RL Algorithms

This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.

Name Refactored [1] Recurrent Box Discrete Multi Processing
A2C ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ [5] ✔️ ✔️
ACKTR ✔️ ✔️ [5] ✔️ ✔️
DDPG ✔️ ✔️ ✔️
DQN ✔️ ✔️
GAIL [2] ✔️ ✔️ ✔️ ✔️ ✔️ [4]
PPO1 ✔️ ✔️ ✔️ ✔️ ✔️ [4]
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️ ✔️ [4]
[1]Whether or not the algorithm has be refactored to fit the BaseRLModel class.
[2]Only implemented for TRPO.
[3]Only implemented for DDPG.
[4](1, 2, 3) Multi Processing with MPI.
[5](1, 2) TODO, in project scope.

Actions gym.spaces:

  • Box: A N-dimensional box that containes every point in the action space.
  • Discrete: A list of possible actions, where each timestep only one of the actions can be used.
  • MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.
  • MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.