RL Algorithms

This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.

Name Refactored [1] Recurrent Box Discrete Multi Processing
A2C ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ [4] ✔️ ✔️
ACKTR ✔️ ✔️ [4] ✔️ ✔️
DDPG ✔️ ✔️ ✔️ [3]
DQN ✔️ ✔️
HER ✔️ ✔️ ✔️
GAIL [2] ✔️ ✔️ ✔️ ✔️ ✔️ [3]
PPO1 ✔️ ✔️ ✔️ ✔️ [3]
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️
SAC ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️ [3]
[1]Whether or not the algorithm has be refactored to fit the BaseRLModel class.
[2]Only implemented for TRPO.
[3](1, 2, 3, 4) Multi Processing with MPI.
[4](1, 2) TODO, in project scope.

Note

Non-array spaces such as Dict or Tuple are not currently supported by any algorithm, except HER for dict when working with gym.GoalEnv

Actions gym.spaces:

  • Box: A N-dimensional box that containes every point in the action space.
  • Discrete: A list of possible actions, where each timestep only one of the actions can be used.
  • MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.
  • MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.