RL Algorithms

This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.

Name Refactored [1] Recurrent Box Discrete Multi Processing
A2C ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ [4] ✔️ ✔️
ACKTR ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ✔️ ✔️ [3]
DQN ✔️ ✔️
HER ✔️ ✔️ ✔️
GAIL [2] ✔️ ✔️ ✔️ ✔️ ✔️ [3]
PPO1 ✔️ ✔️ ✔️ ✔️ [3]
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️
SAC ✔️ ✔️
TD3 ✔️ ✔️
TRPO ✔️ ✔️ ✔️ [3]
[1]Whether or not the algorithm has be refactored to fit the BaseRLModel class.
[2]Only implemented for TRPO.
[3](1, 2, 3, 4) Multi Processing with MPI.
[4]TODO, in project scope.

Note

Non-array spaces such as Dict or Tuple are not currently supported by any algorithm, except HER for dict when working with gym.GoalEnv

Actions gym.spaces:

  • Box: A N-dimensional box that containes every point in the action space.
  • Discrete: A list of possible actions, where each timestep only one of the actions can be used.
  • MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.
  • MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.

Note

Some logging values (like ep_rewmean, eplenmean) are only available when using a Monitor wrapper See Issue #339 for more info.