RL Algorithms¶

This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.

Name	Refactored [1]	Recurrent	`Box`	`Discrete`	Multi Processing
A2C	✔️	✔️	✔️	✔️	✔️
ACER	✔️	✔️	❌ [4]	✔️	✔️
ACKTR	✔️	✔️	✔️	✔️	✔️
DDPG	✔️	❌	✔️	❌	✔️ [3]
DQN	✔️	❌	❌	✔️	❌
HER	✔️	❌	✔️	✔️	❌
GAIL [2]	✔️	✔️	✔️	✔️	✔️ [3]
PPO1	✔️	❌	✔️	✔️	✔️ [3]
PPO2	✔️	✔️	✔️	✔️	✔️
SAC	✔️	❌	✔️	❌	❌
TD3	✔️	❌	✔️	❌	❌
TRPO	✔️	❌	✔️	✔	✔️ [3]

[1]	Whether or not the algorithm has be refactored to fit the `BaseRLModel` class.

[2]	Only implemented for TRPO.

[3]	(1, 2, 3, 4) Multi Processing with MPI.

[4]	TODO, in project scope.

Note

Non-array spaces such as Dict or Tuple are not currently supported by any algorithm, except HER for dict when working with gym.GoalEnv

Actions gym.spaces:

Box: A N-dimensional box that containes every point in the action space.
Discrete: A list of possible actions, where each timestep only one of the actions can be used.
MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.
MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.

Note

Some logging values (like ep_rewmean, eplenmean) are only available when using a Monitor wrapper See Issue #339 for more info.