RL Algorithms¶
This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.
Name | Refactored [1] | Recurrent | Box |
Discrete |
Multi Processing |
---|---|---|---|---|---|
A2C | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
ACER | ✔️ | ✔️ | ❌ [4] | ✔️ | ✔️ |
ACKTR | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
DDPG | ✔️ | ❌ | ✔️ | ❌ | ✔️ [3] |
DQN | ✔️ | ❌ | ❌ | ✔️ | ❌ |
HER | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
GAIL [2] | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ [3] |
PPO1 | ✔️ | ❌ | ✔️ | ✔️ | ✔️ [3] |
PPO2 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
SAC | ✔️ | ❌ | ✔️ | ❌ | ❌ |
TD3 | ✔️ | ❌ | ✔️ | ❌ | ❌ |
TRPO | ✔️ | ❌ | ✔️ | ✔ | ✔️ [3] |
[1] | Whether or not the algorithm has be refactored to fit the BaseRLModel class. |
[2] | Only implemented for TRPO. |
[3] | (1, 2, 3, 4) Multi Processing with MPI. |
[4] | TODO, in project scope. |
Note
Non-array spaces such as Dict
or Tuple
are not currently supported by any algorithm,
except HER for dict when working with gym.GoalEnv
Actions gym.spaces
:
Box
: A N-dimensional box that containes every point in the action space.Discrete
: A list of possible actions, where each timestep only one of the actions can be used.MultiDiscrete
: A list of possible actions, where each timestep only one action of each discrete set can be used.MultiBinary
: A list of possible actions, where each timestep any of the actions can be used in any combination.
Note
Some logging values (like ep_rewmean, eplenmean) are only available when using a Monitor wrapper See Issue #339 for more info.