RL Algorithms
This table displays the rl algorithms that are implemented in the stable baselines project,
along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.
| Name |
Refactored |
Recurrent |
Box |
Discrete |
Multi Processing |
| A2C |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
| ACER |
✔️ |
✔️ |
❌ |
✔️ |
✔️ |
| ACKTR |
✔️ |
✔️ |
❌ |
✔️ |
✔️ |
| DDPG |
✔️ |
❌ |
✔️ |
❌ |
❌ |
| DQN |
✔️ |
❌ |
❌ |
✔️ |
❌ |
| GAIL |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
| PPO1 |
✔️ |
❌ |
✔️ |
✔️ |
✔️ |
| PPO2 |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
| SAC |
✔️ |
❌ |
✔️ |
❌ |
❌ |
| TRPO |
✔️ |
❌ |
✔️ |
✔️ |
✔️ |
Note
Non-array spaces such as Dict or Tuple are not currently supported by any algorithm.
Actions gym.spaces:
Box: A N-dimensional box that containes every point in the action
space.
Discrete: A list of possible actions, where each timestep only
one of the actions can be used.
MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.
MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.