RL Algorithms
This table displays the rl algorithms that are implemented in the stable baselines project,
along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.
Name |
Refactored |
Recurrent |
Box |
Discrete |
Multi Processing |
A2C |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
ACER |
✔️ |
✔️ |
❌ |
✔️ |
✔️ |
ACKTR |
✔️ |
✔️ |
❌ |
✔️ |
✔️ |
DDPG |
✔️ |
❌ |
✔️ |
❌ |
✔️ |
DQN |
✔️ |
❌ |
❌ |
✔️ |
❌ |
HER |
✔️ |
❌ |
✔️ |
✔️ |
❌ |
GAIL |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
PPO1 |
✔️ |
❌ |
✔️ |
✔️ |
✔️ |
PPO2 |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
SAC |
✔️ |
❌ |
✔️ |
❌ |
❌ |
TRPO |
✔️ |
❌ |
✔️ |
✔️ |
✔️ |
Note
Non-array spaces such as Dict or Tuple are not currently supported by any algorithm,
except HER for dict when working with gym.GoalEnv
Actions gym.spaces
:
Box
: A N-dimensional box that containes every point in the action
space.
Discrete
: A list of possible actions, where each timestep only
one of the actions can be used.
MultiDiscrete
: A list of possible actions, where each timestep only one action of each discrete set can be used.
MultiBinary
: A list of possible actions, where each timestep any of the actions can be used in any combination.