This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.
|GAIL ||✔️||✔️||✔️||✔️||✔️ |
|||Whether or not the algorithm has be refactored to fit the |
|||Only implemented for TRPO.|
|||(1, 2, 3, 4) Multi Processing with MPI.|
|||TODO, in project scope.|
Non-array spaces such as
Tuple are not currently supported by any algorithm,
except HER for dict when working with
Box: A N-dimensional box that contains every point in the action space.
Discrete: A list of possible actions, where each timestep only one of the actions can be used.
MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.
MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.
Some logging values (like
eplenmean) are only available when using a Monitor wrapper
See Issue #339 for more info.
Completely reproducible results are not guaranteed across Tensorflow releases or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.
In order to make computations deterministic on CPU, on your specific problem on one specific platform,
you need to pass a
seed argument at the creation of a model and set n_cpu_tf_sess=1 (number of cpu for Tensorflow session).
If you pass an environment to the model using set_env(), then you also need to seed the environment first.
Because of the current limits of Tensorflow 1.x, we cannot ensure reproducible results on the GPU yet. This issue is solved in Stable-Baselines3 “PyTorch edition”
TD3 sometimes fail to have reproducible results for obscure reasons, even when following the previous steps (cf PR #492). If you find the reason then please open an issue ;)
Credit: part of the Reproducibility section comes from PyTorch Documentation