Changelog¶
For download links, please look at Github release page.
Release 2.5.0 (2019-03-28)¶
Working GAIL, pretrain RL models and hotfix for A2C with continuous actions
- fixed various bugs in GAIL
- added scripts to generate dataset for gail
- added tests for GAIL + data for Pendulum-v0
- removed unused
utils
file in DQN folder - fixed a bug in A2C where actions were cast to
int32
even in the continuous case - added addional logging to A2C when Monitor wrapper is used
- changed logging for PPO2: do not display NaN when reward info is not present
- change default value of A2C lr schedule
- removed behavior cloning script
- added
pretrain
method to base class, in order to use behavior cloning on all models - fixed
close()
method for DummyVecEnv. - added support for Dict spaces in DummyVecEnv and SubprocVecEnv. (@AdamGleave)
- added support for arbitrary multiprocessing start methods and added a warning about SubprocVecEnv that are not thread-safe by default. (@AdamGleave)
- added support for Discrete actions for GAIL
- fixed deprecation warning for tf: replaces
tf.to_float()
bytf.cast()
- fixed bug in saving and loading ddpg model when using normalization of obs or returns (@tperol)
- changed DDPG default buffer size from 100 to 50000.
- fixed a bug in
ddpg.py
incombined_stats
for eval. Computed mean oneval_episode_rewards
andeval_qs
(@keshaviyengar) - fixed a bug in
setup.py
that would error on non-GPU systems without TensorFlow installed
Release 2.4.1 (2019-02-11)¶
Bug fixes and improvements
- fixed computation of training metrics in TRPO and PPO1
- added
reset_num_timesteps
keyword when calling train() to continue tensorboard learning curves - reduced the size taken by tensorboard logs (added a
full_tensorboard_log
to enable full logging, which was the previous behavior) - fixed image detection for tensorboard logging
- fixed ACKTR for recurrent policies
- fixed gym breaking changes
- fixed custom policy examples in the doc for DQN and DDPG
- remove gym spaces patch for equality functions
- fixed tensorflow dependency: cpu version was installed overwritting tensorflow-gpu when present.
- fixed a bug in
traj_segment_generator
(used in ppo1 and trpo) wherenew
was not updated. (spotted by @junhyeokahn)
Release 2.4.0 (2019-01-17)¶
Soft Actor-Critic (SAC) and policy kwargs
- added Soft Actor-Critic (SAC) model
- fixed a bug in DQN where prioritized_replay_beta_iters param was not used
- fixed DDPG that did not save target network parameters
- fixed bug related to shape of true_reward (@abhiskk)
- fixed example code in documentation of tf_util:Function (@JohannesAck)
- added learning rate schedule for SAC
- fixed action probability for continuous actions with actor-critic models
- added optional parameter to action_probability for likelihood calculation of given action being taken.
- added more flexible custom LSTM policies
- added auto entropy coefficient optimization for SAC
- clip continuous actions at test time too for all algorithms (except SAC/DDPG where it is not needed)
- added a mean to pass kwargs to policy when creating a model (+ save those kwargs)
- fixed DQN examples in DQN folder
- added possibility to pass activation function for DDPG, DQN and SAC
Release 2.3.0 (2018-12-05)¶
- added support for storing model in file like object. (thanks to @erniejunior)
- fixed wrong image detection when using tensorboard logging with DQN
- fixed bug in ppo2 when passing non callable lr after loading
- fixed tensorboard logging in ppo2 when nminibatches=1
- added early stoppping via callback return value (@erniejunior)
- added more flexible custom mlp policies (@erniejunior)
Release 2.2.1 (2018-11-18)¶
- added VecVideoRecorder to record mp4 videos from environment.
Release 2.2.0 (2018-11-07)¶
- Hotfix for ppo2, the wrong placeholder was used for the value function
Release 2.1.2 (2018-11-06)¶
- added
async_eigen_decomp
parameter for ACKTR and set it toFalse
by default (remove deprecation warnings) - added methods for calling env methods/setting attributes inside a VecEnv (thanks to @bjmuld)
- updated gym minimum version
Release 2.1.1 (2018-10-20)¶
- fixed MpiAdam synchronization issue in PPO1 (thanks to @brendenpetersen) issue #50
- fixed dependency issues (new mujoco-py requires a mujoco licence + gym broke MultiDiscrete space shape)
Release 2.1.0 (2018-10-2)¶
Warning
This version contains breaking changes for DQN policies, please read the full details
Bug fixes + doc update
- added patch fix for equal function using gym.spaces.MultiDiscrete and gym.spaces.MultiBinary
- fixes for DQN action_probability
- re-added double DQN + refactored DQN policies breaking changes
- replaced async with async_eigen_decomp in ACKTR/KFAC for python 3.7 compatibility
- removed action clipping for prediction of continuous actions (see issue #36)
- fixed NaN issue due to clipping the continuous action in the wrong place (issue #36)
- documentation was updated (policy + DDPG example hyperparameters)
Release 2.0.0 (2018-09-18)¶
Warning
This version contains breaking changes, please read the full details
Tensorboard, refactoring and bug fixes
- Renamed DeepQ to DQN breaking changes
- Renamed DeepQPolicy to DQNPolicy breaking changes
- fixed DDPG behavior breaking changes
- changed default policies for DDPG, so that DDPG now works correctly breaking changes
- added more documentation (some modules from common).
- added doc about using custom env
- added Tensorboard support for A2C, ACER, ACKTR, DDPG, DeepQ, PPO1, PPO2 and TRPO
- added episode reward to Tensorboard
- added documentation for Tensorboard usage
- added Identity for Box action space
- fixed render function ignoring parameters when using wrapped environments
- fixed PPO1 and TRPO done values for recurrent policies
- fixed image normalization not occurring when using images
- updated VecEnv objects for the new Gym version
- added test for DDPG
- refactored DQN policies
- added registry for policies, can be passed as string to the agent
- added documentation for custom policies + policy registration
- fixed numpy warning when using DDPG Memory
- fixed DummyVecEnv not copying the observation array when stepping and resetting
- added pre-built docker images + installation instructions
- added
deterministic
argument in the predict function - added assert in PPO2 for recurrent policies
- fixed predict function to handle both vectorized and unwrapped environment
- added input check to the predict function
- refactored ActorCritic models to reduce code duplication
- refactored Off Policy models (to begin HER and replay_buffer refactoring)
- added tests for auto vectorization detection
- fixed render function, to handle positional arguments
Release 1.0.7 (2018-08-29)¶
Bug fixes and documentation
- added html documentation using sphinx + integration with read the docs
- cleaned up README + typos
- fixed normalization for DQN with images
- fixed DQN identity test
Release 1.0.1 (2018-08-20)¶
Refactored Stable Baselines
- refactored A2C, ACER, ACTKR, DDPG, DeepQ, GAIL, TRPO, PPO1 and PPO2 under a single constant class
- added callback to refactored algorithm training
- added saving and loading to refactored algorithms
- refactored ACER, DDPG, GAIL, PPO1 and TRPO to fit with A2C, PPO2 and ACKTR policies
- added new policies for most algorithms (Mlp, MlpLstm, MlpLnLstm, Cnn, CnnLstm and CnnLnLstm)
- added dynamic environment switching (so continual RL learning is now feasible)
- added prediction from observation and action probability from observation for all the algorithms
- fixed graphs issues, so models wont collide in names
- fixed behavior_clone weight loading for GAIL
- fixed Tensorflow using all the GPU VRAM
- fixed models so that they are all compatible with vectorized environments
- fixed
`set_global_seed`
to update`gym.spaces`
’s random seed - fixed PPO1 and TRPO performance issues when learning identity function
- added new tests for loading, saving, continuous actions and learning the identity function
- fixed DQN wrapping for atari
- added saving and loading for Vecnormalize wrapper
- added automatic detection of action space (for the policy network)
- fixed ACER buffer with constant values assuming n_stack=4
- fixed some RL algorithms not clipping the action to be in the action_space, when using
`gym.spaces.Box`
- refactored algorithms can take either a
`gym.Environment`
or a`str`
([if the environment name is registered](https://github.com/openai/gym/wiki/Environments)) - Hoftix in ACER (compared to v1.0.0)
Future Work :
- Finish refactoring HER
- Refactor ACKTR and ACER for continuous implementation
Release 0.1.6 (2018-07-27)¶
Deobfuscation of the code base + pep8 and fixes
- Fixed
tf.session().__enter__()
being used, rather thansess = tf.session()
and passing the session to the objects - Fixed uneven scoping of TensorFlow Sessions throughout the code
- Fixed rolling vecwrapper to handle observations that are not only grayscale images
- Fixed deepq saving the environment when trying to save itself
- Fixed
ValueError: Cannot take the length of Shape with unknown rank.
inacktr
, when runningrun_atari.py
script. - Fixed calling baselines sequentially no longer creates graph conflicts
- Fixed mean on empty array warning with deepq
- Fixed kfac eigen decomposition not cast to float64, when the parameter use_float64 is set to True
- Fixed Dataset data loader, not correctly resetting id position if shuffling is disabled
- Fixed
EOFError
when reading from connection in theworker
insubproc_vec_env.py
- Fixed
behavior_clone
weight loading and saving for GAIL - Avoid taking root square of negative number in
trpo_mpi.py
- Removed some duplicated code (a2cpolicy, trpo_mpi)
- Removed unused, undocumented and crashing function
reset_task
insubproc_vec_env.py
- Reformated code to PEP8 style
- Documented all the codebase
- Added atari tests
- Added logger tests
Missing: tests for acktr continuous (+ HER, rely on mujoco…)
Maintainers¶
Stable-Baselines is currently maintained by Ashley Hill (aka @hill-a), Antonin Raffin (aka @araffin), Maximilian Ernestus (aka @erniejunior) and Adam Gleave (@AdamGleave).
Contributors (since v2.0.0):¶
In random order…
Thanks to @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar @abhiskk @JohannesAck @EliasHasle @mrakgr @Bleyddyn @antoine-galataud @junhyeokahn @AdamGleave @keshaviyengar @tperol