Vectorized Environments¶
Vectorized Environments are a method for multiprocess training. Instead of training an RL agent on 1 environment, it allows us to train it on n environments using n processes. Because of this, actions passed to the environment are now a vector (of dimension n). It is the same for observations, rewards and end of episode signals (dones). In the case of non-array observation spaces such as Dict or Tuple, where different sub-spaces may have different shapes, the sub-observations are vectors (of dimension n).
Name | Box |
Discrete |
Dict |
Tuple |
Multi Processing |
---|---|---|---|---|---|
DummyVecEnv | ✔️ | ✔️ | ✔️ | ✔️ | ❌️ |
SubprocVecEnv | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Note
Vectorized environments are required when using wrappers for frame-stacking or normalization.
Note
When using vectorized environments, the environments are automatically reset at the end of each episode.
Warning
When using SubprocVecEnv
, users must wrap the code in an if __name__ == "__main__":
if using the forkserver
or spawn
start method (default on Windows).
On Linux, the default start method is fork
which is not thread safe and can create deadlocks.
For more information, see Python’s multiprocessing guidelines.
DummyVecEnv¶
-
class
stable_baselines.common.vec_env.
DummyVecEnv
(env_fns)[source]¶ Creates a simple vectorized wrapper for multiple environments
Parameters: env_fns – ([Gym Environment]) the list of environments to vectorize -
env_method
(method_name, *method_args, **method_kwargs)[source]¶ Provides an interface to call arbitrary class methods of vectorized environments
Parameters: - method_name – (str) The name of the env class method to invoke
- method_args – (tuple) Any positional arguments to provide in the call
- method_kwargs – (dict) Any keyword arguments to provide in the call
Returns: (list) List of items retured by the environment’s method call
-
get_attr
(attr_name)[source]¶ Provides a mechanism for getting class attribues from vectorized environments
Parameters: attr_name – (str) The name of the attribute whose value to return Returns: (list) List of values of ‘attr_name’ in all environments
-
render
(*args, **kwargs)[source]¶ Gym environment rendering
Parameters: mode – (str) the rendering type
-
reset
()[source]¶ Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
Returns: ([int] or [float]) observation
-
set_attr
(attr_name, value, indices=None)[source]¶ Provides a mechanism for setting arbitrary class attributes inside vectorized environments
Parameters: - attr_name – (str) Name of attribute to assign new value
- value – (obj) Value to assign to ‘attr_name’
- indices – (list,int) Indices of envs to assign value
Returns: (list) in case env access methods might return something, they will be returned in a list
-
SubprocVecEnv¶
-
class
stable_baselines.common.vec_env.
SubprocVecEnv
(env_fns, start_method=None)[source]¶ Creates a multiprocess vectorized wrapper for multiple environments
Warning
Only ‘forkserver’ and ‘spawn’ start methods are thread-safe, which is important when TensorFlow sessions or other non thread-safe libraries are used in the parent (see issue #217). However, compared to ‘fork’ they incur a small start-up cost and have restrictions on global variables. With those methods, users must wrap the code in an
if __name__ == "__main__":
For more information, see the multiprocessing documentation.Parameters: - env_fns – ([Gym Environment]) Environments to run in subprocesses
- start_method – (str) method used to start the subprocesses. Must be one of the methods returned by multiprocessing.get_all_start_methods(). Defaults to ‘fork’ on available platforms, and ‘spawn’ otherwise.
-
env_method
(method_name, *method_args, **method_kwargs)[source]¶ Provides an interface to call arbitrary class methods of vectorized environments
Parameters: - method_name – (str) The name of the env class method to invoke
- method_args – (tuple) Any positional arguments to provide in the call
- method_kwargs – (dict) Any keyword arguments to provide in the call
Returns: (list) List of items retured by each environment’s method call
-
get_attr
(attr_name)[source]¶ Provides a mechanism for getting class attribues from vectorized environments (note: attribute value returned must be picklable)
Parameters: attr_name – (str) The name of the attribute whose value to return Returns: (list) List of values of ‘attr_name’ in all environments
-
render
(mode='human', *args, **kwargs)[source]¶ Gym environment rendering
Parameters: mode – (str) the rendering type
-
reset
()[source]¶ Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
Returns: ([int] or [float]) observation
-
set_attr
(attr_name, value, indices=None)[source]¶ Provides a mechanism for setting arbitrary class attributes inside vectorized environments (note: this is a broadcast of a single value to all instances) (note: the value must be picklable)
Parameters: - attr_name – (str) Name of attribute to assign new value
- value – (obj) Value to assign to ‘attr_name’
- indices – (list,tuple) Iterable containing indices of envs whose attr to set
Returns: (list) in case env access methods might return something, they will be returned in a list
Wrappers¶
VecFrameStack¶
VecNormalize¶
-
class
stable_baselines.common.vec_env.
VecNormalize
(venv, training=True, norm_obs=True, norm_reward=True, clip_obs=10.0, clip_reward=10.0, gamma=0.99, epsilon=1e-08)[source]¶ A moving average, normalizing wrapper for vectorized environment. has support for saving/loading moving average,
Parameters: - venv – (VecEnv) the vectorized environment to wrap
- training – (bool) Whether to update or not the moving average
- norm_obs – (bool) Whether to normalize observation or not (default: True)
- norm_reward – (bool) Whether to normalize rewards or not (default: True)
- clip_obs – (float) Max absolute value for observation
- clip_reward – (float) Max value absolute for discounted reward
- gamma – (float) discount factor
- epsilon – (float) To avoid division by zero
VecVideoRecorder¶
-
class
stable_baselines.common.vec_env.
VecVideoRecorder
(venv, video_folder, record_video_trigger, video_length=200, name_prefix='rl-video')[source]¶ Wraps a VecEnv or VecEnvWrapper object to record rendered image as mp4 video. It requires ffmpeg or avconv to be installed on the machine.
Parameters: - venv – (VecEnv or VecEnvWrapper)
- video_folder – (str) Where to save videos
- record_video_trigger – (func) Function that defines when to start recording. The function takes the current number of step, and returns whether we should start recording or not.
- video_length – (int) Length of recorded videos
- name_prefix – (str) Prefix to the video name