Vectorized Environments¶

Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step. Because of this, actions passed to the environment are now a vector (of dimension n). It is the same for observations, rewards and end of episode signals (dones). In the case of non-array observation spaces such as Dict or Tuple, where different sub-spaces may have different shapes, the sub-observations are vectors (of dimension n).

Name	`Box`	`Discrete`	`Dict`	`Tuple`	Multi Processing
DummyVecEnv	✔️	✔️	✔️	✔️	❌️
SubprocVecEnv	✔️	✔️	✔️	✔️	✔️

Note

Vectorized environments are required when using wrappers for frame-stacking or normalization.

Note

When using vectorized environments, the environments are automatically reset at the end of each episode. Thus, the observation returned for the i-th environment when done[i] is true will in fact be the first observation of the next episode, not the last observation of the episode that has just terminated. You can access the “real” final observation of the terminated episode—that is, the one that accompanied the done event provided by the underlying environment—using the terminal_observation keys in the info dicts returned by the vecenv.

Warning

When using SubprocVecEnv, users must wrap the code in an if __name__ == "__main__": if using the forkserver or spawn start method (default on Windows). On Linux, the default start method is fork which is not thread safe and can create deadlocks.

For more information, see Python’s multiprocessing guidelines.

VecEnv¶

class stable_baselines.common.vec_env.VecEnv(num_envs, observation_space, action_space)[source]¶

An abstract asynchronous, vectorized environment.

Parameters:	num_envs – (int) the number of environments observation_space – (Gym Space) the observation space action_space – (Gym Space) the action space

close()[source]¶: Clean up the environment’s resources.

env_method(method_name, *method_args, indices=None, **method_kwargs)[source]¶

Call instance methods of vectorized environments.

Parameters:	method_name – (str) The name of the environment method to invoke. indices – (list,int) Indices of envs whose method to call method_args – (tuple) Any positional arguments to provide in the call method_kwargs – (dict) Any keyword arguments to provide in the call
Returns:	(list) List of items returned by the environment’s method call

get_attr(attr_name, indices=None)[source]¶

Return attribute from vectorized environment.

Parameters:	attr_name – (str) The name of the attribute whose value to return indices – (list,int) Indices of envs to get attribute from
Returns:	(list) List of values of ‘attr_name’ in all environments

get_images() → Sequence[numpy.ndarray][source]¶: Return RGB images from each environment

getattr_depth_check(name, already_found)[source]¶

Check if an attribute reference is being hidden in a recursive call to __getattr__

Parameters:	name – (str) name of attribute to check for already_found – (bool) whether this attribute has already been found in a wrapper
Returns:	(str or None) name of module whose attribute is being shadowed, if any.

render(mode: str = 'human')[source]¶

Gym environment rendering

Parameters:	mode – the rendering type

reset()[source]¶

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:	([int] or [float]) observation

seed(seed: Optional[int] = None) → List[Union[None, int]][source]¶

Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed.

Parameters:	seed – (Optional[int]) The random seed. May be None for completely random seeding.
Returns:	(List[Union[None, int]]) Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.

set_attr(attr_name, value, indices=None)[source]¶

Set attribute inside vectorized environments.

Parameters:	attr_name – (str) The name of attribute to assign new value value – (obj) Value to assign to attr_name indices – (list,int) Indices of envs to assign value
Returns:	(NoneType)

step(actions)[source]¶

Step the environments with the given action

Parameters:	actions – ([int] or [float]) the action
Returns:	([int] or [float], [float], [bool], dict) observation, reward, done, information

step_async(actions)[source]¶

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()[source]¶

Wait for the step taken with step_async().

Returns:	([int] or [float], [float], [bool], dict) observation, reward, done, information

DummyVecEnv¶

class stable_baselines.common.vec_env.DummyVecEnv(env_fns)[source]¶

Creates a simple vectorized wrapper for multiple environments, calling each environment in sequence on the current Python process. This is useful for computationally simple environment such as cartpole-v1, as the overhead of multiprocess or multithread outweighs the environment computation time. This can also be used for RL methods that require a vectorized environment, but that you want a single environments to train with.

Parameters:	env_fns – ([callable]) A list of functions that will create the environments (each callable returns a Gym.Env instance when called).

close()[source]¶: Clean up the environment’s resources.

env_method(method_name, *method_args, indices=None, **method_kwargs)[source]¶: Call instance methods of vectorized environments.

get_attr(attr_name, indices=None)[source]¶: Return attribute from vectorized environment (see base class).

get_images() → Sequence[numpy.ndarray][source]¶: Return RGB images from each environment

render(mode: str = 'human')[source]¶

Gym environment rendering. If there are multiple environments then they are tiled together in one image via BaseVecEnv.render(). Otherwise (if self.num_envs == 1), we pass the render call directly to the underlying environment.

Therefore, some arguments such as mode will have values that are valid only when num_envs == 1.

Parameters:	mode – The rendering type.

reset()[source]¶

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:	([int] or [float]) observation

seed(seed=None)[source]¶

Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed.

Parameters:	seed – (Optional[int]) The random seed. May be None for completely random seeding.
Returns:	(List[Union[None, int]]) Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.

set_attr(attr_name, value, indices=None)[source]¶: Set attribute inside vectorized environments (see base class).

step_async(actions)[source]¶

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()[source]¶

Wait for the step taken with step_async().

Returns:	([int] or [float], [float], [bool], dict) observation, reward, done, information

SubprocVecEnv¶

class stable_baselines.common.vec_env.SubprocVecEnv(env_fns, start_method=None)[source]¶

Creates a multiprocess vectorized wrapper for multiple environments, distributing each environment to its own process, allowing significant speed up when the environment is computationally complex.

For performance reasons, if your environment is not IO bound, the number of environments should not exceed the number of logical cores on your CPU.

Warning

Only ‘forkserver’ and ‘spawn’ start methods are thread-safe, which is important when TensorFlow sessions or other non thread-safe libraries are used in the parent (see issue #217). However, compared to ‘fork’ they incur a small start-up cost and have restrictions on global variables. With those methods, users must wrap the code in an if __name__ == "__main__": block. For more information, see the multiprocessing documentation.

Parameters:	env_fns – ([callable]) A list of functions that will create the environments (each callable returns a Gym.Env instance when called). start_method – (str) method used to start the subprocesses. Must be one of the methods returned by multiprocessing.get_all_start_methods(). Defaults to ‘forkserver’ on available platforms, and ‘spawn’ otherwise.

close()[source]¶: Clean up the environment’s resources.

env_method(method_name, *method_args, indices=None, **method_kwargs)[source]¶: Call instance methods of vectorized environments.

get_attr(attr_name, indices=None)[source]¶: Return attribute from vectorized environment (see base class).

get_images() → Sequence[numpy.ndarray][source]¶: Return RGB images from each environment

reset()[source]¶

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:	([int] or [float]) observation

seed(seed=None)[source]¶

Sets the random seeds for all environments, based on a given seed. Each individual environment will still get its own seed, by incrementing the given seed.

Parameters:	seed – (Optional[int]) The random seed. May be None for completely random seeding.
Returns:	(List[Union[None, int]]) Returns a list containing the seeds for each individual env. Note that all list elements may be None, if the env does not return anything when being seeded.

set_attr(attr_name, value, indices=None)[source]¶: Set attribute inside vectorized environments (see base class).

step_async(actions)[source]¶

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()[source]¶

Wait for the step taken with step_async().

Returns:	([int] or [float], [float], [bool], dict) observation, reward, done, information

Wrappers¶

VecFrameStack¶

class stable_baselines.common.vec_env.VecFrameStack(venv, n_stack)[source]¶

Frame stacking wrapper for vectorized environment

Parameters:	venv – (VecEnv) the vectorized environment to wrap n_stack – (int) Number of frames to stack

close()[source]¶: Clean up the environment’s resources.

reset()[source]¶: Reset all environments

step_wait()[source]¶

Wait for the step taken with step_async().

Returns:	([int] or [float], [float], [bool], dict) observation, reward, done, information

VecNormalize¶

class stable_baselines.common.vec_env.VecNormalize(venv, training=True, norm_obs=True, norm_reward=True, clip_obs=10.0, clip_reward=10.0, gamma=0.99, epsilon=1e-08)[source]¶

A moving average, normalizing wrapper for vectorized environment.

It is pickleable which will save moving averages and configuration parameters. The wrapped environment venv is not saved, and must be restored manually with set_venv after being unpickled.

Parameters:

venv – (VecEnv) the vectorized environment to wrap
training – (bool) Whether to update or not the moving average
norm_obs – (bool) Whether to normalize observation or not (default: True)
norm_reward – (bool) Whether to normalize rewards or not (default: True)
clip_obs – (float) Max absolute value for observation
clip_reward – (float) Max value absolute for discounted reward
gamma – (float) discount factor
epsilon – (float) To avoid division by zero

get_original_obs() → numpy.ndarray[source]¶: Returns an unnormalized version of the observations from the most recent step or reset.

get_original_reward() → numpy.ndarray[source]¶: Returns an unnormalized version of the rewards from the most recent step.

static load(load_path, venv)[source]¶

Loads a saved VecNormalize object.

Parameters:	load_path – the path to load from. venv – the VecEnv to wrap.
Returns:	(VecNormalize)

load_running_average(path)[source]¶

Parameters:	path – (str) path to log dir

Deprecated since version 2.9.0: This function will be removed in a future version

normalize_obs(obs: numpy.ndarray) → numpy.ndarray[source]¶: Normalize observations using this VecNormalize’s observations statistics. Calling this method does not update statistics.

normalize_reward(reward: numpy.ndarray) → numpy.ndarray[source]¶: Normalize rewards using this VecNormalize’s rewards statistics. Calling this method does not update statistics.

reset()[source]¶: Reset all environments

save_running_average(path)[source]¶

Parameters:	path – (str) path to log dir

Deprecated since version 2.9.0: This function will be removed in a future version

set_venv(venv)[source]¶

Sets the vector environment to wrap to venv.

Also sets attributes derived from this such as num_env.

Parameters:	venv – (VecEnv)

step_wait()[source]¶

Apply sequence of actions to sequence of environments actions -> (observations, rewards, news)

where ‘news’ is a boolean vector indicating whether each element is new.

VecVideoRecorder¶

class stable_baselines.common.vec_env.VecVideoRecorder(venv, video_folder, record_video_trigger, video_length=200, name_prefix='rl-video')[source]¶

Wraps a VecEnv or VecEnvWrapper object to record rendered image as mp4 video. It requires ffmpeg or avconv to be installed on the machine.

Parameters:	venv – (VecEnv or VecEnvWrapper) video_folder – (str) Where to save videos record_video_trigger – (func) Function that defines when to start recording. The function takes the current number of step, and returns whether we should start recording or not. video_length – (int) Length of recorded videos name_prefix – (str) Prefix to the video name

close()[source]¶: Clean up the environment’s resources.

reset()[source]¶

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:	([int] or [float]) observation

step_wait()[source]¶

Wait for the step taken with step_async().

Returns:	([int] or [float], [float], [bool], dict) observation, reward, done, information

VecCheckNan¶

class stable_baselines.common.vec_env.VecCheckNan(venv, raise_exception=False, warn_once=True, check_inf=True)[source]¶

NaN and inf checking wrapper for vectorized environment, will raise a warning by default, allowing you to know from what the NaN of inf originated from.

Parameters:	venv – (VecEnv) the vectorized environment to wrap raise_exception – (bool) Whether or not to raise a ValueError, instead of a UserWarning warn_once – (bool) Whether or not to only warn once. check_inf – (bool) Whether or not to check for +inf or -inf as well

reset()[source]¶

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns:	([int] or [float]) observation

step_async(actions)[source]¶

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()[source]¶

Wait for the step taken with step_async().

Returns:	([int] or [float], [float], [bool], dict) observation, reward, done, information