# Vectorized Environments¶

Vectorized Environments are a way to multiprocess training. Instead of training a RL agent on 1 environment, it allows to train it on n environments using n processes. Because of that, actions passed to the environment are now a vector (of dimension n). It is the same for observations, rewards and end of episode signals (dones).

Note

Vectorized environments are required when using wrappers for frame-stacking or normalization.

Note

When using vectorized environments, the environments are automatically resetted at the end of each episode.

Warning

It seems that Windows users are experiencing issues with SubprocVecEnv. We recommend to use the docker image in that case. (See Issue #42)

## DummyVecEnv¶

class stable_baselines.common.vec_env.DummyVecEnv(env_fns)[source]

Creates a simple vectorized wrapper for multiple environments

Parameters: env_fns – ([Gym Environment]) the list of environments to vectorize
close()[source]

Clean up the environment’s resources.

env_method(method_name, *method_args, **method_kwargs)[source]

Provides an interface to call arbitrary class methods of vectorized environments

Parameters: method_name – (str) The name of the env class method to invoke method_args – (tuple) Any positional arguments to provide in the call method_kwargs – (dict) Any keyword arguments to provide in the call (list) List of items retured by the environment’s method call
get_attr(attr_name)[source]

Provides a mechanism for getting class attribues from vectorized environments

Parameters: attr_name – (str) The name of the attribute whose value to return (list) List of values of ‘attr_name’ in all environments
get_images()[source]

Return RGB images from each environment

render(*args, **kwargs)[source]

Gym environment rendering

Parameters: mode – (str) the rendering type
reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns: ([int] or [float]) observation
set_attr(attr_name, value, indices=None)[source]

Provides a mechanism for setting arbitrary class attributes inside vectorized environments

Parameters: attr_name – (str) Name of attribute to assign new value value – (obj) Value to assign to ‘attr_name’ indices – (list,int) Indices of envs to assign value (list) in case env access methods might return something, they will be returned in a list
step_async(actions)[source]

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()[source]

Wait for the step taken with step_async().

Returns: ([int] or [float], [float], [bool], dict) observation, reward, done, information

## SubprocVecEnv¶

class stable_baselines.common.vec_env.SubprocVecEnv(env_fns)[source]

Creates a multiprocess vectorized wrapper for multiple environments

Parameters: env_fns – ([Gym Environment]) Environments to run in subprocesses
close()[source]

Clean up the environment’s resources.

env_method(method_name, *method_args, **method_kwargs)[source]

Provides an interface to call arbitrary class methods of vectorized environments

Parameters: method_name – (str) The name of the env class method to invoke method_args – (tuple) Any positional arguments to provide in the call method_kwargs – (dict) Any keyword arguments to provide in the call (list) List of items retured by each environment’s method call
get_attr(attr_name)[source]

Provides a mechanism for getting class attribues from vectorized environments (note: attribute value returned must be picklable)

Parameters: attr_name – (str) The name of the attribute whose value to return (list) List of values of ‘attr_name’ in all environments
get_images()[source]

Return RGB images from each environment

render(mode='human', *args, **kwargs)[source]

Gym environment rendering

Parameters: mode – (str) the rendering type
reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns: ([int] or [float]) observation
set_attr(attr_name, value, indices=None)[source]

Provides a mechanism for setting arbitrary class attributes inside vectorized environments (note: this is a broadcast of a single value to all instances) (note: the value must be picklable)

Parameters: attr_name – (str) Name of attribute to assign new value value – (obj) Value to assign to ‘attr_name’ indices – (list,tuple) Iterable containing indices of envs whose attr to set (list) in case env access methods might return something, they will be returned in a list
step_async(actions)[source]

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()[source]

Wait for the step taken with step_async().

Returns: ([int] or [float], [float], [bool], dict) observation, reward, done, information

## Wrappers¶

### VecFrameStack¶

class stable_baselines.common.vec_env.VecFrameStack(venv, n_stack)[source]

Frame stacking wrapper for vectorized environment

Parameters: venv – (VecEnv) the vectorized environment to wrap n_stack – (int) Number of frames to stack
close()[source]

Clean up the environment’s resources.

reset()[source]

Reset all environments

step_wait()[source]

Wait for the step taken with step_async().

Returns: ([int] or [float], [float], [bool], dict) observation, reward, done, information

### VecNormalize¶

class stable_baselines.common.vec_env.VecNormalize(venv, training=True, norm_obs=True, norm_reward=True, clip_obs=10.0, clip_reward=10.0, gamma=0.99, epsilon=1e-08)[source]

A moving average, normalizing wrapper for vectorized environment. has support for saving/loading moving average,

Parameters: venv – (VecEnv) the vectorized environment to wrap training – (bool) Whether to update or not the moving average norm_obs – (bool) Whether to normalize observation or not (default: True) norm_reward – (bool) Whether to normalize rewards or not (default: False) clip_obs – (float) Max absolute value for observation clip_reward – (float) Max value absolute for discounted reward gamma – (float) discount factor epsilon – (float) To avoid division by zero
get_original_obs()[source]

returns the unnormalized observation

Returns: (numpy float)
load_running_average(path)[source]
Parameters: path – (str) path to log dir
reset()[source]

Reset all environments

save_running_average(path)[source]
Parameters: path – (str) path to log dir
step_wait()[source]

Apply sequence of actions to sequence of environments actions -> (observations, rewards, news)

where ‘news’ is a boolean vector indicating whether each element is new.