Warning

This package is in maintenance mode, please use Stable-Baselines3 (SB3) for an up-to-date version. You can find a migration guide in SB3 documentation.

# Probability Distributions¶

Probability distributions used for the different action spaces:

• CategoricalProbabilityDistribution -> Discrete
• DiagGaussianProbabilityDistribution -> Box (continuous actions)
• MultiCategoricalProbabilityDistribution -> MultiDiscrete
• BernoulliProbabilityDistribution -> MultiBinary

The policy networks output parameters for the distributions (named flat in the methods). Actions are then sampled from those distributions.

For instance, in the case of discrete actions. The policy network outputs probability of taking each action. The CategoricalProbabilityDistribution allows to sample from it, computes the entropy, the negative log probability (neglogp) and backpropagate the gradient.

In the case of continuous actions, a Gaussian distribution is used. The policy network outputs mean and (log) std of the distribution (assumed to be a DiagGaussianProbabilityDistribution).

class stable_baselines.common.distributions.BernoulliProbabilityDistribution(logits)[source]
entropy()[source]

Returns Shannon’s entropy of the probability

Returns: (float) the entropy
flatparam()[source]

Return the direct probabilities

Returns: ([float]) the probabilities
classmethod fromflat(flat)[source]

Create an instance of this from new Bernoulli input

Parameters: flat – ([float]) the Bernoulli input data (ProbabilityDistribution) the instance from the given Bernoulli input data
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters: other – ([float]) the distribution to compare with (float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probability distribution

Returns: (Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.BernoulliProbabilityDistributionType(size)[source]
param_shape()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns: (type) the type
sample_shape()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class stable_baselines.common.distributions.CategoricalProbabilityDistribution(logits)[source]
entropy()[source]

Returns Shannon’s entropy of the probability

Returns: (float) the entropy
flatparam()[source]

Return the direct probabilities

Returns: ([float]) the probabilities
classmethod fromflat(flat)[source]

Create an instance of this from new logits values

Parameters: flat – ([float]) the categorical logits input (ProbabilityDistribution) the instance from the given categorical input
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters: other – ([float]) the distribution to compare with (float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probability distribution

Returns: (Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.CategoricalProbabilityDistributionType(n_cat)[source]
param_shape()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns: (type) the type
sample_shape()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class stable_baselines.common.distributions.DiagGaussianProbabilityDistribution(flat)[source]
entropy()[source]

Returns Shannon’s entropy of the probability

Returns: (float) the entropy
flatparam()[source]

Return the direct probabilities

Returns: ([float]) the probabilities
classmethod fromflat(flat)[source]

Create an instance of this from new multivariate Gaussian input

Parameters: flat – ([float]) the multivariate Gaussian input data (ProbabilityDistribution) the instance from the given multivariate Gaussian input data
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters: other – ([float]) the distribution to compare with (float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probability distribution

Returns: (Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.DiagGaussianProbabilityDistributionType(size)[source]
param_shape()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
proba_distribution_from_flat(flat)[source]

returns the probability distribution from flat probabilities

Parameters: flat – ([float]) the flat probabilities (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns: (type) the type
sample_shape()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class stable_baselines.common.distributions.MultiCategoricalProbabilityDistribution(nvec, flat)[source]
entropy()[source]

Returns Shannon’s entropy of the probability

Returns: (float) the entropy
flatparam()[source]

Return the direct probabilities

Returns: ([float]) the probabilities
classmethod fromflat(flat)[source]

Create an instance of this from new logits values

Parameters: flat – ([float]) the multi categorical logits input (ProbabilityDistribution) the instance from the given multi categorical input
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters: other – ([float]) the distribution to compare with (float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probability distribution

Returns: (Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.MultiCategoricalProbabilityDistributionType(n_vec)[source]
param_shape()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
proba_distribution_from_flat(flat)[source]

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters: flat – ([float]) the flat probabilities (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns: (type) the type
sample_shape()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class stable_baselines.common.distributions.ProbabilityDistribution[source]

Base class for describing a probability distribution.

entropy()[source]

Returns Shannon’s entropy of the probability

Returns: (float) the entropy
flatparam()[source]

Return the direct probabilities

Returns: ([float]) the probabilities
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters: other – ([float]) the distribution to compare with (float) the KL divergence of the two distributions
logp(x)[source]

returns the of the log likelihood

Parameters: x – (str) the labels of each index ([float]) The log likelihood of the distribution
mode()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probability distribution

Returns: (Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.ProbabilityDistributionType[source]

Parametrized family of probability distributions

param_placeholder(prepend_shape, name=None)[source]

returns the TensorFlow placeholder for the input parameters

Parameters: prepend_shape – ([int]) the prepend shape name – (str) the placeholder name (TensorFlow Tensor) the placeholder
param_shape()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
proba_distribution_from_flat(flat)[source]

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters: flat – ([float]) the flat probabilities (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns: (type) the type
sample_placeholder(prepend_shape, name=None)[source]

returns the TensorFlow placeholder for the sampling

Parameters: prepend_shape – ([int]) the prepend shape name – (str) the placeholder name (TensorFlow Tensor) the placeholder
sample_shape()[source]

returns the shape of the sampling

Returns: ([int]) the shape
stable_baselines.common.distributions.make_proba_dist_type(ac_space)[source]

return an instance of ProbabilityDistributionType for the correct type of action space

Parameters: ac_space – (Gym Space) the input action space (ProbabilityDistributionType) the appropriate instance of a ProbabilityDistributionType
stable_baselines.common.distributions.shape_el(tensor, index)[source]

get the shape of a TensorFlow Tensor element

Parameters: tensor – (TensorFlow Tensor) the input tensor index – (int) the element ([int]) the shape