Probability Distributions¶

Probability distributions used for the different action spaces:

CategoricalProbabilityDistribution -> Discrete
DiagGaussianProbabilityDistribution -> Box (continuous actions)
MultiCategoricalProbabilityDistribution -> MultiDiscrete
BernoulliProbabilityDistribution -> MultiBinary

The policy networks output parameters for the distributions (named flat in the methods). Actions are then sampled from those distributions.

For instance, in the case of discrete actions. The policy network outputs probability of taking each action. The CategoricalProbabilityDistribution allows to sample from it, computes the entropy, the negative log probability (neglogp) and backpropagate the gradient.

In the case of continuous actions, a Gaussian distribution is used. The policy network outputs mean and (log) std of the distribution (assumed to be a DiagGaussianProbabilityDistribution).

class stable_baselines.common.distributions.BernoulliProbabilityDistribution(logits)[source]¶

entropy()[source]¶

Returns Shannon’s entropy of the probability

Returns:	(float) the entropy

flatparam()[source]¶

Return the direct probabilities

Returns:	([float]) the probabilities

classmethod fromflat(flat)[source]¶

Create an instance of this from new Bernoulli input

Parameters:	flat – ([float]) the Bernoulli input data
Returns:	(ProbabilityDistribution) the instance from the given Bernoulli input data

kl(other)[source]¶

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters:	other – ([float]) the distribution to compare with
Returns:	(float) the KL divergence of the two distributions

mode()[source]¶

Returns the probability

Returns:	(Tensorflow Tensor) the deterministic action

neglogp(x)[source]¶

returns the of the negative log likelihood

Parameters:	x – (str) the labels of each index
Returns:	([float]) The negative log likelihood of the distribution

sample()[source]¶

returns a sample from the probability distribution

Returns:	(Tensorflow Tensor) the stochastic action

class stable_baselines.common.distributions.BernoulliProbabilityDistributionType(size)[source]¶

param_shape()[source]¶

returns the shape of the input parameters

Returns:	([int]) the shape

proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶

returns the probability distribution from latent values

Parameters:	pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]¶

returns the ProbabilityDistribution class of this type

Returns:	(Type ProbabilityDistribution) the probability distribution class associated

sample_dtype()[source]¶

returns the type of the sampling

Returns:	(type) the type

sample_shape()[source]¶

returns the shape of the sampling

Returns:	([int]) the shape

class stable_baselines.common.distributions.CategoricalProbabilityDistribution(logits)[source]¶

entropy()[source]¶

Returns Shannon’s entropy of the probability

Returns:	(float) the entropy

flatparam()[source]¶

Return the direct probabilities

Returns:	([float]) the probabilities

classmethod fromflat(flat)[source]¶

Create an instance of this from new logits values

Parameters:	flat – ([float]) the categorical logits input
Returns:	(ProbabilityDistribution) the instance from the given categorical input

kl(other)[source]¶

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters:	other – ([float]) the distribution to compare with
Returns:	(float) the KL divergence of the two distributions

mode()[source]¶

Returns the probability

Returns:	(Tensorflow Tensor) the deterministic action

neglogp(x)[source]¶

returns the of the negative log likelihood

Parameters:	x – (str) the labels of each index
Returns:	([float]) The negative log likelihood of the distribution

sample()[source]¶

returns a sample from the probability distribution

Returns:	(Tensorflow Tensor) the stochastic action

class stable_baselines.common.distributions.CategoricalProbabilityDistributionType(n_cat)[source]¶

param_shape()[source]¶

returns the shape of the input parameters

Returns:	([int]) the shape

proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶

returns the probability distribution from latent values

Parameters:	pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]¶

returns the ProbabilityDistribution class of this type

Returns:	(Type ProbabilityDistribution) the probability distribution class associated

sample_dtype()[source]¶

returns the type of the sampling

Returns:	(type) the type

sample_shape()[source]¶

returns the shape of the sampling

Returns:	([int]) the shape

class stable_baselines.common.distributions.DiagGaussianProbabilityDistribution(flat)[source]¶

entropy()[source]¶

Returns Shannon’s entropy of the probability

Returns:	(float) the entropy

flatparam()[source]¶

Return the direct probabilities

Returns:	([float]) the probabilities

classmethod fromflat(flat)[source]¶

Create an instance of this from new multivariate Gaussian input

Parameters:	flat – ([float]) the multivariate Gaussian input data
Returns:	(ProbabilityDistribution) the instance from the given multivariate Gaussian input data

kl(other)[source]¶

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters:	other – ([float]) the distribution to compare with
Returns:	(float) the KL divergence of the two distributions

mode()[source]¶

Returns the probability

Returns:	(Tensorflow Tensor) the deterministic action

neglogp(x)[source]¶

returns the of the negative log likelihood

Parameters:	x – (str) the labels of each index
Returns:	([float]) The negative log likelihood of the distribution

sample()[source]¶

returns a sample from the probability distribution

Returns:	(Tensorflow Tensor) the stochastic action

class stable_baselines.common.distributions.DiagGaussianProbabilityDistributionType(size)[source]¶

param_shape()[source]¶

returns the shape of the input parameters

Returns:	([int]) the shape

proba_distribution_from_flat(flat)[source]¶

returns the probability distribution from flat probabilities

Parameters:	flat – ([float]) the flat probabilities
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶

returns the probability distribution from latent values

Parameters:	pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]¶

returns the ProbabilityDistribution class of this type

Returns:	(Type ProbabilityDistribution) the probability distribution class associated

sample_dtype()[source]¶

returns the type of the sampling

Returns:	(type) the type

sample_shape()[source]¶

returns the shape of the sampling

Returns:	([int]) the shape

class stable_baselines.common.distributions.MultiCategoricalProbabilityDistribution(nvec, flat)[source]¶

entropy()[source]¶

Returns Shannon’s entropy of the probability

Returns:	(float) the entropy

flatparam()[source]¶

Return the direct probabilities

Returns:	([float]) the probabilities

classmethod fromflat(flat)[source]¶

Create an instance of this from new logits values

Parameters:	flat – ([float]) the multi categorical logits input
Returns:	(ProbabilityDistribution) the instance from the given multi categorical input

kl(other)[source]¶

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters:	other – ([float]) the distribution to compare with
Returns:	(float) the KL divergence of the two distributions

mode()[source]¶

Returns the probability

Returns:	(Tensorflow Tensor) the deterministic action

neglogp(x)[source]¶

returns the of the negative log likelihood

Parameters:	x – (str) the labels of each index
Returns:	([float]) The negative log likelihood of the distribution

sample()[source]¶

returns a sample from the probability distribution

Returns:	(Tensorflow Tensor) the stochastic action

class stable_baselines.common.distributions.MultiCategoricalProbabilityDistributionType(n_vec)[source]¶

param_shape()[source]¶

returns the shape of the input parameters

Returns:	([int]) the shape

proba_distribution_from_flat(flat)[source]¶

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters:	flat – ([float]) the flat probabilities
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶

returns the probability distribution from latent values

Parameters:	pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]¶

returns the ProbabilityDistribution class of this type

Returns:	(Type ProbabilityDistribution) the probability distribution class associated

sample_dtype()[source]¶

returns the type of the sampling

Returns:	(type) the type

sample_shape()[source]¶

returns the shape of the sampling

Returns:	([int]) the shape

class stable_baselines.common.distributions.ProbabilityDistribution[source]¶

Base class for describing a probability distribution.

entropy()[source]¶

Returns Shannon’s entropy of the probability

Returns:	(float) the entropy

flatparam()[source]¶

Return the direct probabilities

Returns:	([float]) the probabilities

kl(other)[source]¶

Calculates the Kullback-Leibler divergence from the given probability distribution

Parameters:	other – ([float]) the distribution to compare with
Returns:	(float) the KL divergence of the two distributions

logp(x)[source]¶

returns the of the log likelihood

Parameters:	x – (str) the labels of each index
Returns:	([float]) The log likelihood of the distribution

mode()[source]¶

Returns the probability

Returns:	(Tensorflow Tensor) the deterministic action

neglogp(x)[source]¶

returns the of the negative log likelihood

Parameters:	x – (str) the labels of each index
Returns:	([float]) The negative log likelihood of the distribution

sample()[source]¶

returns a sample from the probability distribution

Returns:	(Tensorflow Tensor) the stochastic action

class stable_baselines.common.distributions.ProbabilityDistributionType[source]¶

Parametrized family of probability distributions

param_placeholder(prepend_shape, name=None)[source]¶

returns the TensorFlow placeholder for the input parameters

Parameters:	prepend_shape – ([int]) the prepend shape name – (str) the placeholder name
Returns:	(TensorFlow Tensor) the placeholder

param_shape()[source]¶

returns the shape of the input parameters

Returns:	([int]) the shape

proba_distribution_from_flat(flat)[source]¶

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters:	flat – ([float]) the flat probabilities
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶

returns the probability distribution from latent values

Parameters:	pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the initial scale of the distribution init_bias – (float) the initial bias of the distribution
Returns:	(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]¶

returns the ProbabilityDistribution class of this type

Returns:	(Type ProbabilityDistribution) the probability distribution class associated

sample_dtype()[source]¶

returns the type of the sampling

Returns:	(type) the type

sample_placeholder(prepend_shape, name=None)[source]¶

returns the TensorFlow placeholder for the sampling

Parameters:	prepend_shape – ([int]) the prepend shape name – (str) the placeholder name
Returns:	(TensorFlow Tensor) the placeholder

sample_shape()[source]¶

returns the shape of the sampling

Returns:	([int]) the shape

stable_baselines.common.distributions.make_proba_dist_type(ac_space)[source]¶

return an instance of ProbabilityDistributionType for the correct type of action space

Parameters:	ac_space – (Gym Space) the input action space
Returns:	(ProbabilityDistributionType) the appropriate instance of a ProbabilityDistributionType

stable_baselines.common.distributions.shape_el(tensor, index)[source]¶

get the shape of a TensorFlow Tensor element

Parameters:	tensor – (TensorFlow Tensor) the input tensor index – (int) the element
Returns:	([int]) the shape