Probability Distributions

Probability distributions used for the different action spaces:

  • CategoricalProbabilityDistribution -> Discrete
  • DiagGaussianProbabilityDistribution -> Box (continuous actions)
  • MultiCategoricalProbabilityDistribution -> MultiDiscrete
  • BernoulliProbabilityDistribution -> MultiBinary

The policy networks output parameters for the distributions (named flat in the methods). Actions are then sampled from those distributions.

For instance, in the case of discrete actions. The policy network outputs probability of taking each action. The CategoricalProbabilityDistribution allows to sample from it, computes the entropy, the negative log probability (neglogp) and backpropagate the gradient.

In the case of continuous actions, a Gaussian distribution is used. The policy network outputs mean and (log) std of the distribution (assumed to be a DiagGaussianProbabilityDistribution).

class stable_baselines.common.distributions.BernoulliProbabilityDistribution(logits)[source]
entropy()[source]

Returns shannon’s entropy of the probability

Returns:(float) the entropy
flatparam()[source]

Return the direct probabilities

Returns:([float]) the probabilites
classmethod fromflat(flat)[source]

Create an instance of this from new bernoulli input

Parameters:flat – ([float]) the bernoulli input data
Returns:(ProbabilityDistribution) the instance from the given bernoulli input data
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probabilty distribution

Parameters:other – ([float]) the distribution to compare with
Returns:(float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns:(Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters:x – (str) the labels of each index
Returns:([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probabilty distribution

Returns:(Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.BernoulliProbabilityDistributionType(size)[source]
param_shape()[source]

returns the shape of the input parameters

Returns:([int]) the shape
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters:
  • pi_latent_vector – ([float]) the latent pi values
  • vf_latent_vector – ([float]) the latent vf values
  • init_scale – (float) the inital scale of the distribution
  • init_bias – (float) the inital bias of the distribution
Returns:

(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns:(Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns:(type) the type
sample_shape()[source]

returns the shape of the sampling

Returns:([int]) the shape
class stable_baselines.common.distributions.CategoricalProbabilityDistribution(logits)[source]
entropy()[source]

Returns shannon’s entropy of the probability

Returns:(float) the entropy
flatparam()[source]

Return the direct probabilities

Returns:([float]) the probabilites
classmethod fromflat(flat)[source]

Create an instance of this from new logits values

Parameters:flat – ([float]) the categorical logits input
Returns:(ProbabilityDistribution) the instance from the given categorical input
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probabilty distribution

Parameters:other – ([float]) the distribution to compare with
Returns:(float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns:(Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters:x – (str) the labels of each index
Returns:([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probabilty distribution

Returns:(Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.CategoricalProbabilityDistributionType(n_cat)[source]
param_shape()[source]

returns the shape of the input parameters

Returns:([int]) the shape
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters:
  • pi_latent_vector – ([float]) the latent pi values
  • vf_latent_vector – ([float]) the latent vf values
  • init_scale – (float) the inital scale of the distribution
  • init_bias – (float) the inital bias of the distribution
Returns:

(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns:(Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns:(type) the type
sample_shape()[source]

returns the shape of the sampling

Returns:([int]) the shape
class stable_baselines.common.distributions.DiagGaussianProbabilityDistribution(flat)[source]
entropy()[source]

Returns shannon’s entropy of the probability

Returns:(float) the entropy
flatparam()[source]

Return the direct probabilities

Returns:([float]) the probabilites
classmethod fromflat(flat)[source]

Create an instance of this from new multivariate gaussian input

Parameters:flat – ([float]) the multivariate gaussian input data
Returns:(ProbabilityDistribution) the instance from the given multivariate gaussian input data
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probabilty distribution

Parameters:other – ([float]) the distribution to compare with
Returns:(float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns:(Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters:x – (str) the labels of each index
Returns:([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probabilty distribution

Returns:(Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.DiagGaussianProbabilityDistributionType(size)[source]
param_shape()[source]

returns the shape of the input parameters

Returns:([int]) the shape
proba_distribution_from_flat(flat)[source]

returns the probability distribution from flat probabilities

Parameters:flat – ([float]) the flat probabilities
Returns:(ProbabilityDistribution) the instance of the ProbabilityDistribution associated
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters:
  • pi_latent_vector – ([float]) the latent pi values
  • vf_latent_vector – ([float]) the latent vf values
  • init_scale – (float) the inital scale of the distribution
  • init_bias – (float) the inital bias of the distribution
Returns:

(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns:(Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns:(type) the type
sample_shape()[source]

returns the shape of the sampling

Returns:([int]) the shape
class stable_baselines.common.distributions.MultiCategoricalProbabilityDistribution(nvec, flat)[source]
entropy()[source]

Returns shannon’s entropy of the probability

Returns:(float) the entropy
flatparam()[source]

Return the direct probabilities

Returns:([float]) the probabilites
classmethod fromflat(flat)[source]

Create an instance of this from new logits values

Parameters:flat – ([float]) the multi categorical logits input
Returns:(ProbabilityDistribution) the instance from the given multi categorical input
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probabilty distribution

Parameters:other – ([float]) the distribution to compare with
Returns:(float) the KL divergence of the two distributions
mode()[source]

Returns the probability

Returns:(Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters:x – (str) the labels of each index
Returns:([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probabilty distribution

Returns:(Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.MultiCategoricalProbabilityDistributionType(n_vec)[source]
param_shape()[source]

returns the shape of the input parameters

Returns:([int]) the shape
proba_distribution_from_flat(flat)[source]

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters:flat – ([float]) the flat probabilities
Returns:(ProbabilityDistribution) the instance of the ProbabilityDistribution associated
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters:
  • pi_latent_vector – ([float]) the latent pi values
  • vf_latent_vector – ([float]) the latent vf values
  • init_scale – (float) the inital scale of the distribution
  • init_bias – (float) the inital bias of the distribution
Returns:

(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns:(Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns:(type) the type
sample_shape()[source]

returns the shape of the sampling

Returns:([int]) the shape
class stable_baselines.common.distributions.ProbabilityDistribution[source]

Base class for describing a probability distribution.

entropy()[source]

Returns shannon’s entropy of the probability

Returns:(float) the entropy
flatparam()[source]

Return the direct probabilities

Returns:([float]) the probabilites
kl(other)[source]

Calculates the Kullback-Leibler divergence from the given probabilty distribution

Parameters:other – ([float]) the distribution to compare with
Returns:(float) the KL divergence of the two distributions
logp(x)[source]

returns the of the log likelihood

Parameters:x – (str) the labels of each index
Returns:([float]) The log likelihood of the distribution
mode()[source]

Returns the probability

Returns:(Tensorflow Tensor) the deterministic action
neglogp(x)[source]

returns the of the negative log likelihood

Parameters:x – (str) the labels of each index
Returns:([float]) The negative log likelihood of the distribution
sample()[source]

returns a sample from the probabilty distribution

Returns:(Tensorflow Tensor) the stochastic action
class stable_baselines.common.distributions.ProbabilityDistributionType[source]

Parametrized family of probability distributions

param_placeholder(prepend_shape, name=None)[source]

returns the TensorFlow placeholder for the input parameters

Parameters:
  • prepend_shape – ([int]) the prepend shape
  • name – (str) the placeholder name
Returns:

(TensorFlow Tensor) the placeholder

param_shape()[source]

returns the shape of the input parameters

Returns:([int]) the shape
proba_distribution_from_flat(flat)[source]

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters:flat – ([float]) the flat probabilities
Returns:(ProbabilityDistribution) the instance of the ProbabilityDistribution associated
proba_distribution_from_latent(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters:
  • pi_latent_vector – ([float]) the latent pi values
  • vf_latent_vector – ([float]) the latent vf values
  • init_scale – (float) the inital scale of the distribution
  • init_bias – (float) the inital bias of the distribution
Returns:

(ProbabilityDistribution) the instance of the ProbabilityDistribution associated

probability_distribution_class()[source]

returns the ProbabilityDistribution class of this type

Returns:(Type ProbabilityDistribution) the probability distribution class associated
sample_dtype()[source]

returns the type of the sampling

Returns:(type) the type
sample_placeholder(prepend_shape, name=None)[source]

returns the TensorFlow placeholder for the sampling

Parameters:
  • prepend_shape – ([int]) the prepend shape
  • name – (str) the placeholder name
Returns:

(TensorFlow Tensor) the placeholder

sample_shape()[source]

returns the shape of the sampling

Returns:([int]) the shape
stable_baselines.common.distributions.make_proba_dist_type(ac_space)[source]

return an instance of ProbabilityDistributionType for the correct type of action space

Parameters:ac_space – (Gym Space) the input action space
Returns:(ProbabilityDistributionType) the approriate instance of a ProbabilityDistributionType
stable_baselines.common.distributions.shape_el(tensor, index)[source]

get the shape of a TensorFlow Tensor element

Parameters:
  • tensor – (TensorFlow Tensor) the input tensor
  • index – (int) the element
Returns:

([int]) the shape