# Probability Distributions¶

Probability distributions used for the different action spaces:

• `CategoricalProbabilityDistribution` -> Discrete
• `DiagGaussianProbabilityDistribution` -> Box (continuous actions)
• `MultiCategoricalProbabilityDistribution` -> MultiDiscrete
• `BernoulliProbabilityDistribution` -> MultiBinary

The policy networks output parameters for the distributions (named flat in the methods). Actions are then sampled from those distributions.

For instance, in the case of discrete actions. The policy network outputs probability of taking each action. The `CategoricalProbabilityDistribution` allows to sample from it, computes the entropy, the negative log probability (`neglogp`) and backpropagate the gradient.

In the case of continuous actions, a Gaussian distribution is used. The policy network outputs mean and (log) std of the distribution (assumed to be a `DiagGaussianProbabilityDistribution`).

class `stable_baselines.common.distributions.``BernoulliProbabilityDistribution`(logits)[source]
`entropy`()[source]

Returns shannon’s entropy of the probability

Returns: (float) the entropy
`flatparam`()[source]

Return the direct probabilities

Returns: ([float]) the probabilites
classmethod `fromflat`(flat)[source]

Create an instance of this from new bernoulli input

Parameters: flat – ([float]) the bernoulli input data (ProbabilityDistribution) the instance from the given bernoulli input data
`kl`(other)[source]

Calculates the Kullback-Leiber divergence from the given probabilty distribution

Parameters: other – ([float]) the distibution to compare with (float) the KL divergence of the two distributions
`mode`()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
`neglogp`(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
`sample`()[source]

returns a sample from the probabilty distribution

Returns: (Tensorflow Tensor) the stochastic action
class `stable_baselines.common.distributions.``BernoulliProbabilityDistributionType`(size)[source]
`param_shape`()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
`proba_distribution_from_latent`(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the inital scale of the distribution init_bias – (float) the inital bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`probability_distribution_class`()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
`sample_dtype`()[source]

returns the type of the sampling

Returns: (type) the type
`sample_shape`()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class `stable_baselines.common.distributions.``CategoricalProbabilityDistribution`(logits)[source]
`entropy`()[source]

Returns shannon’s entropy of the probability

Returns: (float) the entropy
`flatparam`()[source]

Return the direct probabilities

Returns: ([float]) the probabilites
classmethod `fromflat`(flat)[source]

Create an instance of this from new logits values

Parameters: flat – ([float]) the categorical logits input (ProbabilityDistribution) the instance from the given categorical input
`kl`(other)[source]

Calculates the Kullback-Leiber divergence from the given probabilty distribution

Parameters: other – ([float]) the distibution to compare with (float) the KL divergence of the two distributions
`mode`()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
`neglogp`(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
`sample`()[source]

returns a sample from the probabilty distribution

Returns: (Tensorflow Tensor) the stochastic action
class `stable_baselines.common.distributions.``CategoricalProbabilityDistributionType`(n_cat)[source]
`param_shape`()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
`proba_distribution_from_latent`(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the inital scale of the distribution init_bias – (float) the inital bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`probability_distribution_class`()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
`sample_dtype`()[source]

returns the type of the sampling

Returns: (type) the type
`sample_shape`()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class `stable_baselines.common.distributions.``DiagGaussianProbabilityDistribution`(flat)[source]
`entropy`()[source]

Returns shannon’s entropy of the probability

Returns: (float) the entropy
`flatparam`()[source]

Return the direct probabilities

Returns: ([float]) the probabilites
classmethod `fromflat`(flat)[source]

Create an instance of this from new multivariate gaussian input

Parameters: flat – ([float]) the multivariate gaussian input data (ProbabilityDistribution) the instance from the given multivariate gaussian input data
`kl`(other)[source]

Calculates the Kullback-Leiber divergence from the given probabilty distribution

Parameters: other – ([float]) the distibution to compare with (float) the KL divergence of the two distributions
`mode`()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
`neglogp`(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
`sample`()[source]

returns a sample from the probabilty distribution

Returns: (Tensorflow Tensor) the stochastic action
class `stable_baselines.common.distributions.``DiagGaussianProbabilityDistributionType`(size)[source]
`param_shape`()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
`proba_distribution_from_flat`(flat)[source]

returns the probability distribution from flat probabilities

Parameters: flat – ([float]) the flat probabilities (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`proba_distribution_from_latent`(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the inital scale of the distribution init_bias – (float) the inital bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`probability_distribution_class`()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
`sample_dtype`()[source]

returns the type of the sampling

Returns: (type) the type
`sample_shape`()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class `stable_baselines.common.distributions.``MultiCategoricalProbabilityDistribution`(nvec, flat)[source]
`entropy`()[source]

Returns shannon’s entropy of the probability

Returns: (float) the entropy
`flatparam`()[source]

Return the direct probabilities

Returns: ([float]) the probabilites
classmethod `fromflat`(flat)[source]

Create an instance of this from new logits values

Parameters: flat – ([float]) the multi categorical logits input (ProbabilityDistribution) the instance from the given multi categorical input
`kl`(other)[source]

Calculates the Kullback-Leiber divergence from the given probabilty distribution

Parameters: other – ([float]) the distibution to compare with (float) the KL divergence of the two distributions
`mode`()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
`neglogp`(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
`sample`()[source]

returns a sample from the probabilty distribution

Returns: (Tensorflow Tensor) the stochastic action
class `stable_baselines.common.distributions.``MultiCategoricalProbabilityDistributionType`(n_vec)[source]
`param_shape`()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
`proba_distribution_from_flat`(flat)[source]

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters: flat – ([float]) the flat probabilities (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`proba_distribution_from_latent`(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the inital scale of the distribution init_bias – (float) the inital bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`probability_distribution_class`()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
`sample_dtype`()[source]

returns the type of the sampling

Returns: (type) the type
`sample_shape`()[source]

returns the shape of the sampling

Returns: ([int]) the shape
class `stable_baselines.common.distributions.``ProbabilityDistribution`[source]

A particular probability distribution

`entropy`()[source]

Returns shannon’s entropy of the probability

Returns: (float) the entropy
`flatparam`()[source]

Return the direct probabilities

Returns: ([float]) the probabilites
`kl`(other)[source]

Calculates the Kullback-Leiber divergence from the given probabilty distribution

Parameters: other – ([float]) the distibution to compare with (float) the KL divergence of the two distributions
`logp`(x)[source]

returns the of the log likelihood

Parameters: x – (str) the labels of each index ([float]) The log likelihood of the distribution
`mode`()[source]

Returns the probability

Returns: (Tensorflow Tensor) the deterministic action
`neglogp`(x)[source]

returns the of the negative log likelihood

Parameters: x – (str) the labels of each index ([float]) The negative log likelihood of the distribution
`sample`()[source]

returns a sample from the probabilty distribution

Returns: (Tensorflow Tensor) the stochastic action
class `stable_baselines.common.distributions.``ProbabilityDistributionType`[source]

Parametrized family of probability distributions

`param_placeholder`(prepend_shape, name=None)[source]

returns the TensorFlow placeholder for the input parameters

Parameters: prepend_shape – ([int]) the prepend shape name – (str) the placeholder name (TensorFlow Tensor) the placeholder
`param_shape`()[source]

returns the shape of the input parameters

Returns: ([int]) the shape
`proba_distribution_from_flat`(flat)[source]

Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution

Parameters: flat – ([float]) the flat probabilities (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`proba_distribution_from_latent`(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]

returns the probability distribution from latent values

Parameters: pi_latent_vector – ([float]) the latent pi values vf_latent_vector – ([float]) the latent vf values init_scale – (float) the inital scale of the distribution init_bias – (float) the inital bias of the distribution (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
`probability_distribution_class`()[source]

returns the ProbabilityDistribution class of this type

Returns: (Type ProbabilityDistribution) the probability distribution class associated
`sample_dtype`()[source]

returns the type of the sampling

Returns: (type) the type
`sample_placeholder`(prepend_shape, name=None)[source]

returns the TensorFlow placeholder for the sampling

Parameters: prepend_shape – ([int]) the prepend shape name – (str) the placeholder name (TensorFlow Tensor) the placeholder
`sample_shape`()[source]

returns the shape of the sampling

Returns: ([int]) the shape
`stable_baselines.common.distributions.``make_proba_dist_type`(ac_space)[source]

return an instance of ProbabilityDistributionType for the correct type of action space

Parameters: ac_space – (Gym Space) the input action space (ProbabilityDistributionType) the approriate instance of a ProbabilityDistributionType
`stable_baselines.common.distributions.``shape_el`(tensor, index)[source]

get the shape of a TensorFlow Tensor element

Parameters: tensor – (TensorFlow Tensor) the input tensor index – (int) the element ([int]) the shape