Probability Distributions¶
Probability distributions used for the different action spaces:
CategoricalProbabilityDistribution
-> DiscreteDiagGaussianProbabilityDistribution
-> Box (continuous actions)MultiCategoricalProbabilityDistribution
-> MultiDiscreteBernoulliProbabilityDistribution
-> MultiBinary
The policy networks output parameters for the distributions (named flat in the methods). Actions are then sampled from those distributions.
For instance, in the case of discrete actions. The policy network outputs probability
of taking each action. The CategoricalProbabilityDistribution
allows to sample from it,
computes the entropy, the negative log probability (neglogp
) and backpropagate the gradient.
In the case of continuous actions, a Gaussian distribution is used. The policy network outputs
mean and (log) std of the distribution (assumed to be a DiagGaussianProbabilityDistribution
).
-
class
stable_baselines.common.distributions.
BernoulliProbabilityDistribution
(logits)[source]¶ -
-
classmethod
fromflat
(flat)[source]¶ Create an instance of this from new bernoulli input
Parameters: flat – ([float]) the bernoulli input data Returns: (ProbabilityDistribution) the instance from the given bernoulli input data
-
kl
(other)[source]¶ Calculates the Kullback-Leibler divergence from the given probabilty distribution
Parameters: other – ([float]) the distibution to compare with Returns: (float) the KL divergence of the two distributions
-
classmethod
-
class
stable_baselines.common.distributions.
BernoulliProbabilityDistributionType
(size)[source]¶ -
-
proba_distribution_from_latent
(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶ returns the probability distribution from latent values
Parameters: - pi_latent_vector – ([float]) the latent pi values
- vf_latent_vector – ([float]) the latent vf values
- init_scale – (float) the inital scale of the distribution
- init_bias – (float) the inital bias of the distribution
Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
-
class
stable_baselines.common.distributions.
CategoricalProbabilityDistribution
(logits)[source]¶ -
-
classmethod
fromflat
(flat)[source]¶ Create an instance of this from new logits values
Parameters: flat – ([float]) the categorical logits input Returns: (ProbabilityDistribution) the instance from the given categorical input
-
kl
(other)[source]¶ Calculates the Kullback-Leibler divergence from the given probabilty distribution
Parameters: other – ([float]) the distibution to compare with Returns: (float) the KL divergence of the two distributions
-
classmethod
-
class
stable_baselines.common.distributions.
CategoricalProbabilityDistributionType
(n_cat)[source]¶ -
-
proba_distribution_from_latent
(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶ returns the probability distribution from latent values
Parameters: - pi_latent_vector – ([float]) the latent pi values
- vf_latent_vector – ([float]) the latent vf values
- init_scale – (float) the inital scale of the distribution
- init_bias – (float) the inital bias of the distribution
Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
-
class
stable_baselines.common.distributions.
DiagGaussianProbabilityDistribution
(flat)[source]¶ -
-
classmethod
fromflat
(flat)[source]¶ Create an instance of this from new multivariate gaussian input
Parameters: flat – ([float]) the multivariate gaussian input data Returns: (ProbabilityDistribution) the instance from the given multivariate gaussian input data
-
kl
(other)[source]¶ Calculates the Kullback-Leibler divergence from the given probabilty distribution
Parameters: other – ([float]) the distibution to compare with Returns: (float) the KL divergence of the two distributions
-
classmethod
-
class
stable_baselines.common.distributions.
DiagGaussianProbabilityDistributionType
(size)[source]¶ -
-
proba_distribution_from_flat
(flat)[source]¶ returns the probability distribution from flat probabilities
Parameters: flat – ([float]) the flat probabilities Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
proba_distribution_from_latent
(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶ returns the probability distribution from latent values
Parameters: - pi_latent_vector – ([float]) the latent pi values
- vf_latent_vector – ([float]) the latent vf values
- init_scale – (float) the inital scale of the distribution
- init_bias – (float) the inital bias of the distribution
Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
-
class
stable_baselines.common.distributions.
MultiCategoricalProbabilityDistribution
(nvec, flat)[source]¶ -
-
classmethod
fromflat
(flat)[source]¶ Create an instance of this from new logits values
Parameters: flat – ([float]) the multi categorical logits input Returns: (ProbabilityDistribution) the instance from the given multi categorical input
-
kl
(other)[source]¶ Calculates the Kullback-Leibler divergence from the given probabilty distribution
Parameters: other – ([float]) the distibution to compare with Returns: (float) the KL divergence of the two distributions
-
classmethod
-
class
stable_baselines.common.distributions.
MultiCategoricalProbabilityDistributionType
(n_vec)[source]¶ -
-
proba_distribution_from_flat
(flat)[source]¶ Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution
Parameters: flat – ([float]) the flat probabilities Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
proba_distribution_from_latent
(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶ returns the probability distribution from latent values
Parameters: - pi_latent_vector – ([float]) the latent pi values
- vf_latent_vector – ([float]) the latent vf values
- init_scale – (float) the inital scale of the distribution
- init_bias – (float) the inital bias of the distribution
Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
-
class
stable_baselines.common.distributions.
ProbabilityDistribution
[source]¶ A particular probability distribution
-
kl
(other)[source]¶ Calculates the Kullback-Leibler divergence from the given probabilty distribution
Parameters: other – ([float]) the distibution to compare with Returns: (float) the KL divergence of the two distributions
-
logp
(x)[source]¶ returns the of the log likelihood
Parameters: x – (str) the labels of each index Returns: ([float]) The log likelihood of the distribution
-
-
class
stable_baselines.common.distributions.
ProbabilityDistributionType
[source]¶ Parametrized family of probability distributions
-
param_placeholder
(prepend_shape, name=None)[source]¶ returns the TensorFlow placeholder for the input parameters
Parameters: - prepend_shape – ([int]) the prepend shape
- name – (str) the placeholder name
Returns: (TensorFlow Tensor) the placeholder
-
proba_distribution_from_flat
(flat)[source]¶ Returns the probability distribution from flat probabilities flat: flattened vector of parameters of probability distribution
Parameters: flat – ([float]) the flat probabilities Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
proba_distribution_from_latent
(pi_latent_vector, vf_latent_vector, init_scale=1.0, init_bias=0.0)[source]¶ returns the probability distribution from latent values
Parameters: - pi_latent_vector – ([float]) the latent pi values
- vf_latent_vector – ([float]) the latent vf values
- init_scale – (float) the inital scale of the distribution
- init_bias – (float) the inital bias of the distribution
Returns: (ProbabilityDistribution) the instance of the ProbabilityDistribution associated
-
probability_distribution_class
()[source]¶ returns the ProbabilityDistribution class of this type
Returns: (Type ProbabilityDistribution) the probability distribution class associated
-
-
stable_baselines.common.distributions.
make_proba_dist_type
(ac_space)[source]¶ return an instance of ProbabilityDistributionType for the correct type of action space
Parameters: ac_space – (Gym Space) the input action space Returns: (ProbabilityDistributionType) the approriate instance of a ProbabilityDistributionType