Warning
This package is in maintenance mode, please use Stable-Baselines3 (SB3) for an up-to-date version. You can find a migration guide in SB3 documentation.
Tensorflow Utils¶
-
stable_baselines.common.tf_util.
avg_norm
(tensor)[source]¶ Return an average of the L2 normalization of the batch
Parameters: tensor – (TensorFlow Tensor) The input tensor Returns: (TensorFlow Tensor) Average L2 normalization of the batch
-
stable_baselines.common.tf_util.
batch_to_seq
(tensor_batch, n_batch, n_steps, flat=False)[source]¶ Transform a batch of Tensors, into a sequence of Tensors for recurrent policies
Parameters: - tensor_batch – (TensorFlow Tensor) The input tensor to unroll
- n_batch – (int) The number of batch to run (n_envs * n_steps)
- n_steps – (int) The number of steps to run for each environment
- flat – (bool) If the input Tensor is flat
Returns: (TensorFlow Tensor) sequence of Tensors for recurrent policies
-
stable_baselines.common.tf_util.
calc_entropy
(logits)[source]¶ Calculates the entropy of the output values of the network
Parameters: logits – (TensorFlow Tensor) The input probability for each action Returns: (TensorFlow Tensor) The Entropy of the output values of the network
-
stable_baselines.common.tf_util.
check_shape
(tensors, shapes)[source]¶ Verifies the tensors match the given shape, will raise an error if the shapes do not match
Parameters: - tensors – ([TensorFlow Tensor]) The tensors that should be checked
- shapes – ([list]) The list of shapes for each tensor
-
stable_baselines.common.tf_util.
flatgrad
(loss, var_list, clip_norm=None)[source]¶ calculates the gradient and flattens it
Parameters: - loss – (float) the loss value
- var_list – ([TensorFlow Tensor]) the variables
- clip_norm – (float) clip the gradients (disabled if None)
Returns: ([TensorFlow Tensor]) flattened gradient
-
stable_baselines.common.tf_util.
function
(inputs, outputs, updates=None, givens=None)[source]¶ Take a bunch of tensorflow placeholders and expressions computed based on those placeholders and produces f(inputs) -> outputs. Function f takes values to be fed to the input’s placeholders and produces the values of the expressions in outputs. Just like a Theano function.
Input values can be passed in the same order as inputs or can be provided as kwargs based on placeholder name (passed to constructor or accessible via placeholder.op.name).
- Example:
>>> x = tf.placeholder(tf.int32, (), name="x") >>> y = tf.placeholder(tf.int32, (), name="y") >>> z = 3 * x + 2 * y >>> lin = function([x, y], z, givens={y: 0}) >>> with single_threaded_session(): >>> initialize() >>> assert lin(2) == 6 >>> assert lin(x=3) == 9 >>> assert lin(2, 2) == 10
Parameters: - inputs – (TensorFlow Tensor or Object with make_feed_dict) list of input arguments
- outputs – (TensorFlow Tensor) list of outputs or a single output to be returned from function. Returned value will also have the same shape.
- updates – ([tf.Operation] or tf.Operation) list of update functions or single update function that will be run whenever the function is called. The return is ignored.
- givens – (dict) the values known for the output
-
stable_baselines.common.tf_util.
get_globals_vars
(name)[source]¶ returns the trainable variables
Parameters: name – (str) the scope Returns: ([TensorFlow Variable])
-
stable_baselines.common.tf_util.
get_trainable_vars
(name)[source]¶ returns the trainable variables
Parameters: name – (str) the scope Returns: ([TensorFlow Variable])
-
stable_baselines.common.tf_util.
gradient_add
(grad_1, grad_2, param, verbose=0)[source]¶ Sum two gradients
Parameters: - grad_1 – (TensorFlow Tensor) The first gradient
- grad_2 – (TensorFlow Tensor) The second gradient
- param – (TensorFlow parameters) The trainable parameters
- verbose – (int) verbosity level
Returns: (TensorFlow Tensor) the sum of the gradients
-
stable_baselines.common.tf_util.
huber_loss
(tensor, delta=1.0)[source]¶ Reference: https://en.wikipedia.org/wiki/Huber_loss
Parameters: - tensor – (TensorFlow Tensor) the input value
- delta – (float) Huber loss delta value
Returns: (TensorFlow Tensor) Huber loss output
-
stable_baselines.common.tf_util.
in_session
(func)[source]¶ Wraps a function so that it is in a TensorFlow Session
Parameters: func – (function) the function to wrap Returns: (function)
-
stable_baselines.common.tf_util.
initialize
(sess=None)[source]¶ Initialize all the uninitialized variables in the global scope.
Parameters: sess – (TensorFlow Session)
-
stable_baselines.common.tf_util.
intprod
(tensor)[source]¶ calculates the product of all the elements in a list
Parameters: tensor – ([Number]) the list of elements Returns: (int) the product truncated
-
stable_baselines.common.tf_util.
is_image
(tensor)[source]¶ Check if a tensor has the shape of a valid image for tensorboard logging. Valid image: RGB, RGBD, GrayScale
Parameters: tensor – (np.ndarray or tf.placeholder) Returns: (bool)
-
stable_baselines.common.tf_util.
make_session
(num_cpu=None, make_default=False, graph=None)[source]¶ Returns a session that will use <num_cpu> CPU’s only
Parameters: - num_cpu – (int) number of CPUs to use for TensorFlow
- make_default – (bool) if this should return an InteractiveSession or a normal Session
- graph – (TensorFlow Graph) the graph of the session
Returns: (TensorFlow session)
-
stable_baselines.common.tf_util.
mse
(pred, target)[source]¶ Returns the Mean squared error between prediction and target
Parameters: - pred – (TensorFlow Tensor) The predicted value
- target – (TensorFlow Tensor) The target value
Returns: (TensorFlow Tensor) The Mean squared error between prediction and target
-
stable_baselines.common.tf_util.
numel
(tensor)[source]¶ get TensorFlow Tensor’s number of elements
Parameters: tensor – (TensorFlow Tensor) the input tensor Returns: (int) the number of elements
-
stable_baselines.common.tf_util.
outer_scope_getter
(scope, new_scope='')[source]¶ remove a scope layer for the getter
Parameters: - scope – (str) the layer to remove
- new_scope – (str) optional replacement name
Returns: (function (function, str,
*args
,**kwargs
): Tensorflow Tensor)
-
stable_baselines.common.tf_util.
q_explained_variance
(q_pred, q_true)[source]¶ Calculates the explained variance of the Q value
Parameters: - q_pred – (TensorFlow Tensor) The predicted Q value
- q_true – (TensorFlow Tensor) The expected Q value
Returns: (TensorFlow Tensor) the explained variance of the Q value
-
stable_baselines.common.tf_util.
sample
(logits)[source]¶ Creates a sampling Tensor for non deterministic policies when using categorical distribution. It uses the Gumbel-max trick: http://amid.fish/humble-gumbel
Parameters: logits – (TensorFlow Tensor) The input probability for each action Returns: (TensorFlow Tensor) The sampled action
-
stable_baselines.common.tf_util.
seq_to_batch
(tensor_sequence, flat=False)[source]¶ Transform a sequence of Tensors, into a batch of Tensors for recurrent policies
Parameters: - tensor_sequence – (TensorFlow Tensor) The input tensor to batch
- flat – (bool) If the input Tensor is flat
Returns: (TensorFlow Tensor) batch of Tensors for recurrent policies
-
stable_baselines.common.tf_util.
single_threaded_session
(make_default=False, graph=None)[source]¶ Returns a session which will only use a single CPU
Parameters: - make_default – (bool) if this should return an InteractiveSession or a normal Session
- graph – (TensorFlow Graph) the graph of the session
Returns: (TensorFlow session)
-
stable_baselines.common.tf_util.
total_episode_reward_logger
(rew_acc, rewards, masks, writer, steps)[source]¶ calculates the cumulated episode reward, and prints to tensorflow log the output
Parameters: - rew_acc – (np.array float) the total running reward
- rewards – (np.array float) the rewards
- masks – (np.array bool) the end of episodes
- writer – (TensorFlow Session.writer) the writer to log to
- steps – (int) the current timestep
Returns: (np.array float) the updated total running reward
Returns: (np.array float) the updated total running reward