Warning

This package is in maintenance mode, please use Stable-Baselines3 (SB3) for an up-to-date version. You can find a migration guide in SB3 documentation.

Tensorflow Utils

stable_baselines.common.tf_util.avg_norm(tensor)[source]

Return an average of the L2 normalization of the batch

Parameters:tensor – (TensorFlow Tensor) The input tensor
Returns:(TensorFlow Tensor) Average L2 normalization of the batch
stable_baselines.common.tf_util.batch_to_seq(tensor_batch, n_batch, n_steps, flat=False)[source]

Transform a batch of Tensors, into a sequence of Tensors for recurrent policies

Parameters:
  • tensor_batch – (TensorFlow Tensor) The input tensor to unroll
  • n_batch – (int) The number of batch to run (n_envs * n_steps)
  • n_steps – (int) The number of steps to run for each environment
  • flat – (bool) If the input Tensor is flat
Returns:

(TensorFlow Tensor) sequence of Tensors for recurrent policies

stable_baselines.common.tf_util.calc_entropy(logits)[source]

Calculates the entropy of the output values of the network

Parameters:logits – (TensorFlow Tensor) The input probability for each action
Returns:(TensorFlow Tensor) The Entropy of the output values of the network
stable_baselines.common.tf_util.check_shape(tensors, shapes)[source]

Verifies the tensors match the given shape, will raise an error if the shapes do not match

Parameters:
  • tensors – ([TensorFlow Tensor]) The tensors that should be checked
  • shapes – ([list]) The list of shapes for each tensor
stable_baselines.common.tf_util.flatgrad(loss, var_list, clip_norm=None)[source]

calculates the gradient and flattens it

Parameters:
  • loss – (float) the loss value
  • var_list – ([TensorFlow Tensor]) the variables
  • clip_norm – (float) clip the gradients (disabled if None)
Returns:

([TensorFlow Tensor]) flattened gradient

stable_baselines.common.tf_util.function(inputs, outputs, updates=None, givens=None)[source]

Take a bunch of tensorflow placeholders and expressions computed based on those placeholders and produces f(inputs) -> outputs. Function f takes values to be fed to the input’s placeholders and produces the values of the expressions in outputs. Just like a Theano function.

Input values can be passed in the same order as inputs or can be provided as kwargs based on placeholder name (passed to constructor or accessible via placeholder.op.name).

Example:
>>> x = tf.placeholder(tf.int32, (), name="x")
>>> y = tf.placeholder(tf.int32, (), name="y")
>>> z = 3 * x + 2 * y
>>> lin = function([x, y], z, givens={y: 0})
>>> with single_threaded_session():
>>>     initialize()
>>>     assert lin(2) == 6
>>>     assert lin(x=3) == 9
>>>     assert lin(2, 2) == 10
Parameters:
  • inputs – (TensorFlow Tensor or Object with make_feed_dict) list of input arguments
  • outputs – (TensorFlow Tensor) list of outputs or a single output to be returned from function. Returned value will also have the same shape.
  • updates – ([tf.Operation] or tf.Operation) list of update functions or single update function that will be run whenever the function is called. The return is ignored.
  • givens – (dict) the values known for the output
stable_baselines.common.tf_util.get_globals_vars(name)[source]

returns the trainable variables

Parameters:name – (str) the scope
Returns:([TensorFlow Variable])
stable_baselines.common.tf_util.get_trainable_vars(name)[source]

returns the trainable variables

Parameters:name – (str) the scope
Returns:([TensorFlow Variable])
stable_baselines.common.tf_util.gradient_add(grad_1, grad_2, param, verbose=0)[source]

Sum two gradients

Parameters:
  • grad_1 – (TensorFlow Tensor) The first gradient
  • grad_2 – (TensorFlow Tensor) The second gradient
  • param – (TensorFlow parameters) The trainable parameters
  • verbose – (int) verbosity level
Returns:

(TensorFlow Tensor) the sum of the gradients

stable_baselines.common.tf_util.huber_loss(tensor, delta=1.0)[source]

Reference: https://en.wikipedia.org/wiki/Huber_loss

Parameters:
  • tensor – (TensorFlow Tensor) the input value
  • delta – (float) Huber loss delta value
Returns:

(TensorFlow Tensor) Huber loss output

stable_baselines.common.tf_util.in_session(func)[source]

Wraps a function so that it is in a TensorFlow Session

Parameters:func – (function) the function to wrap
Returns:(function)
stable_baselines.common.tf_util.initialize(sess=None)[source]

Initialize all the uninitialized variables in the global scope.

Parameters:sess – (TensorFlow Session)
stable_baselines.common.tf_util.intprod(tensor)[source]

calculates the product of all the elements in a list

Parameters:tensor – ([Number]) the list of elements
Returns:(int) the product truncated
stable_baselines.common.tf_util.is_image(tensor)[source]

Check if a tensor has the shape of a valid image for tensorboard logging. Valid image: RGB, RGBD, GrayScale

Parameters:tensor – (np.ndarray or tf.placeholder)
Returns:(bool)
stable_baselines.common.tf_util.make_session(num_cpu=None, make_default=False, graph=None)[source]

Returns a session that will use <num_cpu> CPU’s only

Parameters:
  • num_cpu – (int) number of CPUs to use for TensorFlow
  • make_default – (bool) if this should return an InteractiveSession or a normal Session
  • graph – (TensorFlow Graph) the graph of the session
Returns:

(TensorFlow session)

stable_baselines.common.tf_util.mse(pred, target)[source]

Returns the Mean squared error between prediction and target

Parameters:
  • pred – (TensorFlow Tensor) The predicted value
  • target – (TensorFlow Tensor) The target value
Returns:

(TensorFlow Tensor) The Mean squared error between prediction and target

stable_baselines.common.tf_util.numel(tensor)[source]

get TensorFlow Tensor’s number of elements

Parameters:tensor – (TensorFlow Tensor) the input tensor
Returns:(int) the number of elements
stable_baselines.common.tf_util.outer_scope_getter(scope, new_scope='')[source]

remove a scope layer for the getter

Parameters:
  • scope – (str) the layer to remove
  • new_scope – (str) optional replacement name
Returns:

(function (function, str, *args, **kwargs): Tensorflow Tensor)

stable_baselines.common.tf_util.q_explained_variance(q_pred, q_true)[source]

Calculates the explained variance of the Q value

Parameters:
  • q_pred – (TensorFlow Tensor) The predicted Q value
  • q_true – (TensorFlow Tensor) The expected Q value
Returns:

(TensorFlow Tensor) the explained variance of the Q value

stable_baselines.common.tf_util.sample(logits)[source]

Creates a sampling Tensor for non deterministic policies when using categorical distribution. It uses the Gumbel-max trick: http://amid.fish/humble-gumbel

Parameters:logits – (TensorFlow Tensor) The input probability for each action
Returns:(TensorFlow Tensor) The sampled action
stable_baselines.common.tf_util.seq_to_batch(tensor_sequence, flat=False)[source]

Transform a sequence of Tensors, into a batch of Tensors for recurrent policies

Parameters:
  • tensor_sequence – (TensorFlow Tensor) The input tensor to batch
  • flat – (bool) If the input Tensor is flat
Returns:

(TensorFlow Tensor) batch of Tensors for recurrent policies

stable_baselines.common.tf_util.single_threaded_session(make_default=False, graph=None)[source]

Returns a session which will only use a single CPU

Parameters:
  • make_default – (bool) if this should return an InteractiveSession or a normal Session
  • graph – (TensorFlow Graph) the graph of the session
Returns:

(TensorFlow session)

stable_baselines.common.tf_util.total_episode_reward_logger(rew_acc, rewards, masks, writer, steps)[source]

calculates the cumulated episode reward, and prints to tensorflow log the output

Parameters:
  • rew_acc – (np.array float) the total running reward
  • rewards – (np.array float) the rewards
  • masks – (np.array bool) the end of episodes
  • writer – (TensorFlow Session.writer) the writer to log to
  • steps – (int) the current timestep
Returns:

(np.array float) the updated total running reward

Returns:

(np.array float) the updated total running reward

stable_baselines.common.tf_util.var_shape(tensor)[source]

get TensorFlow Tensor shape

Parameters:tensor – (TensorFlow Tensor) the input tensor
Returns:([int]) the shape