Tensorflow Utils¶

stable_baselines.common.tf_util.avg_norm(tensor)[source]¶

Return an average of the L2 normalization of the batch

Parameters:	tensor – (TensorFlow Tensor) The input tensor
Returns:	(TensorFlow Tensor) Average L2 normalization of the batch

stable_baselines.common.tf_util.batch_to_seq(tensor_batch, n_batch, n_steps, flat=False)[source]¶

Transform a batch of Tensors, into a sequence of Tensors for recurrent policies

Parameters:	tensor_batch – (TensorFlow Tensor) The input tensor to unroll n_batch – (int) The number of batch to run (n_envs * n_steps) n_steps – (int) The number of steps to run for each environment flat – (bool) If the input Tensor is flat
Returns:	(TensorFlow Tensor) sequence of Tensors for recurrent policies

stable_baselines.common.tf_util.calc_entropy(logits)[source]¶

Calculates the entropy of the output values of the network

Parameters:	logits – (TensorFlow Tensor) The input probability for each action
Returns:	(TensorFlow Tensor) The Entropy of the output values of the network

stable_baselines.common.tf_util.check_shape(tensors, shapes)[source]¶

Verifies the tensors match the given shape, will raise an error if the shapes do not match

Parameters:	tensors – ([TensorFlow Tensor]) The tensors that should be checked shapes – ([list]) The list of shapes for each tensor

stable_baselines.common.tf_util.flatgrad(loss, var_list, clip_norm=None)[source]¶

calculates the gradient and flattens it

Parameters:	loss – (float) the loss value var_list – ([TensorFlow Tensor]) the variables clip_norm – (float) clip the gradients (disabled if None)
Returns:	([TensorFlow Tensor]) flattened gradient

stable_baselines.common.tf_util.function(inputs, outputs, updates=None, givens=None)[source]¶

Take a bunch of tensorflow placeholders and expressions computed based on those placeholders and produces f(inputs) -> outputs. Function f takes values to be fed to the input’s placeholders and produces the values of the expressions in outputs. Just like a Theano function.

Input values can be passed in the same order as inputs or can be provided as kwargs based on placeholder name (passed to constructor or accessible via placeholder.op.name).

Example:

>>> x = tf.placeholder(tf.int32, (), name="x")
>>> y = tf.placeholder(tf.int32, (), name="y")
>>> z = 3 * x + 2 * y
>>> lin = function([x, y], z, givens={y: 0})
>>> with single_threaded_session():
>>>     initialize()
>>>     assert lin(2) == 6
>>>     assert lin(x=3) == 9
>>>     assert lin(2, 2) == 10

Parameters:

inputs – (TensorFlow Tensor or Object with make_feed_dict) list of input arguments
outputs – (TensorFlow Tensor) list of outputs or a single output to be returned from function. Returned value will also have the same shape.
updates – ([tf.Operation] or tf.Operation) list of update functions or single update function that will be run whenever the function is called. The return is ignored.
givens – (dict) the values known for the output

stable_baselines.common.tf_util.get_globals_vars(name)[source]¶

returns the trainable variables

Parameters:	name – (str) the scope
Returns:	([TensorFlow Variable])

stable_baselines.common.tf_util.get_trainable_vars(name)[source]¶

returns the trainable variables

Parameters:	name – (str) the scope
Returns:	([TensorFlow Variable])

stable_baselines.common.tf_util.gradient_add(grad_1, grad_2, param, verbose=0)[source]¶

Sum two gradients

Parameters:	grad_1 – (TensorFlow Tensor) The first gradient grad_2 – (TensorFlow Tensor) The second gradient param – (TensorFlow parameters) The trainable parameters verbose – (int) verbosity level
Returns:	(TensorFlow Tensor) the sum of the gradients

stable_baselines.common.tf_util.huber_loss(tensor, delta=1.0)[source]¶

Reference: https://en.wikipedia.org/wiki/Huber_loss

Parameters:	tensor – (TensorFlow Tensor) the input value delta – (float) Huber loss delta value
Returns:	(TensorFlow Tensor) Huber loss output

stable_baselines.common.tf_util.in_session(func)[source]¶

Wraps a function so that it is in a TensorFlow Session

Parameters:	func – (function) the function to wrap
Returns:	(function)

stable_baselines.common.tf_util.initialize(sess=None)[source]¶

Initialize all the uninitialized variables in the global scope.

Parameters:	sess – (TensorFlow Session)

stable_baselines.common.tf_util.intprod(tensor)[source]¶

calculates the product of all the elements in a list

Parameters:	tensor – ([Number]) the list of elements
Returns:	(int) the product truncated

stable_baselines.common.tf_util.is_image(tensor)[source]¶

Check if a tensor has the shape of a valid image for tensorboard logging. Valid image: RGB, RGBD, GrayScale

Parameters:	tensor – (np.ndarray or tf.placeholder)
Returns:	(bool)

stable_baselines.common.tf_util.make_session(num_cpu=None, make_default=False, graph=None)[source]¶

Returns a session that will use <num_cpu> CPU’s only

Parameters:	num_cpu – (int) number of CPUs to use for TensorFlow make_default – (bool) if this should return an InteractiveSession or a normal Session graph – (TensorFlow Graph) the graph of the session
Returns:	(TensorFlow session)

stable_baselines.common.tf_util.mse(pred, target)[source]¶

Returns the Mean squared error between prediction and target

Parameters:	pred – (TensorFlow Tensor) The predicted value target – (TensorFlow Tensor) The target value
Returns:	(TensorFlow Tensor) The Mean squared error between prediction and target

stable_baselines.common.tf_util.numel(tensor)[source]¶

get TensorFlow Tensor’s number of elements

Parameters:	tensor – (TensorFlow Tensor) the input tensor
Returns:	(int) the number of elements

stable_baselines.common.tf_util.outer_scope_getter(scope, new_scope='')[source]¶

remove a scope layer for the getter

Parameters:	scope – (str) the layer to remove new_scope – (str) optional replacement name
Returns:	(function (function, str, `args`, `*kwargs`): Tensorflow Tensor)

stable_baselines.common.tf_util.q_explained_variance(q_pred, q_true)[source]¶

Calculates the explained variance of the Q value

Parameters:	q_pred – (TensorFlow Tensor) The predicted Q value q_true – (TensorFlow Tensor) The expected Q value
Returns:	(TensorFlow Tensor) the explained variance of the Q value

stable_baselines.common.tf_util.sample(logits)[source]¶

Creates a sampling Tensor for non deterministic policies when using categorical distribution. It uses the Gumbel-max trick: http://amid.fish/humble-gumbel

Parameters:	logits – (TensorFlow Tensor) The input probability for each action
Returns:	(TensorFlow Tensor) The sampled action

stable_baselines.common.tf_util.seq_to_batch(tensor_sequence, flat=False)[source]¶

Transform a sequence of Tensors, into a batch of Tensors for recurrent policies

Parameters:	tensor_sequence – (TensorFlow Tensor) The input tensor to batch flat – (bool) If the input Tensor is flat
Returns:	(TensorFlow Tensor) batch of Tensors for recurrent policies

stable_baselines.common.tf_util.single_threaded_session(make_default=False, graph=None)[source]¶

Returns a session which will only use a single CPU

Parameters:	make_default – (bool) if this should return an InteractiveSession or a normal Session graph – (TensorFlow Graph) the graph of the session
Returns:	(TensorFlow session)

stable_baselines.common.tf_util.total_episode_reward_logger(rew_acc, rewards, masks, writer, steps)[source]¶

calculates the cumulated episode reward, and prints to tensorflow log the output

Parameters:	rew_acc – (np.array float) the total running reward rewards – (np.array float) the rewards masks – (np.array bool) the end of episodes writer – (TensorFlow Session.writer) the writer to log to steps – (int) the current timestep
Returns:	(np.array float) the updated total running reward
Returns:	(np.array float) the updated total running reward

stable_baselines.common.tf_util.var_shape(tensor)[source]¶

get TensorFlow Tensor shape

Parameters:	tensor – (TensorFlow Tensor) the input tensor
Returns:	([int]) the shape