Warning

This package is in maintenance mode, please use Stable-Baselines3 (SB3) for an up-to-date version. You can find a migration guide in SB3 documentation.

Schedules

Schedules are used as hyperparameter for most of the algorithms, in order to change value of a parameter over time (usually the learning rate).

This file is used for specifying various schedules that evolve over time throughout the execution of the algorithm, such as:

  • learning rate for the optimizer
  • exploration epsilon for the epsilon greedy exploration strategy
  • beta parameter for beta parameter in prioritized replay

Each schedule has a function value(t) which returns the current value of the parameter given the timestep t of the optimization procedure.

class stable_baselines.common.schedules.ConstantSchedule(value)[source]

Value remains constant over time.

Parameters:value – (float) Constant value of the schedule
value(step)[source]

Value of the schedule for a given timestep

Parameters:step – (int) the timestep
Returns:(float) the output value for the given timestep
class stable_baselines.common.schedules.LinearSchedule(schedule_timesteps, final_p, initial_p=1.0)[source]

Linear interpolation between initial_p and final_p over schedule_timesteps. After this many timesteps pass final_p is returned.

Parameters:
  • schedule_timesteps – (int) Number of timesteps for which to linearly anneal initial_p to final_p
  • initial_p – (float) initial output value
  • final_p – (float) final output value
value(step)[source]

Value of the schedule for a given timestep

Parameters:step – (int) the timestep
Returns:(float) the output value for the given timestep
class stable_baselines.common.schedules.PiecewiseSchedule(endpoints, interpolation=<function linear_interpolation>, outside_value=None)[source]

Piecewise schedule.

Parameters:
  • endpoints – ([(int, int)]) list of pairs (time, value) meaning that schedule should output value when t==time. All the values for time must be sorted in an increasing order. When t is between two times, e.g. (time_a, value_a) and (time_b, value_b), such that time_a <= t < time_b then value outputs interpolation(value_a, value_b, alpha) where alpha is a fraction of time passed between time_a and time_b for time t.
  • interpolation – (lambda (float, float, float): float) a function that takes value to the left and to the right of t according to the endpoints. Alpha is the fraction of distance from left endpoint to right endpoint that t has covered. See linear_interpolation for example.
  • outside_value – (float) if the value is requested outside of all the intervals specified in endpoints this value is returned. If None then AssertionError is raised when outside value is requested.
value(step)[source]

Value of the schedule for a given timestep

Parameters:step – (int) the timestep
Returns:(float) the output value for the given timestep
stable_baselines.common.schedules.constant(_)[source]

Returns a constant value for the Scheduler

Parameters:_ – ignored
Returns:(float) 1
stable_baselines.common.schedules.constfn(val)[source]

Create a function that returns a constant It is useful for learning rate schedule (to avoid code duplication)

Parameters:val – (float)
Returns:(function)
stable_baselines.common.schedules.double_linear_con(progress)[source]

Returns a linear value (x2) with a flattened tail for the Scheduler

Parameters:progress – (float) Current progress status (in [0, 1])
Returns:(float) 1 - progress*2 if (1 - progress*2) >= 0.125 else 0.125
stable_baselines.common.schedules.double_middle_drop(progress)[source]

Returns a linear value with two drops near the middle to a constant value for the Scheduler

Parameters:progress – (float) Current progress status (in [0, 1])
Returns:(float) if 0.75 <= 1 - p: 1 - p, if 0.25 <= 1 - p < 0.75: 0.75, if 1 - p < 0.25: 0.125
stable_baselines.common.schedules.get_schedule_fn(value_schedule)[source]

Transform (if needed) learning rate and clip range to callable.

Parameters:value_schedule – (callable or float)
Returns:(function)
stable_baselines.common.schedules.linear_interpolation(left, right, alpha)[source]

Linear interpolation between left and right.

Parameters:
  • left – (float) left boundary
  • right – (float) right boundary
  • alpha – (float) coeff in [0, 1]
Returns:

(float)

stable_baselines.common.schedules.linear_schedule(progress)[source]

Returns a linear value for the Scheduler

Parameters:progress – (float) Current progress status (in [0, 1])
Returns:(float) 1 - progress
stable_baselines.common.schedules.middle_drop(progress)[source]

Returns a linear value with a drop near the middle to a constant value for the Scheduler

Parameters:progress – (float) Current progress status (in [0, 1])
Returns:(float) 1 - progress if (1 - progress) >= 0.75 else 0.075