Schedules¶
Schedules are used as hyperparameter for most of the algortihms, in order to change value of a parameter over time (usuallly the learning rate).
This file is used for specifying various schedules that evolve over time throughout the execution of the algorithm, such as:
- learning rate for the optimizer
- exploration epsilon for the epsilon greedy exploration strategy
- beta parameter for beta parameter in prioritized replay
Each schedule has a function value(t) which returns the current value of the parameter given the timestep t of the optimization procedure.
-
class
stable_baselines.common.schedules.
ConstantSchedule
(value)[source]¶ Value remains constant over time.
Parameters: value – (float) Constant value of the schedule
-
class
stable_baselines.common.schedules.
LinearSchedule
(schedule_timesteps, final_p, initial_p=1.0)[source]¶ Linear interpolation between initial_p and final_p over schedule_timesteps. After this many timesteps pass final_p is returned.
Parameters: - schedule_timesteps – (int) Number of timesteps for which to linearly anneal initial_p to final_p
- initial_p – (float) initial output value
- final_p – (float) final output value
-
class
stable_baselines.common.schedules.
PiecewiseSchedule
(endpoints, interpolation=<function linear_interpolation>, outside_value=None)[source]¶ Piecewise schedule.
Parameters: - endpoints – ([(int, int)]) list of pairs (time, value) meanining that schedule should output value when t==time. All the values for time must be sorted in an increasing order. When t is between two times, e.g. (time_a, value_a) and (time_b, value_b), such that time_a <= t < time_b then value outputs interpolation(value_a, value_b, alpha) where alpha is a fraction of time passed between time_a and time_b for time t.
- interpolation – (lambda (float, float, float): float) a function that takes value to the left and to the right of t according to the endpoints. Alpha is the fraction of distance from left endpoint to right endpoint that t has covered. See linear_interpolation for example.
- outside_value – (float) if the value is requested outside of all the intervals sepecified in endpoints this value is returned. If None then AssertionError is raised when outside value is requested.