Experiment functions

Functions to run experiments more efficiently. The usage of these functions is optional and they are only compatible with agents defined in this package. Using agents from other packages such as Stable Baselines or RLlib may require using their own experiment functions.

source

EarlyStoppingHandler

 EarlyStoppingHandler (patience:int=50, warmup:int=100, criteria:str='J',
                       direction:str='max')

Class to handle early stopping during experiments. The EarlyStoppingHandler handler calculates the average score over the last “patience” epochs and compares it to the average score over the previous “patience” epochs. Note that one epoch we define here as time in between evaluating on a validation set, for supervised learning typically one epoch is one pass through the training data. For reinforcement learning, in between each evaluation epoch there may be less than one, one, or many episodes played in the training environment.

	Type	Default	Details
patience	int	50	Number of epochs to evaluate for stopping
warmup	int	100	How many initial epochs to wait before evaluating
criteria	str	J	Whether to use discounted rewards J or total rewards R as criteria
direction	str	max	Whether reward shall be maximized or minimized

source

EarlyStoppingHandler.add_result

 EarlyStoppingHandler.add_result (J:float, R:float)

Add the result of the last epoch to the history and check if the experiment should be stopped.

	Type	Details
J	float	Return (discounted rewards) of the last epoch
R	float	Total rewards of the last epoch
Returns	bool

Helper functions

Some functions that are needed to run an experiment

source

save_agent

 save_agent (agent:ddopai.agents.base.BaseAgent, experiment_dir:str,
             save_best:bool, R:float, J:float, best_R:float, best_J:float,
             criteria:str='J', force_save=False)

Save the agent if it has improved either R or J, depending on the criteria argument, vs. the previous epochs

	Type	Default	Details
agent	BaseAgent		Any agent inheriting from BaseAgent
experiment_dir	str		Directory to save the agent,
save_best	bool
R	float
J	float
best_R	float
best_J	float
criteria	str	J
force_save	bool	False

source

update_best

 update_best (R:float, J:float, best_R:float, best_J:float)

Update the best total rewards R and the best discounted rewards J.

	Type	Details
R	float
J	float
best_R	float
best_J	float

source

log_info

 log_info (R:float, J:float, n_epochs:int, tracking:Literal['wandb'],
           mode:Literal['train','val','test'])

Logs the same R, J information repeatedly for n_epoochs. E.g., to draw a straight line in wandb for algorithmes such as XGB, RF, etc. that can be comparared to the learning curves of supervised or reinforcement learning algorithms.

	Type	Details
R	float
J	float
n_epochs	int
tracking	Literal	only wandb implemented so far
mode	Literal

source

calculate_score

 calculate_score (dataset:List, env:ddopai.envs.base.BaseEnvironment)

Calculate the total rewards R and the discounted rewards J of a dataset.

	Type	Details
dataset	List
env	BaseEnvironment	Any environment inheriting from BaseEnvironment
Returns	Tuple