Experiment functions

Functions to run experiments more efficiently. The usage of these functions is optional and they are only compatible with agents defined in this package. Using agents from other packages such as Stable Baselines or RLlib may require using their own experiment functions.

source

EarlyStoppingHandler

 EarlyStoppingHandler (patience:int=50, warmup:int=100, criteria:str='J',
                       direction:str='max')

Class to handle early stopping during experiments. The EarlyStoppingHandler handler calculates the average score over the last “patience” epochs and compares it to the average score over the previous “patience” epochs. Note that one epoch we define here as time in between evaluating on a validation set, for supervised learning typically one epoch is one pass through the training data. For reinforcement learning, in between each evaluation epoch there may be less than one, one, or many episodes played in the training environment.

Type Default Details
patience int 50 Number of epochs to evaluate for stopping
warmup int 100 How many initial epochs to wait before evaluating
criteria str J Whether to use discounted rewards J or total rewards R as criteria
direction str max Whether reward shall be maximized or minimized

source

EarlyStoppingHandler.add_result

 EarlyStoppingHandler.add_result (J:float, R:float)

Add the result of the last epoch to the history and check if the experiment should be stopped.

Type Details
J float Return (discounted rewards) of the last epoch
R float Total rewards of the last epoch
Returns bool

Helper functions

Some functions that are needed to run an experiment


source

save_agent

 save_agent (agent:ddopai.agents.base.BaseAgent, experiment_dir:str,
             save_best:bool, R:float, J:float, best_R:float, best_J:float,
             criteria:str='J', force_save=False)

Save the agent if it has improved either R or J, depending on the criteria argument, vs. the previous epochs

Type Default Details
agent BaseAgent Any agent inheriting from BaseAgent
experiment_dir str Directory to save the agent,
save_best bool
R float
J float
best_R float
best_J float
criteria str J
force_save bool False

source

update_best

 update_best (R:float, J:float, best_R:float, best_J:float)

Update the best total rewards R and the best discounted rewards J.

Type Details
R float
J float
best_R float
best_J float

source

log_info

 log_info (R:float, J:float, n_epochs:int, tracking:Literal['wandb'],
           mode:Literal['train','val','test'])

Logs the same R, J information repeatedly for n_epoochs. E.g., to draw a straight line in wandb for algorithmes such as XGB, RF, etc. that can be comparared to the learning curves of supervised or reinforcement learning algorithms.

Type Details
R float
J float
n_epochs int
tracking Literal only wandb implemented so far
mode Literal

source

calculate_score

 calculate_score (dataset:List, env:ddopai.envs.base.BaseEnvironment)

Calculate the total rewards R and the discounted rewards J of a dataset.

Type Details
dataset List
env BaseEnvironment Any environment inheriting from BaseEnvironment
Returns Tuple

Experiment functions

Functions to run experiments


source

run_experiment

 run_experiment (agent:ddopai.agents.base.BaseAgent,
                 env:ddopai.envs.base.BaseEnvironment, n_epochs:int,
                 n_steps:int=None, early_stopping_handler:Optional[__main_
                 _.EarlyStoppingHandler]=None, save_best:bool=True,
                 performance_criterion:str='J',
                 tracking:Optional[str]=None, results_dir:str='results',
                 run_id:Optional[str]=None, print_freq:int=10,
                 eval_step_info=False, return_score=False)

Run an experiment with the given agent and environment for n_epochs. It automaticall dedects if the train mode of the agent is direct, epochs_fit or env_interaction and runs the experiment accordingly.

Type Default Details
agent BaseAgent
env BaseEnvironment
n_epochs int
n_steps int None Number of steps to interact with the environment per epoch. Will be ignored for direct_fit and epchos_fit agents
early_stopping_handler Optional None
save_best bool True
performance_criterion str J other: “R”
tracking Optional None other: “wandb”
results_dir str results
run_id Optional None
print_freq int 10
eval_step_info bool False
return_score bool False

Important notes on running experiments

Training mode:

  • Agents have either a training mode direct_fit or epochs_fit or env_interaction. direct_fit means that agents are called with a single call to the fit method, providing the full X and Y dataset. epochs_fit means that agents are training iteratively via epochs. It is assumed that they then have access to the dataloader.

Train, val, test mode:

  • The function always sets the agent and environment to the approproate dataset mode (and thereofore indirectly the dataloader via then environment).

Early stopping:

  • Can be optionally applied for epochs_fit and env_interaction agents.

Save best agent:

  • The save_agent() functions, given the save_bestparam is True, will save the best agent based on the validation score.

  • At test time at a later point, one can then load the best agent and evaluate it on the test set (not done automatically by this function).

Logging:

  • By setting logging to "wandb" the function will log J and R to wandb.

source

test_agent

 test_agent (agent:ddopai.agents.base.BaseAgent,
             env:ddopai.envs.base.BaseEnvironment, return_dataset=False,
             save_features=False, tracking=None, eval_step_info=False)

Tests the agent on the environment for a single episode

Type Default Details
agent BaseAgent
env BaseEnvironment
return_dataset bool False
save_features bool False
tracking NoneType None other: “wandb”,
eval_step_info bool False

source

run_test_episode

 run_test_episode (env:ddopai.envs.base.BaseEnvironment,
                   agent:ddopai.agents.base.BaseAgent,
                   eval_step_info:bool=False, save_features:bool=False)

Runs an episode to test the agent’s performance. It assumes, that agent and environment are initialized, in test/val mode and have done reset.

Type Default Details
env BaseEnvironment Any environment inheriting from BaseEnvironment
agent BaseAgent Any agent inheriting from BaseAgent
eval_step_info bool False Print step info during evaluation
save_features bool False Save features (observation) of the dataset. Can be turned off since they sometimes become very large with many lag information

Usage example for test_agent():

from ddopai.envs.inventory.single_period import NewsvendorEnv
from ddopai.dataloaders.tabular import XYDataLoader
from ddopai.agents.basic import RandomAgent
val_index_start = 80 #90_000
test_index_start = 90 #100_000

X = np.random.rand(100, 2)
Y = np.random.rand(100, 1)

dataloader = XYDataLoader(X, Y, val_index_start, test_index_start)

environment = NewsvendorEnv(
    dataloader = dataloader,
    underage_cost = 0.42857,
    overage_cost = 1.0,
    gamma = 0.999,
    horizon_train = 365,
)

agent = RandomAgent(environment.mdp_info)

environment.test()

R, J = test_agent(agent, environment)

print(f"R: {R}, J: {J}")
R: -7.269816766556392, J: -7.236762453375597