ERM agents

Newsvendor agents based on Empirical Risk Minimization (ERM) principles.

source

NewsvendorXGBAgent

 NewsvendorXGBAgent (environment_info:ddopai.utils.MDPInfo,
                     cu:float|numpy.ndarray, co:float|numpy.ndarray,
                     obsprocessors:Optional[List[object]]=None,
                     agent_name:str|None='XGBAgent', eta:float=0.3,
                     gamma:float=0, max_depth:int=6,
                     min_child_weight:float=1, max_delta_step:float=0,
                     subsample:float=1, sampling_method:str='uniform',
                     colsample_bytree:float=1, colsample_bylevel:float=1,
                     colsample_bynode:float=1, lambda_:float=1,
                     alpha:float=0, tree_method:str='auto',
                     scale_pos_weight:float=1, refresh_leaf:int=1,
                     grow_policy:str='depthwise', max_leaves:int=0,
                     max_bin:int=256, num_parallel_tree:int=1,
                     multi_strategy:str='one_output_per_tree',
                     max_cached_hist_node:int=65536, nthread:int=1,
                     device:str='CPU')

Agent solving the Newsvendor problem within the ERM framework (i.e., using quantile regression) using the XGBoost library.

Type Default Details
environment_info MDPInfo
cu float | numpy.ndarray underage cost
co float | numpy.ndarray overage cost
obsprocessors Optional None
agent_name str | None XGBAgent
eta float 0.3 ## XGB params
gamma float 0
max_depth int 6
min_child_weight float 1
max_delta_step float 0
subsample float 1
sampling_method str uniform
colsample_bytree float 1
colsample_bylevel float 1
colsample_bynode float 1
lambda_ float 1
alpha float 0
tree_method str auto
scale_pos_weight float 1
refresh_leaf int 1 updater will always use default
grow_policy str depthwise process type will always use default
max_leaves int 0
max_bin int 256
num_parallel_tree int 1
multi_strategy str one_output_per_tree
max_cached_hist_node int 65536
nthread int 1 ## General params
device str CPU

source

SGDBaseAgent

 SGDBaseAgent (environment_info:ddopai.utils.MDPInfo,
               dataloader:ddopai.dataloaders.base.BaseDataLoader,
               input_shape:Tuple, output_shape:Tuple,
               dataset_params:Optional[dict]=None,
               dataloader_params:Optional[dict]=None,
               optimizer_params:Optional[dict]=None,
               learning_rate_scheduler_params:Optional[Dict]=None,
               obsprocessors:Optional[List]=None, device:str='cpu',
               agent_name:str|None=None, test_batch_size:int=1024,
               receive_batch_dim:bool=False)

Base class for Agents that are trained using Stochastic Gradient Descent (SGD) on PyTorch models.

Type Default Details
environment_info MDPInfo
dataloader BaseDataLoader
input_shape Tuple
output_shape Tuple
dataset_params Optional None parameters needed to convert the dataloader to a torch dataset
dataloader_params Optional None default: {“batch_size”: 32, “shuffle”: True}
optimizer_params Optional None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params Optional None default: None. If dict, then first key is “scheduler” and the rest are the parameters
obsprocessors Optional None default: []
device str cpu “cuda” or “cpu”
agent_name str | None None
test_batch_size int 1024
receive_batch_dim bool False

Important notes:

SGD-based agents are all agents that are trained via SGD such as Linear Models or Neural Networks. Some specific requirements are necessary to make them interface properly with the environment.

Torch perprocessors:

  • In addition to the general Numpy-based pre-processor, we also provide pre-processors that work on tensor level within the fit_epoch method and the predict method. They can be used in addition to the numpy-based pre-processors or instead of them. It’s important to ensure that the shape of observations (after pre-processing) is the same for those from the environemnt and those from the dataloader during training.

Dataloader:

  • As for normal supervised learning via Torch, we make use of the Torch dataloader to load the data. Instead of defining a custom dataset class, we provide a Wrapper that can be used around our dataloader to make its output and interface the same as a Torch dataset. The dataloader is then initialized when the agent is created such that the agent has access to the same dataloader as the environment.

Training process:

  • The outper loop of the training process (epochs) is handled outside the agent by the [`run_experiment`](https://opimwue.github.io/ddopai/40_experiments/experiment_functions.html#run_experiment)functions (or can also be customized). The agent needs to have a fit_epoch method that tells the agent what to do within an epoch. This includes:
    • Getting the data from the dataloader
    • Pre-processing the data
    • Forward pass
    • Loss calculation
    • Backward pass

source

SGDBaseAgent.set_dataloader

 SGDBaseAgent.set_dataloader
                              (dataloader:ddopai.dataloaders.base.BaseData
                              Loader, dataset_params:dict,
                              dataloader_params:dict)

Set the dataloader for the agent by wrapping it into a Torch Dataset

Type Details
dataloader BaseDataLoader
dataset_params dict
dataloader_params dict dict with keys: batch_size, shuffle
Returns None

source

SGDBaseAgent.set_loss_function

 SGDBaseAgent.set_loss_function ()

Set loss function for the model


source

SGDBaseAgent.set_model

 SGDBaseAgent.set_model (input_shape:Tuple, output_shape:Tuple)

Set the model for the agent


source

SGDBaseAgent.set_optimizer

 SGDBaseAgent.set_optimizer (optimizer_params:dict)

Set the optimizer for the model

Type Details
optimizer_params dict dict with keys: optimizer, lr, weight_decay

source

SGDBaseAgent.set_learning_rate_scheduler

 SGDBaseAgent.set_learning_rate_scheduler (learning_rate_scheduler_params)

Set learning rate scheudler (can be None)

Details
learning_rate_scheduler_params

source

SGDBaseAgent.fit_epoch

 SGDBaseAgent.fit_epoch ()

Fit the model for one epoch using the dataloader


source

SGDBaseAgent.draw_action_

 SGDBaseAgent.draw_action_ (observation:numpy.ndarray)

Draw an action based on the fitted model (see predict method)

Type Details
observation ndarray
Returns ndarray

source

SGDBaseAgent.predict

 SGDBaseAgent.predict (X:numpy.ndarray)

Do one forward pass of the model and return the prediction

Type Details
X ndarray
Returns ndarray

source

SGDBaseAgent.train

 SGDBaseAgent.train ()

set the internal state of the agent and its model to train


source

SGDBaseAgent.eval

 SGDBaseAgent.eval ()

set the internal state of the agent and its model to eval


source

SGDBaseAgent.to

 SGDBaseAgent.to (device:str)

Move the model to the specified device

Type Details
device str

source

SGDBaseAgent.save

 SGDBaseAgent.save (path:str, overwrite:bool=True)

Save the PyTorch model to a file in the specified directory.

Type Default Details
path str The directory where the file will be saved.
overwrite bool True Allow overwriting; if False, a FileExistsError will be raised if the file exists.

source

SGDBaseAgent.load

 SGDBaseAgent.load (path:str)

Load the PyTorch model from a file.

Type Details
path str Only the path to the folder is needed, not the file itself

source

NVBaseAgent

 NVBaseAgent (environment_info:ddopai.utils.MDPInfo,
              dataloader:ddopai.dataloaders.base.BaseDataLoader,
              cu:numpy.ndarray|ddopai.utils.Parameter,
              co:numpy.ndarray|ddopai.utils.Parameter, input_shape:Tuple,
              output_shape:Tuple, optimizer_params:dict|None=None,
              learning_rate_scheduler_params=None,
              dataset_params:dict|None=None,
              dataloader_params:dict|None=None,
              obsprocessors:list|None=None, device:str='cpu',
              agent_name:str|None=None, test_batch_size:int=1024,
              receive_batch_dim:bool=False,
              loss_function:Literal['quantile','pinball']='quantile')

Base agent for the Newsvendor problem implementing the loss function for the Empirical Risk Minimization (ERM) approach based on quantile loss.

Type Default Details
environment_info MDPInfo
dataloader BaseDataLoader
cu numpy.ndarray | ddopai.utils.Parameter
co numpy.ndarray | ddopai.utils.Parameter
input_shape Tuple
output_shape Tuple
optimizer_params dict | None None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params NoneType None TODO: add base class for learning rate scheduler for typing
dataset_params dict | None None parameters needed to convert the dataloader to a torch dataset
dataloader_params dict | None None default: {“batch_size”: 32, “shuffle”: True}
obsprocessors list | None None default: []
device str cpu “cuda” or “cpu”
agent_name str | None None
test_batch_size int 1024
receive_batch_dim bool False
loss_function Literal quantile

source

NVBaseAgent.set_loss_function

 NVBaseAgent.set_loss_function ()

Set the loss function for the model to the quantile loss. For training the model uses quantile loss and not the pinball loss with specific cu and co values to ensure similar scale of the feedback signal during training.


source

NewsvendorlERMAgent

 NewsvendorlERMAgent (environment_info:ddopai.utils.MDPInfo,
                      dataloader:ddopai.dataloaders.base.BaseDataLoader,
                      cu:numpy.ndarray|ddopai.utils.Parameter,
                      co:numpy.ndarray|ddopai.utils.Parameter,
                      input_shape:Tuple, output_shape:Tuple,
                      optimizer_params:dict|None=None,
                      learning_rate_scheduler_params=None,
                      model_params:dict|None=None,
                      dataset_params:dict|None=None,
                      dataloader_params:dict|None=None,
                      obsprocessors:list|None=None, device:str='cpu',
                      agent_name:str|None='lERM',
                      test_batch_size:int=1024,
                      receive_batch_dim:bool=False, loss_function:Literal[
                      'quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a linear (regression) model. Note that this implementation finds the optimal regression parameters via SGD.

Type Default Details
environment_info MDPInfo
dataloader BaseDataLoader
cu numpy.ndarray | ddopai.utils.Parameter
co numpy.ndarray | ddopai.utils.Parameter
input_shape Tuple
output_shape Tuple
optimizer_params dict | None None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params NoneType None TODO: add base class for learning rate scheduler for typing
model_params dict | None None default: {“relu_output”: False}
dataset_params dict | None None parameters needed to convert the dataloader to a torch dataset
dataloader_params dict | None None default: {“batch_size”: 32, “shuffle”: True}
obsprocessors list | None None default: []
device str cpu “cuda” or “cpu”
agent_name str | None lERM
test_batch_size int 1024
receive_batch_dim bool False
loss_function Literal quantile

Further information:

References
----------

.. [1] Gah-Yi Ban, Cynthia Rudin, "The Big Data Newsvendor: Practical Insights
    from Machine Learning", 2018.

source

NewsvendorlERMAgent.set_model

 NewsvendorlERMAgent.set_model (input_shape, output_shape)

Set the model for the agent to a linear model

Example usage:

from ddopai.envs.inventory.single_period import NewsvendorEnv
from ddopai.dataloaders.tabular import XYDataLoader
from ddopai.experiments.experiment_functions import run_experiment, test_agent
val_index_start = 800 #90_000
test_index_start = 900 #100_000

X = np.random.rand(1000, 2)
Y = np.random.rand(1000, 1)

dataloader = XYDataLoader(X, Y, val_index_start, test_index_start)

environment = NewsvendorEnv(
    dataloader = dataloader,
    underage_cost = 0.42857,
    overage_cost = 1.0,
    gamma = 0.999,
    horizon_train = 365,
)

agent = NewsvendorlERMAgent(environment.mdp_info,
                            dataloader,
                            cu=np.array([0.42857]),
                            co=np.array([1.0]),
                            input_shape=(2,),
                            output_shape=(1,),
                            optimizer_params= {"optimizer": "Adam", "lr": 0.01, "weight_decay": 0.0}, # other optimizers: "SGD", "RMSprop"
                            learning_rate_scheduler_params = None, # TODO add base class for learning rate scheduler for typing
                            model_params = {"relu_output": False}, #
                            dataloader_params={"batch_size": 32, "shuffle": True},
                            device = "cpu", # "cuda" or "cpu"
)

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)

run_experiment(agent, environment, 2, run_id = "test") # fit agent via run_experiment function

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)
input shape (2,)
INFO:root:Network architecture:
/Users/magnus/miniforge3/envs/inventory_gym_2/lib/python3.11/site-packages/torchinfo/torchinfo.py:462: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  action_fn=lambda data: sys.getsizeof(data.storage()),
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
LinearModel                              [1, 1]                    --
├─Linear: 1-1                            [1, 1]                    3
├─Identity: 1-2                          [1, 1]                    --
==========================================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================
INFO:root:Starting experiment
INFO:root:Initial evaluation: R=-29.736253318797445, J=-28.287550833928687
INFO:root:Starting training with epochs fit
-23.17678889235405 -22.124720267178684
Experiment directory: results/test
100%|██████████| 25/25 [00:00<00:00, 903.73it/s]
100%|██████████| 25/25 [00:00<00:00, 1999.34it/s]
100%|██████████| 2/2 [00:00<00:00, 35.22it/s]
INFO:root:Finished training with epochs fit
INFO:root:Evaluation after training: R=-15.499745268755348, J=-14.77032101771835
-16.54230338871762 -15.75806274718322

source

NewsvendorDLAgent

 NewsvendorDLAgent (environment_info:ddopai.utils.MDPInfo,
                    dataloader:ddopai.dataloaders.base.BaseDataLoader,
                    cu:numpy.ndarray|ddopai.utils.Parameter,
                    co:numpy.ndarray|ddopai.utils.Parameter,
                    input_shape:Tuple, output_shape:Tuple,
                    learning_rate_scheduler_params:Optional[Dict]=None,
                    optimizer_params:dict|None=None,
                    model_params:dict|None=None,
                    dataloader_params:dict|None=None,
                    dataset_params:dict|None=None, device:str='cpu',
                    obsprocessors:list|None=None,
                    agent_name:str|None='DLNV', test_batch_size:int=1024,
                    receive_batch_dim:bool=False, loss_function:Literal['q
                    uantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a deep learning model.

Type Default Details
environment_info MDPInfo
dataloader BaseDataLoader
cu numpy.ndarray | ddopai.utils.Parameter
co numpy.ndarray | ddopai.utils.Parameter
input_shape Tuple
output_shape Tuple
learning_rate_scheduler_params Optional None
optimizer_params dict | None None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params dict | None None default: {“hidden_layers”: [64, 64], “drop_prob”: 0.0, “batch_norm”: False, “relu_output”: False}
dataloader_params dict | None None default: {“batch_size”: 32, “shuffle”: True}
dataset_params dict | None None parameters needed to convert the dataloader to a torch dataset
device str cpu “cuda” or “cpu”
obsprocessors list | None None default: []
agent_name str | None DLNV
test_batch_size int 1024
receive_batch_dim bool False
loss_function Literal quantile

Further information:

References
----------

.. [1] Afshin Oroojlooyjadid, Lawrence V. Snyder, Martin Takáˇc,
        "Applying Deep Learning to the Newsvendor Problem", 2018.

source

NewsvendorDLAgent.set_model

 NewsvendorDLAgent.set_model (input_shape, output_shape)

Set the model for the agent to an MLP

Example usage:

dataloader = XYDataLoader(X, Y, val_index_start, test_index_start)

environment = NewsvendorEnv(
    dataloader = dataloader,
    underage_cost = 0.42857,
    overage_cost = 1.0,
    gamma = 0.999,
    horizon_train = 365,
)

model_params = {
    "hidden_layers": [64, 64],
}

agent = NewsvendorDLAgent(environment.mdp_info,
                            dataloader,
                            cu=np.array([0.42857]),
                            co=np.array([1.0]),
                            input_shape=(2,),
                            output_shape=(1,),
                            optimizer_params= {"optimizer": "Adam", "lr": 0.01, "weight_decay": 0.0}, # other optimizers: "SGD", "RMSprop"
                            learning_rate_scheduler_params = None, # TODO add base class for learning rate scheduler for typing
                            model_params = model_params, #
                            dataloader_params={"batch_size": 32, "shuffle": True},
                            device = "cpu" # "cuda" or "cpu"
)

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)

run_experiment(agent, environment, 2, run_id = "test") # fit agent via run_experiment function

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)
INFO:root:Network architecture:
input shape (2,)
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
MLP                                      [1, 1]                    --
├─Sequential: 1-1                        [1, 1]                    --
│    └─Linear: 2-1                       [1, 64]                   192
│    └─ReLU: 2-2                         [1, 64]                   --
│    └─Dropout: 2-3                      [1, 64]                   --
│    └─Linear: 2-4                       [1, 64]                   4,160
│    └─ReLU: 2-5                         [1, 64]                   --
│    └─Dropout: 2-6                      [1, 64]                   --
│    └─Linear: 2-7                       [1, 1]                    65
│    └─Identity: 2-8                     [1, 1]                    --
==========================================================================================
Total params: 4,417
Trainable params: 4,417
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.02
Estimated Total Size (MB): 0.02
==========================================================================================
INFO:root:Starting experiment
INFO:root:Initial evaluation: R=-20.030297947350757, J=-19.11491558256756
INFO:root:Starting training with epochs fit
-22.66337395888819 -21.548795898866043
Experiment directory: results/test
100%|██████████| 25/25 [00:00<00:00, 1212.35it/s]
100%|██████████| 25/25 [00:00<00:00, 1277.10it/s]
100%|██████████| 2/2 [00:00<00:00, 32.30it/s]
INFO:root:Finished training with epochs fit
INFO:root:Evaluation after training: R=-15.082729205825588, J=-14.380392673719802
-16.096224629924393 -15.338865711420437

source

BaseMetaAgent

 BaseMetaAgent ()

Initialize self. See help(type(self)) for accurate signature.


source

NewsvendorlERMMetaAgent

 NewsvendorlERMMetaAgent (environment_info:ddopai.utils.MDPInfo,
                          dataloader:ddopai.dataloaders.base.BaseDataLoade
                          r, cu:numpy.ndarray|ddopai.utils.Parameter,
                          co:numpy.ndarray|ddopai.utils.Parameter,
                          input_shape:Tuple, output_shape:Tuple,
                          optimizer_params:dict|None=None,
                          learning_rate_scheduler_params=None,
                          model_params:dict|None=None,
                          dataset_params:dict|None=None,
                          dataloader_params:dict|None=None,
                          obsprocessors:list|None=None, device:str='cpu',
                          agent_name:str|None='lERMMeta',
                          test_batch_size:int=1024,
                          receive_batch_dim:bool=False, loss_function:Lite
                          ral['quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a linear (regression) model. In addition to the features, the agent also gets the sl as input to be able to forecast the optimal order quantity for different sl values. Depending on the training pipeline, this model can be adapted to become a full meta-learning algorithm cross products and cross sls.

Type Default Details
environment_info MDPInfo Parameters for lERM agent
dataloader BaseDataLoader
cu numpy.ndarray | ddopai.utils.Parameter
co numpy.ndarray | ddopai.utils.Parameter
input_shape Tuple
output_shape Tuple
optimizer_params dict | None None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params NoneType None TODO: add base class for learning rate scheduler for typing
model_params dict | None None default: {“relu_output”: False}
dataset_params dict | None None parameters needed to convert the dataloader to a torch dataset
dataloader_params dict | None None default: {“batch_size”: 32, “shuffle”: True}
obsprocessors list | None None default: []
device str cpu “cuda” or “cpu”
agent_name str | None lERMMeta
test_batch_size int 1024
receive_batch_dim bool False
loss_function Literal quantile

source

NewsvendorDLMetaAgent

 NewsvendorDLMetaAgent (environment_info:ddopai.utils.MDPInfo,
                        dataloader:ddopai.dataloaders.base.BaseDataLoader,
                        cu:numpy.ndarray|ddopai.utils.Parameter,
                        co:numpy.ndarray|ddopai.utils.Parameter,
                        input_shape:Tuple, output_shape:Tuple,
                        learning_rate_scheduler_params=None,
                        optimizer_params:dict|None=None,
                        model_params:dict|None=None,
                        dataset_params:dict|None=None,
                        dataloader_params:dict|None=None,
                        device:str='cpu', obsprocessors:list|None=None,
                        agent_name:str|None='DLNV',
                        test_batch_size:int=1024,
                        receive_batch_dim:bool=False, loss_function:Litera
                        l['quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a Neural Network. In addition to the features, the agent also gets the sl as input to be able to forecast the optimal order quantity for different sl values. Depending on the training pipeline, this model can be adapted to become a full meta-learning algorithm cross products and cross sls.

Type Default Details
environment_info MDPInfo
dataloader BaseDataLoader
cu numpy.ndarray | ddopai.utils.Parameter
co numpy.ndarray | ddopai.utils.Parameter
input_shape Tuple
output_shape Tuple
learning_rate_scheduler_params NoneType None TODO: add base class for learning rate scheduler for typing
optimizer_params dict | None None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params dict | None None default: {“hidden_layers”: [64, 64], “drop_prob”: 0.0, “batch_norm”: False, “relu_output”: False}
dataset_params dict | None None parameters needed to convert the dataloader to a torch dataset
dataloader_params dict | None None default: {“batch_size”: 32, “shuffle”: True}
device str cpu “cuda” or “cpu”
obsprocessors list | None None default: []
agent_name str | None DLNV
test_batch_size int 1024
receive_batch_dim bool False
loss_function Literal quantile

source

NewsvendorDLTransformerAgent

 NewsvendorDLTransformerAgent (environment_info:ddopai.utils.MDPInfo,
                               dataloader:ddopai.dataloaders.base.BaseData
                               Loader,
                               cu:numpy.ndarray|ddopai.utils.Parameter,
                               co:numpy.ndarray|ddopai.utils.Parameter,
                               input_shape:Tuple, output_shape:Tuple, lear
                               ning_rate_scheduler_params:Optional[Dict]=N
                               one, optimizer_params:dict|None=None,
                               model_params:dict|None=None,
                               dataset_params:dict|None=None,
                               dataloader_params:dict|None=None,
                               device:str='cpu',
                               obsprocessors:list|None=None,
                               agent_name:str|None='DLNV',
                               test_batch_size:int=1024,
                               receive_batch_dim:bool=False, loss_function
                               :Literal['quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a deep learning model with a Transformer architecture.

Type Default Details
environment_info MDPInfo
dataloader BaseDataLoader
cu numpy.ndarray | ddopai.utils.Parameter
co numpy.ndarray | ddopai.utils.Parameter
input_shape Tuple
output_shape Tuple
learning_rate_scheduler_params Optional None
optimizer_params dict | None None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params dict | None None default: {“max_context_length”: 128, “n_layer”: 3, “n_head”: 8, “n_embd_per_head”: 32, “rope_scaling”: None, “min_multiple”: 256, “gating”: True, “drop_prob”: 0.0, “final_activation”: “identity”}
dataset_params dict | None None parameters needed to convert the dataloader to a torch dataset
dataloader_params dict | None None default: {“batch_size”: 32, “shuffle”: True}
device str cpu “cuda” or “cpu”
obsprocessors list | None None default: []
agent_name str | None DLNV
test_batch_size int 1024
receive_batch_dim bool False
loss_function Literal quantile

source

NewsvendorDLTransformerMetaAgent

 NewsvendorDLTransformerMetaAgent (environment_info:ddopai.utils.MDPInfo,
                                   dataloader:ddopai.dataloaders.base.Base
                                   DataLoader, cu:numpy.ndarray|ddopai.uti
                                   ls.Parameter, co:numpy.ndarray|ddopai.u
                                   tils.Parameter, input_shape:Tuple,
                                   output_shape:Tuple, learning_rate_sched
                                   uler_params:Optional[Dict]=None,
                                   optimizer_params:dict|None=None,
                                   model_params:dict|None=None,
                                   dataset_params:dict|None=None,
                                   dataloader_params:dict|None=None,
                                   device:str='cpu',
                                   obsprocessors:list|None=None,
                                   agent_name:str|None='DLNV',
                                   test_batch_size:int=1024,
                                   receive_batch_dim:bool=False, loss_func
                                   tion:Literal['quantile','pinball']='qua
                                   ntile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a Neural Network using the attention mechanism. In addition to the features, the agent also gets the sl as input to be able to forecast the optimal order quantity for different sl values. Depending on the training pipeline, this model can be adapted to become a full meta-learning algorithm cross products and cross sls.

Type Default Details
environment_info MDPInfo
dataloader BaseDataLoader
cu numpy.ndarray | ddopai.utils.Parameter
co numpy.ndarray | ddopai.utils.Parameter
input_shape Tuple
output_shape Tuple
learning_rate_scheduler_params Optional None
optimizer_params dict | None None default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params dict | None None default: {“hidden_layers”: [64, 64], “drop_prob”: 0.0, “batch_norm”: False, “relu_output”: False}
dataset_params dict | None None parameters needed to convert the dataloader to a torch dataset
dataloader_params dict | None None default: {“batch_size”: 32, “shuffle”: True}
device str cpu “cuda” or “cpu”
obsprocessors list | None None default: []
agent_name str | None DLNV
test_batch_size int 1024
receive_batch_dim bool False
loss_function Literal quantile