ERM agents

Newsvendor agents based on Empirical Risk Minimization (ERM) principles.

NewsvendorXGBAgent

 NewsvendorXGBAgent (environment_info:ddopai.utils.MDPInfo,
                     cu:float|numpy.ndarray, co:float|numpy.ndarray,
                     obsprocessors:Optional[List[object]]=None,
                     agent_name:str|None='XGBAgent', eta:float=0.3,
                     gamma:float=0, max_depth:int=6,
                     min_child_weight:float=1, max_delta_step:float=0,
                     subsample:float=1, sampling_method:str='uniform',
                     colsample_bytree:float=1, colsample_bylevel:float=1,
                     colsample_bynode:float=1, lambda_:float=1,
                     alpha:float=0, tree_method:str='auto',
                     scale_pos_weight:float=1, refresh_leaf:int=1,
                     grow_policy:str='depthwise', max_leaves:int=0,
                     max_bin:int=256, num_parallel_tree:int=1,
                     multi_strategy:str='one_output_per_tree',
                     max_cached_hist_node:int=65536, nthread:int=1,
                     device:str='CPU')

Agent solving the Newsvendor problem within the ERM framework (i.e., using quantile regression) using the XGBoost library.

	Type	Default	Details
environment_info	MDPInfo
cu	float \| numpy.ndarray		underage cost
co	float \| numpy.ndarray		overage cost
obsprocessors	Optional	None
agent_name	str \| None	XGBAgent
eta	float	0.3	## XGB params
gamma	float	0
max_depth	int	6
min_child_weight	float	1
max_delta_step	float	0
subsample	float	1
sampling_method	str	uniform
colsample_bytree	float	1
colsample_bylevel	float	1
colsample_bynode	float	1
lambda_	float	1
alpha	float	0
tree_method	str	auto
scale_pos_weight	float	1
refresh_leaf	int	1	updater will always use default
grow_policy	str	depthwise	process type will always use default
max_leaves	int	0
max_bin	int	256
num_parallel_tree	int	1
multi_strategy	str	one_output_per_tree
max_cached_hist_node	int	65536
nthread	int	1	## General params
device	str	CPU

source

SGDBaseAgent

 SGDBaseAgent (environment_info:ddopai.utils.MDPInfo,
               dataloader:ddopai.dataloaders.base.BaseDataLoader,
               input_shape:Tuple, output_shape:Tuple,
               dataset_params:Optional[dict]=None,
               dataloader_params:Optional[dict]=None,
               optimizer_params:Optional[dict]=None,
               learning_rate_scheduler_params:Optional[Dict]=None,
               obsprocessors:Optional[List]=None, device:str='cpu',
               agent_name:str|None=None, test_batch_size:int=1024,
               receive_batch_dim:bool=False)

Base class for Agents that are trained using Stochastic Gradient Descent (SGD) on PyTorch models.

	Type	Default	Details
environment_info	MDPInfo
dataloader	BaseDataLoader
input_shape	Tuple
output_shape	Tuple
dataset_params	Optional	None	parameters needed to convert the dataloader to a torch dataset
dataloader_params	Optional	None	default: {“batch_size”: 32, “shuffle”: True}
optimizer_params	Optional	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params	Optional	None	default: None. If dict, then first key is “scheduler” and the rest are the parameters
obsprocessors	Optional	None	default: []
device	str	cpu	“cuda” or “cpu”
agent_name	str \| None	None
test_batch_size	int	1024
receive_batch_dim	bool	False

Important notes:

SGD-based agents are all agents that are trained via SGD such as Linear Models or Neural Networks. Some specific requirements are necessary to make them interface properly with the environment.

Torch perprocessors:

In addition to the general Numpy-based pre-processor, we also provide pre-processors that work on tensor level within the fit_epoch method and the predict method. They can be used in addition to the numpy-based pre-processors or instead of them. It’s important to ensure that the shape of observations (after pre-processing) is the same for those from the environemnt and those from the dataloader during training.

Dataloader:

As for normal supervised learning via Torch, we make use of the Torch dataloader to load the data. Instead of defining a custom dataset class, we provide a Wrapper that can be used around our dataloader to make its output and interface the same as a Torch dataset. The dataloader is then initialized when the agent is created such that the agent has access to the same dataloader as the environment.

Training process:

The outper loop of the training process (epochs) is handled outside the agent by the [`run_experiment`](https://opimwue.github.io/ddopai/40_experiments/experiment_functions.html#run_experiment)functions (or can also be customized). The agent needs to have a fit_epoch method that tells the agent what to do within an epoch. This includes:
- Getting the data from the dataloader
- Pre-processing the data
- Forward pass
- Loss calculation
- Backward pass

source

SGDBaseAgent.set_dataloader

 SGDBaseAgent.set_dataloader
                              (dataloader:ddopai.dataloaders.base.BaseData
                              Loader, dataset_params:dict,
                              dataloader_params:dict)

Set the dataloader for the agent by wrapping it into a Torch Dataset

	Type	Details
dataloader	BaseDataLoader
dataset_params	dict
dataloader_params	dict	dict with keys: batch_size, shuffle
Returns	None

source

SGDBaseAgent.set_loss_function

 SGDBaseAgent.set_loss_function ()

Set loss function for the model

source

SGDBaseAgent.set_model

 SGDBaseAgent.set_model (input_shape:Tuple, output_shape:Tuple)

Set the model for the agent

source

SGDBaseAgent.set_optimizer

 SGDBaseAgent.set_optimizer (optimizer_params:dict)

Set the optimizer for the model

	Type	Details
optimizer_params	dict	dict with keys: optimizer, lr, weight_decay

source

SGDBaseAgent.set_learning_rate_scheduler

 SGDBaseAgent.set_learning_rate_scheduler (learning_rate_scheduler_params)

Set learning rate scheudler (can be None)

	Details
learning_rate_scheduler_params

source

SGDBaseAgent.fit_epoch

 SGDBaseAgent.fit_epoch ()

Fit the model for one epoch using the dataloader

source

SGDBaseAgent.draw_action_

 SGDBaseAgent.draw_action_ (observation:numpy.ndarray)

Draw an action based on the fitted model (see predict method)

	Type	Details
observation	ndarray
Returns	ndarray

source

SGDBaseAgent.predict

 SGDBaseAgent.predict (X:numpy.ndarray)

Do one forward pass of the model and return the prediction

	Type	Details
X	ndarray
Returns	ndarray

source

SGDBaseAgent.train

 SGDBaseAgent.train ()

set the internal state of the agent and its model to train

source

SGDBaseAgent.eval

 SGDBaseAgent.eval ()

set the internal state of the agent and its model to eval

source

SGDBaseAgent.to

 SGDBaseAgent.to (device:str)

Move the model to the specified device

	Type	Details
device	str

source

SGDBaseAgent.save

 SGDBaseAgent.save (path:str, overwrite:bool=True)

Save the PyTorch model to a file in the specified directory.

	Type	Default	Details
path	str		The directory where the file will be saved.
overwrite	bool	True	Allow overwriting; if False, a FileExistsError will be raised if the file exists.

source

SGDBaseAgent.load

 SGDBaseAgent.load (path:str)

Load the PyTorch model from a file.

	Type	Details
path	str	Only the path to the folder is needed, not the file itself

source

NVBaseAgent

 NVBaseAgent (environment_info:ddopai.utils.MDPInfo,
              dataloader:ddopai.dataloaders.base.BaseDataLoader,
              cu:numpy.ndarray|ddopai.utils.Parameter,
              co:numpy.ndarray|ddopai.utils.Parameter, input_shape:Tuple,
              output_shape:Tuple, optimizer_params:dict|None=None,
              learning_rate_scheduler_params=None,
              dataset_params:dict|None=None,
              dataloader_params:dict|None=None,
              obsprocessors:list|None=None, device:str='cpu',
              agent_name:str|None=None, test_batch_size:int=1024,
              receive_batch_dim:bool=False,
              loss_function:Literal['quantile','pinball']='quantile')

Base agent for the Newsvendor problem implementing the loss function for the Empirical Risk Minimization (ERM) approach based on quantile loss.

	Type	Default	Details
environment_info	MDPInfo
dataloader	BaseDataLoader
cu	numpy.ndarray \| ddopai.utils.Parameter
co	numpy.ndarray \| ddopai.utils.Parameter
input_shape	Tuple
output_shape	Tuple
optimizer_params	dict \| None	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params	NoneType	None	TODO: add base class for learning rate scheduler for typing
dataset_params	dict \| None	None	parameters needed to convert the dataloader to a torch dataset
dataloader_params	dict \| None	None	default: {“batch_size”: 32, “shuffle”: True}
obsprocessors	list \| None	None	default: []
device	str	cpu	“cuda” or “cpu”
agent_name	str \| None	None
test_batch_size	int	1024
receive_batch_dim	bool	False
loss_function	Literal	quantile

source

NVBaseAgent.set_loss_function

 NVBaseAgent.set_loss_function ()

Set the loss function for the model to the quantile loss. For training the model uses quantile loss and not the pinball loss with specific cu and co values to ensure similar scale of the feedback signal during training.

source

NewsvendorlERMAgent

 NewsvendorlERMAgent (environment_info:ddopai.utils.MDPInfo,
                      dataloader:ddopai.dataloaders.base.BaseDataLoader,
                      cu:numpy.ndarray|ddopai.utils.Parameter,
                      co:numpy.ndarray|ddopai.utils.Parameter,
                      input_shape:Tuple, output_shape:Tuple,
                      optimizer_params:dict|None=None,
                      learning_rate_scheduler_params=None,
                      model_params:dict|None=None,
                      dataset_params:dict|None=None,
                      dataloader_params:dict|None=None,
                      obsprocessors:list|None=None, device:str='cpu',
                      agent_name:str|None='lERM',
                      test_batch_size:int=1024,
                      receive_batch_dim:bool=False, loss_function:Literal[
                      'quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a linear (regression) model. Note that this implementation finds the optimal regression parameters via SGD.

	Type	Default	Details
environment_info	MDPInfo
dataloader	BaseDataLoader
cu	numpy.ndarray \| ddopai.utils.Parameter
co	numpy.ndarray \| ddopai.utils.Parameter
input_shape	Tuple
output_shape	Tuple
optimizer_params	dict \| None	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params	NoneType	None	TODO: add base class for learning rate scheduler for typing
model_params	dict \| None	None	default: {“relu_output”: False}
dataset_params	dict \| None	None	parameters needed to convert the dataloader to a torch dataset
dataloader_params	dict \| None	None	default: {“batch_size”: 32, “shuffle”: True}
obsprocessors	list \| None	None	default: []
device	str	cpu	“cuda” or “cpu”
agent_name	str \| None	lERM
test_batch_size	int	1024
receive_batch_dim	bool	False
loss_function	Literal	quantile

Further information:

References
----------

.. [1] Gah-Yi Ban, Cynthia Rudin, "The Big Data Newsvendor: Practical Insights
    from Machine Learning", 2018.

source

NewsvendorlERMAgent.set_model

 NewsvendorlERMAgent.set_model (input_shape, output_shape)

Set the model for the agent to a linear model

Example usage:

from ddopai.envs.inventory.single_period import NewsvendorEnv
from ddopai.dataloaders.tabular import XYDataLoader
from ddopai.experiments.experiment_functions import run_experiment, test_agent

val_index_start = 800 #90_000
test_index_start = 900 #100_000

X = np.random.rand(1000, 2)
Y = np.random.rand(1000, 1)

dataloader = XYDataLoader(X, Y, val_index_start, test_index_start)

environment = NewsvendorEnv(
    dataloader = dataloader,
    underage_cost = 0.42857,
    overage_cost = 1.0,
    gamma = 0.999,
    horizon_train = 365,
)

agent = NewsvendorlERMAgent(environment.mdp_info,
                            dataloader,
                            cu=np.array([0.42857]),
                            co=np.array([1.0]),
                            input_shape=(2,),
                            output_shape=(1,),
                            optimizer_params= {"optimizer": "Adam", "lr": 0.01, "weight_decay": 0.0}, # other optimizers: "SGD", "RMSprop"
                            learning_rate_scheduler_params = None, # TODO add base class for learning rate scheduler for typing
                            model_params = {"relu_output": False}, #
                            dataloader_params={"batch_size": 32, "shuffle": True},
                            device = "cpu", # "cuda" or "cpu"
)

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)

run_experiment(agent, environment, 2, run_id = "test") # fit agent via run_experiment function

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)

input shape (2,)

INFO:root:Network architecture:
/Users/magnus/miniforge3/envs/inventory_gym_2/lib/python3.11/site-packages/torchinfo/torchinfo.py:462: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  action_fn=lambda data: sys.getsizeof(data.storage()),

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
LinearModel                              [1, 1]                    --
├─Linear: 1-1                            [1, 1]                    3
├─Identity: 1-2                          [1, 1]                    --
==========================================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================

INFO:root:Starting experiment
INFO:root:Initial evaluation: R=-29.736253318797445, J=-28.287550833928687
INFO:root:Starting training with epochs fit

-23.17678889235405 -22.124720267178684
Experiment directory: results/test

100%|██████████| 25/25 [00:00<00:00, 903.73it/s]
100%|██████████| 25/25 [00:00<00:00, 1999.34it/s]
100%|██████████| 2/2 [00:00<00:00, 35.22it/s]
INFO:root:Finished training with epochs fit
INFO:root:Evaluation after training: R=-15.499745268755348, J=-14.77032101771835

-16.54230338871762 -15.75806274718322

source

NewsvendorDLAgent

 NewsvendorDLAgent (environment_info:ddopai.utils.MDPInfo,
                    dataloader:ddopai.dataloaders.base.BaseDataLoader,
                    cu:numpy.ndarray|ddopai.utils.Parameter,
                    co:numpy.ndarray|ddopai.utils.Parameter,
                    input_shape:Tuple, output_shape:Tuple,
                    learning_rate_scheduler_params:Optional[Dict]=None,
                    optimizer_params:dict|None=None,
                    model_params:dict|None=None,
                    dataloader_params:dict|None=None,
                    dataset_params:dict|None=None, device:str='cpu',
                    obsprocessors:list|None=None,
                    agent_name:str|None='DLNV', test_batch_size:int=1024,
                    receive_batch_dim:bool=False, loss_function:Literal['q
                    uantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a deep learning model.

	Type	Default	Details
environment_info	MDPInfo
dataloader	BaseDataLoader
cu	numpy.ndarray \| ddopai.utils.Parameter
co	numpy.ndarray \| ddopai.utils.Parameter
input_shape	Tuple
output_shape	Tuple
learning_rate_scheduler_params	Optional	None
optimizer_params	dict \| None	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params	dict \| None	None	default: {“hidden_layers”: [64, 64], “drop_prob”: 0.0, “batch_norm”: False, “relu_output”: False}
dataloader_params	dict \| None	None	default: {“batch_size”: 32, “shuffle”: True}
dataset_params	dict \| None	None	parameters needed to convert the dataloader to a torch dataset
device	str	cpu	“cuda” or “cpu”
obsprocessors	list \| None	None	default: []
agent_name	str \| None	DLNV
test_batch_size	int	1024
receive_batch_dim	bool	False
loss_function	Literal	quantile

Further information:

References
----------

.. [1] Afshin Oroojlooyjadid, Lawrence V. Snyder, Martin Takáˇc,
        "Applying Deep Learning to the Newsvendor Problem", 2018.

source

NewsvendorDLAgent.set_model

 NewsvendorDLAgent.set_model (input_shape, output_shape)

Set the model for the agent to an MLP

Example usage:

dataloader = XYDataLoader(X, Y, val_index_start, test_index_start)

environment = NewsvendorEnv(
    dataloader = dataloader,
    underage_cost = 0.42857,
    overage_cost = 1.0,
    gamma = 0.999,
    horizon_train = 365,
)

model_params = {
    "hidden_layers": [64, 64],
}

agent = NewsvendorDLAgent(environment.mdp_info,
                            dataloader,
                            cu=np.array([0.42857]),
                            co=np.array([1.0]),
                            input_shape=(2,),
                            output_shape=(1,),
                            optimizer_params= {"optimizer": "Adam", "lr": 0.01, "weight_decay": 0.0}, # other optimizers: "SGD", "RMSprop"
                            learning_rate_scheduler_params = None, # TODO add base class for learning rate scheduler for typing
                            model_params = model_params, #
                            dataloader_params={"batch_size": 32, "shuffle": True},
                            device = "cpu" # "cuda" or "cpu"
)

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)

run_experiment(agent, environment, 2, run_id = "test") # fit agent via run_experiment function

environment.test()
agent.eval()

R, J = test_agent(agent, environment)

print(R, J)

INFO:root:Network architecture:

input shape (2,)
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
MLP                                      [1, 1]                    --
├─Sequential: 1-1                        [1, 1]                    --
│    └─Linear: 2-1                       [1, 64]                   192
│    └─ReLU: 2-2                         [1, 64]                   --
│    └─Dropout: 2-3                      [1, 64]                   --
│    └─Linear: 2-4                       [1, 64]                   4,160
│    └─ReLU: 2-5                         [1, 64]                   --
│    └─Dropout: 2-6                      [1, 64]                   --
│    └─Linear: 2-7                       [1, 1]                    65
│    └─Identity: 2-8                     [1, 1]                    --
==========================================================================================
Total params: 4,417
Trainable params: 4,417
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.02
Estimated Total Size (MB): 0.02
==========================================================================================

INFO:root:Starting experiment
INFO:root:Initial evaluation: R=-20.030297947350757, J=-19.11491558256756
INFO:root:Starting training with epochs fit

-22.66337395888819 -21.548795898866043
Experiment directory: results/test

100%|██████████| 25/25 [00:00<00:00, 1212.35it/s]
100%|██████████| 25/25 [00:00<00:00, 1277.10it/s]
100%|██████████| 2/2 [00:00<00:00, 32.30it/s]
INFO:root:Finished training with epochs fit
INFO:root:Evaluation after training: R=-15.082729205825588, J=-14.380392673719802

-16.096224629924393 -15.338865711420437

source

BaseMetaAgent

 BaseMetaAgent ()

Initialize self. See help(type(self)) for accurate signature.

source

NewsvendorlERMMetaAgent

 NewsvendorlERMMetaAgent (environment_info:ddopai.utils.MDPInfo,
                          dataloader:ddopai.dataloaders.base.BaseDataLoade
                          r, cu:numpy.ndarray|ddopai.utils.Parameter,
                          co:numpy.ndarray|ddopai.utils.Parameter,
                          input_shape:Tuple, output_shape:Tuple,
                          optimizer_params:dict|None=None,
                          learning_rate_scheduler_params=None,
                          model_params:dict|None=None,
                          dataset_params:dict|None=None,
                          dataloader_params:dict|None=None,
                          obsprocessors:list|None=None, device:str='cpu',
                          agent_name:str|None='lERMMeta',
                          test_batch_size:int=1024,
                          receive_batch_dim:bool=False, loss_function:Lite
                          ral['quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a linear (regression) model. In addition to the features, the agent also gets the sl as input to be able to forecast the optimal order quantity for different sl values. Depending on the training pipeline, this model can be adapted to become a full meta-learning algorithm cross products and cross sls.

	Type	Default	Details
environment_info	MDPInfo		Parameters for lERM agent
dataloader	BaseDataLoader
cu	numpy.ndarray \| ddopai.utils.Parameter
co	numpy.ndarray \| ddopai.utils.Parameter
input_shape	Tuple
output_shape	Tuple
optimizer_params	dict \| None	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
learning_rate_scheduler_params	NoneType	None	TODO: add base class for learning rate scheduler for typing
model_params	dict \| None	None	default: {“relu_output”: False}
dataset_params	dict \| None	None	parameters needed to convert the dataloader to a torch dataset
dataloader_params	dict \| None	None	default: {“batch_size”: 32, “shuffle”: True}
obsprocessors	list \| None	None	default: []
device	str	cpu	“cuda” or “cpu”
agent_name	str \| None	lERMMeta
test_batch_size	int	1024
receive_batch_dim	bool	False
loss_function	Literal	quantile

source

NewsvendorDLMetaAgent

 NewsvendorDLMetaAgent (environment_info:ddopai.utils.MDPInfo,
                        dataloader:ddopai.dataloaders.base.BaseDataLoader,
                        cu:numpy.ndarray|ddopai.utils.Parameter,
                        co:numpy.ndarray|ddopai.utils.Parameter,
                        input_shape:Tuple, output_shape:Tuple,
                        learning_rate_scheduler_params=None,
                        optimizer_params:dict|None=None,
                        model_params:dict|None=None,
                        dataset_params:dict|None=None,
                        dataloader_params:dict|None=None,
                        device:str='cpu', obsprocessors:list|None=None,
                        agent_name:str|None='DLNV',
                        test_batch_size:int=1024,
                        receive_batch_dim:bool=False, loss_function:Litera
                        l['quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a Neural Network. In addition to the features, the agent also gets the sl as input to be able to forecast the optimal order quantity for different sl values. Depending on the training pipeline, this model can be adapted to become a full meta-learning algorithm cross products and cross sls.

	Type	Default	Details
environment_info	MDPInfo
dataloader	BaseDataLoader
cu	numpy.ndarray \| ddopai.utils.Parameter
co	numpy.ndarray \| ddopai.utils.Parameter
input_shape	Tuple
output_shape	Tuple
learning_rate_scheduler_params	NoneType	None	TODO: add base class for learning rate scheduler for typing
optimizer_params	dict \| None	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params	dict \| None	None	default: {“hidden_layers”: [64, 64], “drop_prob”: 0.0, “batch_norm”: False, “relu_output”: False}
dataset_params	dict \| None	None	parameters needed to convert the dataloader to a torch dataset
dataloader_params	dict \| None	None	default: {“batch_size”: 32, “shuffle”: True}
device	str	cpu	“cuda” or “cpu”
obsprocessors	list \| None	None	default: []
agent_name	str \| None	DLNV
test_batch_size	int	1024
receive_batch_dim	bool	False
loss_function	Literal	quantile

source

NewsvendorDLTransformerAgent

 NewsvendorDLTransformerAgent (environment_info:ddopai.utils.MDPInfo,
                               dataloader:ddopai.dataloaders.base.BaseData
                               Loader,
                               cu:numpy.ndarray|ddopai.utils.Parameter,
                               co:numpy.ndarray|ddopai.utils.Parameter,
                               input_shape:Tuple, output_shape:Tuple, lear
                               ning_rate_scheduler_params:Optional[Dict]=N
                               one, optimizer_params:dict|None=None,
                               model_params:dict|None=None,
                               dataset_params:dict|None=None,
                               dataloader_params:dict|None=None,
                               device:str='cpu',
                               obsprocessors:list|None=None,
                               agent_name:str|None='DLNV',
                               test_batch_size:int=1024,
                               receive_batch_dim:bool=False, loss_function
                               :Literal['quantile','pinball']='quantile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a deep learning model with a Transformer architecture.

	Type	Default	Details
environment_info	MDPInfo
dataloader	BaseDataLoader
cu	numpy.ndarray \| ddopai.utils.Parameter
co	numpy.ndarray \| ddopai.utils.Parameter
input_shape	Tuple
output_shape	Tuple
learning_rate_scheduler_params	Optional	None
optimizer_params	dict \| None	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params	dict \| None	None	default: {“max_context_length”: 128, “n_layer”: 3, “n_head”: 8, “n_embd_per_head”: 32, “rope_scaling”: None, “min_multiple”: 256, “gating”: True, “drop_prob”: 0.0, “final_activation”: “identity”}
dataset_params	dict \| None	None	parameters needed to convert the dataloader to a torch dataset
dataloader_params	dict \| None	None	default: {“batch_size”: 32, “shuffle”: True}
device	str	cpu	“cuda” or “cpu”
obsprocessors	list \| None	None	default: []
agent_name	str \| None	DLNV
test_batch_size	int	1024
receive_batch_dim	bool	False
loss_function	Literal	quantile

source

NewsvendorDLTransformerMetaAgent

 NewsvendorDLTransformerMetaAgent (environment_info:ddopai.utils.MDPInfo,
                                   dataloader:ddopai.dataloaders.base.Base
                                   DataLoader, cu:numpy.ndarray|ddopai.uti
                                   ls.Parameter, co:numpy.ndarray|ddopai.u
                                   tils.Parameter, input_shape:Tuple,
                                   output_shape:Tuple, learning_rate_sched
                                   uler_params:Optional[Dict]=None,
                                   optimizer_params:dict|None=None,
                                   model_params:dict|None=None,
                                   dataset_params:dict|None=None,
                                   dataloader_params:dict|None=None,
                                   device:str='cpu',
                                   obsprocessors:list|None=None,
                                   agent_name:str|None='DLNV',
                                   test_batch_size:int=1024,
                                   receive_batch_dim:bool=False, loss_func
                                   tion:Literal['quantile','pinball']='qua
                                   ntile')

Newsvendor agent implementing Empirical Risk Minimization (ERM) approach based on a Neural Network using the attention mechanism. In addition to the features, the agent also gets the sl as input to be able to forecast the optimal order quantity for different sl values. Depending on the training pipeline, this model can be adapted to become a full meta-learning algorithm cross products and cross sls.

	Type	Default	Details
environment_info	MDPInfo
dataloader	BaseDataLoader
cu	numpy.ndarray \| ddopai.utils.Parameter
co	numpy.ndarray \| ddopai.utils.Parameter
input_shape	Tuple
output_shape	Tuple
learning_rate_scheduler_params	Optional	None
optimizer_params	dict \| None	None	default: {“optimizer”: “Adam”, “lr”: 0.01, “weight_decay”: 0.0}
model_params	dict \| None	None	default: {“hidden_layers”: [64, 64], “drop_prob”: 0.0, “batch_norm”: False, “relu_output”: False}
dataset_params	dict \| None	None	parameters needed to convert the dataloader to a torch dataset
dataloader_params	dict \| None	None	default: {“batch_size”: 32, “shuffle”: True}
device	str	cpu	“cuda” or “cpu”
obsprocessors	list \| None	None	default: []
agent_name	str \| None	DLNV
test_batch_size	int	1024
receive_batch_dim	bool	False
loss_function	Literal	quantile