Single period inventory environments

Static inventory environment where a decision only affects the next period (Newsvendor problem)

NewsvendorEnv

 NewsvendorEnv
                (underage_cost:Union[numpy.ndarray,ddopai.utils.Parameter,
                int,float]=1, overage_cost:Union[numpy.ndarray,ddopai.util
                s.Parameter,int,float]=1, q_bound_low:Union[numpy.ndarray,
                ddopai.utils.Parameter,int,float]=0, q_bound_high:Union[nu
                mpy.ndarray,ddopai.utils.Parameter,int,float]=inf,
                dataloader:ddopai.dataloaders.base.BaseDataLoader=None,
                num_SKUs:int=None, gamma:float=1,
                horizon_train:int|str='use_all_data',
                postprocessors:list[object]|None=None, mode:str='train',
                return_truncation:str=True)

Class implementing the Newsvendor problem, working for the single- and multi-item case. If underage_cost and overage_cost are scalars and there are multiple SKUs, then the same cost is used for all SKUs. If underage_cost and overage_cost are arrays, then they must have the same length as the number of SKUs. Num_SKUs can be set as parameter or inferred from the DataLoader.

	Type	Default	Details
underage_cost	Union	1	underage cost per unit
overage_cost	Union	1	overage cost per unit
q_bound_low	Union	0	lower bound of the order quantity
q_bound_high	Union	inf	upper bound of the order quantity
dataloader	BaseDataLoader	None	dataloader
num_SKUs	int	None	if None it will be inferred from the DataLoader
gamma	float	1	discount factor
horizon_train	int \| str	use_all_data	if “use_all_data” then horizon is inferred from the DataLoader
postprocessors	list[object] \| None	None	default is empty list
mode	str	train	Initial mode (train, val, test) of the environment
return_truncation	str	True	whether to return a truncated condition in step function
Returns	None

source

NewsvendorEnv.step_

 NewsvendorEnv.step_ (action:numpy.ndarray)

Step function implementing the Newsvendor logic. Note that the dataloader will return an observation and a demand, which will be relevant in the next period. The observation will be returned directly, while the demand will be temporarily stored under self.demand and used in the next step.

	Type	Details
action	ndarray	order quantity
Returns	Tuple

source

NewsvendorEnv.determine_cost

 NewsvendorEnv.determine_cost (action:numpy.ndarray)

Determine the cost per SKU given the action taken. The cost is the sum of underage and overage costs.

source

NewsvendorEnv.update_cu_co

 NewsvendorEnv.update_cu_co (cu=None, co=None)

Example usage of [`NewsvendorEnv`](https://opimwue.github.io/ddopai/20_environments/21_envs_inventory/single_period_envs.html#newsvendorenv) with a distributional dataloader:

from ddopai.dataloaders.distribution import NormalDistributionDataLoader

def run_test_loop(env):
    truncated = False
    while not truncated:
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        print("##### STEP: ", env.index, "#####")
        print("reward:", reward)
        print("info:", info)
        print("next observation:", obs)
        print("truncated:", truncated)

dataloader = NormalDistributionDataLoader(mean=[4, 3], std=[1, 2], num_units=2)

test_env = NewsvendorEnv(underage_cost=1, overage_cost=2, dataloader=dataloader, horizon_train=3)

obs = test_env.reset(start_index=0)
print("##### RESET #####")

run_test_loop(test_env)

##### RESET #####
##### STEP:  1 #####
reward: -5.549075627672828
info: {'demand': array([2.48613144, 4.94828011]), 'action': array([0.2829225, 1.6024134], dtype=float32), 'cost_per_SKU': array([2.20320894, 3.34586669])}
next observation: None
truncated: False
##### STEP:  2 #####
reward: -1.9300547300834316
info: {'demand': array([3.86237064, 1.66660444]), 'action': array([2.0144682, 1.5844522], dtype=float32), 'cost_per_SKU': array([1.84790245, 0.08215228])}
next observation: None
truncated: False
##### STEP:  3 #####
reward: -5.19810850869845
info: {'demand': array([3.45984581, 0.        ]), 'action': array([0.10056694, 0.9194148 ], dtype=float32), 'cost_per_SKU': array([3.35927887, 1.83882964])}
next observation: None
truncated: True

Example usage of [`NewsvendorEnv`](https://opimwue.github.io/ddopai/20_environments/21_envs_inventory/single_period_envs.html#newsvendorenv) using a fixed dataset:

from sklearn.datasets import make_regression
from sklearn.preprocessing import MinMaxScaler

from ddopai.dataloaders.tabular import XYDataLoader

# create a simple dataset bounded between 0 and 1.
# We just scale all the data, pretending that it is the demand.
# When using real data, one should only fit the scaler on the training data
X, Y = make_regression(n_samples=8, n_features=2, n_targets=2, noise=0.1, random_state=42)
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
Y = scaler.fit_transform(Y)

dataloader = XYDataLoader(X, Y, val_index_start = 4, test_index_start = 6)
test_env = NewsvendorEnv(underage_cost=np.array([1,1]), overage_cost=np.array([0.5,0.5]), dataloader=dataloader, horizon_train="use_all_data")

obs = test_env.reset(start_index=0)
print("#################### RESET ####################")

print("#################### RUN IN TRAIN MODE ####################")
run_test_loop(test_env)

print("#################### RUN IN VAL MODE ####################")
test_env.val()
run_test_loop(test_env)

print("#################### RUN IN TEST MODE ####################")
test_env.test()
run_test_loop(test_env)

print("#################### RUN IN TRAIN MODE AGAIN ####################")
test_env.train()
run_test_loop(test_env)

#################### RESET ####################
#################### RUN IN TRAIN MODE ####################
##### STEP:  1 #####
reward: -0.5507963668644685
info: {'demand': array([0.41801109, 0.41814421]), 'action': array([0.70588326, 0.01128393], dtype=float32), 'cost_per_SKU': array([0.14393609, 0.40686028])}
next observation: [0.51654708 0.67238019]
truncated: False
##### STEP:  2 #####
reward: -0.8714066300571378
info: {'demand': array([0.61617324, 0.52211535]), 'action': array([0.180223 , 1.3930281], dtype=float32), 'cost_per_SKU': array([0.43595024, 0.43545639])}
next observation: [0.71467365 0.37996181]
truncated: False
##### STEP:  3 #####
reward: -1.6119519129489481
info: {'demand': array([0.45242345, 0.60924132]), 'action': array([1.8277601, 2.4578085], dtype=float32), 'cost_per_SKU': array([0.68766832, 0.92428359])}
next observation: [0.78011439 1.        ]
truncated: True
#################### RUN IN VAL MODE ####################
##### STEP:  1 #####
reward: -0.5815800605970438
info: {'demand': array([0.        , 0.16760013]), 'action': array([0.11117006, 1.2195902 ], dtype=float32), 'cost_per_SKU': array([0.05558503, 0.52599503])}
next observation: [0.         0.59527916]
truncated: False
##### STEP:  2 #####
reward: -0.5828876160320575
info: {'demand': array([0.33549548, 0.        ]), 'action': array([0.4501956, 1.0510751], dtype=float32), 'cost_per_SKU': array([0.05735007, 0.52553755])}
next observation: None
truncated: True
#################### RUN IN TEST MODE ####################
##### STEP:  1 #####
reward: -0.7298214633019249
info: {'demand': array([0.3316407 , 0.33063685]), 'action': array([0.06531169, 1.2576218 ], dtype=float32), 'cost_per_SKU': array([0.266329  , 0.46349246])}
next observation: [1.         0.71807281]
truncated: False
##### STEP:  2 #####
reward: -0.5407586979670338
info: {'demand': array([0.8554925, 1.       ]), 'action': array([0.5619696, 1.4944715], dtype=float32), 'cost_per_SKU': array([0.29352292, 0.24723577])}
next observation: None
truncated: True
#################### RUN IN TRAIN MODE AGAIN ####################
##### STEP:  1 #####
reward: -0.9409223786788338
info: {'demand': array([0.41801109, 0.41814421]), 'action': array([1.3812015, 1.3367985], dtype=float32), 'cost_per_SKU': array([0.48159521, 0.45932717])}
next observation: [0.51654708 0.67238019]
truncated: False
##### STEP:  2 #####
reward: -0.7144824568212446
info: {'demand': array([0.61617324, 0.52211535]), 'action': array([0.07493836, 0.8686105 ], dtype=float32), 'cost_per_SKU': array([0.54123488, 0.17324757])}
next observation: [0.71467365 0.37996181]
truncated: False
##### STEP:  3 #####
reward: -1.2616030231212196
info: {'demand': array([0.45242345, 0.60924132]), 'action': array([0.84109116, 2.7437797 ], dtype=float32), 'cost_per_SKU': array([0.19433385, 1.06726917])}
next observation: [0.78011439 1.        ]
truncated: True

Newsvendor Env that can provide a variable service level

Static inventory environment where a decision only affects the next period (Newsvendor problem), but with a variable service level (random during training, fixed during testing)

source

NewsvendorEnvVariableSL

 NewsvendorEnvVariableSL
                          (sl_bound_low:Union[numpy.ndarray,ddopai.utils.P
                          arameter,int,float]=0.1, sl_bound_high:Union[num
                          py.ndarray,ddopai.utils.Parameter,int,float]=0.9
                          , sl_distribution:Literal['fixed','uniform']='fi
                          xed', evaluation_metric:Literal['pinball_loss','
                          quantile_loss']='quantile_loss', sl_test_val:Uni
                          on[numpy.ndarray,ddopai.utils.Parameter,int,floa
                          t]=None, underage_cost:Union[numpy.ndarray,ddopa
                          i.utils.Parameter,int,float]=1, overage_cost:Uni
                          on[numpy.ndarray,ddopai.utils.Parameter,int,floa
                          t]=1, q_bound_low:Union[numpy.ndarray,ddopai.uti
                          ls.Parameter,int,float]=0, q_bound_high:Union[nu
                          mpy.ndarray,ddopai.utils.Parameter,int,float]=in
                          f, dataloader:ddopai.dataloaders.base.BaseDataLo
                          ader=None, num_SKUs:int=None, gamma:float=1,
                          horizon_train:int|str='use_all_data',
                          postprocessors:list[object]|None=None,
                          mode:str='train', return_truncation:str=True,
                          SKUs_in_batch_dimension:bool=True)

	Type	Default	Details
sl_bound_low	Union	0.1	lower bound of the service level during training
sl_bound_high	Union	0.9	upper bound of the service level during training
sl_distribution	Literal	fixed	distribution of the random service level during training, if fixed then the service level is fixed to sl_test_val
evaluation_metric	Literal	quantile_loss	quantile loss is the generic quantile loss (independent of cost levels) while pinball loss uses the specific under- and overage costs
sl_test_val	Union	None	service level during test and validation, alternatively use cu and co
underage_cost	Union	1	underage cost per unit
overage_cost	Union	1	overage cost per unit
q_bound_low	Union	0	lower bound of the order quantity
q_bound_high	Union	inf	upper bound of the order quantity
dataloader	BaseDataLoader	None	dataloader
num_SKUs	int	None	if None it will be inferred from the DataLoader
gamma	float	1	discount factor
horizon_train	int \| str	use_all_data	if “use_all_data” then horizon is inferred from the DataLoader
postprocessors	list[object] \| None	None	default is empty list
mode	str	train	Initial mode (train, val, test) of the environment
return_truncation	str	True	whether to return a truncated condition in step function
SKUs_in_batch_dimension	bool	True	whether SKUs in the observation space are in the batch dimension (used for meta-learning)
Returns	None

source

NewsvendorEnvVariableSL.determine_cost

 NewsvendorEnvVariableSL.determine_cost (action:numpy.ndarray)

Determine the cost per SKU given the action taken. The cost is the sum of underage and overage costs.

	Type	Details
action	ndarray
Returns	ndarray

source

NewsvendorEnvVariableSL.set_observation_space

 NewsvendorEnvVariableSL.set_observation_space (shape:tuple,
                                                low:Union[numpy.ndarray,fl
                                                oat]=-inf, high:Union[nump
                                                y.ndarray,float]=inf,
                                                samples_dim_included=True)

Set the observation space of the environment. This is a standard function for simple observation spaces. For more complex observation spaces, this function should be overwritten. Note that it is assumped that the first dimension is n_samples that is not relevant for the observation space.

	Type	Default	Details
shape	tuple		shape of the dataloader features
low	Union	-inf	lower bound of the observation space
high	Union	inf	upper bound of the observation space
samples_dim_included	bool	True	whether the first dimension of the shape input is the number of samples
Returns	None

source

NewsvendorEnvVariableSL.draw_parameter

 NewsvendorEnvVariableSL.draw_parameter (distribution, sl_bound_low,
                                         sl_bound_high, samples)

	Details
distribution
sl_bound_low
sl_bound_high
samples

source

NewsvendorEnvVariableSL.get_observation

 NewsvendorEnvVariableSL.get_observation ()

Return the current observation. This function is for the simple case where the observation is only an x,y pair. For more complex observations, this function should be overwritten.

source

NewsvendorEnvVariableSL.check_evaluation_metric

 NewsvendorEnvVariableSL.check_evaluation_metric ()

source

NewsvendorEnvVariableSL.check_sl_distribution

 NewsvendorEnvVariableSL.check_sl_distribution ()

source

NewsvendorEnvVariableSL.set_val_test_sl

 NewsvendorEnvVariableSL.set_val_test_sl (sl_test_val)

	Details
sl_test_val