Single period inventory environments

Static inventory environment where a decision only affects the next period (Newsvendor problem)

source

NewsvendorEnv

 NewsvendorEnv
                (underage_cost:Union[numpy.ndarray,ddopai.utils.Parameter,
                int,float]=1, overage_cost:Union[numpy.ndarray,ddopai.util
                s.Parameter,int,float]=1, q_bound_low:Union[numpy.ndarray,
                ddopai.utils.Parameter,int,float]=0, q_bound_high:Union[nu
                mpy.ndarray,ddopai.utils.Parameter,int,float]=inf,
                dataloader:ddopai.dataloaders.base.BaseDataLoader=None,
                num_SKUs:int=None, gamma:float=1,
                horizon_train:int|str='use_all_data',
                postprocessors:list[object]|None=None, mode:str='train',
                return_truncation:str=True)

Class implementing the Newsvendor problem, working for the single- and multi-item case. If underage_cost and overage_cost are scalars and there are multiple SKUs, then the same cost is used for all SKUs. If underage_cost and overage_cost are arrays, then they must have the same length as the number of SKUs. Num_SKUs can be set as parameter or inferred from the DataLoader.

Type Default Details
underage_cost Union 1 underage cost per unit
overage_cost Union 1 overage cost per unit
q_bound_low Union 0 lower bound of the order quantity
q_bound_high Union inf upper bound of the order quantity
dataloader BaseDataLoader None dataloader
num_SKUs int None if None it will be inferred from the DataLoader
gamma float 1 discount factor
horizon_train int | str use_all_data if “use_all_data” then horizon is inferred from the DataLoader
postprocessors list[object] | None None default is empty list
mode str train Initial mode (train, val, test) of the environment
return_truncation str True whether to return a truncated condition in step function
Returns None

source

NewsvendorEnv.step_

 NewsvendorEnv.step_ (action:numpy.ndarray)

Step function implementing the Newsvendor logic. Note that the dataloader will return an observation and a demand, which will be relevant in the next period. The observation will be returned directly, while the demand will be temporarily stored under self.demand and used in the next step.

Type Details
action ndarray order quantity
Returns Tuple

source

NewsvendorEnv.determine_cost

 NewsvendorEnv.determine_cost (action:numpy.ndarray)

Determine the cost per SKU given the action taken. The cost is the sum of underage and overage costs.


source

NewsvendorEnv.update_cu_co

 NewsvendorEnv.update_cu_co (cu=None, co=None)

Example usage of [`NewsvendorEnv`](https://opimwue.github.io/ddopai/20_environments/21_envs_inventory/single_period_envs.html#newsvendorenv) with a distributional dataloader:

from ddopai.dataloaders.distribution import NormalDistributionDataLoader

def run_test_loop(env):
    truncated = False
    while not truncated:
        action = env.action_space.sample()
        obs, reward, terminated, truncated, info = env.step(action)
        print("##### STEP: ", env.index, "#####")
        print("reward:", reward)
        print("info:", info)
        print("next observation:", obs)
        print("truncated:", truncated)

dataloader = NormalDistributionDataLoader(mean=[4, 3], std=[1, 2], num_units=2)

test_env = NewsvendorEnv(underage_cost=1, overage_cost=2, dataloader=dataloader, horizon_train=3)

obs = test_env.reset(start_index=0)
print("##### RESET #####")

run_test_loop(test_env)
##### RESET #####
##### STEP:  1 #####
reward: -5.549075627672828
info: {'demand': array([2.48613144, 4.94828011]), 'action': array([0.2829225, 1.6024134], dtype=float32), 'cost_per_SKU': array([2.20320894, 3.34586669])}
next observation: None
truncated: False
##### STEP:  2 #####
reward: -1.9300547300834316
info: {'demand': array([3.86237064, 1.66660444]), 'action': array([2.0144682, 1.5844522], dtype=float32), 'cost_per_SKU': array([1.84790245, 0.08215228])}
next observation: None
truncated: False
##### STEP:  3 #####
reward: -5.19810850869845
info: {'demand': array([3.45984581, 0.        ]), 'action': array([0.10056694, 0.9194148 ], dtype=float32), 'cost_per_SKU': array([3.35927887, 1.83882964])}
next observation: None
truncated: True

Example usage of [`NewsvendorEnv`](https://opimwue.github.io/ddopai/20_environments/21_envs_inventory/single_period_envs.html#newsvendorenv) using a fixed dataset:

from sklearn.datasets import make_regression
from sklearn.preprocessing import MinMaxScaler

from ddopai.dataloaders.tabular import XYDataLoader
# create a simple dataset bounded between 0 and 1.
# We just scale all the data, pretending that it is the demand.
# When using real data, one should only fit the scaler on the training data
X, Y = make_regression(n_samples=8, n_features=2, n_targets=2, noise=0.1, random_state=42)
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
Y = scaler.fit_transform(Y)

dataloader = XYDataLoader(X, Y, val_index_start = 4, test_index_start = 6)
test_env = NewsvendorEnv(underage_cost=np.array([1,1]), overage_cost=np.array([0.5,0.5]), dataloader=dataloader, horizon_train="use_all_data")

obs = test_env.reset(start_index=0)
print("#################### RESET ####################")

print("#################### RUN IN TRAIN MODE ####################")
run_test_loop(test_env)

print("#################### RUN IN VAL MODE ####################")
test_env.val()
run_test_loop(test_env)

print("#################### RUN IN TEST MODE ####################")
test_env.test()
run_test_loop(test_env)

print("#################### RUN IN TRAIN MODE AGAIN ####################")
test_env.train()
run_test_loop(test_env)
#################### RESET ####################
#################### RUN IN TRAIN MODE ####################
##### STEP:  1 #####
reward: -0.5507963668644685
info: {'demand': array([0.41801109, 0.41814421]), 'action': array([0.70588326, 0.01128393], dtype=float32), 'cost_per_SKU': array([0.14393609, 0.40686028])}
next observation: [0.51654708 0.67238019]
truncated: False
##### STEP:  2 #####
reward: -0.8714066300571378
info: {'demand': array([0.61617324, 0.52211535]), 'action': array([0.180223 , 1.3930281], dtype=float32), 'cost_per_SKU': array([0.43595024, 0.43545639])}
next observation: [0.71467365 0.37996181]
truncated: False
##### STEP:  3 #####
reward: -1.6119519129489481
info: {'demand': array([0.45242345, 0.60924132]), 'action': array([1.8277601, 2.4578085], dtype=float32), 'cost_per_SKU': array([0.68766832, 0.92428359])}
next observation: [0.78011439 1.        ]
truncated: True
#################### RUN IN VAL MODE ####################
##### STEP:  1 #####
reward: -0.5815800605970438
info: {'demand': array([0.        , 0.16760013]), 'action': array([0.11117006, 1.2195902 ], dtype=float32), 'cost_per_SKU': array([0.05558503, 0.52599503])}
next observation: [0.         0.59527916]
truncated: False
##### STEP:  2 #####
reward: -0.5828876160320575
info: {'demand': array([0.33549548, 0.        ]), 'action': array([0.4501956, 1.0510751], dtype=float32), 'cost_per_SKU': array([0.05735007, 0.52553755])}
next observation: None
truncated: True
#################### RUN IN TEST MODE ####################
##### STEP:  1 #####
reward: -0.7298214633019249
info: {'demand': array([0.3316407 , 0.33063685]), 'action': array([0.06531169, 1.2576218 ], dtype=float32), 'cost_per_SKU': array([0.266329  , 0.46349246])}
next observation: [1.         0.71807281]
truncated: False
##### STEP:  2 #####
reward: -0.5407586979670338
info: {'demand': array([0.8554925, 1.       ]), 'action': array([0.5619696, 1.4944715], dtype=float32), 'cost_per_SKU': array([0.29352292, 0.24723577])}
next observation: None
truncated: True
#################### RUN IN TRAIN MODE AGAIN ####################
##### STEP:  1 #####
reward: -0.9409223786788338
info: {'demand': array([0.41801109, 0.41814421]), 'action': array([1.3812015, 1.3367985], dtype=float32), 'cost_per_SKU': array([0.48159521, 0.45932717])}
next observation: [0.51654708 0.67238019]
truncated: False
##### STEP:  2 #####
reward: -0.7144824568212446
info: {'demand': array([0.61617324, 0.52211535]), 'action': array([0.07493836, 0.8686105 ], dtype=float32), 'cost_per_SKU': array([0.54123488, 0.17324757])}
next observation: [0.71467365 0.37996181]
truncated: False
##### STEP:  3 #####
reward: -1.2616030231212196
info: {'demand': array([0.45242345, 0.60924132]), 'action': array([0.84109116, 2.7437797 ], dtype=float32), 'cost_per_SKU': array([0.19433385, 1.06726917])}
next observation: [0.78011439 1.        ]
truncated: True

Newsvendor Env that can provide a variable service level

Static inventory environment where a decision only affects the next period (Newsvendor problem), but with a variable service level (random during training, fixed during testing)


source

NewsvendorEnvVariableSL

 NewsvendorEnvVariableSL
                          (sl_bound_low:Union[numpy.ndarray,ddopai.utils.P
                          arameter,int,float]=0.1, sl_bound_high:Union[num
                          py.ndarray,ddopai.utils.Parameter,int,float]=0.9
                          , sl_distribution:Literal['fixed','uniform']='fi
                          xed', evaluation_metric:Literal['pinball_loss','
                          quantile_loss']='quantile_loss', sl_test_val:Uni
                          on[numpy.ndarray,ddopai.utils.Parameter,int,floa
                          t]=None, underage_cost:Union[numpy.ndarray,ddopa
                          i.utils.Parameter,int,float]=1, overage_cost:Uni
                          on[numpy.ndarray,ddopai.utils.Parameter,int,floa
                          t]=1, q_bound_low:Union[numpy.ndarray,ddopai.uti
                          ls.Parameter,int,float]=0, q_bound_high:Union[nu
                          mpy.ndarray,ddopai.utils.Parameter,int,float]=in
                          f, dataloader:ddopai.dataloaders.base.BaseDataLo
                          ader=None, num_SKUs:int=None, gamma:float=1,
                          horizon_train:int|str='use_all_data',
                          postprocessors:list[object]|None=None,
                          mode:str='train', return_truncation:str=True,
                          SKUs_in_batch_dimension:bool=True)

Class implementing the Newsvendor problem, working for the single- and multi-item case. If underage_cost and overage_cost are scalars and there are multiple SKUs, then the same cost is used for all SKUs. If underage_cost and overage_cost are arrays, then they must have the same length as the number of SKUs. Num_SKUs can be set as parameter or inferred from the DataLoader.

Type Default Details
sl_bound_low Union 0.1 lower bound of the service level during training
sl_bound_high Union 0.9 upper bound of the service level during training
sl_distribution Literal fixed distribution of the random service level during training, if fixed then the service level is fixed to sl_test_val
evaluation_metric Literal quantile_loss quantile loss is the generic quantile loss (independent of cost levels) while pinball loss uses the specific under- and overage costs
sl_test_val Union None service level during test and validation, alternatively use cu and co
underage_cost Union 1 underage cost per unit
overage_cost Union 1 overage cost per unit
q_bound_low Union 0 lower bound of the order quantity
q_bound_high Union inf upper bound of the order quantity
dataloader BaseDataLoader None dataloader
num_SKUs int None if None it will be inferred from the DataLoader
gamma float 1 discount factor
horizon_train int | str use_all_data if “use_all_data” then horizon is inferred from the DataLoader
postprocessors list[object] | None None default is empty list
mode str train Initial mode (train, val, test) of the environment
return_truncation str True whether to return a truncated condition in step function
SKUs_in_batch_dimension bool True whether SKUs in the observation space are in the batch dimension (used for meta-learning)
Returns None

source

NewsvendorEnvVariableSL.determine_cost

 NewsvendorEnvVariableSL.determine_cost (action:numpy.ndarray)

Determine the cost per SKU given the action taken. The cost is the sum of underage and overage costs.

Type Details
action ndarray
Returns ndarray

source

NewsvendorEnvVariableSL.set_observation_space

 NewsvendorEnvVariableSL.set_observation_space (shape:tuple,
                                                low:Union[numpy.ndarray,fl
                                                oat]=-inf, high:Union[nump
                                                y.ndarray,float]=inf,
                                                samples_dim_included=True)

Set the observation space of the environment. This is a standard function for simple observation spaces. For more complex observation spaces, this function should be overwritten. Note that it is assumped that the first dimension is n_samples that is not relevant for the observation space.

Type Default Details
shape tuple shape of the dataloader features
low Union -inf lower bound of the observation space
high Union inf upper bound of the observation space
samples_dim_included bool True whether the first dimension of the shape input is the number of samples
Returns None

source

NewsvendorEnvVariableSL.draw_parameter

 NewsvendorEnvVariableSL.draw_parameter (distribution, sl_bound_low,
                                         sl_bound_high, samples)
Details
distribution
sl_bound_low
sl_bound_high
samples

source

NewsvendorEnvVariableSL.get_observation

 NewsvendorEnvVariableSL.get_observation ()

Return the current observation. This function is for the simple case where the observation is only an x,y pair. For more complex observations, this function should be overwritten.


source

NewsvendorEnvVariableSL.check_evaluation_metric

 NewsvendorEnvVariableSL.check_evaluation_metric ()

source

NewsvendorEnvVariableSL.check_sl_distribution

 NewsvendorEnvVariableSL.check_sl_distribution ()

source

NewsvendorEnvVariableSL.set_val_test_sl

 NewsvendorEnvVariableSL.set_val_test_sl (sl_test_val)
Details
sl_test_val