ddop.datasets.load_SID

ddop.datasets.load_SID(include_date=False, one_hot_encoding=False, label_encoding=False, return_X_y=False)

Load and return the store item demand dataset.

Dataset Characteristics:

Number of Instances

887284

Number of Targets

1

Number of Features

6

Target Information
  • ‘demand’ the corresponding demand observation

Feature Information
  • ‘date’ the date

  • ‘weekday’ the day of the week,

  • ‘month’ the month of the year,

  • ‘year’ the year,

  • ‘store’ the store id,

  • ‘item’ the item id

Parameters
  • include_date (bool, default=False) – Whether to include the demand date

  • one_hot_encoding (bool, default=False) – Whether to one hot encode categorical features

  • label_encoding (bool, default=False) – Whether to convert categorical columns (weekday, month, year) to continuous. Will only be applied if one_hot_encoding=False

  • return_X_y (bool, default=False.) – If True, returns (data, target) instead of a Bunch object. See below for more information about the data and target object.

Returns

  • data (sklearn Bunch) – Dictionary-like object, with the following attributes.

    dataPandas DataFrame of shape (887284, n_features)

    The data matrix.

    target: Pandas DataFrame of shape (887284, n_targets)

    The target values.

    n_features: int

    The number of features included

    n_targets: int

    The number of target variables included

    DESCR: str

    The full description of the dataset.

    data_filename: str

    The path to the location of the data.

    target_filename: str

    The path to the location of the target.

  • (data, target) (tuple if return_X_y is True)

Notes

The store item demand dataset was published within a demand forecasting challenge on kaggle [1]

References

1

https://www.kaggle.com/c/demand-forecasting-kernels-only/overview

Examples

>>> from ddop.datasets import load_SID
>>> X, y = load_SID(return_X_y=True)
>>> print(X.shape)
    (887284, 5)