= DatasetLoader()
datasetloader datasetloader.show_dataset_types()
Univariate datasets:
bakery
Multivariate datasets:
arma_10_10
arma_2_2
ar_1
We provide a range of synthetic and real-world datasets to enable reproducible research. Typically we have multiple datasets of the same dataset type (e.g., 16 multivariate datasets following an arma(10,10) process). The datasets are available in the releases of this repository. Below are automated functions that help to easily download those datasets. Three steps to load datasets:
Step 1: Create a DatasetLoader object: datasetloader = DatasetLoader()
Step 2: Check available dataset types: datasetloader.show_dataset_types(show_num_datasets_per_type=True)
Step 3: Load a dataset: data = datasetloader.load_dataset("arma_10_10", 1))
where the first string argument is the name of the dataset type and the second integer argument is the dataset number.
load_data_from_directory (dir)
unzip_file (zip_file_path, output_dir, delete_zip_file=True)
download_file_from_github (url, output_path, token=None)
get_asset_url (dataset_type, dataset_number, version='latest', token=None)
get_dataset_url (dataset_type, dataset_number, release_tag, token=None)
get_release_tag (dataset_type, version, token=None)
get_all_release_tags (token=None)
DatasetLoader ()
Class to load datasets from the GitHub repository.
DatasetLoader.show_dataset_types (show_num_datasets_per_type=False)
Show an overview of all dataset types available in the repository.
Type | Default | Details | |
---|---|---|---|
show_num_datasets_per_type | bool | False | Whether to show the number of datasets per type |
DatasetLoader.load_dataset (dataset_type:str, dataset_number:int, overwrite:bool=False, version:str='latest', token:str=None)
Load a dataset from the GitHub repository.
Type | Default | Details | |
---|---|---|---|
dataset_type | str | ||
dataset_number | int | ||
overwrite | bool | False | Whether to overwrite the dataset if it already exists |
version | str | latest | Which version of the dataset to load, “latest” or a specific version, |
token | str | None | GitHub token to enable more requests (otherwise limited to 60 requests per hour) |
Example usage:
Univariate datasets:
bakery
Multivariate datasets:
arma_10_10
arma_2_2
ar_1
download_test = True
if download_test:
data = datasetloader.load_dataset("bakery", 1) #arma_10_10 bakery
X = data["data_raw_features"]
y = data["data_raw_target"]
print(X.shape, y.shape)
(127575, 13) (127575, 1)
date | weekday | month | year | is_schoolholiday | is_holiday | is_holiday_next2days | store | product | rain | temperature | promotion_currentweek | promotion_lastweek | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-01-02 | FRI | JAN | 2016 | 1 | 0 | 0 | 2 | 101 | 11.9 | 2.1 | 0 | 0 |
1 | 2016-01-03 | SAT | JAN | 2016 | 1 | 0 | 0 | 2 | 101 | 4.1 | 2.6 | 0 | 0 |
2 | 2016-01-04 | SUN | JAN | 2016 | 1 | 0 | 1 | 2 | 101 | 7.9 | 3.2 | 0 | 0 |
3 | 2016-01-05 | MON | JAN | 2016 | 1 | 0 | 1 | 2 | 101 | 3.5 | 3.1 | 0 | 0 |
4 | 2016-01-06 | TUE | JAN | 2016 | 1 | 1 | 0 | 2 | 101 | 0.1 | 4.1 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
127570 | 2019-04-26 | THU | APR | 2019 | 1 | 0 | 0 | 71 | 110 | 4.9 | 8.0 | 0 | 0 |
127571 | 2019-04-27 | FRI | APR | 2019 | 1 | 0 | 0 | 71 | 110 | 6.1 | 7.8 | 0 | 0 |
127572 | 2019-04-28 | SAT | APR | 2019 | 0 | 0 | 0 | 71 | 110 | 1.0 | 6.5 | 0 | 0 |
127573 | 2019-04-29 | SUN | APR | 2019 | 0 | 0 | 1 | 71 | 110 | 9.1 | 6.5 | 0 | 0 |
127574 | 2019-04-30 | MON | APR | 2019 | 0 | 0 | 1 | 71 | 110 | 0.0 | 10.3 | 0 | 0 |
127575 rows × 13 columns