ETL DB writers
Company-level
write_company_name
write_company_name (name:str, additional_info:Dict=None, ignore_company_if_exist:bool=True)
*This function writes the company name to the database and any additional info. Each key in additional_info
becomes a column in the database table if it doesn’t exist, and the associated value is written to that column.
If ignore_company_if_exist
is False and the company name already exists, an error is raised. If ignore_company_if_exist
is True, a warning is logged and the existing record is updated if additional info differs.
Returns the ID that the database has assigned to the company name.*
Category level
Main function to write categories
write_categories
write_categories (categories:dict, company_id:int, category_level_names:list=None)
*This function writes the categories to the database.
The categories are expected to be in a dictionary with the following structure: {1: { “category_name”: None } 2: { “category_name”: [“parent_category_1”, “parent_category_2”, …] } … }
For the first level, the parent list should be None.
If a category on a lower level has another parent from 2 or more levels above, the cateogry should be listed under the lowest level parent. (such that the write db function can first write all parents and then the children)*
Helper functions for categories
write_category_level
write_category_level (categories:list, company_id:int)
*This function writes one level of categories into the database and returns a list of the IDs that the database has assigned. The purpose is to call this function repeatedly for each level of categories.
It will add data to two tables: - categories
with the category names and a flag if it is a leaf category - category_relations
with the parent-child relationships between categories*
write_category_level_descriptions
write_category_level_descriptions (category_level_names, company_id)
This function writes the names of the category levels to the database.
Product data
write_products
write_products (products:pandas.core.frame.DataFrame, company_id:int)
*This function writes the products to the database.
The input must be a dataframe with the following structure: First column: product name (column name is irrelevant) Second column: category name (column name is irrelevant)
Note that each product may have more than one category.*
Store level
Main function to write stores
write_stores
write_stores (store_regions:pandas.core.frame.DataFrame, company_id)
This function writes the store data to the database.
Helper functions for write_stores
add_region_ids
add_region_ids (data, cur)
*Adds region IDs to the given DataFrame by mapping region, type, and country.
Args: data (pd.DataFrame): Input DataFrame containing region
, type
, and country
columns. cur (psycopg2.cursor): Database cursor for querying region IDs.
Returns: pd.DataFrame: DataFrame with an additional region_id
column.*
get_region_ids
get_region_ids (cur, country, abbreviation, type_)
SKU writing
Main function
write_skus
write_skus (store_item_combinations:pandas.core.frame.DataFrame, company_id:int)
*Writes SKU data to the database.
For each store-item combination: - Fetches the product ID by matching item_name
in the products
table, filtering by company. - Fetches the store ID by matching store_name
in the stores
table, filtering by company. - Inserts the resulting combinations into the sku_table
.
Args: store_item_combinations (pd.DataFrame): DataFrame with columns store_name
and item_name
. company_id (int): ID of the company for filtering relevant data.
Raises: Exception: If any errors occur during the database operation.*
Helper functions for SKU writing
get_store_ids
get_store_ids (cur, company_id, store_name_list)
*Fetch store IDs for a given company and a list of store names.
Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. company_id (int): ID of the company. store_name_list (list): List of store names.
Returns: pd.DataFrame: A DataFrame mapping store IDs to store names.*
get_product_ids
get_product_ids (cur, company_id, item_name_list)
*Fetch product IDs for a given company and a list of item names.
Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. company_id (int): ID of the company. item_name_list (list): List of item names.
Returns: pd.DataFrame: A DataFrame mapping product IDs to item names.*
Datapoint-level data
Datapoint IDs
write_datapoints
write_datapoints (sales:pandas.core.frame.DataFrame, company_id:int)
*Writes datapoints to the datapoints
table in the database.
The datapoints consist of skuID
and dateID
, resolved based on store_name
, item_name
, and date
.
Args: sales (pd.DataFrame): DataFrame containing store_name
, item_name
, date
, and additional data. company_id (int): ID of the company for filtering relevant data.
Raises: ValueError: If any store_name
, item_name
, or date
cannot be matched or if duplicate rows are found.*
Time-sku specific data
write_sold_flag
write_sold_flag (sold_flags:pandas.core.frame.DataFrame, company_id, datapoint_ids)
This function writes the sold flag data to the database.
write_prices
write_prices (prices:pandas.core.frame.DataFrame, company_id, datapoint_ids)
This function writes the prices data to the database.
write_sales
write_sales (sales:pandas.core.frame.DataFrame, company_id, datapoint_ids)
This function writes the sales data to the database.
write_SKU_date_specific_data
write_SKU_date_specific_data (data:pandas.core.frame.DataFrame, datapoint_ids, variable_name:str, variable_type:<built-infunctioncallable>, table_name:str, company_id:int, name_in_df=None)
*Writes SKU and date-specific data to the database using the new datapointID
schema.
Args: data (pd.DataFrame): Input data containing item_name
, store_name
, date
, and the variable to insert. variable_name (str): The name of the variable/column to be inserted into the database. variable_type (callable): The type to cast the variable’s values (e.g., int, float, str). table_name (str): The name of the database table to insert into. company_id (int): The company ID for filtering relevant records. name_in_df (str, optional): Column name in the DataFrame for the variable. Defaults to variable_name
.*
Helper functions for time-sku specific data
get_date_ids
get_date_ids (cur, dates_list)
*Fetch date IDs for a given list of dates.
Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. dates_list (list): List of dates.
Returns: pd.DataFrame: A DataFrame mapping date IDs to dates.*
get_datapoint_ids
get_datapoint_ids (cur, datapoint_combinations:pandas.core.frame.DataFrame)
*Fetch datapointID
for given combinations of skuID
and dateID
.
Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. datapoint_combinations (pd.DataFrame): DataFrame with two columns: skuID
and dateID
.
Returns: pd.DataFrame: DataFrame mapping datapointID
, skuID
, and dateID
.*
get_sku_ids
get_sku_ids (cur, store_product_names:pandas.core.frame.DataFrame, company_id:int)
*Fetch skuIDs for given combinations of store_name
and item_name
.
Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. store_product_names (pd.DataFrame): DataFrame with two columns: store_name
and item_name
. company_id (int): ID of the company for filtering relevant data.
Returns: pd.DataFrame: DataFrame mapping skuID
, storeID
, productID
, store_name
, and item_name
.*
Time-region specific data
write_time_region_features
write_time_region_features (time_region_features:pandas.core.frame.DataF rame, name_description:[<class'str'>,<class'str'>], company_id:int)
This function writes data into the database whose values are specific to a time-stamps and regions
Type | Details | |
---|---|---|
time_region_features | DataFrame | |
name_description | [<class ‘str’>, <class ‘str’>] | containing name and description of the feature |
company_id | int |