ETL DB writers

Pre-specified nodes that write data into the database

Company-level

write_company_name

 write_company_name (name:str, additional_info:Dict=None,
                     ignore_company_if_exist:bool=True)

*This function writes the company name to the database and any additional info. Each key in additional_info becomes a column in the database table if it doesn’t exist, and the associated value is written to that column.

If ignore_company_if_exist is False and the company name already exists, an error is raised. If ignore_company_if_exist is True, a warning is logged and the existing record is updated if additional info differs.

Returns the ID that the database has assigned to the company name.*

Category level

Main function to write categories

write_categories

 write_categories (categories:dict, company_id:int,
                   category_level_names:list=None)

*This function writes the categories to the database.

The categories are expected to be in a dictionary with the following structure: {1: { “category_name”: None } 2: { “category_name”: [“parent_category_1”, “parent_category_2”, …] } … }

For the first level, the parent list should be None.

If a category on a lower level has another parent from 2 or more levels above, the cateogry should be listed under the lowest level parent. (such that the write db function can first write all parents and then the children)*

Helper functions for categories

write_category_level

 write_category_level (categories:list, company_id:int)

*This function writes one level of categories into the database and returns a list of the IDs that the database has assigned. The purpose is to call this function repeatedly for each level of categories.

It will add data to two tables: - categories with the category names and a flag if it is a leaf category - category_relations with the parent-child relationships between categories*

write_category_level_descriptions

 write_category_level_descriptions (category_level_names, company_id)

This function writes the names of the category levels to the database.

Product data

write_products

 write_products (products:pandas.core.frame.DataFrame, company_id:int)

*This function writes the products to the database.

The input must be a dataframe with the following structure: First column: product name (column name is irrelevant) Second column: category name (column name is irrelevant)

Note that each product may have more than one category.*

Store level

Main function to write stores

write_stores

 write_stores (store_regions:pandas.core.frame.DataFrame, company_id)

This function writes the store data to the database.

Helper functions for write_stores

add_region_ids

 add_region_ids (data, cur)

*Adds region IDs to the given DataFrame by mapping region, type, and country.

Args: data (pd.DataFrame): Input DataFrame containing region, type, and country columns. cur (psycopg2.cursor): Database cursor for querying region IDs.

Returns: pd.DataFrame: DataFrame with an additional region_id column.*

get_region_ids

 get_region_ids (cur, country, abbreviation, type_)

SKU writing

Main function

write_skus

 write_skus (store_item_combinations:pandas.core.frame.DataFrame,
             company_id:int)

*Writes SKU data to the database.

For each store-item combination: - Fetches the product ID by matching item_name in the products table, filtering by company. - Fetches the store ID by matching store_name in the stores table, filtering by company. - Inserts the resulting combinations into the sku_table.

Args: store_item_combinations (pd.DataFrame): DataFrame with columns store_name and item_name. company_id (int): ID of the company for filtering relevant data.

Raises: Exception: If any errors occur during the database operation.*

Helper functions for SKU writing

get_store_ids

 get_store_ids (cur, company_id, store_name_list)

*Fetch store IDs for a given company and a list of store names.

Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. company_id (int): ID of the company. store_name_list (list): List of store names.

Returns: pd.DataFrame: A DataFrame mapping store IDs to store names.*

get_product_ids

 get_product_ids (cur, company_id, item_name_list)

*Fetch product IDs for a given company and a list of item names.

Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. company_id (int): ID of the company. item_name_list (list): List of item names.

Returns: pd.DataFrame: A DataFrame mapping product IDs to item names.*

Datapoint-level data

Datapoint IDs

write_datapoints

 write_datapoints (sales:pandas.core.frame.DataFrame, company_id:int)

*Writes datapoints to the datapoints table in the database.

The datapoints consist of skuID and dateID, resolved based on store_name, item_name, and date.

Args: sales (pd.DataFrame): DataFrame containing store_name, item_name, date, and additional data. company_id (int): ID of the company for filtering relevant data.

Raises: ValueError: If any store_name, item_name, or date cannot be matched or if duplicate rows are found.*

Time-sku specific data

write_sold_flag

 write_sold_flag (sold_flags:pandas.core.frame.DataFrame, company_id,
                  datapoint_ids)

This function writes the sold flag data to the database.

write_prices

 write_prices (prices:pandas.core.frame.DataFrame, company_id,
               datapoint_ids)

This function writes the prices data to the database.

write_sales

 write_sales (sales:pandas.core.frame.DataFrame, company_id,
              datapoint_ids)

This function writes the sales data to the database.

write_SKU_date_specific_data

 write_SKU_date_specific_data (data:pandas.core.frame.DataFrame,
                               datapoint_ids, variable_name:str,
                               variable_type:<built-infunctioncallable>,
                               table_name:str, company_id:int,
                               name_in_df=None)

*Writes SKU and date-specific data to the database using the new datapointID schema.

Args: data (pd.DataFrame): Input data containing item_name, store_name, date, and the variable to insert. variable_name (str): The name of the variable/column to be inserted into the database. variable_type (callable): The type to cast the variable’s values (e.g., int, float, str). table_name (str): The name of the database table to insert into. company_id (int): The company ID for filtering relevant records. name_in_df (str, optional): Column name in the DataFrame for the variable. Defaults to variable_name.*

Helper functions for time-sku specific data

get_date_ids

 get_date_ids (cur, dates_list)

*Fetch date IDs for a given list of dates.

Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. dates_list (list): List of dates.

Returns: pd.DataFrame: A DataFrame mapping date IDs to dates.*

get_datapoint_ids

 get_datapoint_ids (cur,
                    datapoint_combinations:pandas.core.frame.DataFrame)

*Fetch datapointID for given combinations of skuID and dateID.

Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. datapoint_combinations (pd.DataFrame): DataFrame with two columns: skuID and dateID.

Returns: pd.DataFrame: DataFrame mapping datapointID, skuID, and dateID.*

get_sku_ids

 get_sku_ids (cur, store_product_names:pandas.core.frame.DataFrame,
              company_id:int)

*Fetch skuIDs for given combinations of store_name and item_name.

Args: cur (psycopg2.cursor): Database cursor for executing SQL commands. store_product_names (pd.DataFrame): DataFrame with two columns: store_name and item_name. company_id (int): ID of the company for filtering relevant data.

Returns: pd.DataFrame: DataFrame mapping skuID, storeID, productID, store_name, and item_name.*

Time-region specific data

write_time_region_features

 write_time_region_features
                             (time_region_features:pandas.core.frame.DataF
                             rame,
                             name_description:[<class'str'>,<class'str'>],
                             company_id:int)

This function writes data into the database whose values are specific to a time-stamps and regions

	Type	Details
time_region_features	DataFrame
name_description	[<class ‘str’>, <class ‘str’>]	containing name and description of the feature
company_id	int