Quick Start#

ForecastFlowML is designed for scaleable forecasting and uses Spark for both feature engineering and training/prediction/hyperparameter optimisation.

Use Cases#

ForecastFlowML can generally be used for three use cases:

  • Data is stored in a PySpark DataFrame, and we need to paralelly build many/big group models which does not fit into driver memory.

  • Data is stored in a PySpark DataFrame, and we need to paralelly build a few/small group models which fits into driver memory.

  • Data is stored in a Pandas DataFrame, and we need to paralelly build a few/small group models which fits into driver memory.

This quick guide shows how you can develop a scaleable forecasting system on Kaggle Walmart M5 Competition sample dataset.

Goal#

  • Build independent models for each of the stores in the dataset.

  • Parallelize training/inference steps.

  • Use LightGBM as machine learning algorithm.

  • Utilize direct multi-step forecasting approach.

  • Perform backtesting.

Import Packages#

from forecastflowml import ForecastFlowML
from forecastflowml import FeatureExtractor
from forecastflowml.data.loader import load_walmart_m5
from lightgbm import LGBMRegressor
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import plotly.express as px
import plotly.io as pio
import pandas as pd

pd.set_option("display.max_columns", 40)

Initialize Spark#

spark = (
    SparkSession.builder.master("local[4]")
    .config("spark.driver.memory", "8g")
    .config("spark.sql.shuffle.partitions", "4")
    .config("spark.sql.execution.arrow.enabled", "true")
    .getOrCreate()
)

Sample Dataset#

df = load_walmart_m5(spark)
df.show(10)
+--------------------+-----------+-------+------+--------+--------+----------+-----+
|                  id|    item_id|dept_id|cat_id|store_id|state_id|      date|sales|
+--------------------+-----------+-------+------+--------+--------+----------+-----+
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-01-29|  2.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-01-30|  5.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-01-31|  3.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-02-01|  0.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-02-02|  0.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-02-03|  0.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-02-04|  0.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-02-05|  1.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-02-06|  0.0|
|FOODS_1_013_TX_2_...|FOODS_1_013|FOODS_1| FOODS|    TX_2|      TX|2011-02-07|  3.0|
+--------------------+-----------+-------+------+--------+--------+----------+-----+
only showing top 10 rows

Feature Engineering#

feature_extractor = FeatureExtractor(
    id_col="id",
    date_col="date",
    target_col="sales",
    lag_window_features={
        "lag": [7 * (i + 1) for i in range(8)],
        "mean": [[window, lag] for lag in [7, 14, 21, 28] for window in [7, 14, 30]],
    },
    date_features=[
        "day_of_month",
        "day_of_week",
        "week_of_year",
        "quarter",
        "month",
        "year",
    ],
    count_consecutive_values={
        "value": 0,
        "lags": [7, 14, 21, 28],
    },
    history_length=True,
)

Pandas DataFrame#

feature_extractor.transform(df.toPandas(), spark=spark)
id item_id dept_id cat_id store_id state_id date sales lag_7 lag_14 lag_21 lag_28 lag_35 lag_42 lag_49 lag_56 window_7_lag_7_mean window_14_lag_7_mean window_30_lag_7_mean window_7_lag_14_mean window_14_lag_14_mean window_30_lag_14_mean window_7_lag_21_mean window_14_lag_21_mean window_30_lag_21_mean window_7_lag_28_mean window_14_lag_28_mean window_30_lag_28_mean count_consecutive_value_lag_7 count_consecutive_value_lag_14 count_consecutive_value_lag_21 count_consecutive_value_lag_28 history_length day_of_month day_of_week week_of_year quarter month year
0 FOODS_1_011_WI_2_evaluation FOODS_1_011 FOODS_1 FOODS WI_2 WI 2011-01-31 2.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 31 2 5 1 1 2011
1 FOODS_1_011_WI_2_evaluation FOODS_1_011 FOODS_1 FOODS WI_2 WI 2011-02-01 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 1 3 5 1 2 2011
2 FOODS_1_011_WI_2_evaluation FOODS_1_011 FOODS_1 FOODS WI_2 WI 2011-02-02 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 2 4 5 1 2 2011
3 FOODS_1_011_WI_2_evaluation FOODS_1_011 FOODS_1 FOODS WI_2 WI 2011-02-03 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 3 5 5 1 2 2011
4 FOODS_1_011_WI_2_evaluation FOODS_1_011 FOODS_1 FOODS WI_2 WI 2011-02-04 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5 4 6 5 1 2 2011
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1470899 HOUSEHOLD_2_514_WI_3_evaluation HOUSEHOLD_2_514 HOUSEHOLD_2 HOUSEHOLD WI_3 WI 2016-05-18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.071429 0.166667 0.142857 0.142857 0.166667 0.142857 0.142857 0.166667 0.142857 0.214286 0.133333 9.0 2.0 5.0 6.0 1936 18 4 20 2 5 2016
1470900 HOUSEHOLD_2_514_WI_3_evaluation HOUSEHOLD_2_514 HOUSEHOLD_2 HOUSEHOLD WI_3 WI 2016-05-19 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.071429 0.100000 0.142857 0.142857 0.166667 0.142857 0.071429 0.166667 0.000000 0.214286 0.133333 10.0 3.0 6.0 7.0 1937 19 5 20 2 5 2016
1470901 HOUSEHOLD_2_514_WI_3_evaluation HOUSEHOLD_2_514 HOUSEHOLD_2 HOUSEHOLD WI_3 WI 2016-05-20 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.071429 0.100000 0.142857 0.071429 0.166667 0.000000 0.071429 0.166667 0.142857 0.285714 0.166667 11.0 4.0 7.0 0.0 1938 20 6 20 2 5 2016
1470902 HOUSEHOLD_2_514_WI_3_evaluation HOUSEHOLD_2_514 HOUSEHOLD_2 HOUSEHOLD WI_3 WI 2016-05-21 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.071429 0.066667 0.142857 0.071429 0.166667 0.000000 0.071429 0.166667 0.142857 0.285714 0.166667 12.0 5.0 8.0 1.0 1939 21 7 20 2 5 2016
1470903 HOUSEHOLD_2_514_WI_3_evaluation HOUSEHOLD_2_514 HOUSEHOLD_2 HOUSEHOLD WI_3 WI 2016-05-22 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.071429 0.066667 0.142857 0.071429 0.166667 0.000000 0.071429 0.166667 0.142857 0.285714 0.166667 13.0 6.0 9.0 2.0 1940 22 1 20 2 5 2016

1470904 rows × 39 columns

PySpark DataFrame#

df_features = feature_extractor.transform(df).localCheckpoint()
df_features.show(10)
+--------------------+-----------+-------+------+--------+--------+----------+-----+-----+------+------+------+------+------+------+------+-------------------+--------------------+--------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+-----------------------------+------------------------------+------------------------------+------------------------------+--------------+------------+-----------+------------+-------+-----+----+
|                  id|    item_id|dept_id|cat_id|store_id|state_id|      date|sales|lag_7|lag_14|lag_21|lag_28|lag_35|lag_42|lag_49|lag_56|window_7_lag_7_mean|window_14_lag_7_mean|window_30_lag_7_mean|window_7_lag_14_mean|window_14_lag_14_mean|window_30_lag_14_mean|window_7_lag_21_mean|window_14_lag_21_mean|window_30_lag_21_mean|window_7_lag_28_mean|window_14_lag_28_mean|window_30_lag_28_mean|count_consecutive_value_lag_7|count_consecutive_value_lag_14|count_consecutive_value_lag_21|count_consecutive_value_lag_28|history_length|day_of_month|day_of_week|week_of_year|quarter|month|year|
+--------------------+-----------+-------+------+--------+--------+----------+-----+-----+------+------+------+------+------+------+------+-------------------+--------------------+--------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+-----------------------------+------------------------------+------------------------------+------------------------------+--------------+------------+-----------+------------+-------+-----+----+
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-01-31|  2.0| null|  null|  null|  null|  null|  null|  null|  null|               null|                null|                null|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                         null|                          null|                          null|                          null|             1|          31|          2|           5|      1|    1|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-01|  0.0| null|  null|  null|  null|  null|  null|  null|  null|               null|                null|                null|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                         null|                          null|                          null|                          null|             2|           1|          3|           5|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-02|  0.0| null|  null|  null|  null|  null|  null|  null|  null|               null|                null|                null|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                         null|                          null|                          null|                          null|             3|           2|          4|           5|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-03|  0.0| null|  null|  null|  null|  null|  null|  null|  null|               null|                null|                null|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                         null|                          null|                          null|                          null|             4|           3|          5|           5|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-04|  0.0| null|  null|  null|  null|  null|  null|  null|  null|               null|                null|                null|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                         null|                          null|                          null|                          null|             5|           4|          6|           5|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-05|  0.0| null|  null|  null|  null|  null|  null|  null|  null|               null|                null|                null|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                         null|                          null|                          null|                          null|             6|           5|          7|           5|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-06|  1.0| null|  null|  null|  null|  null|  null|  null|  null|               null|                null|                null|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                         null|                          null|                          null|                          null|             7|           6|          1|           5|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-07|  0.0|  2.0|  null|  null|  null|  null|  null|  null|  null|                2.0|                 2.0|                 2.0|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                            0|                          null|                          null|                          null|             8|           7|          2|           6|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-08|  0.0|  0.0|  null|  null|  null|  null|  null|  null|  null|                1.0|                 1.0|                 1.0|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                            1|                          null|                          null|                          null|             9|           8|          3|           6|      1|    2|2011|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2011-02-09|  0.0|  0.0|  null|  null|  null|  null|  null|  null|  null| 0.6666666666666666|  0.6666666666666666|  0.6666666666666666|                null|                 null|                 null|                null|                 null|                 null|                null|                 null|                 null|                            2|                          null|                          null|                          null|            10|           9|          4|           6|      1|    2|2011|
+--------------------+-----------+-------+------+--------+--------+----------+-----+-----+------+------+------+------+------+------+------+-------------------+--------------------+--------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+-----------------------------+------------------------------+------------------------------+------------------------------+--------------+------------+-----------+------------+-------+-----+----+
only showing top 10 rows

Training/Test Dataset#

df_train = df_features.filter(F.col("date") < "2016-04-25")
df_test = df_features.filter(F.col("date") >= "2016-04-25")

Training#

forecast_flow = ForecastFlowML(
    group_col="store_id",
    id_col="id",
    date_col="date",
    target_col="sales",
    categorical_cols=["item_id", "dept_id", "cat_id"],
    date_frequency="days",
    model_horizon=7,
    max_forecast_horizon=28,
    model=LGBMRegressor(),
)

PySpark DataFrame with Distributed Results#

trained_models = forecast_flow.train(df_train).localCheckpoint()
trained_models.show()
+-----+--------------------+--------------------+--------------------+--------------------+---------------+
|group|    forecast_horizon|               model|          start_time|            end_time|elapsed_seconds|
+-----+--------------------+--------------------+--------------------+--------------------+---------------+
| CA_2|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            3.8|
| CA_3|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            3.2|
| WI_2|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            3.2|
| WI_3|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            2.9|
| CA_1|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            4.3|
| CA_4|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            3.5|
| TX_1|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            3.2|
| TX_3|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            3.0|
| WI_1|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            2.0|
| TX_2|[[1, 2, 3, 4, 5, ...|[€clightgbm.skle...|01-May-2023 (03:2...|01-May-2023 (03:2...|            3.8|
+-----+--------------------+--------------------+--------------------+--------------------+---------------+

PySpark DataFrame with Local Results#

forecast_flow.train(df_train, local_result=True)
forecast_flow.model_
group forecast_horizon model start_time end_time elapsed_seconds
0 CA_2 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:31) 01-May-2023 (03:22:38) 6.6
1 CA_3 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:38) 01-May-2023 (03:22:42) 3.6
2 WI_2 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:42) 01-May-2023 (03:22:47) 5.1
3 WI_3 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:47) 01-May-2023 (03:22:51) 3.2
4 CA_1 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:30) 01-May-2023 (03:22:37) 7.5
5 CA_4 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:38) 01-May-2023 (03:22:41) 3.8
6 TX_1 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:42) 01-May-2023 (03:22:47) 5.3
7 TX_3 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:48) 01-May-2023 (03:22:51) 3.4
8 WI_1 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:51) 01-May-2023 (03:22:54) 2.4
9 TX_2 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:22:28) 01-May-2023 (03:22:33) 4.7

Pandas DataFrame#

forecast_flow.train(df_train.toPandas(), spark=spark)
forecast_flow.model_
group forecast_horizon model start_time end_time elapsed_seconds
0 CA_2 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:16) 01-May-2023 (03:23:21) 4.4
1 CA_3 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:21) 01-May-2023 (03:23:25) 3.4
2 WI_2 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:25) 01-May-2023 (03:23:28) 3.0
3 WI_3 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:28) 01-May-2023 (03:23:32) 3.3
4 CA_1 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:14) 01-May-2023 (03:23:20) 5.8
5 CA_4 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:21) 01-May-2023 (03:23:24) 3.3
6 TX_1 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:24) 01-May-2023 (03:23:28) 3.4
7 TX_3 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:28) 01-May-2023 (03:23:32) 3.4
8 WI_1 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:32) 01-May-2023 (03:23:34) 2.2
9 TX_2 [[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13,... [€clightgbm.sklearn\nLGBMRegressor\nq)q}q... 01-May-2023 (03:23:12) 01-May-2023 (03:23:17) 5.0

Prediction#

PySpark DataFrame#

forecast = forecast_flow.predict(df_test, trained_models).localCheckpoint()
forecast.show(10)
+-----+--------------------+----------+----------+
|group|                  id|      date|prediction|
+-----+--------------------+----------+----------+
| CA_2|FOODS_1_179_CA_2_...|2016-04-25|  0.481568|
| CA_2|FOODS_1_179_CA_2_...|2016-04-26|0.46724537|
| CA_2|FOODS_1_179_CA_2_...|2016-04-27|0.41596597|
| CA_2|FOODS_1_179_CA_2_...|2016-04-28|0.40775877|
| CA_2|FOODS_1_179_CA_2_...|2016-04-29|0.43439913|
| CA_2|FOODS_1_179_CA_2_...|2016-04-30| 0.4951446|
| CA_2|FOODS_1_179_CA_2_...|2016-05-01| 0.4308696|
| CA_2|FOODS_1_192_CA_2_...|2016-04-25| 0.2172628|
| CA_2|FOODS_1_192_CA_2_...|2016-04-26| 0.1687214|
| CA_2|FOODS_1_192_CA_2_...|2016-04-27| 0.1687214|
+-----+--------------------+----------+----------+
only showing top 10 rows
df_test.show()
+--------------------+-----------+-------+------+--------+--------+----------+-----+-----+------+------+------+------+------+------+------+-------------------+--------------------+--------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+-----------------------------+------------------------------+------------------------------+------------------------------+--------------+------------+-----------+------------+-------+-----+----+
|                  id|    item_id|dept_id|cat_id|store_id|state_id|      date|sales|lag_7|lag_14|lag_21|lag_28|lag_35|lag_42|lag_49|lag_56|window_7_lag_7_mean|window_14_lag_7_mean|window_30_lag_7_mean|window_7_lag_14_mean|window_14_lag_14_mean|window_30_lag_14_mean|window_7_lag_21_mean|window_14_lag_21_mean|window_30_lag_21_mean|window_7_lag_28_mean|window_14_lag_28_mean|window_30_lag_28_mean|count_consecutive_value_lag_7|count_consecutive_value_lag_14|count_consecutive_value_lag_21|count_consecutive_value_lag_28|history_length|day_of_month|day_of_week|week_of_year|quarter|month|year|
+--------------------+-----------+-------+------+--------+--------+----------+-----+-----+------+------+------+------+------+------+------+-------------------+--------------------+--------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+-----------------------------+------------------------------+------------------------------+------------------------------+--------------+------------+-----------+------------+-------+-----+----+
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-04-25|  0.0|  0.0|   0.0|   1.0|   0.0|   0.0|   1.0|   0.0|   0.0|                1.0|  0.5714285714285714|                 0.8| 0.14285714285714285|   0.7857142857142857|                  0.8|  1.4285714285714286|   0.7142857142857143|   0.9666666666666667|                 0.0|   0.5714285714285714|   0.7333333333333333|                            1|                             5|                             0|                             8|          1912|          25|          2|          17|      2|    4|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-04-26|  2.0|  0.0|   0.0|   0.0|   0.0|   0.0|   0.0|   4.0|   0.0|                1.0|  0.5714285714285714|                 0.6| 0.14285714285714285|   0.7857142857142857|   0.6666666666666666|  1.4285714285714286|   0.7142857142857143|   0.9666666666666667|                 0.0|   0.5714285714285714|   0.6333333333333333|                            2|                             6|                             1|                             9|          1913|          26|          3|          17|      2|    4|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-04-27|  0.0|  2.0|   0.0|   1.0|   0.0|   0.0|   1.0|   2.0|   0.0| 1.2857142857142858|  0.6428571428571429|  0.6666666666666666|                 0.0|   0.7857142857142857|   0.6333333333333333|  1.5714285714285714|   0.7857142857142857|                  1.0|                 0.0|                  0.5|   0.6333333333333333|                            0|                             7|                             0|                            10|          1914|          27|          4|          17|      2|    4|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-04-28|  1.0|  0.0|   4.0|   0.0|   0.0|   0.0|   1.0|   0.0|   0.0| 0.7142857142857143|  0.6428571428571429|  0.6666666666666666|  0.5714285714285714|   1.0714285714285714|   0.7666666666666667|  1.5714285714285714|   0.7857142857142857|   0.8666666666666667|                 0.0|  0.42857142857142855|   0.6333333333333333|                            1|                             0|                             1|                            11|          1915|          28|          5|          17|      2|    4|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-04-29|  0.0|  0.0|   0.0|   0.0|   4.0|   0.0|   0.0|   0.0|   0.0| 0.7142857142857143|  0.6428571428571429|  0.6666666666666666|  0.5714285714285714|   0.7857142857142857|   0.7333333333333333|                 1.0|   0.7857142857142857|                  0.8|  0.5714285714285714|   0.7142857142857143|   0.7666666666666667|                            2|                             1|                             2|                             0|          1916|          29|          6|          17|      2|    4|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-04-30|  1.0|  2.0|   0.0|   0.0|   0.0|   0.0|   0.0|   0.0|   0.0|                1.0|  0.7857142857142857|  0.7333333333333333|  0.5714285714285714|   0.7857142857142857|                  0.7|                 1.0|   0.7857142857142857|                  0.8|  0.5714285714285714|   0.7142857142857143|   0.7666666666666667|                            0|                             2|                             3|                             1|          1917|          30|          7|          17|      2|    4|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-01|  4.0|  0.0|   3.0|   0.0|   5.0|   0.0|   6.0|   4.0|   0.0| 0.5714285714285714|  0.7857142857142857|  0.7333333333333333|                 1.0|   0.6428571428571429|                  0.8|  0.2857142857142857|   0.7857142857142857|                  0.8|  1.2857142857142858|   0.6428571428571429|   0.9333333333333333|                            1|                             0|                             4|                             0|          1918|           1|          1|          17|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-02|  0.0|  0.0|   0.0|   0.0|   1.0|   0.0|   0.0|   1.0|   0.0| 0.5714285714285714|  0.7857142857142857|  0.7333333333333333|                 1.0|   0.5714285714285714|                  0.8| 0.14285714285714285|   0.7857142857142857|                  0.8|  1.4285714285714286|   0.7142857142857143|   0.9666666666666667|                            2|                             1|                             5|                             0|          1919|           2|          2|          18|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-03|  0.0|  2.0|   0.0|   0.0|   0.0|   0.0|   0.0|   0.0|   4.0| 0.8571428571428571|  0.9285714285714286|                 0.8|                 1.0|   0.5714285714285714|                  0.6| 0.14285714285714285|   0.7857142857142857|   0.6666666666666666|  1.4285714285714286|   0.7142857142857143|   0.9666666666666667|                            0|                             2|                             6|                             1|          1920|           3|          3|          18|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-04|  0.0|  0.0|   2.0|   0.0|   1.0|   0.0|   0.0|   1.0|   2.0| 0.5714285714285714|  0.9285714285714286|                 0.8|  1.2857142857142858|   0.6428571428571429|   0.6666666666666666|                 0.0|   0.7857142857142857|   0.6333333333333333|  1.5714285714285714|   0.7857142857142857|                  1.0|                            1|                             0|                             7|                             0|          1921|           4|          4|          18|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-05|  0.0|  1.0|   0.0|   4.0|   0.0|   0.0|   0.0|   1.0|   0.0| 0.7142857142857143|  0.7142857142857143|  0.8333333333333334|  0.7142857142857143|   0.6428571428571429|   0.6666666666666666|  0.5714285714285714|   1.0714285714285714|   0.7666666666666667|  1.5714285714285714|   0.7857142857142857|   0.8666666666666667|                            0|                             1|                             0|                             1|          1922|           5|          5|          18|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-06|  4.0|  0.0|   0.0|   0.0|   0.0|   4.0|   0.0|   0.0|   0.0| 0.7142857142857143|  0.7142857142857143|  0.8333333333333334|  0.7142857142857143|   0.6428571428571429|   0.6666666666666666|  0.5714285714285714|   0.7857142857142857|   0.7333333333333333|                 1.0|   0.7857142857142857|                  0.8|                            1|                             2|                             1|                             2|          1923|           6|          6|          18|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-07|  0.0|  1.0|   2.0|   0.0|   0.0|   0.0|   0.0|   0.0|   0.0| 0.5714285714285714|  0.7857142857142857|  0.8666666666666667|                 1.0|   0.7857142857142857|   0.7333333333333333|  0.5714285714285714|   0.7857142857142857|                  0.7|                 1.0|   0.7857142857142857|                  0.8|                            0|                             0|                             2|                             3|          1924|           7|          7|          18|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-08|  0.0|  4.0|   0.0|   3.0|   0.0|   5.0|   0.0|   6.0|   4.0| 1.1428571428571428|  0.8571428571428571|  0.8666666666666667|  0.5714285714285714|   0.7857142857142857|   0.7333333333333333|                 1.0|   0.6428571428571429|                  0.8|  0.2857142857142857|   0.7857142857142857|                  0.8|                            0|                             1|                             0|                             4|          1925|           8|          1|          18|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-09|  1.0|  0.0|   0.0|   0.0|   0.0|   1.0|   0.0|   0.0|   1.0| 1.1428571428571428|  0.8571428571428571|  0.8666666666666667|  0.5714285714285714|   0.7857142857142857|   0.7333333333333333|                 1.0|   0.5714285714285714|                  0.8| 0.14285714285714285|   0.7857142857142857|                  0.8|                            1|                             2|                             1|                             5|          1926|           9|          2|          19|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-10|  0.0|  0.0|   2.0|   0.0|   0.0|   0.0|   0.0|   0.0|   0.0| 0.8571428571428571|  0.8571428571428571|                 0.7|  0.8571428571428571|   0.9285714285714286|                  0.8|                 1.0|   0.5714285714285714|                  0.6| 0.14285714285714285|   0.7857142857142857|   0.6666666666666666|                            2|                             0|                             2|                             6|          1927|          10|          3|          19|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-11|  0.0|  0.0|   0.0|   2.0|   0.0|   1.0|   0.0|   0.0|   1.0| 0.8571428571428571|  0.7142857142857143|  0.6666666666666666|  0.5714285714285714|   0.9285714285714286|                  0.8|  1.2857142857142858|   0.6428571428571429|   0.6666666666666666|                 0.0|   0.7857142857142857|   0.6333333333333333|                            3|                             1|                             0|                             7|          1928|          11|          4|          19|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-12|  1.0|  0.0|   1.0|   0.0|   4.0|   0.0|   0.0|   0.0|   1.0| 0.7142857142857143|  0.7142857142857143|  0.6666666666666666|  0.7142857142857143|   0.7142857142857143|   0.8333333333333334|  0.7142857142857143|   0.6428571428571429|   0.6666666666666666|  0.5714285714285714|   1.0714285714285714|   0.7666666666666667|                            4|                             0|                             1|                             0|          1929|          12|          5|          19|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-13|  0.0|  4.0|   0.0|   0.0|   0.0|   0.0|   4.0|   0.0|   0.0| 1.2857142857142858|                 1.0|  0.7666666666666667|  0.7142857142857143|   0.7142857142857143|   0.8333333333333334|  0.7142857142857143|   0.6428571428571429|   0.6666666666666666|  0.5714285714285714|   0.7857142857142857|   0.7333333333333333|                            0|                             1|                             2|                             1|          1930|          13|          6|          19|      2|    5|2016|
|FOODS_1_011_WI_2_...|FOODS_1_011|FOODS_1| FOODS|    WI_2|      WI|2016-05-14|  0.0|  0.0|   1.0|   2.0|   0.0|   0.0|   0.0|   0.0|   0.0| 1.1428571428571428|  0.8571428571428571|  0.7666666666666667|  0.5714285714285714|   0.7857142857142857|   0.8666666666666667|                 1.0|   0.7857142857142857|   0.7333333333333333|  0.5714285714285714|   0.7857142857142857|                  0.7|                            1|                             0|                             0|                             2|          1931|          14|          7|          19|      2|    5|2016|
+--------------------+-----------+-------+------+--------+--------+----------+-----+-----+------+------+------+------+------+------+------+-------------------+--------------------+--------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+--------------------+---------------------+---------------------+-----------------------------+------------------------------+------------------------------+------------------------------+--------------+------------+-----------+------------+-------+-----+----+
only showing top 20 rows

Pandas DataFrame#

forecast_flow.predict(df_test.toPandas(), spark=spark)
group id date prediction
0 CA_2 FOODS_1_179_CA_2_evaluation 2016-04-25 0.481568
1 CA_2 FOODS_1_179_CA_2_evaluation 2016-04-26 0.467245
2 CA_2 FOODS_1_179_CA_2_evaluation 2016-04-27 0.415966
3 CA_2 FOODS_1_179_CA_2_evaluation 2016-04-28 0.407759
4 CA_2 FOODS_1_179_CA_2_evaluation 2016-04-29 0.434399
... ... ... ... ...
26427 TX_2 HOUSEHOLD_2_481_TX_2_evaluation 2016-05-18 0.215980
26428 TX_2 HOUSEHOLD_2_481_TX_2_evaluation 2016-05-19 0.215980
26429 TX_2 HOUSEHOLD_2_481_TX_2_evaluation 2016-05-20 0.222249
26430 TX_2 HOUSEHOLD_2_481_TX_2_evaluation 2016-05-21 0.334569
26431 TX_2 HOUSEHOLD_2_481_TX_2_evaluation 2016-05-22 0.313987

26432 rows × 4 columns

Visualize Predictions#

past_future = (
    df.select("id", "store_id", "date", "sales")
    .join(forecast, on=["id", "date"], how="left")
    .groupBy("store_id", "date")
    .agg(
        F.sum("sales").alias("sales"),
        F.sum("prediction").alias("prediction"),
    )
    .orderBy("store_id", "date")
    .toPandas()
)
pio.renderers.default = "notebook"
fig = px.line(
    past_future,
    x="date",
    y=["sales", "prediction"],
    facet_row_spacing=0.04,
    facet_col="store_id",
    facet_col_wrap=2,
    height=1000,
    width=720,
)
fig.update_layout(
    legend=dict(orientation="h", yanchor="top", y=1.07, xanchor="center", x=0.5),
    margin=dict(l=0, r=10, t=5, b=5),
    legend_title="",
)
fig.update_traces(line=dict(width=1.7))
fig.update_yaxes(matches=None, title="")
fig.update_xaxes(type="date", range=["2015-11-01", "2016-05-22"])

Backtesting#

cv_forecast = forecast_flow.cross_validate(df_train).localCheckpoint()
cv_forecast.show(10)
+-----+--------------------+----------+---+------+-----------+
|group|                  id|      date| cv|target| prediction|
+-----+--------------------+----------+---+------+-----------+
| CA_2|FOODS_1_179_CA_2_...|2016-03-28|  0|   0.0| 0.44766802|
| CA_2|FOODS_1_179_CA_2_...|2016-03-29|  0|   0.0| 0.43386874|
| CA_2|FOODS_1_179_CA_2_...|2016-03-30|  0|   0.0| 0.40635538|
| CA_2|FOODS_1_179_CA_2_...|2016-03-31|  0|   1.0|  0.3618364|
| CA_2|FOODS_1_179_CA_2_...|2016-04-01|  0|   0.0| 0.40051356|
| CA_2|FOODS_1_179_CA_2_...|2016-04-02|  0|   1.0| 0.42851403|
| CA_2|FOODS_1_179_CA_2_...|2016-04-03|  0|   0.0| 0.40656742|
| CA_2|FOODS_1_192_CA_2_...|2016-03-28|  0|   0.0| 0.13468084|
| CA_2|FOODS_1_192_CA_2_...|2016-03-29|  0|   0.0|0.103752814|
| CA_2|FOODS_1_192_CA_2_...|2016-03-30|  0|   2.0|0.103752814|
+-----+--------------------+----------+---+------+-----------+
only showing top 10 rows

Visualize Cross Validation#

cv_forecast = (
    df_train.select("id", "store_id", "date", "sales")
    .join(
        cv_forecast.select("id", "date", "cv", "prediction"),
        on=["id", "date"],
        how="left",
    )
    .groupBy("id", "store_id", "date", "sales")
    .pivot("cv")
    .sum("prediction")
    .groupBy("store_id", "date")
    .agg(
        F.sum("sales").alias("sales"),
        *[F.sum(f"{i}").alias(f"cv_{i}") for i in range(3)],
    )
    .orderBy("store_id", "date")
).toPandas()
pio.renderers.default = "notebook"
fig = px.line(
    cv_forecast,
    x="date",
    y=["sales", *[f"cv_{i}" for i in range(3)]],
    facet_row_spacing=0.04,
    facet_col="store_id",
    facet_col_wrap=2,
    height=1000,
    width=720,
)
fig.update_layout(
    legend=dict(orientation="h", yanchor="top", y=1.07, xanchor="center", x=0.5),
    margin=dict(l=0, r=10, t=5, b=5),
    legend_title="",
)
fig.update_traces(line=dict(width=1.7))
fig.update_yaxes(matches=None, title="")
fig.update_xaxes(type="date", range=["2015-11-01", "2016-04-24"])