SPSS Modeler

Primary sales forecasting using PA, SPSS Modeler and python

By Alexander Dvoinev posted Thu July 15, 2021 02:54 PM

  
Hello first time everybody!
What picture do you imagine when you read yet another article about sales forecasting? Something like this, with stable seasonality and trend, with 10 years of history available?

Figure 1 – Passengers dataset

 

Having SPSS Modeler, it’s too easy to make a very accurate forecast for such series, just using “Time Series” node with “Expert modeler” enabled.

This article is different. Look at this diagram. 

Figure 2 – Primary sales

This diagram shows Primary sales amount from a manufacturing company to region distributor. In this article, I will show you how to create a forecast model for such time series, using PA, SPSS and python.

Planning Analytics part

… is very easy.

We have “Sales” cube with dimensions: Product, Store, Week, Measures.

There are 2 measures:

  • “Sales” with actual sales
  • “Sales forecast”, where we will export forecast from SPSS

The server data can be downloaded here: https://github.com/Dvoynev/Blog/tree/main/Iterative%20forecast%20model/PA

SPSS Modeler part

Let’s look closer at figure 2. We can hardly recognize the seasonality of weeks in each year. But the peaks spoil this seasonality.

If we select 2017 year and look closer at this peaks, we will see that there is no any seasonality in them, the distance differs much.

Figure 3 – 2017 year

This peaks are wholesale orders. And we can assume that the wholesale order appears when distributor’s stock is close to irreducible stock balance. So, in addition to obvious “from_date” features (like week number in month, month number and others), we will try “history” features (sales N months ago) and “moving sum” features.

A simple SPSS Modeler stream with predictors described can be downloaded here: https://github.com/Dvoynev/Blog/tree/main/Iterative%20forecast%20model/SPSS

Figure 4 – simple stream

It performs the following tasks:

  1. Load actual sales from PA;
  2. Split it to “train” and “test” partitions (validation start and test length are set in stream parameters);
  3. Perform feature engineering:
    1. Create “date” features: week, month and so on;
    2. Create “history” features: previous weeks’ values and moving sums.
  4. Make forecast with XGBoost;

Let’s look at the resulting forecast and eyebleed:

Figure 5 – Don’t look at it


Accuracy is rather good in test partition, but it degrades dramatically in validation partition. This is because of our “history” features:

Figure 6 – The Cause of forecast degradation

They stop working in validation partition. We need to use previous weeks’ forecasts (instead of actual sales) to count “history” features in validation partition.
To do this, we can create 52 model nuggets for each week in year, like this:

Figure 7 – Don’t look at this either

Stop, no! This looks ugly. We need something more elegant. We need the…

...Python part

We will use:

  • Python to loop through all forecast weeks and make 1-week-forward forecast for every week;
  • Planning Analytics as a buffer to store the forecasts.

Figure 8 - Loop

First we should make a stream to forecast 1 week. The stream can be downloaded here: https://github.com/Dvoynev/Blog/tree/main/Iterative%20forecast%20model/SPSS

We just add 3 improvements to the previous stream:

  1. Before making “history” features: add “tm1 import” to import previous weeks’ forecasts, and replace actual zero sales with forecasts;
  2. Leave only 1 forecast (set in parameter) week before model nugget;
  3. Create a view with all the forecast periods in Planning Analytics. It will be used to loop through weeks.

The resulting stream will look like this:

Figure 9 – Iterative stream

Finally, the python part begins. To be short, the script will:

  • Train XGBoost model;
  • Load forecast periods to list;
  • Loop through periods to forecast each week.
  1. We first define stream variable and get nodes by ID:

There is a very useful button to get all the needed node IDs in SPSS Modeler:

  1. Then we run the table node to list all forecast periods
    The result will be as follows:
  2. “ForecPeriod” parameter is used to calculate partitions. We set first period listed in the table node as the first forecast period, and then train model once.
  3. Finally, loop through all forecast periods, put them to the “ForecPeriod” parameter and export forecasts to tm1.

Do not forget to check “Run this script” On stream execution.

Finally, we have an accurate primary sales forecast:

Figure 10 – final forecast

Notes

  • History node was used for simplicity here. If you have data for several customers, products and others, you should not use “history” node: it will take other product’s/client’s data. You should use several derive nodes with @OFFSET (with check that the client/product is the same in this offset).
  • «@SUM» nodes use «Sales__1» (sales with 1 offset) field because the @SUM function takes current period also to calculate sum. It is called «data leakage» and leads to overfitting.
  • I do not describe time series forecasting theory and practice here. Just an example of how to use PA+SPSS+python for this.
    You need to do a lot of other things to do a good time series forecast, like feature selection, data preparation, use of promotion data, use of external (like weather) data and many many others.

Further reading

This duplicates the functionality of “split” role. But this is the only way to make forecasts with “Time series” node with many splits: when the TS node encounters a time series with poor quality, the whole Time Series node fails, for all the split models.
There is an FRE for this, please vote: https://ibm-data-and-ai.ideas.aha.io/ideas/MDLR-I-349

0 comments
12 views

Permalink