Global AI and Data Science

 View Only

Forecasting the Demand for Car Spare Parts Using the Modified Croston Method

By Ji Zhou Liu posted Fri December 07, 2018 02:49 AM

  

Introduction

 

Forecasting demand is important for the business control to balance the costs on storage and lack of supply. Irregular demand can be hard to forecast. Spare parts are demanded irregularly and might cost more if there is a lack of supply. Thus, the forecasting of spare parts demand is important and also challenging due to the irregularity of its demand. For example, the demand for the spare parts appears at random, with many time periods having no demand, but in other time periods, there is some demand. Irregular demands with this kind of characteristics are often called intermittent demands.  

 

The following graph shows an example of an intermittent time series:

sampleData.png

As we can see from the following time series data plot graph, the data is “0” at most points in time and is greater than “0” at only a few points in time. This is consistent with the real data for the parts inventory scenario we mentioned earlier.




sampleDataGrapy.png

Typical forecasting procedure uses simple exponential smoothing to forecast the future demands. However, simple exponential smoothing forecasts are not good when the demand is intermittent. Croston (1972) proposed a model that gives a more explicit representation of the demand pattern by making separate estimates of the demand size and inter-arrival interval of the demands. This greatly increased the accuracy of intermittent demand forecasts and inventory control that is compared with the standard method previously used.

 

This blog provides an example of a real-world use case for using the modified Croston method.

 

User Scenarios

The use case is in the car industry domain, where some (or all) of the parts are spare parts with the intermittent demand series. As a first step of time series analysis, it is always good to check whether the series is intermittent or not to gain a better knowledge of the data under study. The following data provides an example of actual auto parts data with many of the demand series but a moderate number of the historical records.

 

Data file example (carparts_5.csv): The file contains five time series that is supplied by a US car company. The time series, representing the monthly sales for slow moving parts, cover a period of 51 months. The first row is the index serial number) for the auto parts. Each column is the demand series for one of the automobile parts. Thus, there are six columns (five time series plus a date/time column) and 52 rows in the data file.

 

Here's a screen capture of the carparts_5.csv file:

data.png

The following steps explain how to create a model and do the forecasting in the IBM Watson Studio notebook. We used Python 3.5 with Spark2.1 as its kernel. The entire code is located here.

You can also get the entire code for using Scala 2.11 with Spark2.1 as its kernel here.

 

A prerequisite for the modified Croston method is that the data should be non-negative and there should be no missing value within it. A Time Series Data Preparation (TSDP) component is usually run before modeling. TSDP prepares the input (time series) data so that there are no missing values. It is recommended to set all the missing values to be zero or the minimum value of the series. Here is the TSDP part code in the notebook:



from spss.ml.forecasting.timeseriesdatapreparation import TimeSeriesDataPreparation
from spss.ml.forecasting.reversetimeseriesdatapreparation import ReverseTimeSeriesDataPreparation
from spss.ml.common.wrapper import LocalContainerManager
from spss.ml.forecasting.traditional.timeseriesforecastingcroston import TimeSeriesForecastingCroston, TimeSeriesForecastingCrostonModel
from spss.ml.forecasting.params.temporal import Fit, ForecastEs
from spss.ml.forecasting.params.predictor import Predictor, ScorePredictor


cons = LocalContainerManager()

tsdp = TimeSeriesDataPreparation(cons).\
        setMetricFieldList(["21311636","21012606", "21021840", "21048588", "21311636"]).\
        setDateTimeField("Date").\
        setEncodeSeriesID(True).\
        setInputTimeInterval("MONTH").\
        setOutTimeInterval("MONTH").\
        setMissingImputeType("MIN")
        
tsdpOutput = tsdp.transform(model_DF)

Note that here we use the minimum value of the series to replace the missing value. This is configured in the setting that is shown below:

setMissingImputeType("MIN")  

Then, we set the last series (index: 21311636) as the target and build a Croston model for example. With the following settings and only replacing other series of labels, you can also do Croston models for other series. Here we can set the following parameters:



target_list = [["21311636"]]
target_predictor_list = [Predictor(target_list)]
tsmc = TimeSeriesForecastingCroston(cons). \
    setInputContainerKeys([tsdp.uid]). \
    setSameSmoothingParam(False). \
    setPredictionInterval("ORIGINAL"). \
    setObjFunction("MSE"). \
    setCILevel(0.95). \
    setTargetList(target_predictor_list)

tsmc_model = tsmc.fit(tsdpOutput) # build a Croston model

Review the explanation for each of the parameters that were set:

  • “setSameSmoothingParam(false)”: means that we are using the original Croston’s method without any modification or bias correction. The two smoothing parameters α and β are optimized by the algorithm and are allowed to be different from each other.
  • “setObjFunction(MSE)”: means that we are using the mean square error as the cost function when fitting the model.
  • setPredictionInterval(ORIGINAL)”: this setting specifies the prediction intervals. There are two kinds of prediction intervals: Croston’s original prediction interval and Accurate interval by Shenstone and Hyndman. We selected the Croston’s original prediction interval. You can also set it as “BOTH” if you want to get both the original and accurate prediction intervals or set it as “ACCURATE” for accurate prediction intervals.
  • We are using default setting for the other model parameters.


After modeling, you can do the forecasting for the following months by using the produced model as follows:

sp = ScorePredictor()
fitsettings = Fit(outFit=True,outCI=True)
forecast = ForecastEs(outForecast=True,forecastSpan = 10,outCI=True)
tsmc_model.setTargets(sp). \
        setFitSettings(fitsettings). \
        setForecast(forecast). \
        setOutInputData(True). \
       setPredictionInterval("ORIGINAL"). \
        setInputContainerKeys([tsdp.uid])

tsmcForecastDf = tsmc_model.transform(tsdpOutput) # doing forecasting using Croston model

rtsdp = ReverseTimeSeriesDataPreparation(cons).\
                setDeriveFutureIndicatorField(True).\
                setInputContainerKeys([tsdp.uid])

rtsdpDF = rtsdp.transform(tsmcForecastDf)
rtsdpDF.show(70)

User can switch the forecast period by changing the value of the setting “forecastSpan”.

forecast = ForecastEs(outForecast=True,forecastSpan = 10,outCI=True)

The RTSDP appears at the end of the process and always appears in pairs with the TSDP, which returns the data format to the original input format. Therefore, if you use the TSDP in the SPSS Time series-related components, don’t forget to call the RTSDP at the end of your process.

 

By running the code below, we can get the forecast output:

rtsdpDF.show(70)

Here is the screen capture for forecast output:

forcastTable.png

In this table, “Date” and “21311636” are the original fields that are entered by the user. After forecasting, four new fields are produced:

  • “$FutureFlag”: is a flag field that indicates whether it is an original observation time point (“$FutureFlag = 0) or a new predicted time point (“$FutureFlag = 1). There are 10 predicted values for the coming 10 months.
  • “$TS-21311636”: is the predicted value of target field “21311636”.
  • “$TSLCI-21311636”: is the lower bound of the confidence interval. This means that 95% of the actual average demand rate is higher than it.
  • “$TSUCI-21311636”: is the upper bound of the confidence interval. This means that 95% of the actual average demand rate is lower than it.

 

Let’s check the forecasting output plot and see what conclusion we can get from the Croston model. You can save the forecast output to a csv file and open it in Excel to create a chart as below. Use the following settings to create the plot:

  • Use the Date as the abscissa of plot.
  • Select the quantity of demands as the ordinate.
  • Use the black line as the observed time series.
  • Use the pink line for estimated value of the observation time point
  • Use the red line for forecasts values.
  • Use the blue lines for the confidence intervals of the forecasts values.

You can also use the pandas to draw the following image:


#Draw lines
from pyspark.sql.types import *
import matplotlib.pyplot as plt
import pandas as pd

plt.figure(figsize=(20,12))
#read predicted result
rtsdpDF = rtsdpDF.withColumnRenamed("21311636", "y").withColumnRenamed("$FutureFlag", "flag").withColumnRenamed("$TS-21311636", "predict_y").withColumnRenamed("$TSLCI-21311636", "LCI_y").withColumnRenamed("$TSUCI-21311636", "UCI_y")
fitDF =  rtsdpDF.where((rtsdpDF.flag == 0)).withColumnRenamed("21311636", "y").withColumnRenamed("$TS-21311636", "predict_y")
forcastDF = rtsdpDF.where((rtsdpDF.flag == 1)).withColumnRenamed("21311636", "y").withColumnRenamed("$TS-21311636", "predict_y").withColumnRenamed("$TSLCI-21311636", "LCI_y").withColumnRenamed("$TSUCI-21311636", "UCI_y")    

fitDF_pd = fitDF.toPandas()
forcastDF_pd = forcastDF.toPandas()
rtsdpDF_pd = rtsdpDF.toPandas()
                          
x0 = rtsdpDF_pd.Date
y0 = rtsdpDF_pd.y
fit = rtsdpDF_pd.predict_y

x1 = forcastDF_pd.Date
y1 = forcastDF_pd.predict_y
lci = forcastDF_pd.LCI_y
uci = forcastDF_pd.UCI_y

lines = plt.plot(x0, y0, x0, fit, x1, y1, x1, lci, x1, uci, marker='None')

plt.setp(lines[0], color='gray', marker='o')
plt.setp(lines[1], linewidth=2, color='pink')
plt.setp(lines[2], linewidth=3, color='r')
plt.setp(lines[3], linewidth=2, color='b')
plt.setp(lines[4], linewidth=2, color='b')

plt.legend([lines[0], lines[1], lines[2], lines[3], lines[4]], ['Observed demands','fit Observed demands', 'Predicted demands', 'LCI', 'UCI'],loc='upper right',fontsize=8)

plt.xlabel("Period")
plt.ylabel("Demands")
plt.title("Prediction over Time")
plt.show()



plot2.png

The output plot shows the forecasts (red line) values and the confidence intervals for the forecasts (blue lines) values. Specifically, the pink line indicates the estimated value of the observation time point. From this line you can see how well the model fits the raw data. The model forecasts that in the following 10 months, the average demand for this auto part is around one per month, with 95% confidence that the actual average demand rate falls in the range 0 - 2 per month.

 

Locating IBM SPSS CROSTON

API Documentation for Spark and Python

  • You can get the online API documentation for Spark and Python here.

#GlobalAIandDataScience
#GlobalDataScience
0 comments
41 views

Permalink