Introduction
The volatility of the stock markets and the currency markets is an important indicator for the stability of the markets. The non-trading day effect, for example, is a well-known reason for increased volatility of the stock price on Mondays after the market closure on the weekends. In addition, the exchange rate volatility has been proved to have impacts on many macroeconomic variables, such as inflation and international trade. All those concerns need requirements for proper modeling and forecasting of the exchange rate volatility for better understandings to the currency exchange markets.
From the previous examples, we can see that the financial time series suffer from heteroscedasticity--the phenomena of volatility clustering. In other words, some time periods are more volatile than others due to the turbulence and unexpected events. Therefore, in this business area, analysts should not only focus on the future trends, but we should also pay attention to the changes in volatility.
The generalized autoregressive conditional heteroscedasticity (GARCH) models provide a tool to estimate the volatility of financial time series. It captures the pattern of variance by a function of the past variances and the past of residuals from a mean process. GARCH models; then, use the function to forecast the variance in future, where the mean process might be a constant or an autoregressive moving average (ARMA) model. So the GARCH model is actually an ARMA+GARCH model, where the ARMA model is used for forecasting the mean of time series while the GARCH model is used for variance. For simplicity, when we refer to the GARCH model, it includes ARMA+ GARCH in the rest of the blog.
Typical Data for the GARCH Model
An example of a typical time series data with non-constant variances is the: S&P500 index daily closing value log return from 1987-03-10 to 2009-01-30 from Yahoo finance and 5523 observations in total. The log(pi/pi-1) return at time is defined as, where pi and pi-1 are the stock prices at time i and i-1.
Figure 1 shows the plots of the S&P500 index log return time series. The daily log return data center around zero by definition. As we can see from Figure 1, the data is showing different magnitude of fluctuation during the different time periods.
Figure 1: Daily return for S&P500 index
Figure 2 shows the realized volatility (historical variances) calculated from the S&P500 index time series in Figure 1. The historical variances at each time point are calculated as the sample variance of the previous 100 days of the current day. Thus, the peaks in Figure 2 correspond to the relatively more fluctuated periods in Figure 1, and the low values in Figure 2 correspond to the periods with small variant in Figure 1. The goal of the GARCH models is to catch and model the time-varying variances in the data.
Figure 2: Historical variance (realized volatility) for S&P500 daily return data
To determine whether an observed time series has a time-varying variance, a test for GARCH effect is necessary. Once it’s decided that the time series has the GARCH effect, a GARCH model is fit to the data. This step is necessary in the modeling of the GARCH model in SPSS and it will be evoked automatically. You do not need to configure it.
User Scenarios
The stock markets and the currency exchange markets are the two most applied fields of the GARCH models for the volatility forecasting. This blog presents two scenarios in the real world by using the GARCH models:
- Scenario 1: If users are familiar with the GARCH, they can specify the settings to build his own model
- Scenario 2: If users aren't familiar with the GARCH, a “auto GARCH” model is provided for them to build a model automatically
Scenario 1:
The first scenario is in the domain of currency exchange rates. The market for currency exchange rates is affected not only by the forces of supply and demand but also by the policy of nations. In managed floating systems, the authorities sometimes take actions if, for example, the market exhibits a period of high volatility. This scenario draws the examples and the insights about the volatility of the currency exchange rates through the development of the GARCH models.
In this scenario, the data file (dem2gbp.csv) contains one time series data with 1,974 observations of the daily exchange rate log returns and another column for the date information. There are no missing values in the data. The following Figure 1 is a snapshot for part of the data file.
Figure 1: Partial information of dem2gbp.csv data file
The DEMGBP field contains the daily observations of the Deutschmark versus the British Pound foreign exchange rate log-returns. This data set has been promoted as an informal benchmark for the GARCH time-series software validation. An exchange rate pair is the quotation of the value of one currency unit against another. For instance, a EUR/USD transaction that is traded at 1.25 means that € 1.00 is bought for $1.25. The data is constructed from the corresponding daily Deutschmark/USD dollar rates and GBP/USD rates that are recorded by the International Monetary Fund in International Financial Statistics. The rates cover the period from January 3, 1984 through December 31, 1991 for a total of 1,974 observations.
The sample consists of 456 non-trading periods over an eight-year-long period of daily Deutschmark/GBP exchange rates, which corresponds to roughly 23% of the observations. The non-trading periods mean the weekends and holidays when the currency exchange market is closed. As previously stated, the non-trading periods is an important source of volatility in the exchange rates. The Mondays and opening days after the non-trading periods often show a sudden increase in the volatility of the currency exchange rates. The graph below shows the observed Deutschemark and British pound exchange rates (log values). As we can see from the graph, the Deutschemark and British pound exchange rates exhibit volatility clustering as expected.
Based on this data, the decision maker and the analyst would like to obtain insights on questions like the following:
- Does my exchange rate time series have time-varying variances in it?
- What will the exchange rate likely to be in the next few days?
- Will the exchange rate change get more or less fluctuated in the following days?
- Are my inferences about the exchange rate and its volatility reasonable?
In a solution based on GARCH modeling, it provides a solution to this clear business need by fitting an ARMA type time series model on both the currency exchange rate series and the observed volatility series simultaneously. It provides a way to investigate the auto-correlation among the historical variances of the observed time series data points, some generic functions to answer questions such as those listed above might be provided, namely:
- GARCH modeling
- Model forecasting
- Model evaluation
Use the following steps to create the model and forecasting to answer the previous questions in the Watson studio notebook: We use Scala 2.11 with Spark2.1 as its’ kernel.
You can also get the full code here
GARCH modeling
1. import com.ibm.spss.ml.forecasting.params.{Fit, Predictor, ForecastEs, ScorePredictor}
2. import com.ibm.spss.ml.common.{Container, ContainerStateManager, LocalContainerManager}
3. import com.ibm.spss.ml.forecasting.{ ReverseTimeSeriesDataPreparation, TimeSeriesDataPreparation }
4. import com.ibm.spss.ml.forecasting.traditional.{TimeSeriesForecastingGarch,GarchTargetOrderList,ConvergenceCriterion}
5.
6.
7.
8. val tsdp = TimeSeriesDataPreparation().
9. setMetricFieldList(Array("DEM2GBP")).
10. setDateTimeField("Date").
11. setEncodeSeriesID(true).
12. setInputTimeInterval("DAY").
13. setOutTimeInterval("DAY").
14. setQualityScoreThreshold(0.0).
15. setConstSeriesThreshold(0.0).
16. setCollectCategories(true)
17.
18. val tsdp_df = tsdp.transform(df)
19.
20. val targList: List[List[String]] = List(List("DEM2GBP"))
21. val targetOrderList = List(GarchTargetOrderList(targList, 0, 0, 1, 1))
22.
23. val cm = LocalContainerManager()
24. cm.exportContainers("Container", tsdp.containers)
25. val lcm :(String, ContainerStateManager) =
26. ("Container", cm)
27.
28. val garch = TimeSeriesForecastingGarch(lcm._2).
29. setTargetOrderList(targetOrderList).
30. setInputContainerKeys(List(lcm._1)).
31. setLogLHConvergCriterion(ConvergenceCriterion(false, true, 1e-6)).
32. setHessianConvergCriterion(ConvergenceCriterion(false, true, 1e-4))
33.
34. val garchModel = garch.fit(tsdp_df)
35.
The GARCH is typically run after the Time Series Data Preparation (TSDP) component. TSDP prepares appropriate data for the GARCH model, such as missing value processing and replacement, datetime field conversion, and more. So assume that the (time series) data that is provided for the GARCH component is "ready" with no missing values. It is recommended to use the linear interpolation method for the missing imputation. Then, Reverse Time Series Data Preparation (RTSDP) component will convert the output (binary) from the GARCH or other SPSS time series model to readable format as raw data after forecasting.
As mentioned before, the GARCH model usually contains the ARMA part and the GARCH part. The model usually has the following format:
Where:
P is the number of past observations to consider when building an ARMA model
Q is the number of past residual to consider when building an ARMA model
p is the number of previous variance
q is the number of previous residual
The ARMA part will always reflect the mean value of the data. If the trend of mean value is constant, we can set the ARMA part to for example, ARMA(0,0). The GARCH part reflects the trend of mean volatility. From the data graph, we can see that the average value of the data is around 0, so we might set a constant plus a white noise model as the mean part. You can model changes in the mean volatility and try to consider only one previous residual and variance, such as GARCH(1,1). The generated model can be evaluated through the subsequent model evaluation section. If the model is not as perfect as expected, you can still return to the modeling step to modify the model by adjusting the P, Q, p, q values. Thus we set “DEM2GBP” field as the target, and you can build an ARMA(0,0)+GARCH(1,1) model. Use the following settings to configure the target:
1. val targetOrderList = List(GarchTargetOrderList(targList, 0, 0, 1, 1))
Use the default setting for the other model parameters to specify that the data does not have ARMA effect but it has only the GARCH effect. After modeling, you can get a PMML and a StatXML outputs that include all the model information.
1. import java.io.File
2. val path=System.getProperty("user.dir")
3. println(path)
4. val garchPMML = new File(path.toString()+"/garchPMML")
5. garchPMML.mkdir()
6.
7. garchModel.writePMML(path.toString()+"/garchPMML")
8. val pmml= sc.wholeTextFiles(path.toString()+"/garchPMML"+"/0.xml")
9.
10. pmml.collect.foreach(t=>println(t._2))
11.
12.
13.
14. val garchStatXML = new File(path.toString()+"/garchStatXML")
15. garchStatXML.mkdir()
16.
17. garchModel.writeStatXML(path.toString()+"/garchStatXML")
18. val statXML= sc.wholeTextFiles(path.toString()+"/garchStatXML"+"/0.xml")
19.
20. statXML.collect.foreach(t=>println(t._2))
When checking the content in PMML, we can see a model with No ARMA Part but with a GARCH part is built like below:
From this PMML, we can get the answer for our question 1 that this exchange rate time series has time-varying variances in it.
In the StatXML, we can see the estimated model parameters in the < ParameterEstimates> part that is shown below:
There are four model parameters: c(“0 Constant"), α0(“0 GARCH Constant”), α1(“0 GARCH AR Lag 1”), β1(“0 GARCH MA Lag 1”). The estimated values of the model are recorded, and we have also tested some of these estimates and listed their “confidence.” From this confidence value, we can see that this model is reasonable and reflected the data trend perfectly because the significant value of both α1(“0 GARCH AR Lag 1”) and β1(“0 GARCH MA Lag 1”) are small and c value is very close to 0.
Model Forecasting
The following questions require forecasting:
- What will the exchange rate likely to be in the next few days?
- Will the exchange rate change get more or less fluctuated in the following days?
You check the mean and the volatility forecasting value to determine what the exchange rate will be in the next few days and whether the exchange rate change will fluctuate.
1. val sp = ScorePredictor()
2. val fit = Fit(outFit = true,
3. outCI = true,
4. outResidual = true)
5. val forecast = ForecastEs(outForecast = true,
6. forecastSpan = 10,
7. outCI = true)
8.
9. val garchForecastDf = garchModel.setTargets(sp).
10. setFitSettings(fit).
11. setForecast(forecast).
12. setInputContainerKeys(List(lcm._1)).
13. transform(tsdp_df)
14.
15. val rtsdp = ReverseTimeSeriesDataPreparation(lcm._2).
16. setDeriveFutureIndicatorField(true).
17. setInputContainerKeys(Seq(lcm._1, "ScoreOutput"))
18.
19. val rtsdpDF = rtsdp.transform(garchForecastDf)
If you're planning to do forecasting for the next 10 time point, set “forecastSpan = 10".
Here are the outcomes. The confidence intervals of the mean forecasts are calculated with the 10-step ahead forecasts of sigma (standard errors). Below is the output for the 10 step ahead the mean and the sigma forecasts. The volatility is simply the square of sigma. Thus, the forecasts of the volatility can be given as:
Where:
- The “$TS-DEM2GBP” field is the mean forecasts.
- “$TSVariance-DEM2GBP” field is the forecasts of volatility.
Where the “$TS-DEM2GBP” field is the mean forecasts and “$TSVariance-DEM2GBP” field is the forecasts of volatility. Because we are using an ARMA(0,0)+GARCH(1,1) model, both of the fitting and the forecasts of the mean part is going be the constant c(“0 Constant"). Below is a plot of the last part of in-sample observations with the 10 step ahead out-of-sample mean forecasts and the confidence intervals.
Meanwhile, in the output table “$TSLCI-DEM2GBP” field is the lower confidence interval of “$TS-DEM2GBP.” “$TSUCI-DEM2GBP” field is the Upper Confidence Interval of “$TS-DEM2GBP”, which describes the upper and the lower bound of the predated value. The following plot shows the observed original time series (blue line) with the predicted standard deviations (sigma, red lines):
Model Evaluation
To access the goodness of model fitting of the GARCH(1,1) model, two Ljung-Box tests are carried out automatically. The tests are based on the standardized residuals from the fitted model. The standardized residuals are calculated as the difference between the observed series and the predicted mean divided by the in-sample sigma values. The standardized residuals should be approximately following a Standard normal distribution if the GARCH model is appropriate for the data.
The first Ljung-Box test is on the standardized residuals. Getting a p-value of greater than 0.05 for this test means that there is no evidence for the remaining correlations among the residuals. Thus, to prove that the fitted GARCH model is proper for the data, we need to have a p-value greater than 0.05 for this test.
The second Ljung-box test is on the squared standardized residuals. The goal of this test is to detect any remaining GARCH effect in the residuals. Getting a p-value of greater than 0.05 for this test means that there is no evidence for remaining GARCH effects in the standardized residuals. In other words, the fitted GARCH model has adequately accounted for the GARCH effects in the original observed time series. Thus, we also need to have a p-value of greater than 0.05 for this test to prove the validity of the fitted GARCH model.
The results for the two tests from the fitted GARCH(1,1) model is as below:
Since we are getting p-values of both tests to be greater than 0.05, we can conclude that the fitted GARCH(1,1) model is reasonable for the data.
So the answer to the last question:
- Are my inferences about the exchange rate and its volatility reasonable?
is that inferences about the exchange rate and its volatility are reasonable and it's suitable for you to use this model to do the forecasting for this business data from now on.
Summary:
Based on the analysis of the above aspects, we can think that this model is reasonable and completely reflects the data trend. The model is highly reliable and can be used as a typical model for such time-varying volatility time series data modeling.
Scenario 2 :
The second scenario is in the domain of the stock market. We will work with one of the most important indexes of the US stock market: The S&P500 index. It is the one we introduced in the “Data example” part. This scenario demonstrates the process of automatically modeling with no user input about the model order information.
The index value is updated every 15 seconds during trading sessions. The “SP500 index” data contains the log return of the daily closing value of the SP500 index from 1987-03-10 to 2009-01-30. The data is publicly available from yahoo finance. The Daily SP500 Index log Returns data contains 5523 observations in total. Below are the plots of the S&P500 index log return time series and the realized volatility (historical variances) of the S&P500 index log returns.
As we can see from these graphs, the daily log return data centers around zero by definition and the data is showing different magnitude of fluctuation during the different time periods. The historical variances at each time point are calculated as the sample variance of the previous 100 days of the current day. Thus, the peaks in the plot on the right correspond to the relatively more fluctuated periods in the plot on the left.
The following Figure 2 shows a snapshot of the data file (sp500ret.csv):
Figure 2: Partial information of sp500ret.csv data file
This time that you need to specify that you want to build a GARCH model, but you don't need to enter any more information about the model. Similar to the first scenario, you need to switch a new data set and refine the modeling part setting shown as below:
1. import com.ibm.spss.ml.forecasting.params.{Fit, Predictor, ForecastEs, ScorePredictor}
2. import com.ibm.spss.ml.common.{Container, ContainerStateManager, LocalContainerManager}
3. import com.ibm.spss.ml.forecasting.{ ReverseTimeSeriesDataPreparation, TimeSeriesDataPreparation }
4. import com.ibm.spss.ml.forecasting.traditional.{TimeSeriesForecastingGarch,GarchTargetOrderList,ConvergenceCriterion}
5.
6.
7.
8. val tsdp = TimeSeriesDataPreparation().
9. setMetricFieldList(Array("SP500RET")).
10. setDateTimeField("Date").
11. setEncodeSeriesID(true).
12. setInputTimeInterval("DAY").
13. setOutTimeInterval("DAY").
14. setQualityScoreThreshold(0.0).
15. setConstSeriesThreshold(0.0).
16. setCollectCategories(true)
17.
18. val tsdp_df = tsdp.transform(df)
19.
20. val targList: List[List[String]] = List(List("SP500RET"))
21. val targetOrderList = List(GarchTargetOrderList(targList))
22.
23. val cm = LocalContainerManager()
24. cm.exportContainers("Container", tsdp.containers)
25. val lcm :(String, ContainerStateManager) =
26. ("Container", cm)
27.
28. val garch = TimeSeriesForecastingGarch(lcm._2).
29. setTargetOrderList(targetOrderList).
30. setInputContainerKeys(List(lcm._1)).
31. setLogLHConvergCriterion(ConvergenceCriterion(false, true, 1e-6)).
32. setHessianConvergCriterion(ConvergenceCriterion(false, true, 1e-4)).
33. setAutoOrderSelection(true).setOutIterHistory(true)
34.
35. val garchModel = garch.fit(tsdp_df)
Pay attention to the following part changes. In this setting, you don't need to specify the P,Q,p,q for both the ARMA part and the GARCH part. The GARCH calculates a suitable one for the model automatically.
1. val targetOrderList = List(GarchTargetOrderList(targList))
2. setAutoOrderSelection(true).setOutIterHistory(true)
This setting opens the automatic modeling process.
Then, the automatic model building process will start. You can also get the full code here
From the output statXML, we find that there are two iterations in the modeling process:
Then, finally the ARMA(0,2)+GARCH(1,1) model is significant as given in the PMML:
The rest part after modeling like “output the orders and model fitting results of the final model for the use of model diagnostic” and “post estimation” are the same as the previous scenario.
Thus, the previous model is taken as the final GARCH model.
Note that in IBM SPSS, the process of automatically determining the orders of the GARCH model is run in the background and only the final model will be produced unless you want to track the model selection process.
Locating IBM SPSS GARCH
- All the SPSS algorithms are available in IBM Watson Studio. User should follow the Watson Studio access policy.
-
#GlobalAIandDataScience#GlobalDataScience