Global Data Science Forum

Automatically analyze your time series in IBM Cloud Pak for Data

By Jing Xu posted 14 days ago

  

Time series data are very common observation results in nearly all kinds of business fields, they sometimes imply valuable patterns, can provide insights for business adjustment and optimization. Time series analysis and forecasting will release the fog of future while based on solid statistical analysis, data mining and machine learning technology.

Time series analysis and forecast, in the general case, involves time series data handling, proper algorithm selection and tuning, result evaluation, and deployment for forecasting, where some knowledge of data analysis and time series handling experience are required to make sure the project can be delivered with agreed quality.

AutoAI experiment in IBM Cloud Pak for Data (IBM CPD) now provides such a graphical tool to handle your time series data in a easy-to-use interactive way. It is a good start of your time series analysis project, and has expandable ability with rich advanced options, even can be extended to python notebook.

In this post, we will talk about time series analysis process in AutoAI experiment of IBM CPD, what we can learn from its graphical analysis progress and experiment result visualization.

Import your time series data into project

To start time series analysis in a CPD project, you can find ‘AutoAI experiment’ when you click blue button ‘Add to project +’



After you create an AutoAI experiment, you will be asked to ‘Add data sources’.



You can select an existing data file from the project, or browse one from your local machine. Here as an example, we use a local CSV file which records daily minimum temperatures of Melbourne, Australia from 1981 to 1990.


Then you can select ‘Min_Temp’ as prediction column, and ‘Date’ as Date/time column. Now you can click ‘Run experiment’ to kickoff your time series analysis! Isn’t it pretty easy for your first time series analysis?

Configure your time series analysis process

Of course, if you want to control some detailed steps in the experiment, ‘Experiment setting’ provides some options for you.

You can change the ‘Optimized metric’ in Tab ‘’General’ under ‘Prediction settings’, you can use Mean absolute error or Root mean square error if you feel more comfortable to evaluate your results.

You can adjust the length of holdout period it will apply to backtesting to validate the pipeline on historical data. In general, one holdout period, the range marked as red in the figure below, will be split from your input data as a separate data to evaluate all possible pipelines for final recommendations. For each backtesting, a testing periods, the range marked as purple in the figure below, will be used to evaluate the pipeline with a part of training data which is mark as green range for each backtesting block. With such method, you can figure out how the recommended pipeline performed in the historical time series data.

Follow up the analysis progress with graphical UI

After you confirm and run experiment, you will jump to a graphical summary dashboard to monitoring the analysis progress.



In experiment level, you will have an experiment summary with two views to monitor the progress of the experiment. In ‘Progress map’ view, you can find a tree structure which shows the whole data analysis process from read data to post-evaluation with backtesting.



And in ‘Relationship map’, you can find how the recommended algorithms built as pipeline with selected feature transformation from inner circle for data to the outer circles for algorithm, pipeline, and feature transformation.

Review the recommended pipeline with insights

You may already notice there is a pipeline leaderboard with some pipelines listed under the Experiment summary view. It is the recommended list according the ‘Optimized metric’ configured in ‘Experiment setting’. Clicking each of them will lead you to its pipeline details.

For ‘Model evaluation’ view, a sequence chart with actual values and predicted value comparison shows how this pipeline performs on the holdout period. The dotted line in green at the right-hand side of the sequence chart shows the forecast values in the future range.

Of course, some evaluation measures are there for your reference.



If you want to check how this pipeline perform on your specified testing period for each backtesting and holdout period, ‘Predictions over time’ provide you a flexible chart to close check them.

Summary

In this article, we went through the key steps of automatically time series analysis in AutoAI experiment of IBM Cloud Pak for Data. Some analysis process related configuration and interactive progress summary can help to understand the logic behind AutoAI. Pipeline details could further deepen the understanding of your time series data and business pattern. Allow AutoAI experiments to be a good assistant in your data analysis business with IBM Cloud Pak for Data.

For more information about AutoAI experiment in IBM Cloud Pak for Data, visit the links below:


#News-DS
#Highlights-home
#Highlights
0 comments
243 views

Permalink