IBM Event Streams and IBM Event Automation

IBM Event Streams and IBM Event Automation

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Using time series models with IBM Event Automation

By Dale Lane posted Mon July 21, 2025 09:12 PM

  

Intro

graphic of an e-bike hire park

Imagine you run a city e-bike hire scheme.

Let’s say that you’ve instrumented your bikes so you can track their location and battery level.

When a bike is on the move, it emits periodic updates to a Kafka topic, and you use these events for a range of maintenance, logistics, and operations reasons.

You also have other Kafka topics, such as a stream of events with weather sensor readings covering the area of your bike scheme.

Do you know how to use predictive models to forecast the likely demand for bikes in the next few hours?

Could you compare these forecasts with the actual usage that follows, and use this to identify unusual demand?

Time series models

A time series is how a machine learning or data scientist would describe a dataset that consists of data values, ordered sequentially over time, and labelled with timestamps.

A time series model is a specific type of machine learning model that can analyze this type of sequential time series data. These models are used to predict future values and to identify anomalies.

For those of us used to working with Kafka topics, the machine learning definition of a “time series” sounds exactly like our definition of a Kafka topic. Kafka topics are a sequential ordered set of data values, each labelled with timestamps.

This means that time series models are a powerful tool to use with Kafka events, allowing us to:

  • recognise patterns in the events on our topics
  • identify anomalies that need our attention
  • predict future events, giving us forecasts that we can use to prepare
Forecasting can be a powerful tool when applied correctly” IBM Technical Strategist Joshua Noble explains.
The ability to predict demand, revenue, costs, device failure or market changes are all powerful assets for a business at any size

Choosing a model

There are several types of time series model that you can choose from. The right model type for your Kafka topics will depend on the type of data you have.

For example, a univariate topic might have independent data with time as the only variable impacting the data in the events. With a multivariate topic, the data will be impacted by other independent variables (potentially data available on other Kafka topics) in addition to time.

An example of a multivariate topic could be the rental bike usage topic I described above, where bike usage could be impacted not just by time but also by the current weather conditions as captured on a second Kafka topic.

The type of data that you have, and the use case that you have in mind, will help you to identify a suitable model type to use.

Some of the most commonly used time series model types include:

  • ARIMA (Autoregressive integrated moving average)
  • Exponential smoothing
  • GARCH (Generalized autoregressive conditional heteroscedasticity)
  • LSTM (Long short-term memory)

You can also choose between using an existing model as-is, or fine-tuning a model to enhance the model performance still further for your own specific tasks.

Time series models with Flink SQL

The benefits of time series models are magnified when you apply them to live streams of data. Apache Flink is an ideal processing framework to enable this.

Flink offers a powerful ability to collate and organise time series data from Kafka topic sources, and prepare it for submitting to a time series model for inferencing.

It is also ideal for processing the output from the time series model in a way that enables you to respond in the moment.

Seeing time series models in action with Kafka

To bring this idea to life, I prepared a simple demo, using:

  • IBM Event Streams
    • hosting Kafka topics
    • running connectors that provide live streams of time series data
  • IBM Granite
    • creating a fine-tuned time series forecasting model
  • IBM Event Processing
    • authoring and hosting a Flink SQL job
  • Apache Flink
    • collating and aggregating raw events to use with a time series model
    • invoking the time series model
    • outputting the model forecasts to trigger further processing

simplified architectural diagram

Demo scenario : time series events

The demo is themed around the bike hire scheme I described at the start. The goal is to use a time series model to continually refresh a forecast of the number of bike rental journeys that are expected in the coming hours.

The demo uses two source topics:

BIKESHARING.LOCATION

Each event gives the current location and current battery level of a rental bike.

With hundreds of bikes in the program, at peak times there will be many events received on this topic per second, as each bike independently and periodically reports its location.


BIKESHARING.WEATHER

Each event contains an hourly forecast for the area where the bike hire scheme is running.

Event payloads include details such as the temperature, humidity, and wind speed for the coming hour.

Demo scenario : using time series models with Flink

I used Event Processing to create a Flink job that:

  • pre-processed the events
    • Updating formats and data structures to match the requirements of the machine learning model
    • This is a good example of how the wide range of built-in data processing functions in Flink play an important part of any data project - the events on your Kafka topics are unlikely to be a perfect match for the requirements of your machine learning model.
  • enriched the events
    • Turning the raw date in each event into variables that identify whether the date is a weekday/weekend and whether it is a known public holiday - all of which could impact the usage of rental bikes
    • This is a good example of how Flink enables more effective solutions than are possible if you simply feed raw input events directly into a machine learning model, as these enriched event streams enable far more accurate and effective inferencing.
  • collated the events
    • Turning the raw stream of location events (with each bike updating their location every few metres) into an aggregate that identifies how many journeys have been taken in the last hour, grouped by the type of user (casual vs registered).
    • This is a good example of how Flink’s powerful time window capabilities can turn a stream of events (that was far too granular for my needs) into a relevant and consumable summary.
  • invoked the time series model
    • Submitting the pre-processed, enriched, collated events to the time series model for inferencing using an HTTP call.
    • This is a good example of how API Enrichment is an effective way to include AI models as part of an event processing solution, without introducing the complexities of hosting and scaling a model as part of the Flink SQL job.
      watsonx.ai can host standard, or custom fine-tuned, foundation models which is easier to manage, scale, and maintain than embedding it directly within Flink. For jobs such as this demo (where hourly aggregates are submitted to the model for forecasting the coming hours) this is an ideal fit.
  • processed the model forecasts
    • The model forecasts for the next few hours are output to a Kafka topic, where they can be used to trigger further processing or automations.

screenshot of the event processing flow

Demo scenario : using time series model predictions

The output is hosted on a Kafka topic called BIKESHARING.PREDICTIONS that contains the forecasts for the number of expected (by each user type) expected in the coming hours.

screenshot of the output topic

These events could then be used for:

  • anomaly detection
    • by comparing the actual usage each hour with the forecast for that hour, any significant deviations could indicate problems that need attention - for example, if the number of bike journeys was far lower than forecast, perhaps this indicates a problem with the bikes?
  • resource optimisation
    • the forecast could identify where bikes are most needed, and enable an operations team to relocate bikes to the optimal locations in town to maximise usage
  • tailored promotions
    • if the forecast indicates there will be low number of bike journeys resulting in many unused bikes, this could be mitigated by sending out promotional offers to registered users with discounts or other incentives
  • pattern recognition
    • trends in bike hire usage could also be used to inform long-term planning

Learn more

I used a Granite Time Series model for this demo. These are a set of ultra-compact, open-source models optimized for a variety of time-series tasks.

Model cards for each of these models are available on Hugging Face, together with papers that dig into the research that underpins these models.

The source and config for recreating my demo is all available on Github so you can set this up for yourself to see how easy it is to include a time series model in your next event processing project.

If you would like a more in-depth guide to learn how to fine-tune and evaluate a time series model, Joshua Noble’s tutorial is a good place to start.

0 comments
49 views

Permalink