Cloud Platform as a Service

 View Only

Observer Pattern: A native approach to consume Kafka messages in IBM Cloud Code Engine.

By Enrique Encalada posted 7 days ago

  

Written by

Mahesh Kumawat (mahesh.kumawat1@ibm.com) - Software Developer - IBM Cloud Code Engine

Enrico Regge (reggeenr@de.ibm.com) - Architect - IBM Cloud Code Engine

Karan Kumar (karan.kumar6@ibm.com) - Software Developer - IBM Cloud Code Engine

Enrique Encalada (encalada@de.ibm.com) - Software Engineer - IBM Cloud Code Engine

Part 1: An efficient serverless approach to consume Kafka messages in IBM Cloud Code Engine

In this era of continuous data flow, we encounter numerous events originating from various sources at different times. Managing these events, particularly when their arrival times are unpredictable, can be quite challenging. In this blog, we will introduce a different pattern that addresses this issue efficiently. We present a component called Observer that will help you consume Kafka messages within an event stream instance using a serverless approach thereby being cost effective and ideal for solving unpredictable event handling.

Caveats Of Using Code Engine’s Kafka Subscription

Assume you have an Event Streams instance in IBM Cloud and a topic within that instance. Imagine you have Code Engine applications or jobs that need to react to messages/events generated by other components (event producers), which send data to your Event Streams topics.

IBM Cloud Code Engine allows you to create an Event Subscription that connects to your Event Streams instance. Your final consumer or destination, which is your Code Engine app or job, can receive events of interest by subscribing to one or more topics. Event information is received as POST HTTP requests for applications and as environment variables for jobs.

In this architecture, the consumer managed by Code Engine acts as a proxy, watching for any new messages that arrive in a Kafka instance. When you create a subscription for a set of topics, your app or job receives a separate event for each new message that appears in one of the topics. The consumer is managed by Code Engine, while the topic and final consumer are customer managed.

So, everything seems fine, right? However, there are some caveats:

  • No Delivery Guarantees: Code Engine doesn't guarantee that the events will reach your final consumer. While Code Engine tries to deliver a message through retries, it will give up eventually in case the final consumer isn’t able to receive the message (e.g. due to overloads) after some time.

  • Consumer Code Adjustments: Your final consumer code needs adjustments because messages cannot be consumed by Kafka native client libraries. As messages are delivered as HTTP requests or environment variables.

  • Scalability: Limitations for jobs under high load of messages.

  • Cost Optimization: 1 event per job leads to excessive cost.

  • No Batch Processing: One job per event means no batch processing is possible.

Therefore, we need an approach that allows us to efficiently consume Kafka events using Kafka native client libraries while maintaining the same serverless fashion. Well don’t worry, this is where the Observer Pattern comes to the rescue.

Saviour in Action: The Observer !!

There are multiple ways to resolve the above caveats. In Code Engine, engineers recently introduced the concept of the Observer, a battle-tested concept that can address previous limitations.

The above caveats can be resolved with the Observer Pattern. The Observer pattern consists of two main parts:

  1. Detecting Event Arrival: It detects when an event arrives in a topic without actually consuming any event.

  2. Triggering Jobs: Trigger/run specific jobs after detecting that an event has arrived into the topic.

Observer → observes. But what exactly does it observe? Before we dive into the pattern, let's explore some Kafka terminologies first.

In a standard Kafka instance, you'll find multiple brokers. These brokers store and manage data. These brokers can be organized into topics. A topic represents a stream of records. Topics, in turn, are divided into partitions. Each topic may have any number of partitions, with each partition maintaining an ordered sequence of records identified by unique offsets. Now remember these offsets, they're very crucial for the observer pattern to work. Additionally, we will have consumer groups, consisting of one or more consumers subscribed to specific topics.

Now, let’s get back to our pattern. Our basic goal is to know when an event has arrived and, after the event has arrived, to trigger a particular job. To achieve this, we introduce a component called the observer between your Event Streams instance and your consumers. It's a daemon job in Code Engine that runs indefinitely. So, this observer should know when the event arrives and what job to run. But how does the Observer do this? How does it know when an event has arrived? Remember the offsets from Kafka terminologies previously mentioned? Well, that is the secret sauce for this observer. Here's what it does:

  1. It fetches the topic offsets and consumer group offsets from the Kafka brokers. As you know, topics have partitions, and each partition will have its own offset value. The observer compares both the topic offsets and consumer group offsets partition-wise.

  2. If any partition's offset in a topic is greater than the consumer group's offset for that partition, it signals that a message has arrived at that particular partition. With this insight, we can ascertain that an event has arrived.

  3. We will associate a set of consumer groups with a set of topics, which we'll call "kafkadata." Using this "kafkadata," the observer can determine which job or jobs to execute when a message arrives.

That's the essence of the observer pattern—it observes topics, compares offsets with consumer group offsets, and triggers specific job runs when new messages arrive.

Did you notice some good things here?
- This is still a serverless approach, meaning that your consumer does not have to run 24/7 to consume those events. It will only run when an event arrives.
- Your consumer can be written in any native Kafka client, without needing to modify your code.
- You can scale your jobs partition wise and a single job can handle multiple events.

Observer Unleashed in Code Engine

Assume you have an IBM Cloud Event Streams instance with two topics. For this example, we'll use e-commerce events. Let's say you have a payments topic for payment events and a shipping topic for shipment events. The payments topic has 4 partitions, and the shipping topic has 3 partitions. You want to run a payment-job when messages arrive in the payments topic, and similarly, you want to run a shipping-job when messages arrive in the shipping topic.

The maximum instances of a job you can run are based on the number of partitions that a topic has. In this case, for the payments topic, the maximum instances of payment-job you can run are 4 and for shipping-job it would be 3. Each instance will consume messages partition-wise. So, if new messages arrive in, say, 2 partitions, then only 2 instances of that job will run. This ensures efficient resource utilization without idle pods waiting to consume messages, as the existing pods are already handling events from their respective partitions. If no message has arrived in a partition, the corresponding pod remains idle, and the observer manages this situation by not creating the idle pods itself.

Here is the architecture below that depicts the workflow of Observer in Code Engine:

The jobs will not be triggered unless an event arrives. Here, the observer acts as a wake-up mechanism, activating the corresponding consumers when new messages arrive. This embodies the serverless approach to consuming events, where only the Observer Job runs continuously, while other jobs are awaken only when needed. You can configure as manyjobs/consumers as required for your project.

These consumers can also be daemon jobs. Once the observer triggers the consumer job, it will consume the events of that particular partition of that topic and will have a countdown in the background, such as 60 seconds. If no new events arrive within those 60 seconds, the consumer job stops, releases its CPU and memory resources and therefore does not incur any further costs.

As a single consumer can observe multiple topics and wake-up multiple consumers, this observer pattern, becomes more cost efficient, the more consumer are being handled. In a second blog post, we will dive deeper in the actual cost optimizations we achieve throughout this observer pattern.

Try it out!!

To learn more about the Observer Pattern and how to use it in an efficient way, we recommend these resources:

Observer Pattern Sample code: There is a run script already present in there that will help you setup all components required for this example use case

- Getting Started with IBM Cloud Code Engine

Next steps

In Part 2 of this blog post series, we will get to know how leveraging observer pattern helps you in cost optimization and provide an overview of our benchmark results compared to the standard Code Engine Event Subscription.

If you have feedback, suggestions, or questions about this post, please reach out to us on LinkedIn (Mahesh Kumawat, Enrico Regge or Enrique Encalada)

0 comments
12 views

Permalink