Red Hat OpenShift

Kubernetes-based container platform that provides a trusted environment to run enterprise workloads. It extends the Kubernetes platform with built-in software to enhance app lifecycle development, operations, and security

View Only

Back to Blog List

How to Scale Openshift Applications based on Apache Kafka Using KEDA on IBM Z & LinuxONE

By Parameshwaran Krishnasamy posted 11 hours ago

Authors : Parameshwaran Krishnasamy (Parameshwaran.K@ibm.com), Santosh Vasisht (Santosh.Vasisht@ibm.com), Arya Jena (arya.jena4@ibm.com), Basavaraju G (basavarg@in.ibm.com), Rishika Kedia (rishika.kedia@in.ibm.com)

Enterprises operating event-driven, message-centric platforms—especially on OpenShift/Kubernetes—continually struggle to efficiently scale Kafka consumers. Traditional resource-based autoscaling (such as CPU/memory triggers) fails to adapt to dynamic workloads, resulting in:

Under‑provisioning: Consumers fall behind during traffic surges, increasing lag and impacting downstream processing or user experiences.

Over‑provisioning: Spending unnecessary resources (and cost) during periods of low or no Kafka traffic.

Real World Challenge

Consider an e‑commerce microservices architecture deployed on Red Hat OpenShift (or Kubernetes), where:

Multiple services ingest orders, logs, or notifications via Kafka.

Traffic patterns are unpredictable—daily peaks during promotions, campaigns, or external triggers.

If Kafka consumer lag increases, services may cause delays.

Root Cause Analysis

The underlying root issues include:

Mismatch of scaling triggers: Traditional Horizontal Pod Autoscaler (HPA) responds to resource utilization rather than Kafka-specific events.

Consumer lag accumulation: Lag builds up when consumers can’t keep pace with incoming messages, leading to bottlenecks.

Static resource allocation: Static replica counts or threshold-based scaling can’t adapt rapidly to fluctuating load.

Idle or redundant replicas: Without awareness of Kafka partitions and lag, extra consumers either sit idle or are insufficient during spikes.

KEDA : The Solution to Kafka-Based Scaling Challenges

KEDA (Kubernetes Event-Driven Autoscaler) is an open-source component for Kubernetes that allows applications to scale dynamically based on the number of events in various external systems like message queues or databases. It works alongside the standard Kubernetes Horizontal Pod Autoscaler (HPA), extending its capabilities to enable scale-to-zero functionality and more responsive, cost-efficient resource management by reacting to real-world demands beyond just CPU and memory usage.

Key Features and Benefits

What is Kafka Consumer lag?

In Apache Kafka, consumer lag is a delay in the time it takes a message to move from a producer (which generates messages) to a consumer (which receives them).

Some amount of lag is inevitable because it will always take some amount of time for data to move between producers and consumers. But in a well-designed, well-managed Kafka cluster, lag should be minimal – typically, just a handful of milliseconds.

Prerequisites:

Openshift Cluster : Ensure you have a running Openshift cluster set up and accessible.

Kafka Cluster : Ensure RedHat Streams for Apache Kafka Operator is installed and Kafka Cluster Instance is created.

KEDA Installation : KEDA needs to be installed on your Openshift Cluster before you can use it.

Please find the below steps for installing KEDA on s390x,

Download KEDA CRDs yaml,

curl -LO https://github.com/kedacore/keda/releases/download/v2.17.2/keda-2.17.2.yaml

Edit the yaml with tag "main", instead of "2.17.2",

image: ghcr.io/kedacore/keda-admission-webhooks:main

image: ghcr.io/kedacore/keda-metrics-apiserver:main

image: ghcr.io/kedacore/keda:main

Apply the changes,

oc apply -f keda-2.17.2.yaml

Test autoscale with Keda and Kafka:

Ref : https://keda.sh/docs/2.17/scalers/apache-kafka/

Create a consumer application deployment,

NOTE : This is a sample deployment we have used. Users can try the same sample deployment to achieve the validation.

apiVersion: apps/v1

kind: Deployment

metadata:

name: order-processor

namespace: test

labels:

app: order-processor

spec:

replicas: 1

selector:

matchLabels:

app: order-processor

template:

metadata:

labels:

app: order-processor

spec:

containers:

- name: kafka-consumer

image: registry.redhat.io/amq-streams/kafka-40-rhel9:3.0.0-15

command:

- "/bin/bash"

- "-c"

- |

bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap.test.svc:9092 --topic orders --group order-processing-group --from-beginning

Create a ScaledObject,

apiVersion: keda.sh/v1alpha1

kind: ScaledObject

metadata:

name: order-processor-scaledobject

namespace: test # make sure this matches your deployment's namespace

spec:

scaleTargetRef:

kind: Deployment

name: order-processor

minReplicaCount: 1

maxReplicaCount: 5

cooldownPeriod: 300

pollingInterval: 30

triggers:

- type: kafka

metadata:

bootstrapServers: my-cluster-kafka-bootstrap.test.svc:9092

consumerGroup: order-processing-group

topic: orders

lagThreshold: "500"

Now produce a large amount of message to the topic and see if autoscaling happens correctly.

Create a Kafka Topic

./kafka-topics.sh --create --topic orders --bootstrap-server my-cluster-kafka-bootstrap.test.svc:9092 --replication-factor 3 --partitions 3

Produce a large amount of message to the topic

./kafka-producer-perf-test.sh --topic orders --num-records 1000000 --throughput -1 --producer-props bootstrap.servers=my-cluster-monitor-kafka-bootstrap.test.svc:9092 batch.size=1000 acks=1 linger.ms=100000 buffer.memory=4294967296 compression.type=none request.timeout.ms=300000 --record-size 1000

This will produce more num of messages continuously to the topic.

Check the lag,

oc run kafka-lag-check -ti --image=registry.redhat.io/amq-streams/kafka-40-rhel9:3.0.0-15 --rm --restart=Never -- bash -c "bin/kafka-consumer-groups.sh --bootstrap-server my-cluster-kafka-bootstrap.test.svc:9092 --describe --group order-processing-group"

Here lag is more than 500, So ideally autoscale (up and down) should happen.

Once Producer is completely done producing, check whether it is scaling down,

Conclusion:

By combining KEDA with Red Hat OpenShift and IBM Z / LinuxONE, enterprises can unlock truly event-driven, intelligent scaling for Kafka-based applications. Instead of relying on static, resource-based triggers, workloads now scale automatically with real-time Kafka traffic—eliminating lag, optimizing costs, and improving responsiveness. This seamless integration ensures maximum performance, efficiency, and reliability for mission-critical, event-driven architectures.

Ref:

https://keda.sh/docs/2.17/scalers/apache-kafka/

0 comments

16 views

Permalink

https://community.ibm.com/community/user/blogs/parameshwaran-krishnasamy/2025/11/19/how-to-scale-openshift-applications-based-on-kafka

Red Hat OpenShift

Red Hat OpenShift

How to Scale Openshift Applications based on Apache Kafka Using KEDA on IBM Z & LinuxONE

By Parameshwaran Krishnasamy posted 11 hours ago

Real World Challenge

Root Cause Analysis

KEDA : The Solution to Kafka-Based Scaling Challenges

Key Features and Benefits

What is Kafka Consumer lag?

Prerequisites:

Test autoscale with Keda and Kafka:

Conclusion:

Ref:

Permalink

Additional
Resources

Office

Quick Links

Red Hat OpenShift

Red Hat OpenShift

How to Scale Openshift Applications based on Apache Kafka Using KEDA on IBM Z & LinuxONE

By Parameshwaran Krishnasamy posted 11 hours ago

Real World Challenge

Root Cause Analysis

KEDA : The Solution to Kafka-Based Scaling Challenges

Key Features and Benefits

What is Kafka Consumer lag?

Prerequisites:

Test autoscale with Keda and Kafka:

Conclusion:

Ref:

Permalink

Additional Resources

Office

Quick Links

Additional
Resources