Red Hat OpenShift

Red Hat OpenShift

Red Hat OpenShift

Kubernetes-based container platform that provides a trusted environment to run enterprise workloads. It extends the Kubernetes platform with built-in software to enhance app lifecycle development, operations, and security

 View Only

How to Scale Openshift Applications based on Apache Kafka Using KEDA on IBM Z & LinuxONE

By Parameshwaran Krishnasamy posted 11 hours ago

  

Authors : Parameshwaran Krishnasamy (Parameshwaran.K@ibm.com), Santosh Vasisht (Santosh.Vasisht@ibm.com), Arya Jena (arya.jena4@ibm.com), Basavaraju G (basavarg@in.ibm.com), Rishika Kedia (rishika.kedia@in.ibm.com)

Enterprises operating event-driven, message-centric platforms—especially on OpenShift/Kubernetes—continually struggle to efficiently scale Kafka consumers. Traditional resource-based autoscaling (such as CPU/memory triggers) fails to adapt to dynamic workloads, resulting in: 
  • Under‑provisioning: Consumers fall behind during traffic surges, increasing lag and impacting downstream processing or user experiences. 
  • Over‑provisioning: Spending unnecessary resources (and cost) during periods of low or no Kafka traffic. 

Real World Challenge 

Consider an e‑commerce microservices architecture deployed on Red Hat OpenShift (or Kubernetes), where: 
  • Multiple services ingest orders, logs, or notifications via Kafka. 
  • Traffic patterns are unpredictable—daily peaks during promotions, campaigns, or external triggers. 
  • If Kafka consumer lag increases, services may cause delays. 

 

Root Cause Analysis 

The underlying root issues include: 
  • Mismatch of scaling triggers: Traditional Horizontal Pod Autoscaler (HPA) responds to resource utilization rather than Kafka-specific events. 
  • Consumer lag accumulation: Lag builds up when consumers can’t keep pace with incoming messages, leading to bottlenecks. 
  • Static resource allocation: Static replica counts or threshold-based scaling can’t adapt rapidly to fluctuating load. 
  • Idle or redundant replicas: Without awareness of Kafka partitions and lag, extra consumers either sit idle or are insufficient during spikes. 

 

KEDA : The Solution to Kafka-Based Scaling Challenges 

 

 

KEDA (Kubernetes Event-Driven Autoscaler) is an open-source component for Kubernetes that allows applications to scale dynamically based on the number of events in various external systems like message queues or databases. It works alongside the standard Kubernetes Horizontal Pod Autoscaler (HPA), extending its capabilities to enable scale-to-zero functionality and more responsive, cost-efficient resource management by reacting to real-world demands beyond just CPU and memory usage. 

 

Key Features and Benefits 

 

What is Kafka Consumer lag? 

 

In Apache Kafka, consumer lag is a delay in the time it takes a message to move from a producer (which generates messages) to a consumer (which receives them).  
Some amount of lag is inevitable because it will always take some amount of time for data to move between producers and consumers. But in a well-designed, well-managed Kafka cluster, lag should be minimal – typically, just a handful of milliseconds. 
 

Prerequisites: 

Openshift Cluster : Ensure you have a running Openshift cluster set up and accessible. 
Kafka Cluster : Ensure RedHat Streams for Apache Kafka Operator is installed and Kafka Cluster Instance is created. 
KEDA Installation : KEDA needs to be installed on your Openshift Cluster before you can use it. 
Please find the below steps for installing KEDA on s390x, 

 

Download KEDA CRDs yaml, 

Edit the yaml with tag "main", instead of "2.17.2", 

image: ghcr.io/kedacore/keda-admission-webhooks:main 

image: ghcr.io/kedacore/keda-metrics-apiserver:main 

image: ghcr.io/kedacore/keda:main 

 

Apply the changes,

oc apply -f keda-2.17.2.yaml 

           

 

             

 

Test autoscale with Keda and Kafka: 

 

Create a consumer application deployment, 
NOTE : This is a sample deployment we have used. Users can try the same sample deployment to achieve the validation. 
 

apiVersion: apps/v1 

kind: Deployment 

metadata: 

name: order-processor 

namespace: test 

labels: 

   app: order-processor 

spec: 

replicas: 1 

selector: 

   matchLabels: 

     app: order-processor 

template: 

   metadata: 

     labels: 

       app: order-processor 

   spec: 

     containers: 

       - name: kafka-consumer 

         image: registry.redhat.io/amq-streams/kafka-40-rhel9:3.0.0-15 

         command: 

           - "/bin/bash" 

           - "-c" 

           - | 

             bin/kafka-console-consumer.sh --bootstrap-server my-cluster-kafka-bootstrap.test.svc:9092 --topic orders --group order-processing-group --from-beginning

  

 

 

Create a ScaledObject, 

 

apiVersion: keda.sh/v1alpha1 

kind: ScaledObject 

metadata: 

  name: order-processor-scaledobject 

  namespace: test   # make sure this matches your deployment's namespace 

spec: 

  scaleTargetRef: 

    kind: Deployment 

    name: order-processor 

  minReplicaCount: 1 

  maxReplicaCount: 5 

  cooldownPeriod: 300 

  pollingInterval: 30 

  triggers: 

  - type: kafka 

    metadata: 

       bootstrapServers: my-cluster-kafka-bootstrap.test.svc:9092 

      consumerGroup: order-processing-group 

      topic: orders 

      lagThreshold: "500" 

 

 

Now produce a large amount of message to the topic and see if autoscaling happens correctly. 
 
Create a Kafka Topic 

 

./kafka-topics.sh --create --topic orders --bootstrap-server my-cluster-kafka-bootstrap.test.svc:9092 --replication-factor 3 --partitions 3 

 

Produce a large amount of message to the topic 

  

./kafka-producer-perf-test.sh --topic orders --num-records 1000000 --throughput -1 --producer-props bootstrap.servers=my-cluster-monitor-kafka-bootstrap.test.svc:9092 batch.size=1000 acks=1 linger.ms=100000 buffer.memory=4294967296 compression.type=none request.timeout.ms=300000 --record-size 1000

 

This will produce more num of messages continuously to the topic. 

 

Check the lag,

 

oc run kafka-lag-check -ti --image=registry.redhat.io/amq-streams/kafka-40-rhel9:3.0.0-15 --rm --restart=Never -- bash -c "bin/kafka-consumer-groups.sh --bootstrap-server my-cluster-kafka-bootstrap.test.svc:9092 --describe --group order-processing-group" 

 

 

Here lag is more than 500, So ideally autoscale (up and down) should happen. 

 

 

 

Once Producer is completely done producing, check whether it is scaling down, 

 

 

 

 

 

Conclusion: 

By combining KEDA with Red Hat OpenShift and IBM Z / LinuxONE, enterprises can unlock truly event-driven, intelligent scaling for Kafka-based applications. Instead of relying on static, resource-based triggers, workloads now scale automatically with real-time Kafka traffic—eliminating lag, optimizing costs, and improving responsiveness. This seamless integration ensures maximum performance, efficiency, and reliability for mission-critical, event-driven architectures. 

 

Ref: 

 

 

0 comments
16 views

Permalink