In a Streams for Apache Kafka cluster, Apache ZooKeeper has historically functioned as the central coordination service, managing metadata for brokers, producers, and consumers, facilitating leader election, and orchestrating the overall health and consensus of the Kafka ecosystem.
With the introduction of KIP-500 (Kafka Improvement Proposal 500), the Apache Kafka project initiated a major architectural shift to eliminate the ZooKeeper dependency. This proposal introduces the KRaft (Kafka Raft Metadata Mode) quorum-based consensus mechanism, enabling Kafka to natively manage metadata and controller functionality within the Kafka brokers themselves.
As of Apache Kafka 4.0 and Streams for Apache Kafka 3.0 (supported OCP versions – OCP 4.14 to 4.19), ZooKeeper is scheduled for complete removal, with KRaft becoming the default and only supported mode for metadata management and controller coordination.
In this blog, we’ll walk into the technical evolution from ZooKeeper-based deployments to the KRaft architecture, outlining the migration path, operational considerations, and key architectural changes.
What is KRaft and why to migrate to it?
KRaft (Kafka Raft metadata mode) is a new metadata management paradigm introduced in Apache Kafka that eliminates the reliance on ZooKeeper. By embedding the Raft consensus protocol directly within Kafka brokers, KRaft enables native management of cluster metadata, controller elections, and configuration propagation.
The shift to KRaft streamlines Kafka’s architecture, enhancing system scalability, reducing external dependencies, and lowering operational overhead. It also improves performance and fault tolerance by leveraging Kafka's internal mechanisms for consensus and state replication, aligning metadata management more closely with Kafka’s core design principles.
ZooKeeper to KRaft migration:
In a Kafka cluster, brokers in ZooKeeper mode (ZK mode) store their metadata in Apache ZooKeeper. This is the old mode of handling metadata. Brokers in KRaft mode store their metadata in a KRaft quorum, which is the new and improved mode of handling metadata.
So, migration is the process of moving cluster metadata from ZooKeeper into a KRaft quorum.
NOTE : Once KRaft mode is enabled, rollback to ZooKeeper is not possible. Consider this carefully before proceeding with the migration.
ZooKeeper Mode:
KRaft Mode:
-
To migrate to KRaft mode, you must be using Streams for Apache Kafka version 2.7 or later, along with Apache Kafka version 3.7.0 or newer. If you're using an older version of either, upgrade to the required versions before proceeding with the migration.
To perform the migration when the cluster is not yet using node pools, all brokers must be defined within a KafkaNodePool resource. This resource must be named kafka and assigned the broker role. Node pool support is enabled in the Kafka resource by adding the strimzi.io/node-pools: enabled annotation.
Migration Procedure:
-
Create a new KafkaNodePool resource.
Example configuration for a node pool used in migrating a Kafka cluster:
apiVersion: kafka.strimzi.io/v1beta2
strimzi.io/cluster: my-cluster
NOTE : To retain existing cluster data and preserve the names of nodes and resources, the node pool must be named kafka, and the strimzi.io/cluster label must match the name of the Kafka resource. If these requirements aren't met, new nodes and resources, including persistent volume storage will be created with different names, which may result in the loss of access to previously stored data.
-
Apply the KafkaNodePool resource:
oc apply -f <node_pool_configuration_file>
Applying this resource transitions Kafka to use node pools.
There are no changes to the existing resources, no rolling updates, and everything remains identical to the previous configuration.
-
Enable support for node pools in the Kafka resource using the strimzi.io/node-pools: enabled annotation.
oc annotate kafka my-cluster strimzi.io/node-pools="enabled" –overwrite
-
To avoid warnings, remove any replicated properties from the Kafka custom resource. Once the KafkaNodePool resource is in use, you can safely delete properties that were moved to the KafkaNodePool, such as .spec.kafka.replicas and .spec.kafka.storage.
-
Create a node pool with a controller role.
apiVersion: kafka.strimzi.io/v1beta2
strimzi.io/cluster: my-cluster
NOTE: For the migration, you cannot use a node pool of nodes that share the broker and controller roles.
-
Enable KRaft migration in the Kafka resource by setting the strimzi.io/kraft annotation to migration:
oc annotate kafka my-cluster -n <my-project> strimzi.io/kraft="migration" –overwrite
Applying the annotation to the Kafka resource configuration starts the migration.
-
Check the controllers have started and the brokers have rolled:
oc get pods -n <my-project>
Output shows nodes in broker and controller node pools
NAME READY STATUS RESTARTS
my-cluster-kafka-0 1/1 Running 0
my-cluster-kafka-1 1/1 Running 0
my-cluster-kafka-2 1/1 Running 0
my-cluster-controller-3 1/1 Running 0
my-cluster-controller-4 1/1 Running 0
my-cluster-controller-5 1/1 Running 0
-
Check the status of the migration:
oc get kafka my-cluster -n <my-project> -w
Updates to the metadata state
my-cluster ... KRaftMigration
my-cluster ... KRaftDualWriting
my-cluster ... KRaftPostMigration
-
When the metadata state has reached KRaftPostMigration, enable KRaft in the Kafka resource configuration by setting the strimzi.io/kraft annotation to enabled:
oc annotate kafka my-cluster -n <my-project> strimzi.io/kraft="enabled" –overwrite
10. Check the status of the move to full KRaft mode:
oc get kafka my-cluster -n <my-project> -w
Updates to the metadata state
my-cluster ... KRaftMigration
my-cluster ... KRaftDualWriting
my-cluster ... KRaftPostMigration
11. Remove any ZooKeeper-related configuration from the Kafka resource.
If present, you can remove the following:
Removing log.message.format.version and inter.broker.protocol.version will trigger a restart of the brokers and controllers. Removing ZooKeeper-related properties will eliminate any warning messages about ZooKeeper configuration in a KRaft-operated cluster.
Verify the cluster state:
Performing a rollback on the migration
Before the migration is finalized by enabling KRaft in the Kafka resource, and the state has moved to the KRaft state, we can perform a rollback operation as follows:
-
Apply the strimzi.io/kraft="rollback" annotation to the Kafka resource to roll back the brokers.
oc annotate kafka my-cluster -n <my-project> strimzi.io/kraft="rollback" –overwrite
NOTE : The migration process must be in the KRaftPostMigration state to do this. The brokers are rolled back so that they can be connected to ZooKeeper again and the state returns to KRaftDualWriting.
-
Delete the controllers node pool:
oc delete KafkaNodePool controller -n <my-project>
-
Apply the strimzi.io/kraft="disabled" annotation to the Kafka resource to return the metadata state to ZooKeeper.
oc annotate kafka my-cluster strimzi.io/kraft="disabled" –overwrite
Switching back to using ZooKeeper:
If the nodepool for brokers is not needed, apply the strimzi.io/node-pools: disabled annotation in the Kafka resource, So that it will use the initial Kafka brokers.
We may need to add back the properties -.spec.kafka.replicas and .spec.kafka.storage if you have removed them to avoid warnings in step 4.
oc annotate kafka my-cluster -n <my-project> strimzi.io/node-pools="disabled" –overwrite
And delete the broker nodepool named kafka.
oc delete KafkaNodePool kafka -n <my-project>
The transition from ZooKeeper-based Kafka clusters to KRaft mode marks a pivotal evolution in the Apache Kafka architecture, offering a more streamlined, scalable, and resilient metadata management model. With KRaft becoming the default in Apache Kafka 4.0 and Streams for Apache Kafka 3.0, organizations must begin planning and executing their migration strategies to stay aligned with the future of Kafka.
By leveraging the Red Hat OpenShift Operator – Streams for Apache Kafka, enterprises running on IBM zSystems and LinuxONE can perform this migration with confidence, using a structured, operator-driven approach that minimizes disruption and maximizes operational continuity.
Migrating to KRaft not only simplifies your Kafka infrastructure but also enhances performance, fault tolerance, and manageability—making your event-driven architecture more future-ready and cloud-native.