As ClickHouse adoption grows, many teams are moving away from ZooKeeper to the newer ClickHouse Keeper. ClickHouse Keeper is a lightweight, purpose-built coordination service designed to handle ClickHouse workloads without the operational overhead of a full ZooKeeper cluster.
Why the need to migrate to clickhouse-keeper.?
-
ClickHouse Keeper can run embedded within ClickHouse or as a standalone service, simplifying deployment and management within a ClickHouse ecosystem.
-
ClickHouse Keeper resolves known issues present in ZooKeeper, such as the 1MB limit on default packet and node data size, and the ZXID overflow problem, which can necessitate restarts in ZooKeeper
-
ClickHouse Keeper, being implemented in C++, offers better performance and significantly reduced resource consumption (CPU, memory, disk I/O) compared to ZooKeeper, which is Java-based and prone to issues like Full GC pauses

In this guide, I will walk through how to migrate from ZooKeeper to ClickHouse Keeper on a GitOps-managed platform (Flux). We’ll cover the prerequisites, deployment, migration steps, and validation checks to ensure a smooth transition.
Prerequisites
Before starting, make sure you’re on supported versions:
You’ll also need the clickhouse-keeper-converter binary inside your ZooKeeper pod.
Step 1: Deploy the ClickHouse Operator with Helm
In a GitOps setup, we manage operators with Flux + Helm.

Below is an example HelmRelease for the ClickHouse Operator pinned to the required versions:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: clickhouse
namespace: flux-system
spec:
targetNamespace: ch-system
chart:
spec:
chart: ibm-clickhouse-operator
version: "v1.2.0"
sourceRef:
kind: HelmRepository
name: clickhouse
values:
installCRDs: true
image:
repository: icr.io/clickhouse/clickhouse-operator
tag: v1.2.5
Note:- We need to update clickhouse-operator only if its not in version v1.2.5, if its already in or above we don't need this step.
Step 2: Deploy ClickHouse Keeper

apiVersion: clickhouse-keeper.altinity.com/v1
kind: ClickHouseKeeperInstallation
metadata:
name: clickhouse-keeper
namespace: datastores
spec:
configuration:
clusters:
- name: keeper
layout:
replicasCount: 1
settings:
keeper_server/tcp_port: "2181"
Deploy this with Flux or Kustomize. Once applied, verify:
kubectl get pods -n datastores -l app=clickhouse-keeper
kubectl get svc -n datastores | grep clickhouse-keeper
Step 3: Convert ZooKeeper Metadata
/tmp/clickhouse-keeper-converter/clickhouse-keeper-converter \
--zookeeper-logs-dir /data/version-2 \
--zookeeper-snapshots-dir /data/version-2 \
--output-dir /tmp/ckeeper-snapshots
This generates Keeper-compatible snapshots. Copy them into the Keeper pod’s snapshot directory.

Step 4: Restart Keeper with Snapshots
Ensure your snapshot_storage_path points correctly:
/var/lib/clickhouse-keeper/coordination/snapshots/store
Remove old snapshots, restart Keeper, and confirm metadata matches ZooKeeper:
SELECT * FROM system.zookeeper WHERE path = '/';

Step 5: Update ClickHouse to Use Keeper
zookeeper:
nodes:
- host: clickhouse-keeper-headless.datastores.svc.cluster.local
port: 2181
Flux will reconcile and trigger a rolling restart (ensure spec.restart: "RollingUpdate" is enabled). Perform this during a maintenance window and pause heavy DDL jobs.

Step 6: Validate the Cluster
Run these checks after the switchover:
Keeper connectivity
SELECT * FROM system.zookeeper WHERE path = '/' LIMIT 10;
Replication queues
SELECT database, table, count(*) FROM system.replication_queue GROUP BY database, table;
Replication delay
SELECT database, table, max(abs(now() - create_time)) FROM system.replication_queue GROUP BY database, table;
Readonly replicas
SELECT * FROM system.replicas WHERE is_readonly = 1;

Step 7: Decommission ZooKeeper
Once ClickHouse is stable and fully synced with Keeper, you can scale down ZooKeeper pods and remove them after an observation period.

Conclusion
Migrating from ZooKeeper to ClickHouse Keeper simplifies your stack and brings cluster coordination closer to ClickHouse itself. Using GitOps principles ensures the migration is auditable, repeatable, and automated.
If you’re planning this migration in production, test in staging first, schedule it during low-traffic windows, and monitor replication closely.
#Documentation
#Infrastructure
#Kubernetes
#Database
#SRE