File and Object Storage

 View Only

IBM Storage Ceph Object Storage Multisite Replication Series. Part One

By Daniel Alexander Parkes posted Thu January 25, 2024 12:51 PM


IBM Storage Ceph Object Storage Multisite Replication Series.

Throughout this series of articles, we will provide hands-on examples to help you set up and configure some of the most critical replication features of the IBM Storage Ceph Object Storage solution. This will include the new Object Storage multisite enhancements released in version 7.0. 

At a high level, these are the different topics we will cover during the different parts of the blog series.

  • Introduction to IBM Storage Ceph Object Storage Multisite Replication.
  • IBM Storage Ceph Object Multisite Architecture & Configuration
  • New Performance Improvements in 7.0. Replication Sync Fairness
  • Load-Balancing the RGW services. Deploying the Ceph Ingress Service
  • IBM Storage Ceph Object Multisite sync policy
  • IBM Storage Ceph Object Storage Archive Zone

When discussing Replication, Disaster Recovery, Backup and Restore, we have different strategies available that provide us with different SLAs for data and application recovery (RTO / RPO). For instance, synchronous replication provides the lowest RPO, which means zero data loss. IBM Storage Ceph can provide synchronous replication between sites by stretching the Ceph cluster among the data centres. On the other hand, asynchronous replication will assume a non-zero RPO. In Ceph, async multisite replication involves replicating the data to another Ceph cluster. Each IBM Storage Ceph storage type: object, block, and file, has its own asynchronous replication method. This blog series will cover Geo-dispersed Object Storage Multisite Asynchronous replication.


Introduction to IBM Storage Ceph Object Storage Multisite Replication.

Before getting our hands wet with the deployment details, let me give you a quick overview of what Ceph Object Storage provides, enterprise-grade, highly mature object geo-replication capabilities. The RGW multi-site replication feature facilitates asynchronous object replication across single or multi-zone deployments. Ceph Object Storage operates efficiently over WAN connections using asynchronous replication with eventual consistency.

IBM Storage Ceph Object Storage Multisite Replication provides many benefits for businesses that must store and manage large amounts of data across multiple locations. Here are some of the key benefits of using IBM Storage Ceph Object Storage Multisite Replication:

Improved Data Availability, Multi-Region.

Ceph Object Storage clusters can be geographically dispersed, which improves data availability and reduces the risk of data loss due to hardware failure, natural disasters or other events. There are no network latency requirements as we are doing eventually consistent async replication.

Active/Active Replication.

Replication is Active/Active for Data(Objects) access. End users can simultaneously read/write from/to their closest S3 endpoint location; this means users can access data more quickly and reduce downtime.

 But only the designated master zone in the zone group accepts Metadata updates; for example, when creating Users and Buckets, all metadata modifications on non-master zones will be forwarded to the configured master. if the master fails, a manual master zone failover must be triggered.

Increased Scalability.

With multisite replication, businesses can quickly scale their storage infrastructure by adding new sites or clusters. This allows businesses to store and manage large amounts of data without worrying about running out of storage capacity or performance.

Realm, Zonegroups and Zones 

An IBM Storage Ceph Object Storage multisite cluster consists of Realms,zone groups and zones:

  • A realm defines a global namespace across multiple Ceph storage clusters

  • Zone groups can have one or more zones

  • Next, we have zones. These are the lowest level of the Ceph multisite configuration, and they’re represented by one or more object gateways underneath one single Ceph cluster. 

As you can see in the following diagram, IBM Storage Ceph object storage multisite replication happens at the zone level. We have a single realm called and two zone groups. The realm global object namespace ensures unique object IDs across zone groups and zones.

Each bucket is owned by the zone group where it was created, and its object data will only be replicated to other zones in that zone group. Any request for data in that bucket sent to other zone groups will be redirected to the zone group where the bucket resides.

One Realm

On an IBM Storage Ceph Object Storage cluster, you can have one or more Realms; each Realm is an independent global object namespace, meaning each realm will have its own set of Users, Buckets and objects. For example, you can't have two buckets with the same name in a single Realm. In IBM Storage Ceph Object Storage, there is also the concept of "tenants" to isolate S3 namespaces, but that discussion is out of scope; you can find more information on this link.

The following diagram shows an example where we have two different Realms, so two independent namespaces. Each realm has it's zonegroup and replicating zones.

Two Realms

Each Zone represents an IBM Storage Ceph cluster; you can have one or more zones in a zone group. Multisite replication, when configured, will happen between zones. In this series of blogs, we will configure only two zones in a zone group, but you can configure N amount of replicated zones in a single zone group.

IBM Storage Ceph Multisite Replication Policy

With the latest 6.1 release, Ceph Object Storage introduces “Multisite Sync Policy” that provides granular bucket-level replication, provides the user with greater flexibility and reduced costs, unlocking and an array of valuable replication features:

  • Users can enable or disable sync per individual bucket, enabling precise control over replication workflows. 

  • Full-zone replication while opting out to replicate specific buckets

  • Replicating a single source bucket with multi-destination buckets

  • Implementing symmetrical and directional data flow configurations per bucket

The following diagram shows an example of the sync policy feature in action.

IBM Storage Ceph Multisite Configuration 

Architecture overview

As part of the 6.1 release, a new MGR module called RGW was added to the ceph orchestrator “cephadm”. The RGW manager module makes the configuration of Multisite Replication straightforward. This section will show you how to configure IBM Storage Ceph Object Storage multisite replication between 2 zones(Each zone is an independent Ceph Cluster) through the CLI using the new RGW manager module.

NOTE: In IBM Storage Ceph 7.0, the multisite configuration can also be done using the Ceph UI/Dashboard. We won’t use the UI in this guide, but if you are interested, you can find more information on the following link.

In our setup, we are going to configure our multisite replication with the following logical layout: we have a realm called multisite, and this realm contains a single zone group called multizg; inside the zone group, we have two zones, called zone1 and zone2, each zone represents a ceph cluster on a geographically distributed datacenter. The following diagram is a logical representation of our multisite configuration.


As this is a lab deployment, this is a downsized example. Each ceph cluster comprises four nodes with six disks each. We are configuring 4 RadosGW services(one per node) for each cluster; two rgws will serve the S3 client requests, and the remaining rgw services will be responsible for the multisite replication operations. Ceph Object Storage Multisite replication data is transmitted to the other site through the RadosGW services using the HTTP protocol; the advantage of this is that at the networking layer, we only need to enable/allow HTTP communication between the Ceph Clusters(zones) that we want to configure multisite replication for. The following diagram shows the final architecture we will be configuring step by step in this series of IBM Storage Ceph multisite replication posts.

In our example, we will terminate the SSL connection from the client at the per-site load balancer level. The RGW services will use plain HTTP for all the involved endpoints.

When configuring TLS/SSL, we can terminate the encrypted connection from the client to the S3 endpoint at the load balancer level, at the rgw service level, or both, re-encrypting the connection from the load balancer to the rgw(this feature is not currently supported by Ceph ingress service(Load Balancer)).

The second blog will enumerate the steps to establish the multisite replication between our Ceph clusters, as depicted in the following diagram.

But before starting the configuration of the Object Storage multisite replication, we need to provide a bit more context of our starting point; we have two ceph clusters deployed, the first cluster with nodes from ceph-node-00 to ceph-node-03 and the second cluster with nodes from ceph-node-04 to ceph-node-07. 

[root@ceph-node-00 ~]# ceph orch host ls
HOST                      ADDR             LABELS                      STATUS   _admin,osd,mon,mgr  osd,mon,mgr   osd,mon,mgr  osd
4 hosts in cluster
[root@ceph-node-04 ~]#  ceph orch host ls
HOST                      ADDR             LABELS                      STATUS  _admin,osd,mon,mgr  osd,mon,mgr  osd,mon,mgr  osd
4 hosts in cluster
The Core Ceph services have been deployed, plus Ceph's observability stack, but there is no RadosGW services configuration. The Ceph services are running containerized on top of a RHEL OS with the help of Podman. For more details on how to get started and deploy IBM Storage Ceph, check this video.
[root@ceph-node-00 ~]# ceph orch ls
NAME                       PORTS        RUNNING  REFRESHED  AGE  PLACEMENT                                          
alertmanager               ?:9093,9094      1/1  6m ago     3w   count:1                                            
ceph-exporter                               4/4  6m ago     3w   *                                                  
crash                                       4/4  6m ago     3w   *                                                  
grafana                    ?:3000           1/1  6m ago     3w   count:1                                            
mgr                                         3/3  6m ago     3w   label:mgr                                          
mon                                         3/3  6m ago     3w   label:mon                                          
node-exporter              ?:9100           4/4  6m ago     3w   *                                                  
osd.all-available-devices                     4  6m ago     3w   label:osd                                          
prometheus                 ?:9095           1/1  6m ago     3w   count:1        
[root@ceph-node-00 ~]# ceph version
ceph version 18.2.0-131.el9cp (d2f32f94f1c60fec91b161c8a1f200fca2bb8858) reef (stable)
[root@ceph-node-00 ~]# podman inspect | jq .[].Labels.summary
"Provides the latest IBM Storage Ceph 7 in a fully featured and supported base image."
# cat /etc/redhat-release 
Red Hat Enterprise Linux release 9.2 (Plow)

Summary & up next

As a recap, in Part One of this multisite series, we have gone through an overview of IBM Storage Ceph Object Storage multisite replication features and architecture, setting the stage to start configuring the multisite replication in Part Two of the series. Part Two is available following this link

IBM Storage Ceph resources

Find out more about IBM Storage Ceph