File and Object Storage

 View Only

IBM Storage Ceph Object Storage Multisite Replication Series. Part Five

By Daniel Alexander Parkes posted Tue February 06, 2024 02:30 PM

  

IBM Storage Ceph Object Storage Multisite Replication Series. Part Five

In the previous episode of the series, we discussed everything related to Load-balancing our RGW S3 endpoints, covering different Load-balancing techniques, including the out-of-the-box ceph provided load balancer, the `Ingress service`. If you missed it, here is the link for Part Four. In the fifth article on the series, we will discuss Multisite sync policy in detail.

Multisite sync policy introduction 

With the latest IBM Storage Ceph 6.1 release, IBM Ceph Object Storage introduces granular bucket-level replication, unlocking many valuable features. Users can enable or disable sync per individual bucket, enabling precise control over replication workflows. This empowers full-zone replication while opting out of the replication of specific buckets, replicating a single source bucket to multi-destination buckets, and implementing symmetrical and directional data flow configurations. The following diagram shows an example of the sync policy feature in action:

 With our previous synchronization model, we would do full zone sync, meaning that all data and metadata would be synced between zones; the sync policy feature gives us this new flexibility and granularity that allows us to configure per-bucket replication. 

The bucket sync policies apply to the archive zones. The movement from an archive zone is not bidirectional, wherein all the objects can be moved from the active zone to the archive zone. However, you cannot move objects from the archive zone to the active zone since the archive zone is read-only. We will cover the Archive Zone in detail in part six of the blog series.

The sync policy feature has been GA’d in a phased release schedule. Here is a list of features that have already been GA’d with IBM Storage Ceph 6.1 and 7.0

GA’d in IBM Storage Ceph 6.1:

  • One to One bucket replication

  • Zone group level policy configurations

  • Bucket-level policy configurations

  • Configurable Data flow - Symmetrical

  • Only Greenfield/new multisite deployments supported

GA’d in IBM Storage Ceph 7.0:

  • Object Name filtering

  • Moving from a Legacy Multisite Sync(Full Zone replication) to Sync policy(At Zone group or bucket Level)

  • Archive zone sync Policy(Enable/Disable per bucket Replication to Archive Zone)

  • Data flow - Symmetrical or  Directional

Our final GA release for this feature will be IBM Storage Ceph 7.1, where we will start supporting the following functionalities:

  • Partial user S3 replication API(GetBucketReplication, PutBucketReplication, DeleteBucketReplication)

  • Sync to and from a different bucket (one-to-one or one-to-many)

  • Destination Parameter modifications: Storage class, destination Owner Translation, User mode 

Here are some of the sync policy concepts that we need to understand before we get our hands dirty; a sync policy is built of the following components:

  • Groups: we have one or more groups that can contain lists of data-flow configurations 

  • Data-flow: defines the flow of replicated data between the different zones. It can define symmetrical data flow, in which multiple zones sync data. It can also define directional data flow, in which the data moves in one way from one zone to another.

  • Pipes: A pipe defines the zones and buckets that can use these data flows and their associated properties.

A sync policy group can be in three states:

  • Enabled - sync is allowed and enabled. Replication will start when enabled; for example, we can enable full zone group sync and then disable (forbid) on a per-bucket basis.

  • Allowed - sync is permitted. Replication is permitted but will not start; for example, we can configure the zone group policy to allowed and then enable per-bucket policy sync

  • forbidden - sync, as defined by this group, is not permitted.

We can configure sync policies (groups, flows & pipes) at the zone group and bucket level; the bucket sync policy is always a subset of the defined zone group policy that they belong to, so if, for example, we don’t allow a flow at the zone group level it won’t work even if allowed at the bucket level, there are further details on the expected behaviour in the official documentation.

Multisite sync policy configuration

The following section will explain using the new multisite sync policy feature. By default, once we set up multisite replication as we did in the initial blog of the series, all metadata and data are replicated among the zones that are part of the zone group; we will call this sync method `legacy` during the remainder of the article. 

As we explained in the previous section, a sync policy is made up of a group, flow and pipe; we are going first to configure a zone group policy that is very lax and will allow bi-directional traffic for all buckets on all zones once in place we will add per-bucket sync policies that by design are a subset of the zone group policy, with more stringent rulesets.

 

We will first start adding the zone group policy; we create a new group called “group1”, and set the status to allowed. If you recall from the previous section, the zone group will allow sync traffic to flow. Still, the policy will be set to “allowed” and not “enabled”, so data synchronisation will not happen at the zone group level when in the “allowed” state, the idea being to enable the synchronisation on a per bucket basis.

[root@ceph-node-00 ~]# radosgw-admin sync group create --group-id=group1 --status=allowed --rgw-realm=multisite --rgw-zonegroup=multizg

We now create a symmetrical/bi-directional flow, allowing data sync in both directions from our zones: zone1 and zone2.

[root@ceph-node-00 ~]#  radosgw-admin sync group flow create --group-id=group1 --flow-id=flow-mirror --flow-type=symmetrical --zones=zone1,zone2

We finally create a pipe; in the pipe, we specify the group-id to use and then set an asterisk wildcard for the source and destination buckets and zones, meaning all zones and buckets can be replicated as the source and destination of the data.

[root@ceph-node-00 ~]# radosgw-admin sync group pipe create --group-id=group1  --pipe-id=pipe1 --source-zones='*'  --source-bucket='*' --dest-zones='*' --dest-bucket='*'

Zone group sync policy modifications, need to update the period; the bucket sync policy doesn’t require a period update.

[root@ceph-node-00 ~]# radosgw-admin period update --commit

Once we have committed the new period, all the data sync in the zone group is going to stop because our zone group policy is set to “Allowed” (If we had set it to “enabled”, the sync would have kept happening in the same way as with the initial multisite configuration we had).

 

Single Bucket bi-directional sync between zones

Now, we can start enabling sync on a per-bucket basis. We will create a bucket-level policy rule for the existing bucket “testbucket”. Note that the bucket needs to exist before being able to set this policy, and admin commands that modify bucket policies need to run on the master zone; however, bucket sync policies do not require a period update. There is no need to change the data flow, as it is inherited from the zone group policy. A bucket policy flow will only be a subset of the flow defined in the zone group policy; the same happens with pipes.

We create the bucket:

[root@ceph-node-00 ~]# aws --endpoint https://s3.zone1.cephlab.com:443 s3 mb s3://testbucket

make_bucket: testbucket

Create a bucket sync group, using the `--bucket` parameter to specify the bucket, and we set the status to enabled so that replication will be enabled for our bucket “testbucket” 

[root@ceph-node-00 ~]# radosgw-admin sync group create --bucket=testbucket --group-id=testbucket-1 --status=enabled

 

There is no need to specify a flow as we will inherit the flow from the zone group, so we only need to define a pipe for our bucket sync policy group called “testbucket-1”, as soon as this command is applied the data sync replication will start for this bucket.

[root@ceph-node-00 ~]# radosgw-admin sync group pipe create --bucket=testbucket --group-id=testbucket-1 --pipe-id=test-pipe1 --source-zones='*' --dest-zones='*'

NOTE: You can safely ignore the following warnings:

WARNING: cannot find source zone id for name=*

 With the “sync group get” command you can review your group, flow and pipe configurations, we can run the command at the Zone group level, where we can see that the status is “allowed”.

[root@ceph-node-00 ~]# radosgw-admin sync group get | jq .[0].val.status

"allowed"

And we can run the sync group get command at the Bucket Level using the `--bucket` parameter. In this case, the status is enabled for the “testbucket”:

[root@ceph-node-00 ~]# radosgw-admin sync group get --bucket testbucket | jq .[0].val.status

"Enabled"

Another helpful command is sync info; with sync info, we can preview what sync replication will be implemented with our current configuration. So, for example, with our current zone group sync policy in the “allowed” state, no sync will happen at the zone group level, so the sync info command will not show any sources or destinations configured.

[root@ceph-node-00 ~]# radosgw-admin sync info
{
    "sources": [],
    "dests": [],
    "hints": {
        "sources": [],
        "dests": []
    },
    "resolved-hints-1": {
        "sources": [],
        "dests": []
    },
    "resolved-hints": {
        "sources": [],
        "dests": []
    }
}

We can also use the sync info command but at the bucket level, using the `--bucket` parameter because we have configured a bi-directional pipe; we are going to have in sources zone2 -> zone1 and on destinations zone1->zone2; this means that the replication on bucket testbucket is happening on both directions, if we PUT and object to testbucket from zone1 it will get replicated to zone2, and if we PUT and object to zone2 it will get replicated to zone1. 

[root@ceph-node-00 ~]# radosgw-admin sync info --bucket testbucket
{
    "sources": [
        {
            "id": "test-pipe1",
            "source": {
                "zone": "zone2",
                "bucket": "testbucket:89c43fae-cd94-4f93-b21c-76cd1a64788d.34553.1"
            },
            "dest": {
                "zone": "zone1",
                "bucket": "testbucket:89c43fae-cd94-4f93-b21c-76cd1a64788d.34553.1"
            },
            "params": {
                "source": {
                    "filter": {
                        "tags": []
                    }
                },
                "dest": {},
                "priority": 0,
                "mode": "system",
                "user": "user1"
            }
        }
    ],
    "dests": [
        {
            "id": "test-pipe1",
            "source": {
                "zone": "zone1",
                "bucket": "testbucket:89c43fae-cd94-4f93-b21c-76cd1a64788d.34553.1"
            },
            "dest": {
                "zone": "zone2",
                "bucket": "testbucket:89c43fae-cd94-4f93-b21c-76cd1a64788d.34553.1"
            },
            "params": {
                "source": {
                    "filter": {
                        "tags": []
                    }
                },
                "dest": {},
                "priority": 0,
                "mode": "system",
                "user": "user1"
            }
        }
    ],

So if, for example, we only look at the sources, you can see they will change depending on the cluster where we run the radosgw-admin command from; for example from cluster2(ceph-node04), we have zone1 as the source:

[root@ceph-node-00 ~]# ssh ceph-node-04 radosgw-admin sync info --bucket testbucket | jq '.sources[].source, .sources[].dest'
{
  "zone": "zone1",
  "bucket": "testbucket:66df8c0a-c67d-4bd7-9975-bc02a549f13e.45330.2"
}
{
  "zone": "zone2",
  "bucket": "testbucket:66df8c0a-c67d-4bd7-9975-bc02a549f13e.45330.2"
}

In Cluster1(ceph-node-00), we have zone2 as the source

[root@ceph-node-00 ~]# radosgw-admin sync info --bucket testbucket | jq '.sources[].source, .sources[].dest'
{
  "zone": "zone2",
  "bucket": "testbucket:66df8c0a-c67d-4bd7-9975-bc02a549f13e.45330.2"
}
{
  "zone": "zone1",
  "bucket": "testbucket:66df8c0a-c67d-4bd7-9975-bc02a549f13e.45330.2"
}

Ok, let’s do a quick test with the AWS CLI, to validate the configuration and confirm the replication is working for the testbucket. We PUT an object in zone1 and check that it gets replicated in zone2.

[root@ceph-node-00 ~]# aws --endpoint https://s3.zone1.cephlab.com:443 s3 cp /etc/hosts s3://testbucket/firsfile

upload: ../etc/hosts to s3://testbucket/firsfile

We can check the sync has finished with the `radosgw-admin bucket sync checkpoint` command:

[root@ceph-node-00 ~]# ssh ceph-node-04 radosgw-admin bucket sync checkpoint --bucket testbucket
2024-02-02T02:17:26.858-0500 7f3f38729800  1 bucket sync caught up with source:
      local status: [, , , 00000000004.531.6, , , , , , , ]
    remote markers: [, , , 00000000004.531.6, , , , , , , ]
2024-02-02T02:17:26.858-0500 7f3f38729800  0 bucket checkpoint complete

Another alternative to check the sync status is to use the `radosgw-admin bucket sync status` Command:

[root@ceph-node-00 ~]# radosgw-admin bucket sync status --bucket=testbucket
          realm beeea955-8341-41cc-a046-46de2d5ddeb9 (multisite)
      zonegroup 2761ad42-fd71-4170-87c6-74c20dd1e334 (multizg)
           zone 66df8c0a-c67d-4bd7-9975-bc02a549f13e (zone1)
         bucket :testbucket[66df8c0a-c67d-4bd7-9975-bc02a549f13e.37124.2])
   current time 2024-02-02T09:07:42Z

    source zone 7b9273a9-eb59-413d-a465-3029664c73d7 (zone2)
  source bucket :testbucket[66df8c0a-c67d-4bd7-9975-bc02a549f13e.37124.2])
                incremental sync on 11 shards
              bucket is caught up with source

So the object is available on zone2

[root@ceph-node-00 ~]# aws  --endpoint https://object.s3.zone2.dan.ceph.blue:443 s3 ls s3://testbucket/

2024-01-09 06:27:24        233 firsfile 

Because the replication is bi-directional, we PUT an object in zone2, and it gets replicated to zone1

[root@ceph-node-00 ~]# aws --endpoint https://object.s3.zone2.dan.ceph.blue:443 s3 cp   /etc/hosts s3://testbucket/secondfile
upload: ../etc/hosts to s3://testbucket/secondfile
[root@ceph-node-00 ~]# aws  --endpoint https://object.s3.zone1.dan.ceph.blue:443 s3 ls s3://testbucket/
2024-01-09 06:27:24        233 firsfile
2024-02-02 00:40:15        233 secondfile

Summary & up next

In Part Five of this series, we discussed Multisite Sync Policy and shared some hands-on examples of configuring granular bucket bi-directional replication. In Part Six, we will continue configuring Multisite Sync Policies like unidirectional replication with one source to many destination buckets. 
Links to the rest of the blog series:

IBM Storage Ceph resources

Find out more about IBM Storage Ceph


#Highlights
#Highlights-home
0 comments
44 views

Permalink