
Introduction: Why Performance Matters for Secure Object Storage
Organizations today are challenged not only with managing massive volumes of data — often in the tens of petabytes — but also with the responsibility of securing that data across hybrid and multicloud environments. Object storage systems, such as IBM Storage Ceph, provide the scalability and flexibility required to meet these challenges, offering S3-compatible access, native redundancy, and a growing set of enterprise features.
As encryption in transit and at rest is configured and layered with Ceph Object Gateway (RGW) deployments, it becomes essential to understand their impact on latency, throughput, and resource utilization.
This blog series presents the results of a comprehensive performance benchmarking effort on the IBM Storage Ceph Object Gateway (RGW), conducted by the IBM Storage Ceph Performance and Interoperability team. Special thanks to Jay Rajput for leading the execution of test cases and data collection. Our evaluation focuses on how real-world workloads interact with different configurations of encryption, data protection, and horizontal scaling, offering practical insights for architects, administrators, and developers alike.
Hardware and Software Setup
We tested on a production-grade IBM Storage Ceph cluster, deployed with collocated daemons for RGW, OSDs, MONs, MGRs, and Ingress.
Hardware Specifications
Each test cluster configuration (4-node, 8-node, and 12-node) maintained consistent OSD density (24 per node) and 4 RGW daemons per node, with dedicated VIPs for Ingress-based load balancing.
Software version matrix

Ceph Cluster Configuration

Network Architecture & Hardware Connectivity
To complement the compute/storage setup, our network underpins the cluster's high-throughput performance:
-
Leaf–Spine topology: We’re running a 100 Gbps leaf–spine network with one spine (QFX5120) and three leafs (QFX5120), enabling a scalable, low-latency design. This offers port density now and a future upgrade path (e.g., adding a true spine and repurposing the current one) without impacting performance.
-
Dual 100 Gbps uplinks per server via LACP: Each Ceph node utilizes two 100 Gbps ports on a single NIC, bonded using LACP, to connect to both leaf switches for redundancy and link aggregation.
-
Per-node limit: Each Ceph storage node is equipped with Intel NICs that support a maximum aggregate throughput of 100 Gbps, even though two ports are available and bonded via LACP. This means that per-node throughput is capped at ~12.5 GB/s in optimal conditions.
-
Cluster-wide switching capacity: Our leaf–spine topology, built with one QFX5120 spine and three QFX5120 leaf switches, provides full line-rate connectivity across all twelve storage nodes. Each leaf connects to four nodes and uplinks to the spine at 100 Gbps. This results in a total cluster theoretical switching capacity of ~150 GB/s. In our large-object benchmarks, the system achieved an aggregate throughput of ~111 GB/s, demonstrating that we were reaching the physical network ceiling, particularly for large object read-intensive workloads.
Test Methodology
We designed our performance evaluation to answer foundational questions about how to deploy Ceph Object Gateway (RGW) for both performance and security:
-
What’s the impact of TLS (SSL) on RGW throughput and latency?
-
How much overhead does server-side encryption (SSE-S3/KMS) introduce?
-
Does securing internal daemon communication (msgr v2) affect CPU utilization?
-
How do EC profiles (2+2, 4+2, 8+3) compare to 3x replication?
-
What are the performance implications of using HAProxy-based ingress vs direct access?
-
How does performance scale with node count and concurrency?
Each test case was repeated across PUT and GET workloads with varying object sizes, ranging from 64 KB to 1 GB. Elbencho was used in client-server mode with thread counts of 128(except for the SSE testing that used 64 threads), running up to 8 concurrent clients. Each El Bencho client uses an individual bucket. Buckets were created in advance, using the default sharding count of eleven shards, multipart upload was used for objects larger than 1 GB.

Executive Summary
IBM Storage Ceph demonstrates exceptional performance and flexibility when deployed on cutting-edge, all-flash infrastructure with 100 GbE networking, such as the IBM Storage Ceph Ready Nodes. As enterprises scale to billions of objects and multi-petabyte workloads, Ceph's ability to handle diverse data patterns, from high IOPS, low-latency, metadata-heavy workloads to high-throughput, bandwidth-intensive workloads, becomes critical.
Large Object Workloads (Throughput Focus)
For objects exceeding 32MB, the cluster achieved near-linear scaling up to twelve storage nodes, peaking at an aggregate PUT throughput of 65 GB/s and an aggregate GET throughput of ~115 GB/s. Beyond this point, 100 GbE NIC saturation on individual nodes became the primary constraint. This suggests that future benchmark testing will benefit from higher-bandwidth NICs, as large object workloads still have room to achieve higher throughput results from the current node's available resources.
For large objects, fully in-transit secured configurations (TLS + msgr v2) maintained high throughput with reasonable overhead, demonstrating that Ceph Object Gateway (RGW) is well-suited for secure data pipelines at scale. There is room for performance improvement when also enabling server-side encryption (SSE) to provide object encryption at rest.
Small Object Workloads (IOPS & Latency Focus)
Small object tests (64 KB) demonstrated Ceph Object Gateway (RGW)’s ability to scale IOPS with increasing concurrency and cluster size efficiently. With 64 KB objects, the system achieved up to 391K GET IOPS and 86K PUT IOPS on a twelve-node cluster using erasure coding.
To unlock optimal performance for small-object workloads, especially under high concurrency, it's essential to deploy on infrastructure with robust CPU capacity and generous RGW threading, enabling the Ceph Object Gateway to leverage its parallel processing capabilities fully.
What’s Next
This post introduces the testbed, methodology, and highlights key results. In the upcoming posts of this series, we’ll delve into each performance axis, exploring the impact of TLS and SSE on RGW throughput, scaling behaviors with erasure coding versus replication, how concurrency and daemon density affect latency, and more. You’ll see detailed graphs and architectural guidance drawn directly from production-grade testing. Whether you’re building secure object storage for AI pipelines, backups, or multi-tenant cloud services, stay tuned, there’s much more to uncover. Read Part 2 here.