IBM Storage Ceph

Connect, collaborate, and share expertise on IBM Storage Ceph

View Only

Back to Blog List

Choosing the Right Data Protection Strategies For Your IBM Storage Ceph Deployments

By Anthony D'Atri posted Wed May 21, 2025 05:37 PM

Choosing the Right Data Protection Strategies

for your IBM Storage Ceph Deployments

Choosing data protection strategy can be complicated

Multiple factors and tradeoffs

Usable to raw capacity ratio
Replication vs erasure coding
EC profile values for k and m
Read and Write performance
Recovery performance
Failure domains
Fault tolerance
Media saturation
min_alloc_sizevs IU

Replication vs Erasure Coding

Replication

Three (usually) copies of all data
Fast and simple
As few as 3 nodes or racks
Well-suited to small and/or hot S3 / CephFS objects and metadata pools
High space amplification
- Ratio of raw capacity to usable capacity
- Default 3 copies: space amp factor of 3
- Other values are possible: 1 and 2 are dangerous if your data isn't disposable
- Stretch mode clusters require 4

Erasure Coding (EC)

Splits data into chunks
Computes and stores additional parity chunks
Akin to RAID6 / RAID60 but more flexible
Requires additional CPU for parity calculation
Usually requires more hosts/racks than Replication
Less space amp
EC is often selected to maximize usable capacity for a given amount of raw underlying storage
Or to minimize raw capacity needed for desired usable capacity
More TB/RU, TB/node, TB/watt
Extra IOPs on underlying drives: HDDs saturate
Slower recovery; client writes may have higher latency and the cluster have lowered redundancy until recovery completes, which means that the risk of data unavailability or loss in the event of an overlapping failure
May burn SSD endurance more quickly (though this is mostly FUD) (mostly)

EC Profile

EC pools have attributes: the most important are Kand M
K is the number of data chunks: when K=4, 1MB of user data is split into 4 data chunks @ 256KB
M is the number of coding chunks, which are the same size as data chunks
When M=2, 1MB of user data generates 2 coding chunks @ 256KB
We call the above a 4+2 or 4,2 profile
Space amp factor here is just 1.5
Diminishing returns with large values ofK
Space amp factor is (K+M)/K
As K increases beyond, say, 6, the incremental space saving of larger values quickly declines
At the cost of more IOPS, slower recovery, and scrubbing
Subtly, consider relative vs absolute space amp. In the below table the absolute difference between 8+2 and 4+2 is 25%, but the relative difference is just 16%.
You can select a profile with M=1
But. Don't. Just ... Don't.
Unless you can afford to lose data
This is not facetious: sometimes data can be reconstructed or is only a scratchpad.
High risk of data loss
Data is unavailable if even one host / rack is down for maintenance

Selecting K

Tradeoff of space amp, performance, fault tolerance
Data is preserved if any K data or coding chunks survive
Data is available for reads and writes when any K+1 data or coding chunks are online
There are durability and availability benefits of larger values of K: K=3 offers a significant improvement over K=2, beyond 3 there are -- you guessed it -- rapid diminishing returns on the timescale of cosmological heat-death.
Many admins choose K=2. As drive sizes increase with a concomitant increase in MTTR, the risk of data unavailability or loss due to overlapping failures increases.
Consider how long a cluster will take to recover from the failure of a single 30TB HDD or 122TB SSD.
Now consider the recovery time when an entire host of those halts and catches fire.
3+3, 6+3 are examples of profiles with higher fault tolerance at the expense of performance and space amp.
Decision factors include OSD size and media, network resources, and how existential a threat partial data loss would be.

When to choose Replication or EC

EC is best across at least M+K+1 failure domains
You can do EC on fewer, but it's tricky
Strategy is configured at pool creation
Can't switch between the two later, so choose .... wisely
Some pools require replication:
- RGW index,CephFS metadata
- RBD pools are almost always replicated
- RBD using EC is possible with recent IBM Storage Ceph releases
  - Requires a small adjunct replicated metadata pool
  - Inherently higher latency is usually unacceptable

EC for CephFS and RGW data pools

Applications are often less sensitive to latency than with block storage
Driven by space amp: cost and density over performance
Read throughput can even be higher than with replication, in limited circumstances
Applications are often less sensitive to latency

Multiple RGW data pools

EC data pools can result in larger space amp for very small objects (this should improve in IBM Storage Ceph 9)
Small / hot RGW objects benefit from replicated pool performance
User objects in a secondary data pool still have a small HEAD RADOS object in the default pool
Larger / tepid / cold objects often are fine on an EC pool for efficiency

Mixed media RGW + CephFS data pools

Best of both worlds
Most capacity of fast TLC SSDs for replicated metadata + tiny / hot objects
Possible inlining of small user objects / files
Dense, cost-effective HDD for EC data pools, but spinners gonna seek
Dense pTLC or QLC SSDs for EC data pools: up to 122TB today, larger on the horizon
When using coarse-IU pTLC or QLC SSDs, set bluestore_use_optimal_io_size_for_min_alloc_size = true before OSD creation. These media are fantastic for object and file storage, not great choices for RBD pools or CephFS / RGW metadata pools.
PCIe Gen 4/5/6 allow huge NVMe capacity without bottlenecks

May the Tentacles be with You

This post was requested by IBMer Greg Deffenbaugh and adapted from a deck presented at Ceph Day Seattle 2025

0 comments

5 views

Permalink

https://community.ibm.com/community/user/blogs/anthony-datri/2025/05/21/ceph-choosing-the-right-data-protection

IBM Storage Ceph

IBM Storage Ceph

Choosing the Right Data Protection Strategies For Your IBM Storage Ceph Deployments

By Anthony D'Atri posted Wed May 21, 2025 05:37 PM

Choosing the Right Data Protection Strategies

for your IBM Storage Ceph Deployments

Choosing data protection strategy can be complicated

Multiple factors and tradeoffs

Replication vs Erasure Coding

Replication

Erasure Coding (EC)

EC Profile

You can select a profile with M=1

Selecting K

When to choose Replication or EC

EC for CephFS and RGW data pools

Multiple RGW data pools

Mixed media RGW + CephFS data pools

May the Tentacles be with You

Permalink

Additional
Resources

Office

Quick Links

IBM Storage Ceph

IBM Storage Ceph

Choosing the Right Data Protection Strategies For Your IBM Storage Ceph Deployments

By Anthony D'Atri posted Wed May 21, 2025 05:37 PM

Choosing the Right Data Protection Strategies

for your IBM Storage Ceph Deployments

Choosing data protection strategy can be complicated

Multiple factors and tradeoffs

Replication vs Erasure Coding

Replication

Erasure Coding (EC)

EC Profile

You can select a profile with M=1

Selecting K

When to choose Replication or EC

EC for CephFS and RGW data pools

Multiple RGW data pools

Mixed media RGW + CephFS data pools

May the Tentacles be with You

Permalink

Additional Resources

Office

Quick Links

Additional
Resources