This article is about the differences and motivations why the recommended deployment for IBM Storage Ceph is on bare metal servers rather then on a virtualized or containerized infrastructure environment.
Note that we install IBM Storage Fusion Data Foundation and Red Hat OpenShift Data Foundation, which contain Ceph, in OpenShift context.
More about this can be read later on in this blog.
First we will a short look into the functional capabilities of IBM Storage Ceph and why it makes a lot of sense to deploy Ceph on bare metal rather than -although possible- on other infrastructure platforms.
IBM Storage Ceph is an open-source software defined, distributed storage system designed for modern data storage needs. IBM Storage Ceph offers an unified storage platform which can provide Amazon AWS compatible S3 object storage, block- and file storage which makes the solution very suitable for a wide range of different types of workloads.
A key strength of IBM Storage Ceph is its ability to scale horizontally. This allows organizations to seamlessly expand their storage capacity by adding Object Storage Daemon nodes (OSDs) to the cluster. This scalability, combined with self-healing capabilities, makes Ceph a very suitable solution for handling of massive data sets in demanding environments.
IBM Storage Ceph offers support for data replication and automated recovery. By employing data replication across multiple Object Storage Daemons (OSDs), fault tolerance and data durability is ensured, even in the event of hardware failures.
IBM Storage Ceph provides a rich set of features such as snapshots, thin provisioning, erasure coding and bucket-policy based tiering to public cloud capabilities for object storage.
This enables users to optimize storage efficiency and reduce costs.
Because of the software defined nature of IBM Storage Ceph, the solution can practically run on different types of underlying infrastructure. Ceph could in example run on bare metal, virtualized on a hypervisor platform and also on a container platform such as Kubernetes. This blogpost will explain the differences at a higher level and will also clarify why bare metal is our recommended way to go with IBM Storage Ceph.
As already mentioned, IBM Storage Ceph has built-in capabilities to address the concept of single point of failure. While it can run on industry standard hardware, the philosophy behind Ceph is that hardware components are subject to failure and will break eventually. Because of this as a given, the software can address hardware shortcomings effectively and also provides remedial action which is referred to as self-healing.
Failures can happen at any single component, server or rack levels, where the solution will remain to be working and ensures data durability. This redundancy is provided from within Ceph software and does not rely on external hardware redundancy in any way.
Bare Metal deployment
Bare metal is the recommend deployment option for IBM Storage Ceph because of the nature of the software defined solution and because of several typical advantages that bare metal deployment brings along in terms of resource utilization and overall performance.
A bare metal deployment will avoid additional abstraction and layering of infrastructure resources by virtual or containerized platforms, between the IBM Storage Ceph solution and underlying resources.
Ceph will perform faster and more efficient while interacting directly with infrastructure components rather than through additional abstraction layers which will introduce latencies in responsiveness and performance in certain ways.
Bare metal deployment allows IBM Storage Ceph to direct access the hardware elements which eliminates performance overhead that applies with hypervisor virtualization layers.
This results in improved IO performance, reduced latency, and better overall system responsiveness.
With a bare metal deployment, resources like CPU, memory, and physical storage (HDD, SSD, NVMe) are dedicated and exclusively available to IBM Storage Ceph. This ensures a consistent and predictable performance.
Within virtualized or containerized environments, resources are shared among multiple virtual machines or containers which potentially affects the performance of IBM Storage Ceph.
IBM Storage Ceph relies on fast, direct and low latency network communication between cluster components and nodes. Bare metal allows fine-grained control over network configuration, including low latency and high-bandwidth network connections. This optimizes network performance and minimizes potential bottlenecks which could arise in virtual or containerized environments where networking is abstracted and layered.
A bare metal deployment simplifies the management and administration of an IBM Storage Ceph cluster while it avoids additional complexity of a virtualized or containerized layer in between. It allows for direct access to hardware monitoring and management tools.
This results in easier troubleshooting and monitoring.
Because IBM Storage Ceph is a software defined storage solution, the solution can run agnostic from the underlying infrastructure platform. We however don’t recommend running IBM Storage Ceph virtualized for large scale-out environments.
It is important to realize that virtualized IBM Storage Ceph can be suitable for certain use cases like in example development, testing- or small environments.
For deployments that require high performance, reliability and scalability, a bare metal deployment is generally most recommended, to maximize its potential and ensure optimal performance and data integrity, as explained earlier, here in this blog. The main attention areas in this type of deployment are the elements of layering and latencies which will occur as a result. Scaling to a larger clustersize will be limited by resource bounderies and/or constraints of the virtual platform.
IBM Storage Ceph is built for massive scale, which would be more difficult to achieve to considerable numbers in a virtualized machines environment, where the Object Storage Daemon nodes would run on top of hypervisor hosts with typical shared resources.
How about IBM Storage Fusion Data Foundation and Red Hat OpenShift Data Foundation?
In this situation, IBM Storage Ceph is a major part of the Data Foundation solution, which runs from an operator-based setup on top of Red Hat OpenShift.
How does that relate to the recommendation of running IBM Storage Ceph on bare metal?
In this configuration, IBM Storage Ceph is a component of the IBM Storage Fusion Data Foundation solution.
There is no complexity involved in the installation, tooling and capabilities for seamless management and monitoring are not required to be administrated in a manual sense, because this all of that is abstracted and managed by OpenShift operators, without any further need for manual intervention.
The Rook operator will take care of the cluster setup and volume management rather then an user directly. This situation does not concern a standalone storage solution for production usage, but a provider of persistent storage for containers instead, which runs on top of the container platform and uses underlying physical storage resources.
For this purpose, alongside with OpenShift Disaster recovery capabilities and other data foundation functionalities, IBM Storage Ceph is a solution component rather than a standalone production storage cluster. Similar functionality, but a different use case and purpose.
Data Foundation is a purpose built, containerized container storage solution with specific OpenShift related functionality platform, which is something else compared to a scale-out IBM Storage Ceph cluster for other use patterns.
IBM Storage Fusion Data Foundation and Red Hat OpenShift Data Foundation are ideal for quick and easy implementation of persistent platform storage. This is where we deploy IBM Storage Ceph containerized.
However, if there are higher performance or scalability requirements, we recommend clients to deploy a dedicated IBM Storage Ceph cluster, which runs on bare metal rather then virtualized. IBM Storage Ceph allows you the flexibility to make either choice, depending on business needs and situation dependencies.
More information about IBM Storage Ceph
IBM Storage Ceph product website
IBM Storage Ceph product documentation
Data matters. When planning high performance infrastructure for new or existing applications it’s easy to focus on compute resources and applications without proper planning for the data that will drive the results for the applications. Our products are all about solving hard problems faster with data.
IBM helps customers achieve business value with a clear data strategy. Our strategy is simple, unlock data to speed innovation, de-risk data to bring business resilience and help customers adopt green data to bring cost and energy efficiencies. Value needs to be delivered by connecting the multiple organizational data sources with business drivers to create business value that mean something to the organization. Many organizations focus on a single driver with a storage solution, but the best solution is driven by an infrastructure strategy than can accomplish most if not all the drivers for maximum benefits. Our story is not just about another storage product but is about innovation and a storage portfolio that is powered by our global data platform.