I have been “doing” enterprise storage for the better part of 30 years. I can remember creating highly available Unix clusters by bending 68 wire SCSI cables in a rack to connect dual-ported SCSI devices to servers. We played with SCSI switches to connect more than two servers to the storage device, but switching a protocol designed for point-to-point communications has limitations, not to mention distance limitations of SCSI.
Along came the Fibre Channel (FC) protocol, first in arbitrated loop, then supporting switching. This allowed us to easily connect multiple servers to a shared storage device (also eliminated those 68 wire cables). FC provided better storage economics, sharing storage eliminated stranded capacity in servers, simplified capacity planning and allowed us to push data protection functions (local and remote replication) into the storage system.
While providing better availability, FC didn’t change the way we deployed or managed applications. At the time, it was a strength of FC, it was invisible to applications other than an occasional pause to create a clean copy for backup via a snap. This made shared Storage Area Network (SAN) block storage easy to adopt into data center operations.
Then came server virtualization. With Virtual Machines (VM), multiple applications could be deployed on a server and could be moved dynamically between servers in the “cluster”. This allowed operations to better align server resources to application requirements, adjust for changing requirements, and improve application availability. To get the full value of VMs, one had to deploy shared storage either SAN or Network-Attached Storage (NAS – aka shared file systems).
Next up in the storage evolution was Software Defined Storage (SDS) which separated the storage control plain from the physical data devices. Unbinding hardware from software provided more flexibility in data storage architectures. Customers could source their data management software separately from the hardware, helping to drive down costs. Alternatively, storage resources (drives) in the application servers can be utilized reducing the hardware support requirements: networks, power supplies, fans, etc. (Like the old days without the limitations of “captive storge”.)
Now the storage devices in servers could be used as shared resources in a virtualized cluster. SDS was very popular in OpenStack deployments. Based on Red Hat telemetry data about 50% of production OpenStack deployments use Ceph, an open source SDS. Linux Kernel-based Virtual Machine (KVM) is often deployed on Gluster, another open source SDS. SDS hasn’t been as popular in VMware deployments.
Similar to SAN, SDS didn’t change the way applications were deployed. The server virtualization layer sat on top of the SDS layer using the same provisioning methods and interfaces that were used in the SAN and NAS days.
Enter Digital Transformation and Containers
Kubernetes containers have significantly changed the way we build and deploy applications. Containers are ephemeral, designed to be started and stopped. With VMs, like servers, we avoided shutting them down to the point that we made major investments to develop infrastructure to support live-migration between physical hosts. Though containers are ephemeral, applications still require their data to persist the stop/start process. Kubernetes has defined constructs to manage persistent data in an ephemeral world.
Another key factor with Kubernetes is granularity and scale. “Cloud-native” development breaks complex processes into microservices deployed as containers. Where we had 10’s to 100’s of VMs, we now have 100’s to 1000’s of containers in an application stack. Many (most) of these containers don’t need persistent storage, but many do, stressing traditional storage provisioning methodologies.
A third factor is the rate-of-change of the application code. Adopting Continuous Integration / Continuous Delivery (CI/CD) enables constant application upgrades. Data management and storage functions must keep up with this rate of change. Traditional, “submit a ticket and wait for storage to be provisioned” is unacceptable in the fast-changing environment created by digital transformation.
Hardware independence is taken for granted in Kubernetes environments. Kubernetes runs ‘anywhere’ Linux runs, so essentially Kubernetes runs ‘anywhere’. To maintain the flexibility and portability of container applications, storage resources need to be as hardware independent as practical. So, if Kubernetes is deployed on VMware, Hyper-scaler cloud provider, bare metal, KVM, etc. the storage solution needs to provide the same operations and similar behavior given the limitations of the underlying infrastructure. (An application will not experience the same performance potential deployed on a system running on NVMe devices when moved to one running HDDs.)
Realize the true value of Software Defined Storage
These attributes require a different storage deployment paradigm, one that supports the granularity and dynamic nature of cloud-native deployments. Kubernetes has built in primitives for defining, requesting, and managing storage resources (CSI, PVs, etc.). It is critical that the physical storage layer works with these constructs, is tightly integrated into the Kubernetes control plane, and works well with CI/CD type development processes (dynamic, self-service provisioning).
Software defined storage meets the hardware independence requirement of OpenShift, providing the ability to run applications on wide variety of platforms from mainframes to clouds and in many cases can extend the useful life of existing physical storage assets. Further, since SDS solutions were designed to be flexible in the beginning, it is straight-forward to tightly integrate into Kubernetes resource management constructs.
The Rook open-source project provides a Container Native Storage (CNS) solution for file, block, and object storage that runs in Kubernetes and leverages mature Ceph storage management. A key component of Rook/Ceph is that you can utilize your existing block storage infrastructure. So, if you are looking to deploy Kubernetes on VMware with VSAN or on physical servers with a SAN, Rook/Ceph can use the block devices as the physical storage layer, or if you are deploying on a cloud like AWS, you can use EBS volumes. In all cases the Rook/Ceph solution hides the hardware idiosyncrasies presenting a common experience for application developers and Kubernetes administrators.
Better yet, if you have chosen to use OpenShift from Red Hat, the industry leading Kubernetes distribution, you can get fully supported Rook/Ceph in IBM Storage Fusion but… Fusion offers a lot more than primary storage:
• The Multi-Cloud Gateway, based on the Noobaa project, provides programmatic access to external S3 object storage resources.
• AN OpenShift native backup solution with centralized management for all of you OpenShift environments.
• Supported RamanDR operators to provide integrated namespace-level disaster recovery for your critical applications.
For more on IBM Storage Fusion look at other material in the Storage Fusion Community Page or the IBM Storage Fusion Page.
As history as shown, storage technology has been ahead of application deployment methodologies. Software Defined Storage has been around for years and has been adopted by many organizations, realizing hardware and operational efficiencies. The drive to adopt microservices and container-now based application deployments makes “now” the time to adopt SDS to simplify and accelerate the benefits of digital transformation. Building Kubernetes on traditional SAN or NAS infrastructure will work, for now, but it will be just a matter of time before these technologies limit your ability to realize the full benefits of hybrid- and multi-cloud infrastructure.