Sumant Padbidri – IBM Distinguished Engineer, Spectrum Fusion
Running stateful containerized applications in a hybrid cloud is a big part of the digital modernization journey. Businesses are looking for platforms that they can trust deploying their mission critical applications to. As applications scale from on-premises data centers to public clouds to edge locations, they need to ensure secure data availability. Otherwise, vast amounts of data can be stranded on geographic islands out of reach of analytics and AI.
IBM Spectrum Fusion takes on this challenge by providing a container-native data management solution on Red Hat OpenShift, the industry's leading Kubernetes platform. It combines market leading high performance clustered file system and data protection technologies to provide an innovative and differentiated global data fabric. IBM Spectrum Fusion addresses several enterprise-class data management use cases including:
This article introduces IBM Spectrum Fusion's architecture and the capabilities it enables.
IBM Spectrum Fusion is built using a container-native architecture that brings together proven IBM technologies and provides a simple, yet powerful, user experience for application-centric data management. The modular architecture allows for the addition of new and exciting capabilities in the future. Let's take a quick tour of the key components.
A core concept in IBM Spectrum Fusion is an application. It is possible to manage data at an individual PVC level, but managing data at an application level enables powerful capabilities like backing up and restoring applications with consistent data, making copies of an application, migrating an application, etc. While Kubernetes is an application deployment platform, it does not have a consistent way of defining an application. So, IBM Spectrum Fusion provides a custom resource definition that allows you to define the scope of an application. Initially, an application will equate to an OpenShift project (namespace). In the future, more flexible ways of identifying the scope of an application will be provided. All components of the architecture are application aware.
At the heart of IBM Spectrum Fusion is IBM's proven clustered file system. It is a container-native software defined storage solution that provides RWO and RWX storage to containers using Kubernetes-native CSI interfaces. A key differentiator is its truly global namespace. Data can be accessed from anywhere to anywhere without duplication. Efficient caching algorithms ensure high performance even when accessing data in remote regions. It implements a zero-bottleneck architecture that utilizes all available resources in parallel for maximum performance and high availability. It can scale from just a handful of nodes to thousands, so it can keep up with the demands of a growing business.
Providing high availability necessitates creating redundant copies of data. The data storage component provides flexible ways to achieve an optimal balance between redundancy and cost. The HCI edition uses efficient erasure coding algorithms to achieve redundancy while maximizing utilization. A single rack configuration achieves 66% utilization of raw disk capacity while being able to tolerate up to two simultaneous node failures in a 20-node rack. A three-rack configuration achieves 57% efficiency while tolerating the failure of an entire rack. Compare that with a replica-based strategy where utilization is 33% when using three replicas. There are deployment topologies that necessitate using replicas as is used by the software-only edition when deployed across multiple availability zones.
The data storage component is application-aware, so it can take instantaneous snapshots of all the PVs used by an application regardless of the number of storage classes used. This is a very powerful capability that we'll explore in the next section.
Enterprise data management use cases include the ability to periodically backup applications so that they can be restored to a known previous state when needed. IBM Spectrum Fusion includes IBM's proven data protection technology to reliably backup applications. It can create both local backups as well as copy data to an external, even offsite, S3 compliant object store. Local backups can be used to recover quickly (low RTO) while external backups provide protection from disasters (with higher RTO).
A key capability of the data protection component is to backup up a consistent state of an application. There are two ways to get a consistent state of an application, both of which are supported.
The data protection component provides "incremental forever" backup. That means that only the first backup is a full one and all subsequent ones only backup changes. This results in significant performance improvements.
Although a very common use case for data protection is restoring an application to a known previous state, this capability enables several other powerful use cases.
IBM Spectrum Fusion provides Kubernetes-native APIs for data management. All data management capabilities can be accessed via custom resources (CRs) in a declarative way. Kubernetes operators are provided to monitor CRs and implement the desired state. This enables powerful automation capabilities by including data management in CI/CD pipelines and techniques like GitOps.
IBM Spectrum Fusion's graphical user interface (GUI) provides user friendly access to data management capabilities. The GUI is driven by the data management APIs, so the GUI and the APIs can be used interchangeably. A good way to learn about APIs is to configure something in the GUI and inspect the CR that gets created.
Let's look at a typical workflow for backing up and restoring an application using CRs and the GUI.
1) Identify an 'Application' that should be backed up. IBM Spectrum Fusion automatically defines an application for every OpenShift project (namespace).
2) Create a 'Backup Storage Location' that specifies a S3 compatible object store bucket that backups are copied to. You can configure multiple locations.
3) Create a 'Backup Policy' that specifies the schedule for taking backups, retention limits and the storage location to back up to.
cron: 0 12 * * *
4) Create a 'Policy Assignment' that assigns a backup policy to an application.
5) Selecting "run backup now" in the previous step will create a backup as soon as the policy is assigned to the application and thereafter on the schedule specified. You can also request a backup on-demand.
6) Restore the application from a specified backup. Here's what the CR looks like: