The intent of this article is to explain how the backup and restore solution on IBM Spectrum Fusion hyperconverged infrastructure appliance is implemented. The narration is data centric, focusing on what persistent data is, how to identify it, its sources in the appliance, and how to create policies for backup and restore.
Who should read this article?
This article is intended for:
- Architects or administrators who define and create backup policies, identify persistent data of containerized workloads or applications and that of the appliance to be protected.
- Developers and DevOps engineers who take production snapshots and perform backups using pre-created backup policies.
- Test engineers who want to take snapshots for creating test data sets
Use of backup and restore functionality generally requires elevated privileges and its implied that the different personas have the needed privileges for their intended use cases.
How it works?
This section provides a broad understanding of how the backup restore solution in IBM Spectrum Fusion is architected with a focus on concepts and decisions rather than individual steps or commands. This we believe would help different personas understand how to setup backup schedules and what to backup.
Persistent data is commonly defined as data that cannot be redeployed if it is lost; if lost, it is lost forever. It is imperative to ensure that the persistent data is not only captured, but is captured and backed up in a manner that is usable. As we look at a cross section of the Fusion stack including the underlying hardware, it is important to identify what data needs to be categorised as persistent, why that data needs to be backed up and how it is achieved in IBM Spectrum Fusion. The sources of persistent data in IBM Spectrum Fusion are the OpenShift Container Platform (OCP) cluster, IBM Spectrum Scale file system that is providing the container native storage, and the underlying rack’s hardware configuration such as that of the rack switches.
At the heart of the backup solution on IBM Spectrum Fusion is IBM Spectrum Protect Plus. IBM Spectrum Protect Plus focuses on protecting persistent data within OpenShift Container Platform (OCP) cluster.
Backup of OCP Cluster data.
In an OCP cluster, the persistent volumes (PVs) used by stateful containers and the Kubernetes resources are the persistent data that needs to be backed up. The Kubernetes resources in OCP can be either namespace scoped or cluster scoped. When identifying the data that needs to be backed up for an application deployed on OCP, we need to consider the PVs that are used by the application, the namespace scoped resources and the cluster scoped resources that are deployed and used by the application. A backup of a combination of these data would ensure that the application’s state can be successfully persevered when a backup is taken and can be restored when needed.
Backup of PVs is accomplished by IBM Spectrum Protect Plus by integrating with the Container Storage Interface (CSI) API layer of OpenShift which is implemented by IBM Spectrum Scale. IBM Spectrum Protect Plus makes API calls to create a point-in-time consistent snapshot of the CSI volume, mounts the snapshot on an alternative container (known as a Data Mover) that is part of IBM Spectrum Protect Plus and transfers the data to the vSnap server for storage. The backup Service Level Agreements (SLAs), also known as backup policies in IBM Spectrum Protect Plus, provide control for scheduling, snapshot retention, and copy to vSnap server. If the original volume is damaged or lost, the snapshot or copy backups on the vSnap servers can facilitate recovery. This design allows grouping of OCP application data for backup using OCP’s native capability like namespaces and provides a holistic view of PV recovery using snapshots preserved on vSnap server.
Backup of system namespaces in OCP.
A follow-up to application workload backup is the backup of system namespaces in OCP. The system namespaces referred here are the namespaces that being with the name OpenShift-* and Kube-* . In a typical deployment scenario, an application deployed in its own namespace interacts and uses the resources in the system namespaces including both the Kubernetes resources and PVs. So, when taking an application backup, it sometimes is necessary to take a backup up of these resources in system namespaces to ensure that the state of the application is fully captured.
Backup of IBM Spectrum Scale file system metadata.
Backup of IBM Spectrum Scale file system metadata is essential to reconstruct the IBM Spectrum Scale filesystem in an event the file system is corrupted or in disaster recovery scenarios when an appliance is being rebuilt from scratch.
Backup of IBM Spectrum Scale file system metadata is handled by backupmanager
Custom Resource (CR) provided by the IBM Spectrum Fusion management stack. The backups are taken daily based on a configurable schedule and copied to the same vSnap server that is configured in SPP.
Backup of Hardware configuration.
Backup of hardware configuration is essential to reconfigure the hardware when the configuration is corrupted or in disaster recovery scenarios when an appliance is being rebuilt from scratch.
Similar to file system metadata, the backup of rack hardware configuration is managed by the backupmanager
Custom Resource. The hardware backup includes the four switches on a IBM Spectrum Fusion rack which are two management switches and two high speed switches.
Default backup schedules in IBM Spectrum Fusion
Default backup schedules are created and configured for all categories of persistent data after the installation of IBM Spectrum Fusion management stack. These schedules ensure that important data are backed up from day one and also serve as a reference template for creating further backup schedules especially for workload applications.
The following table lists the default backup schedules available on IBM Spectrum Fusion:
||SLA in IBM Spectrum Protect Plus
|Cluster scoped resources
|Fusion management stack namespace scoped resources
|System namespace scoped resources
|IBM Spectrum Protect Plus catalog data
|Network switch hardware configuration data
||triggerSchedule in backupmanager CR
|IBM Spectrum Scale file system metadata
||triggerSchedule in backupmanager CR
The default backup schedules can be edited to accommodate further resources to be backed up and to alter schedules to run concurrently and the data retention period.
Backing up an application deployed on OCP
In this section, we look at how to backup an application including creating a new backup schedule.
Creating backup SLA
The first step in planning for an application backup is to create a backup SLA for your application. A backup SLA has the schedule based on which the backups are taken and details of the backup location. The backup location is either a snapshot (data saved locally on the same OCP cluster) or a remote backup (data copied to the configured remote vSnap server)
1. Access the Fusion UI and click on the launch out in the Backups widget.
2. Login to the IBM Spectrum Protect Plus UI
3. Access the Manage Protection > Policy Overview menu and click on “Add SLA Policy”
4. Create an SLA with the desired backup schedule
The next step in creating an application backup is to identify the resources and add them to the backup SLA created earlier. These resources are the cluster scoped resources used by the applications and namespace scoped resources including the Kubernetes resources and PVs. The resources can also be added searched and added using a label. If all the needed resources have unique label, then these resources can be searched and added based on the label
1. Access the cluster resources from Manage Protection > Containers > OpenShift menu in IBM Spectrum Protect Plus user interface.
2. Navigate to the identified resource that needs to be backed up, select the resource, and click on “Select an SLA policy”.
3. In the SLA policy list, select the SLA policy created earlier.
Monitoring backup jobs
After the SLAs are setup and resources are added, the backups are taken as per the schedule and you need to monitor the backup jobs to ensure that are run periodically on schedule and are successful.
Navigate to the “Jobs and Operations” menu to view the scheduled jobs and to check if they are running correctly. Once the scheduled job has started, it will appear in the “Running Jobs” tab.
After a job has completed, it will appear in the “Job History” tab. Access the “Job History” tab to view the list of completed jobs. Completed jobs include both the failed and successful jobs.
Creating restore jobs
A precursor to running a restore job is to identify what part of the data is corrupted and needs to be restored. Restore can be run independently for PVs and namespace scoped or cluster scoped resources as required. There isn’t a blanket rule for doing restores and its assumed that all due diligence and prudence has been employed before doing a restore.
1. To create a restore job, click on the “Create a job” button in the “Jobs and Operations” menu.
4. Select the source from which the restore has to be done.
8. Select the restore point; vary the time range as required to see all restore points.
10. If a PV has been selected, specify a new name for PV if required. A new PV name is required if a PV of the same name already exists. If a new name is not required, it is left blank.
11. Select the destination storage class and namespace.
12. Select the job options.
13. Review the selected options and click “Submit” to trigger the restore job.
After the restore job has been submitted, it can be monitored under “Jobs and Operations” menu.
Viewing and editing the backup schedule in the backupmanager CR
CR in IBM Spectrum Fusion is responsible for taking backups of IBM Spectrum Scale file system metadata and hardware switch configuration backups. The backup schedule is independently driven by the cron expression in the CR spec. The cron expression can be modified to change the backup schedule as needed. Refer to this section in KC link
for more information on how to edit the cron expression.
Viewing the backup schedule from OpenShift console
1. Access the backupmanager CR in OpenShift console from Administration > CustomResourceDefinitions > controlplanebackups > Instances.
2. Select the YAML tab.
The backup schedule is driven by the cron expression in the triggerSchedule field.
Backups are vital in any production environments. A judicious backup in terms of the persistent data being backed up and the schedule with which it is being backed up is essential for a quick recovery and minimal downtime when a disaster situation, the ‘unwanted guest’, comes knocking.
Additional information and references
Reference link to KC https://www.ibm.com/docs/en/spectrum-fusion/2.1?topic=system-back-up-restore
Sincere thanks to Hugh Hocket, Shajeer Mohammed, Pruthvi T d, Anilkumar Hegde and Shyamala Rajagopalan for helping review the article and running through the steps.