Storage Fusion

 View Only

How IBM Storage Fusion Simplifies Backup and Recovery of Complex OpenShift Applications - Part 3 “A Deeper Dive into Recipes"

By Jim Smith posted Mon August 07, 2023 04:17 PM

  

By @Sandeep Prajapati and @Jim Smith

In the previous articles we examined why providing data protection continues to play a crucial, central role for complex application platforms which rely on stateful, persistent data storage like IBM Cloud Pak for Data (article). We also discussed IBM Storage Fusion’s feature to orchestrate backup and recovery operations through recipes to create repeatable backup and restore operations for complex applications (article).

In this final article we will take a closer look at the anatomy of a recipe by using an example of backing up and restoring a complex application.  We will use the example of EnterpriseDB (EDB) again to illustrate the concepts in a recipe. As a review, the three basic elements of a recipe are:

  • groups – A set of resources or PVCs that are processed as a unit in a backup or restore workflow.
  • hooks – A specific action taken during a backup or restore operation to facilitate the backup or restore operation.
  • workflows – An explicit set of actions that constitute a backup or restore operation based on the groups and hooks directives.

In the last article we introduced the general concept of resource groups and discussed the diagram shown above.  In the example in the diagram, there are a set of volumes (persistent data) that need to be protected, a set of OpenShift custom resources that need to be protected, and a set of OpenShift custom resources that do not need to be protected.  Let’s start by defining two groups to represent the resources that need to be protected.

Groups

Let's look at a specific implementation of a group:

groups:

    - name: edb-volumes

      type: volume

      includedNamespaces:

        - edb

      labelSelector: velero.io/exclude-from-backup!=true

    - name: edb-resources

      type: resource

      includedNamespaces:

        - edb

      includeClusterResources: true

      excludedResourceTypes:

        - pods

        - replicasets

        - deployments

        - services

        - clusterserviceversions

The first part of this recipe defines the groups construct that we discussed in the last article. You can see in this example that we have explicitly defined two groups:

  • A volume group which defines the persistent volumes (PVC) to backup.  In this example you can see that we are requesting that all of the volumes in the “edb” namespace are protected as long as they don’t have the label that indicates that they should be excluded (exclude-from-backup!=true).
  • A resource group that backs up all of the resources in the “edb” namespace except for the excluded types of resources (pods, replicatsets, deployments, services, and clusterserverversions)

Hooks

The next workflow constructs we will define are hooks.  As described in the last article, the hooks are actions that facilitate the backup and recovery processes.  Let’s look at two hooks defined in the recipe in more detail:

  hooks:

  - name: clusters-pod-exec

    type: exec

    namespace: edb

    labelSelector: k8s.enterprisedb.io/podRole=instance,role=primary

    singlePodOnly: true

    timeout: 120

    onError: fail

    ops:

    - name: checkpoint

      command: "psql -c CHECKPOINT -U postgres"

      container: postgres

      timeout: 60

The first hook defined above is an execution (exec) hook. The hook is called “clusters-pod-exec’ and there is one action defined which runs a PostgreSQL checkpoint. The hook further specifies which pod or pods that the hook should be executed in; in this example you can see we are requesting that the hook is executed in the pod that has a label that indicates it's role is primary (podRole=instance,role=primary) and that this hook only executes in one pod (singlePodOnly) if more than one pod have this label. We will be using this hook during the backup operation to put the data into a consistent state before the volume snapshot operation.

There is more information in the hook definition that defines properties such as timeout and actions to perform on failure. For detailed information about all the field definitions please refer to IBM Storage Fusion documentation.

Let's examine the second hook:

  hooks:

  - name: postgresql-operator-controller-manager-check

    type: check

    namespace: edb

    selectResource: deployment

    labelSelector: app.kubernetes.io/name=cloud-native-postgresql

    timeout: 120

    onError: fail

    chks:

    - name: replicasReady

      timeout: 180

      onError: fail

      condition: "{$.spec.replicas} == {$.status.readyReplicas}"

The next hook (defined above) is a check hook that will check to make sure that the deployment replica count matches what is indicated in the specification (condition). We will use this check hook on the restore workflow to ensure the deployment is restored correctly.

Workflows

Let's now look at the third element of the recipe called workflows in more detail.

  workflows:

  - name: backup

    sequence:

    - group: edb-resources

    - hook: clusters-pod-exec/checkpoint

    - group: edb-volumes

  - name: restore

    sequence:

    - group: edb-volumes

    - group: edb-resources

    - hook: postgresql-operator-controller-manager-check/replicasReady

The workflow creates an ordered execution of the actions defined we defined in the groups and the hooks. There are two workflows by default in a recipe, one for backup and one for restore. The backup sequence can be ascertained directly from the workflow definition:

  • Backup the cluster resources as defined in the group “edb-resource".
  • Execute the “checkpoint” command in the “clusters-pod-exec” hook to put the data into a consistent state.
  • Take a snapshot of the persistent volumes as defined in the group “edb-volumes”.

Note that for some application types such as IBM Db2 you will also define a hook to be run after the snapshot of the persistent volumes. This is typical for databases that have a mechanism to suspend I/O before the snapshot operation and then need to be instructed  to resume I/O operations after the snapshot.

After the workflow is executed, IBM Storage Fusion will perform the data movement to object storage. The snapshots will be mounted by the Backup & restore service and the data will be copied to object storage via S3.  This processing all takes place after execution of the backup recipe.


The restore sequence reverses the main order of operations:

  • The persistent volumes as defined in the group “edb-volumes” are first restored from object storage via S3.
  • The resources as defined in the group “edb-recourses” are restored.
  • The “replicasReady” check is invoked to ensure the desired number of deployment replicas are present on the system.

 

Final specifications

We will complete the specification of the recipe resource with two more important pieces of information.

apiVersion: spp-data-protection.isf.ibm.com/v1alpha1

kind: Recipe

metadata:

  name: edb-backup-restore-recipe

  namespace: ibm-spectrum-fusion-ns

spec:

  appType: edbcluster

First we give our recipe a name and then we declare which namespace to store the recipe. For the present, we recommend that the recipes are kept in a common namespace and not with the application you are protecting. In this way you will have a record of the recipe in case you lose the protected applications namespace.  Note this is not a strict requirement. IBM Storage Fusion will keep a copy of the recipe used for backup as metadata so that it can use the same recipe on restore, even if you inadvertently delete or lose the recipe on the local cluster.

Note that there is also a specification called an “appType” which is not currently used in IBM Storage Fusion and can be set to any value.

Taking the recipe live

Now that you have a recipe defined, it is time to associate it with the backup operation. In IBM Storage Fusion, there is a resource that defines the application (Application) and a backup policy (BackupPolicy) that defines the basic backup policy constructs (when does the backup run, where is the data stored, how long is the data kept, etc.).  Applications are associated with backup policy through a backup policy assignment (PolicyAssignment) resource. The policy assignment defines the relationship between a single application and a single backup policy.

In order to use a custom recipe for backup and restore, the recipe resource needs to be specified in the PolicyAssignment.  The easiest way to do this is to patch the policy assignment:

oc -n ibm-spectrum-fusion-ns patch policyassignment <policy-assignment-name> --type merge -p '{"spec":{"recipe":{"name":"<recipe-name>", "namespace":"recipe-namespace>", "apiVersion":"spp-data-protection.isf.ibm.com/v1alpha1"}}}'

As you can see, there is allot of application specific knowledge that goes into creating functional recipes. In the near future, the IBM Storage Fusion team will be publishing sample recipes in a public repository for some of the most popular applications such as PostgreSQL, EDB, MongoDB, and IBM Db2.  We are also calling on the community to help develop and share recipes for other custom applications.  As of publication date the repository is not live but you can periodically check the IBM Storage Fusion GitHub  page for more information.

1 comment
53 views

Permalink

Comments

Wed August 09, 2023 02:14 PM

Thanks Jim, your article on recipes are really helpful to the people working on ground.