Introduction
In a previous blog, we looked at a special case of the Fusion Recipe exec hook – running K8s
or oc
commands during data protection workflows. In this article, we will look at another special feature of exec hook, where we can undo certain operations in the event of backup or restore workflow failures.
Background
Like any other application or resource failure in a container environment, certain backup or restore operations may fail and can impact the normal functioning of user applications. Sometime, these failures can be severe, and can lead to downtime issues for the application.
Let’s assume you have a MongoDB database that was locked during the backup snapshot operation to ensure an application-consistent snapshot, and for some reason(s), the snapshot operation has failed. In this scenario, the Fusion backup job will be reported as failed, with the database locked and halting write operations.
Fusion Recipe has a built-in capability to undo previous operations if the inverseOp
field is specified in the event of failures.
How to undo an operation in case of a failure in the Fusion Recipe workflow
For the above-mentioned MongoDB database scenario, the initial Fusion Recipe to backup the application is:
...
hooks:
- name: mongodb-pod-exec
labelSelector: app=mongodb
timeout: 300
namespace: ${GROUP.mongodb-resources.namespace}
onError: fail
ops:
- command: >
["/bin/bash", "-c", "mongosh -u `printenv MONGO_INITDB_ROOT_USERNAME` -p `printenv MONGO_INITDB_ROOT_PASSWORD` --eval \"db.fsyncLock()\""]
container: mongodb
timeout: 300
name: fsyncLock
onError: fail
- command: >
["/bin/bash", "-c", "mongosh -u `printenv MONGO_INITDB_ROOT_USERNAME` -p `printenv MONGO_INITDB_ROOT_PASSWORD` --eval \"db.fsyncUnlock()\""]
container: mongodb
timeout: 300
name: fsyncUnlock
onError: fail
selectResource: pod
type: exec
workflows:
- name: backup
sequence:
...
- hook: mongodb-pod-exec/fsyncLock
- group: mongodb-volumes
- hook: mongodb-pod-exec/fsyncUnlock
In the above snippet, the database is locked before the backup snapshot operation and subsequently unlocked to make it usable by the user. However, what happens if the snapshot operation fails? The Recipe execution will be aborted, leaving the database locked and resulting in downtime for the application. How can we address this situation? Fortunately, we can specify an undo operation in the inverseOp
field of the Fusion Recipe.
To ensure that the database remains unlocked in the event of failures, we can set the inverseOp
field as follows:
...
hooks:
- name: mongodb-pod-exec
labelSelector: app=mongodb
timeout: 300
namespace: ${GROUP.mongodb-resources.namespace}
onError: fail
ops:
- command: >
["/bin/bash", "-c", "mongosh -u `printenv MONGO_INITDB_ROOT_USERNAME` -p `printenv MONGO_INITDB_ROOT_PASSWORD` --eval \"db.fsyncLock()\""]
container: mongodb
timeout: 300
name: fsyncLock
onError: fail
inverseOp: fsyncUnlock
- command: >
["/bin/bash", "-c", "mongosh -u `printenv MONGO_INITDB_ROOT_USERNAME` -p `printenv MONGO_INITDB_ROOT_PASSWORD` --eval \"db.fsyncUnlock()\""]
container: mongodb
timeout: 300
name: fsyncUnlock
onError: fail
selectResource: pod
type: exec
workflows:
- name: backup
sequence:
...
- hook: mongodb-pod-exec/fsyncLock
- group: mongodb-volumes
- hook: mongodb-pod-exec/fsyncUnlock
Conclusion
In this article, we have seen how to use Fusion Recipe inverseOp
capability to overcome a bigger issue of application usability in the data protection workflows. In the next blog will explore some other aspect of Fusion Recipes.