Tactical solution – disabling the webhook
The first solution suggested in the Velero documentation is to "…disable the webhooks". The suggestion should be stated more completely as: disable the webhooks and re-attempt the restore operation. Note that this is not guaranteed to work but if you had not previously planned your application backup or recovery to account for the webhooks, this might be the only option you have.
To disable the webhooks, you can edit the webhook resource and set the "failurePolicy" to "ignore". You should note the current setting of the "failurePolicy" so you can set it back to its original value after a successful restore. After you have edited the webhook that was preventing the restore from completing, you should re-submit the restore without cleaning-up the PVC data and resources that were previously restored. Fusion will have to restore the PVCs again but will not restore resources that were already restored. (Note there is a way to instruct Fusion not to restore the PVC data again but that is outside the scope of this article). If you can now successfully restore your application, you can edit the webhook and reset the "failurePolicy" to its original value.
Strategic solutions to properly protecting applications with webhooks - Overview
There are several ways which you can strategically accommodate the recovery of an application with webhooks. We will look at how you can achieve each of these with the Fusion backup recipe infrastructure in detail:
- Backup the webhook resources but during restore make sure webhook server is running before restoring other resources. This is easiest strategic solution to implement.
- Use a transformation to modify the webhook resource before it is restored. (Note this is similar to what the Velero documentation suggests as "a restore item action plugin to modify the resources". This is the most complete solution but requires additional creation of the appropriate transformation resources.
- Exclude the webhook resources from backup and let the controller re-create the webhook server and configurations.
- Patch the webhook resource failurePolicy to Ignore during backup so that it the recovered webhook resource will ignore any issues during the recovery and then set the failurePolicy back to its original value after the recovery.
Strategic solution 1: Ensure Webhook server is running before restoring other resources
This is easiest strategic solution to implement. In this solution we will backup the webhook but during restore ensure that the webhook server is running before restoring the actual webhook resource.
Details:
In this sample workflow, the hook fusion-reference-webhook-deployment-check/replicasReady would be used to ensure that the Webhook server resource is recovered first, and group fusion-reference-resources has backup all namespaces resources including webhook server deployment. Its important to note that webhook resource is been restored after ensuring webhook server is ready.
groups:
- name: fusion-reference-volumes
type: volume
- name: fusion-reference-resources
excludedResourceTypes:
- events
- event.events.k8s.io
- replicaset
- pods
type: resource
- name: fusion-reference-cluster-resources
includeClusterResources: true
includedResourceTypes:
- validatingwebhookconfigurations.admissionregistration.k8s.io
labelSelector: deployment-webhook=true
type: resource
workflows:
- name: restore
sequence:
- group: fusion-reference-volumes
- group: fusion-reference-resources
- hook: fusion-reference-webhook-deployment-check/replicasReady
- hook: mysql-deployment-check/replicasReady
- group: fusion-reference-cluster-resources
Pros:
- Keeps the webhook resource in the backup, ensuring comprehensive backup is achieved.
- Improves resiliency compared to excluding the webhook.
Cons:
- Involves creating a restore workflow (dependency management) to restore the webhook at the appropriate sequence.
Strategic solution 2: Use a transformation to modify the webhook resource before it is restored
This is the most complete solution but requires additional creation of the appropriate transformation resources. A resource transformation is essentially a hook into the restore process which allows the program to modify the contents of a resource in memory before it gets restored to the target cluster. (Note this is similar to what the Velero documentation suggests as "a restore item action plugin to modify the resources")
Steps:
- Add another group for the webhook resource with "restoreOverwriteResources: true" in the Fusion Recipe. This additional group references the original group but differs by having the restoreOverwriteResources and disableTransform flag enabled. Then proceed with taking the backup snapshot
groups:
...
- name: fusion-reference-cluster-resources-overwrite
backupRef: fusion-reference-cluster-resources
restoreOverwriteResources: true
disableTransform: true
includeClusterResources: true
includedResourceTypes:
- validatingwebhookconfigurations.admissionregistration.k8s.io
labelSelector: deployment-webhook=true
type: resource
workflows:
- name: backup
sequence:
- group: fusion-reference-cluster-resources
- group: fusion-reference-resources
- hook: mysql-pod-exec/flush-tables-with-read-lock
- group: fusion-reference-volumes
- name: restore
sequence:
- group: fusion-reference-cluster-resources
- group: fusion-reference-volumes
- group: fusion-reference-resources
- hook: fusion-reference-deployment-check/replicasReady
- hook: fusion-reference-webhook-deployment-check/replicasReady
- hook: mysql-deployment-check/replicasReady
- group: fusion-reference-cluster-resources-overwrite
Note: Flag disableTransform: true disables the transformation effect on webhook resource and helps in restoring the backed up resource.
- Configure and apply Fusion Transform CR for webhook (failurePolicy) transformation - sets webhook "failurePolicy" to "Ignore"
apiVersion: data-protection.isf.ibm.com/v1alpha1
kind: Transform
metadata:
name: transform-deployment-webhook
namespace: ibm-spectrum-fusion-ns
spec:
transforms:
- name: ingress-with-json-patch
json:
- op: test
path: /webhooks/0/failurePolicy
value: Fail
- op: replace
path: /webhooks/0/failurePolicy
value: Ignore
subject:
groupResource: validatingwebhookconfigurations.admissionregistration.k8s.io
labelSelector: 'deployment-webhook=true'
- Update transform resource details and apply the restore CR
apiVersion: data-protection.isf.ibm.com/v1alpha1
kind: Restore
metadata:
name: custom-restore-fusion-reference-1
namespace: ibm-spectrum-fusion-ns
spec:
transform: transform-deployment-webhook
backup: <Backup job name>
recipe:
name: fusion-reference-bnr-recipe
namespace: ibm-spectrum-fusion-ns
- Notice the overwrite sequence in the end. This brings back the original webhook resource
- name: restore
sequence:
- group: fusion-reference-cluster-resources
- group: fusion-reference-volumes
- group: fusion-reference-resources
- hook: fusion-reference-deployment-check/replicasReady
- hook: fusion-reference-webhook-deployment-check/replicasReady
- hook: mysql-deployment-check/replicasReady
- group: fusion-reference-cluster-resources-overwrite
Application of Transform CR in restore scenario clearly demonstrates its importance during recovery i.e. when the webhook resource is already backed up.
Pros:
- There is no need to exclude the webhook resource from backup or manage its sequencing during recovery.
- Restoring the webhook resource at the end with overwrite ensures the original webhook resource configuration is restored. Basically, this overwrites the transformed webhook resource with its backed-up version.
Cons:
- Requires additional transform resource management.
Strategic solution 3: Exclude the webhook from the backup
In this option we will exclude the webhook from backup and let controller re-create the webhook server with configurations. In this situation you must manually apply the webhook configurations after the restore process if there is no controller involved.
Pros:
- This is an easy workaround with minimal complexity.
Cons:
- The webhook is not included in the backup which reduces overall resiliency.
- If no webhook controller is present, manual intervention is required to retrieve webhook resources after application recovery, which increases recovery overhead.
Strategic solution 4: Modify the webhook failurePolicy behaviour
In this option we set the webhook resource "failurePolicy" to Ignore during backup so that the recovered webhook resource will ignore any issues during the recovery. We will then set the failurePolicy back to the original value after the recovery.
Recipe hooks
disable-webhook-set-failurepolicy-ignore
kubectl patch validatingwebhookconfiguration <webhook-name> --type='json' -p='[{"op":"replace","path":"/webhooks/0/failurePolicy","value":"Ignore"}]'
enable-webhook-set-failurepolicy
kubectl patch validatingwebhookconfiguration <webhook-name> --type='json' -p='[{"op":"replace","path":"/webhooks/0/failurePolicy","value":"<Original value>"}]'
Recipe
workflows:
- name: backup
sequence:
- hook: disable-webhook-set-failurepolicy-ignore
- group: fusion-reference-cluster-resources
- hook: enable-webhook-set-failurepolicy
- group: fusion-reference-resources
- hook: mysql-pod-exec/flush-tables-with-read-lock
- group: fusion-reference-volumes
- name: restore
sequence:
- group: fusion-reference-cluster-resources
- group: fusion-reference-volumes
- group: fusion-reference-resources
- hook: fusion-reference-deployment-check/replicasReady
- hook: fusion-reference-webhook-deployment-check/replicasReady
- hook: mysql-deployment-check/replicasReady
- hook: enable-webhook-set-failurepolicy
Pros:
- There is no need to exclude the webhook resource from the backup or manage its placement during recovery.
- Allows user to backup all resources for the application, including webhook resources
Cons:
- Requires a container with sufficient permissions to execute the patch commands on the webhook resource.
- Involves managing permissions and hooks for patching during backup and restore.
Summary
Among the different solutions discussed for recovering applications with webhooks, the first two solutions are more elegant because they retain all application resources in the backup and do not require an additional container for executing hooks to update the “failurePolicy”. Both approaches also eliminate the need for any manual steps during recovery.
Out of these two, the Fusion Transform CR approach (solution 2) is more straightforward, requiring only minimal management of the transformation resource. By following these approaches, one can develop a functional Fusion Recipe workflow for webhook-based application recovery. However, depending on your specific scenario, you may find other solutions outlined to be more suitable in your case.
Feel free to share your experiences with webhook-based application recovery - the challenges you faced and the resolutions applied. Will be happy to learn from your story!
Detailed Webhook failure information
IBM Fusion Recipe framework internally usages Velero for resource processing. Let’s understand the webhook restore problem step by step using a reference application.
Step 1
Let’s assume reference application have a webhook resource which will abort the update or scale of deployment (application) if "spec.replicas" greater than 1.
Webhook resource applied:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
labels:
deployment-webhook: "true"
name: deployment-validator
webhooks:
- admissionReviewVersions:
- v1
clientConfig:
caBundle: <base64-encoded-ca>
service:
name: deployment-webhook
namespace: fusion-reference
path: /validate-deployment
port: 443
failurePolicy: Ignore
matchPolicy: Equivalent
name: deployment-validator.yourdomain.com
namespaceSelector:
matchLabels:
webhook: enabled
rules:
- apiGroups:
- apps
apiVersions:
- v1
operations: [ "CREATE", "UPDATE" ]
resources:
- deployments
- deployments/scale
scope: Namespaced
sideEffects: None
timeoutSeconds: 10
Apply other webhook configuration resources like TLS secret, deployment and services to bring webhook in effect.
Step 2
Modify Fusion Recipe Backup and Restore workflow for reference application. New resource group, backup and restore workflow after modification will be -
spec:
appType: fusion-reference
groups:
...
- name: fusion-reference-cluster-resources
includeClusterResources: true
includedResourceTypes:
- validatingwebhookconfigurations.admissionregistration.k8s.io
labelSelector: deployment-webhook=true
type: resource
workflows:
- name: backup
sequence:
- group: fusion-reference-cluster-resources
- group: fusion-reference-resources
- hook: mysql-pod-exec/flush-tables-with-read-lock
- group: fusion-reference-volumes
- name: restore
sequence:
- group: fusion-reference-cluster-resources
- group: fusion-reference-volumes
- group: fusion-reference-resources
- hook: fusion-reference-deployment-check/replicasReady
- hook: mysql-deployment-check/replicasReady
Step 3
Take Backup -- no issues
Step 4
On Restore, issue is observed
Fusion Restore Job Page
Recipe failed
BMYBR0009
There was an error when processing the job in the Transaction Manager service. The underlying error was: 'Execution of workflow restore of recipe fusion-reference-bnr-recipe completed. Number of failed commands: 1, last failed command: "ResourceGroup/fusion-reference-resources " [\'Namespace fusion-reference, resource restore error: error restoring deployments.apps/ fusion-reference/deployment-webhook: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook. fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\', \'Namespace fusion-reference, resource restore error: error restoring deployments.apps/fusion-reference/fusion-reference: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\', \'error restoring deployment-webhook: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\', \'error restoring fusion-reference: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\']'.
Snippet From Velero log:
{"level":"error","logSource":"pkg/controller/restore_controller.go:601","msg":"Namespace fusion-reference, resource restore error: error restoring deployments.apps/ fusion-reference/deployment-webhook: Internal error occurred: failed calling webhook \"deployment-validator.yourdomain.com\": failed to call webhook: Post \"https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s\": no endpoints available for service \"deployment-webhook\"","restore":"ibm-backup-restore/backup-resources-8ed93258-e5d5-4245-be39-8cbebdf58f4f","time":"2025-11-11T09:07:17Z"}
{"level":"error","logSource":"pkg/controller/restore_controller.go:601","msg":"Namespace fusion-reference, resource restore error: error restoring deployments.apps/ fusion-reference/fusion-reference: Internal error occurred: failed calling webhook \"deployment-validator.yourdomain.com\": failed to call webhook: Post \"https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s\": no endpoints available for service \"deployment-webhook\"","restore":"ibm-backup-restore/backup-resources-8ed93258-e5d5-4245-be39-8cbebdf58f4f","time":"2025-11-11T09:07:17Z"}