IBM Fusion

IBM Fusion

Ask questions, exchange ideas, and learn about IBM Fusion

 View Only

Fusion Backup and Restore considerations for Applications with Webhooks

By Sandeep Prajapati posted Thu November 27, 2025 01:05 PM

  

Background

Kubernetes applications often use admission Webhooks to perform early validation of incoming requests before they reach the Kubernetes API Server. This ensures that requests are accurate and comply with defined policies. 

When recovering application with webhooks, consideration needs to be made as to how the webhooks will behave while the application is being recovered. For example, if you are recovering an application which tries to use an admission webhook and the infrastructure is not yet ready to process the webhook request, you may encounter an error like "Internal error occurred: failed calling webhook" during restore. This typically means that the webhook service was unavailable when restore service attempted to apply the resources during the restore process. With another application, you may not encounter this issue if webhook service was recovered properly before restoring other resources. The behavior of the webhooks depend on the restore order.

In this article, we will explore possible solutions to overcome Fusion restore issues with webhook resources.

Encountering a webhook problem on restore

On restore, if you encounter a webhook problem in the context of a Fusion Backup & Restore recipe.  The problem can be summarized by the following symptoms:

  1. Execution of one of the recipe workflows for restoring resources will fail
  2. Internal error reported by Fusion (and found in logs) will indicate a webhook failure such as "failed calling webhook"

There is more detailed information demonstrating a sample failure at the end of this blog - on how a webhook error will be externalized by Fusion.

Velero recommendation for resolving Webhook restore issues

Let’s start by looking at the Velero recommendation way to work with webhooks during recovery. This is an excerpt from the current Velero documentation:
 
Known issue with restoring resources when Admission webhooks are enabled
The Admission webhooks may forbid a resource to be created based on the input, it may optionally mutate the input as well.
 
Because velero calls the API server to restore resources, it is possible that the admission webhooks are invoked and cause unexpected failures, depending on the implementation and the configuration of the webhooks. To work around such issue, you may disable the webhooks or create a restore item action plugin to modify the resources before they are restored.
We will take a look at these two general workarounds in more detail - disabling webhooks and modifying resources before they are restored.

Tactical solution – disabling the webhook

The first solution suggested in the Velero documentation is to "…disable the webhooks". The suggestion should be stated more completely as: disable the webhooks and re-attempt the restore operation. Note that this is not guaranteed to work but if you had not previously planned your application backup or recovery to account for the webhooks, this might be the only option you have.
 
To disable the webhooks, you can edit the webhook resource and set the "failurePolicy" to "ignore". You should note the current setting of the "failurePolicy" so you can set it back to its original value after a successful restore. After you have edited the webhook that was preventing the restore from completing, you should re-submit the restore without cleaning-up the PVC data and resources that were previously restored. Fusion will have to restore the PVCs again but will not restore resources that were already restored. (Note there is a way to instruct Fusion not to restore the PVC data again but that is outside the scope of this article). If you can now successfully restore your application, you can edit the webhook and reset the "failurePolicy" to its original value.

Strategic solutions to properly protecting applications with webhooks - Overview

There are several ways which you can strategically accommodate the recovery of an application with webhooks. We will look at how you can achieve each of these with the Fusion backup recipe infrastructure in detail:
 
  1. Backup the webhook resources but during restore make sure webhook server is running before restoring other resources. This is easiest strategic solution to implement.
  2. Use a transformation to modify the webhook resource before it is restored. (Note this is similar to what the Velero documentation suggests as "a restore item action plugin to modify the resources". This is the most complete solution but requires additional creation of the appropriate transformation resources.
  3. Exclude the webhook resources from backup and let the controller re-create the webhook server and configurations.
  4. Patch the webhook resource failurePolicy to Ignore during backup so that it the recovered webhook resource will ignore any issues during the recovery and then set the failurePolicy back to its original value after the recovery.

Strategic solution 1: Ensure Webhook server is running before restoring other resources

This is easiest strategic solution to implement. In this solution we will backup the webhook but during restore ensure that the webhook server is running before restoring the actual webhook resource. 
 
Details:
In this sample workflow, the hook fusion-reference-webhook-deployment-check/replicasReady would be used to ensure that the Webhook server resource is recovered first, and group fusion-reference-resources has backup all namespaces resources including webhook server deployment. Its important to note that webhook resource is been restored after ensuring webhook server is ready.
 
  groups:
    - name: fusion-reference-volumes
      type: volume
    - name: fusion-reference-resources
      excludedResourceTypes:
      - events
      - event.events.k8s.io
    - replicaset
      - pods
      type: resource
    - name: fusion-reference-cluster-resources
      includeClusterResources: true
      includedResourceTypes:
        - validatingwebhookconfigurations.admissionregistration.k8s.io
      labelSelector: deployment-webhook=true
      type: resource
  workflows:
    - name: restore
      sequence:
        - group: fusion-reference-volumes
        - group: fusion-reference-resources
      - hook: fusion-reference-webhook-deployment-check/replicasReady
        - hook: mysql-deployment-check/replicasReady
      - group: fusion-reference-cluster-resources
Pros:
  • Keeps the webhook resource in the backup, ensuring comprehensive backup is achieved.
  • Improves resiliency compared to excluding the webhook.
Cons:
  • Involves creating a restore workflow (dependency management) to restore the webhook at the appropriate sequence.

Strategic solution 2: Use a transformation to modify the webhook resource before it is restored

This is the most complete solution but requires additional creation of the appropriate transformation resources. A resource transformation is essentially a hook into the restore process which allows the program to modify the contents of a resource in memory before it gets restored to the target cluster.  (Note this is similar to what the Velero documentation suggests as "a restore item action plugin to modify the resources")
 
Steps:
  1. Add another group for the webhook resource with "restoreOverwriteResources: true" in the Fusion Recipe. This additional group references the original group but differs by having the restoreOverwriteResources and disableTransform flag enabled. Then proceed with taking the backup snapshot
      groups:
        ...
        - name: fusion-reference-cluster-resources-overwrite
          backupRef: fusion-reference-cluster-resources
          restoreOverwriteResources: true
          disableTransform: true
          includeClusterResources: true
          includedResourceTypes:
            - validatingwebhookconfigurations.admissionregistration.k8s.io
          labelSelector: deployment-webhook=true
          type: resource
      workflows:
        - name: backup
          sequence: 
            - group: fusion-reference-cluster-resources
            - group: fusion-reference-resources
            - hook: mysql-pod-exec/flush-tables-with-read-lock
            - group: fusion-reference-volumes
        - name: restore
          sequence:
            - group: fusion-reference-cluster-resources
            - group: fusion-reference-volumes
            - group: fusion-reference-resources
            - hook: fusion-reference-deployment-check/replicasReady
          - hook: fusion-reference-webhook-deployment-check/replicasReady
            - hook: mysql-deployment-check/replicasReady
          - group: fusion-reference-cluster-resources-overwrite
    Note: Flag disableTransform: true disables the transformation effect on webhook resource and helps in restoring the backed up resource.
  2. Configure and apply Fusion Transform CR for webhook (failurePolicy) transformation - sets webhook "failurePolicy" to "Ignore"
    apiVersion: data-protection.isf.ibm.com/v1alpha1
    kind: Transform
    metadata:
      name: transform-deployment-webhook
      namespace: ibm-spectrum-fusion-ns
    spec:
      transforms:
      - name: ingress-with-json-patch
        json:
        - op: test
          path: /webhooks/0/failurePolicy
          value: Fail
        - op: replace
          path: /webhooks/0/failurePolicy
          value: Ignore 
        subject:
          groupResource: validatingwebhookconfigurations.admissionregistration.k8s.io 
          labelSelector: 'deployment-webhook=true'
  3. Update transform resource details and apply the restore CR
    apiVersion: data-protection.isf.ibm.com/v1alpha1
    kind: Restore
    metadata:
      name: custom-restore-fusion-reference-1
      namespace: ibm-spectrum-fusion-ns
    spec:
      transform: transform-deployment-webhook
      backup: <Backup job name>
      recipe:
        name: fusion-reference-bnr-recipe
        namespace: ibm-spectrum-fusion-ns
  4. Notice the overwrite sequence in the end. This brings back the original webhook resource
        - name: restore
          sequence:
            - group: fusion-reference-cluster-resources
            - group: fusion-reference-volumes
            - group: fusion-reference-resources
            - hook: fusion-reference-deployment-check/replicasReady
          - hook: fusion-reference-webhook-deployment-check/replicasReady
            - hook: mysql-deployment-check/replicasReady
          - group: fusion-reference-cluster-resources-overwrite

         Application of Transform CR in restore scenario clearly demonstrates its importance during recovery i.e. when the webhook resource is already backed up.

Pros:
  • There is no need to exclude the webhook resource from backup or manage its sequencing during recovery.
  • Restoring the webhook resource at the end with overwrite ensures the original webhook resource configuration is restored. Basically, this overwrites the transformed webhook resource with its backed-up version.
Cons:
  • Requires additional transform resource management.

Strategic solution 3: Exclude the webhook from the backup

In this option we will exclude the webhook from backup and let controller re-create the webhook server with configurations. In this situation you must manually apply the webhook configurations after the restore process if there is no controller involved.
 
Pros:
  • This is an easy workaround with minimal complexity.
Cons:
  • The webhook is not included in the backup which reduces overall resiliency.
  • If no webhook controller is present, manual intervention is required to retrieve webhook resources after application recovery, which increases recovery overhead.

Strategic solution 4: Modify the webhook failurePolicy behaviour

In this option we set the webhook resource "failurePolicy" to Ignore during backup so that the recovered webhook resource will ignore any issues during the recovery. We will then set the failurePolicy back to the original value after the recovery.
 
Recipe hooks
disable-webhook-set-failurepolicy-ignore
kubectl patch validatingwebhookconfiguration <webhook-name> --type='json' -p='[{"op":"replace","path":"/webhooks/0/failurePolicy","value":"Ignore"}]'
enable-webhook-set-failurepolicy
kubectl patch validatingwebhookconfiguration <webhook-name> --type='json' -p='[{"op":"replace","path":"/webhooks/0/failurePolicy","value":"<Original value>"}]'
Recipe
  workflows:
    - name: backup
      sequence:
        - hook: disable-webhook-set-failurepolicy-ignore
        - group: fusion-reference-cluster-resources
        - hook: enable-webhook-set-failurepolicy
        - group: fusion-reference-resources
        - hook: mysql-pod-exec/flush-tables-with-read-lock
        - group: fusion-reference-volumes
    - name: restore
      sequence:
        - group: fusion-reference-cluster-resources
        - group: fusion-reference-volumes
        - group: fusion-reference-resources
        - hook: fusion-reference-deployment-check/replicasReady
      - hook: fusion-reference-webhook-deployment-check/replicasReady
        - hook: mysql-deployment-check/replicasReady
        - hook: enable-webhook-set-failurepolicy
Pros:
  • There is no need to exclude the webhook resource from the backup or manage its placement during recovery.
  • Allows user to backup all resources for the application, including webhook resources
Cons:
  • Requires a container with sufficient permissions to execute the patch commands on the webhook resource.
  • Involves managing permissions and hooks for patching during backup and restore.

Summary

Among the different solutions discussed for recovering applications with webhooks, the first two solutions are more elegant because they retain all application resources in the backup and do not require an additional container for executing hooks to update the “failurePolicy”. Both approaches also eliminate the need for any manual steps during recovery.
 
Out of these two, the Fusion Transform CR approach (solution 2) is more straightforward, requiring only minimal management of the transformation resource. By following these approaches, one can develop a functional Fusion Recipe workflow for webhook-based application recovery. However, depending on your specific scenario, you may find other solutions outlined to be more suitable in your case.
 
Feel free to share your experiences with webhook-based application recovery - the challenges you faced and the resolutions applied. Will be happy to learn from your story!

Detailed Webhook failure information

IBM Fusion Recipe framework internally usages Velero for resource processing. Let’s understand the webhook restore problem step by step using a reference application.
 
Step 1
Let’s assume reference application have a webhook resource which will abort the update or scale of deployment (application) if "spec.replicas" greater than 1.
Webhook resource applied:
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  labels:
    deployment-webhook: "true"
  name: deployment-validator
webhooks:
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: <base64-encoded-ca>
    service:
      name: deployment-webhook
      namespace: fusion-reference
      path: /validate-deployment
      port: 443
  failurePolicy: Ignore
  matchPolicy: Equivalent
  name: deployment-validator.yourdomain.com
  namespaceSelector:
    matchLabels:
      webhook: enabled
  rules:
  - apiGroups:
    - apps
    apiVersions:
    - v1
    operations: [ "CREATE", "UPDATE" ]
    resources:
    - deployments
    - deployments/scale
    scope: Namespaced
  sideEffects: None
  timeoutSeconds: 10
Apply other webhook configuration resources like TLS secret, deployment and services to bring webhook in effect.
 
Step 2
Modify Fusion Recipe Backup and Restore workflow for reference application. New resource group, backup and restore workflow after modification will be -
spec:
  appType: fusion-reference
  groups:
    ...
    - name: fusion-reference-cluster-resources
      includeClusterResources: true
      includedResourceTypes:
        - validatingwebhookconfigurations.admissionregistration.k8s.io
      labelSelector: deployment-webhook=true
      type: resource
  workflows:
    - name: backup
      sequence:
        - group: fusion-reference-cluster-resources
        - group: fusion-reference-resources
        - hook: mysql-pod-exec/flush-tables-with-read-lock
        - group: fusion-reference-volumes
    - name: restore
      sequence:
        - group: fusion-reference-cluster-resources
        - group: fusion-reference-volumes
        - group: fusion-reference-resources
        - hook: fusion-reference-deployment-check/replicasReady
        - hook: mysql-deployment-check/replicasReady

Step 3
Take Backup -- no issues
 
Step 4
On Restore, issue is observed

Fusion Restore Job Page

Recipe failed
BMYBR0009
There was an error when processing the job in the Transaction Manager service. The underlying error was: 'Execution of workflow restore of recipe fusion-reference-bnr-recipe completed. Number of failed commands: 1, last failed command: "ResourceGroup/fusion-reference-resources " [\'Namespace fusion-reference, resource restore error: error restoring deployments.apps/ fusion-reference/deployment-webhook: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook. fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\', \'Namespace fusion-reference, resource restore error: error restoring deployments.apps/fusion-reference/fusion-reference: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\', \'error restoring deployment-webhook: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\', \'error restoring fusion-reference: Internal error occurred: failed calling webhook "deployment-validator.yourdomain.com": failed to call webhook: Post "https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s": no endpoints available for service "deployment-webhook"\']'.

Snippet From Velero log:

{"level":"error","logSource":"pkg/controller/restore_controller.go:601","msg":"Namespace fusion-reference, resource restore error: error restoring deployments.apps/ fusion-reference/deployment-webhook: Internal error occurred: failed calling webhook \"deployment-validator.yourdomain.com\": failed to call webhook: Post \"https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s\": no endpoints available for service \"deployment-webhook\"","restore":"ibm-backup-restore/backup-resources-8ed93258-e5d5-4245-be39-8cbebdf58f4f","time":"2025-11-11T09:07:17Z"}
{"level":"error","logSource":"pkg/controller/restore_controller.go:601","msg":"Namespace fusion-reference, resource restore error: error restoring deployments.apps/ fusion-reference/fusion-reference: Internal error occurred: failed calling webhook \"deployment-validator.yourdomain.com\": failed to call webhook: Post \"https://deployment-webhook.fusion-reference.svc:443/validate-deployment?timeout=10s\": no endpoints available for service \"deployment-webhook\"","restore":"ibm-backup-restore/backup-resources-8ed93258-e5d5-4245-be39-8cbebdf58f4f","time":"2025-11-11T09:07:17Z"}

Acknowledgements: @Jim Smith @Chris Tan

Previous blog

0 comments
41 views

Permalink