IBM Fusion

IBM Fusion

Ask questions, exchange ideas, and learn about IBM Fusion

 View Only

Making Kubernetes or OpenShift Backup and Restore Reliable with Resource Discovery

By Sandeep Prajapati posted Sun January 11, 2026 01:58 PM

  

The Hidden Challenge in Kubernetes Backup & Restore

Backup and restore in Kubernetes is often discussed in terms of tools - Velero, OADP, snapshots, object storage - but the hardest part is frequently overlooked - Knowing exactly what to back up and what to restore.

In real-world clusters:

  • Applications span dozens of resource types
  • Operators create resources dynamically
  • Some resources are cluster-scoped and other namespace-scoped
  • Labels are inconsistent or missing

If resources are missed during backup, restores appear “successful” but applications still fail.

This is where resource discovery becomes critical, and where tools like the get-resources kubectl plugin add significant value.

Why Resource Inventory Matters for Backup & Restore?

A Kubernetes application is not just Deployments and Services. It may include:

  • CustomResourceDefinitions (CRDs)
  • Custom resources managed by operators
  • RBAC objects
  • Webhooks
  • ConfigMaps and Secrets created at runtime

Without a complete inventory, backup tools can:

  • Skip critical objects
  • Restore incomplete application states
  • Leave behind orphaned resources

The get-resources plugin helps solve this by building a precise inventory of application-related resources, regardless of how they were created.

Backup Use Case #1: Defining the True Backup Scope

Before you run a backup, you need answers to:
  • What resources belong to this application?
  • Are there cluster-scoped dependencies?
  • Were resources created dynamically after installation?
Using get-resources, you can:
  1. Enumerate all namespace-scoped and cluster-scoped resourcesTo retrieve all resources, run
    oc get-resources >all_resources.csv
    To know all cluster scope resources
    awk -F',' 'NR==1 || $4==""' all_resources.csv | wc -l
    To know all namespace scope resources
    awk -F',' 'NR==1 || $4!=""' all_resources.csv | wc -l
  2. Filter by creation timestamp or namespaceLet’s assume you deployed an application in the “demo” namespace that includes both cluster-scoped and namespace-scoped resources. In this scenario, we can retrieve all application resources using
    oc get-resources --namespace demo --after=<namespace creationTimestamp>
    oc get-resources --namespace demo --after=$(oc get namespace demo -o jsonpath='{.metadata.creationTimestamp}')
    Want to know all resources of a namespace
    oc get-resources --namespace demo
    This can be used to query multiple namespace resources excluding cluster resources
    oc get-resources --namespace demo --namespace=default --exclude-cluster-resources=true
  3. Export the results in machine-readable formats
    oc get-resources --namespace demo
    OR
    oc get-resources --namespace demo --start=<start_timestamp> --end=<end_timestamp>
     
    OR
    oc get-resources --namespace demo --after=<after timestamp> --resource-data=true
     
    OR
    oc get-resources --namespace demo --before=<before timestamp> --output=demo_resources
    OR
    oc get-resources --namespace demo --namespace=default --after=<namespace creationTimestamp> --output=demo_resources
  4. Check help for details
    oc get-resources --help
This allows backup workflows to be:
  • Data-driven instead of assumption-driven
  • More predictable
  • Easier to automate
Instead of guessing what Velero should include, you know exactly what exists.

Backup Use Case #2: Supporting Label-Less and Legacy Applications

Many backup strategies rely on labels:

app=myapp

But in practice:

  • Older applications may not use labels
  • Operators may create unlabelled resources
  • Third-party components may not follow conventions
  • The get-resources plugin does not rely solely on labels.

It identifies resources based on actual cluster state, making it ideal for:

  • Legacy workloads
  • Vendor operators
  • Complex platforms like OpenShift add-ons

Once, we know all resources of an application, we can label them as per our need. This ensures no silent data loss during backup.

Restore Use Case #1: Restore Validation and Confidence

A restore is not complete just because the command finished successfully.

After restore, users need to know:

  • Was every resource recreated?
  • Are annotations and owner references intact?
  • Are cluster-scoped resources present?

By capturing a pre-backup inventory and a post-restore inventory, get-resources enables:

  • Side-by-side comparisons
  • Automated diff checks
  • Restore validation in CI pipelines

This turns restore testing from a manual process into a repeatable verification step.

Restore Use Case #2: Disaster Recovery Drills

Disaster recovery exercises often fail due to:

  • Missing CRDs
  • Restored workloads without permissions
  • Incomplete operator recovery

With get-resources, users can:

  • Capture a baseline resource inventory
  • Simulate failure (cluster or namespace deletion)
  • Restore from backup
  • Compare restored resources against baseline

This highlights exactly what the DR plan missed, long before a real outage occurs.

Restore Use Case #3: Clean Rebuilds and Environment Migration

When restoring applications into:

  • New clusters
  • Different environments (dev --> prod)
  • Fresh OpenShift installations

It’s critical to ensure:

  • No stale resources remain
  • Only required objects are restored

Resource inventories help users:

  • Identify environment-specific objects
  • Exclude or transform non-portable resources
  • Confirm that restores align with target cluster policies

For instance, restore workflows that require domain transformations during recovery can be easily identified and handled

  • Get the cluster domain of the cluster
    oc get dns cluster -o jsonpath='{.spec.baseDomain}'
  • Filter resources from the inventory that reference the domain in their specification
  • Apply any required domain transformations before performing the restore

Backup Validation Use Case: Detecting Orphaned Resources

Uninstalls and failed restores often leave behind:

  • Unused RBAC objects
  • Webhooks
  • Finalizers that block deletion

Running get-resources before and after backup/restore operations helps:

  • Detect resource leaks
  • Validate cleanup logic
  • Improve uninstall and rollback procedures

This improves cluster hygiene and long-term stability.

Using Resource Inventories in Automation

Exported outputs (CSV/YAML/JSON) can be:

  • Fed into backup scripts
  • Used in GitOps workflows
  • Analyzed offline for compliance or audit purposes

For platform users, this becomes a foundation layer for:

  • Backup policy enforcement
  • Restore validation pipelines
  • Multi-cluster disaster recovery strategies

Conclusion

Kubernetes backup and restore failures are rarely the result of the backup tool itself; they usually stem from incomplete visibility into what actually exists in the cluster. By introducing a reliable resource discovery step using tools like the get-resources plugin, users gain a clear and comprehensive understanding of application dependencies and cluster state. This enables them to back up with confidence, verify restores accurately, conduct realistic disaster recovery testing, and eliminate blind spots in complex Kubernetes environments. In modern Kubernetes platforms, successful backup and restore is not just about protecting data - it is fundamentally about maintaining visibility and control.

Acknowledgements: @Jim Smith @Chris Tan

0 comments
19 views

Permalink