API Connect

 View Only

Changes to Backup & Restore in the Management Subsystem

By David O'Shea posted 19 days ago

  

Starting from the 10.0.7.0 release, the Management subsystem of API Connect improved the back-end PostgreSQL database vendor and one of the end-user visible impacts of this is a change to the backup and restore experience. The experience was further improved for 10.0.8 by adding back support for features not supported natively by the new database operator.

Backups

Prior to 10.0.7.0, backups were wrapped with a managementbackup CR which was used to initiate a management backup and expose some backup status to the end-user, since the CrunchyData backups are controlled via an operator CLI rather than by CR. These backups natively supported S3 and local backups via a pgBackRest pod, with sftp backup support added by a wrapper provided by API Connect which uploaded a tarfile of the local backup to your choice of sftp server.

With the switch to EDB Postgres in 10.0.7.0, backups are supported by CR natively, so the managementbackup CR wrapper has been removed and the native EDB backup CR is used directly. EDB's backups natively support S3, but not local or sftp. Initially 10.0.7.0 shipped with just the S3 backup support but in 10.0.8.0 API Connect added back support for both sftp and local backups via a new pod called s3proxy.

The s3proxy pod implements support for both sftp and local by exposing an AWS S3 compatible interface to EDB on one side and either persisting backup files to a PVC mounted in the pod, or streaming the files to/from an upstream sftp server. The EDB cluster is automatically configured to backup to the s3proxy pod with the appropriate endpoint and credentials when sftp or local is configured in the management CR database backup section.

s3proxy makes a number of improvements over the previous pgbackrest implementation, namely:

  • No longer a single point of failure, pod can scale to 3 replicas in an n3 deployment
  • Backups are individually streamed to the sftp server in real-time rather than being tarred up after the backup and pushed as a single large file as a separate operation
  • Layout of local and sftp backups matches what would be uploaded to an S3 bucket rather than being a different format
    • Note that on the sftp server, it is normal for the data.tar.gz file to appear to be 0 bytes in size. This is because that portion of the backup is uploaded as a multi-part file, so the actual data is stored as hidden files such as .data.tar.gz.part1.data.tar.gz.part2 etc.
  • Real-time WAL file streaming to sftp is now supported, avoiding WAL files building up in the DB pod and improving database replica resiliency
  • sftp backups and associated WAL files are now subject to the backup retention period and no longer need to be manually maintained (starting with 10.0.8.1)
  • Full support for passwords with special characters (including spaces)
  • Support for additional sftp servers which previously caused issues (e.g. Azure sftp)

For reference, the backup layout looks like this:

In the above, the top level directory is the database sitename - this is generated when the database cluster is created (initial install or during a restore). The backups themselves are then under a base directory and each backup is contained in a datestamp format directory of when the backup was taken. Finally, each backup directory contains a backup.info file with metadata about the backup and a data.tar.gz file containing the actual main contents of the backup.

As a peer of the base directory there is also a wal directory, which contains a sharded hierarchy of streamed WAL files. These are used to resync database replica pods and are also used during a restore in order to restore data that was changing while the backup was in progress to make sure it is consistent. Some example files are shown below to show what the sharding looks like:

management-site1-db-2024-09-24T155451Z/wals/0000000100000018/0000000100000018000000EC.gz
management-site1-db-2024-09-24T155451Z/wals/0000000100000018/0000000100000018000000ED.gz
management-site1-db-2024-09-24T155451Z/wals/0000000100000018/0000000100000018000000EE.00000028.backup.gz
management-site1-db-2024-09-24T155451Z/wals/0000000100000018/0000000100000018000000EE.gz

...

management-site1-db-2024-09-24T155451Z/wals/0000000200000018/0000000200000018000000FE.gz
management-site1-db-2024-09-24T155451Z/wals/0000000200000018/0000000200000018000000FF.gz
management-site1-db-2024-09-24T155451Z/wals/0000000200000019/000000020000001900000000.gz
management-site1-db-2024-09-24T155451Z/wals/0000000200000019/000000020000001900000001.gz

Scheduled Backups

One final point to note when configuring scheduled backups is that the configuration has changed in a number of ways:

  • The retention period has changed from number of backups to number of days. When a backup completes successfully, older backups will be removed
  • There is an extra field at the start of the schedule as you can now configure seconds

While using scheduled backups is recommended, it may still be desirable to perform an on-demand backup from time to time. This is as straightforward as creating a backup CR with just a few fields:

apiVersion: postgresql.k8s.enterprisedb.io/v1
kind: Backup
metadata:
  name: manual-backup-202409151400
spec:
  cluster:
    name: apic-mgmt-6b022fcc-db

The cluster name must match the deployed cluster - this can be found by running kubectl get cluster in the namespace of your deployment. If the cluster name does not match, the backup will stay pending.

Restores

Restores are still performed by using a managementrestore CR. There are 2 main ways to restore.

If you have a backup CR available for the backup you wish to restore, you can simply reference that in your restore CR:

apiVersion: management.apiconnect.ibm.com/v1beta1
kind: ManagementRestore
metadata:
  name: mgmt-restore-example
spec:
  backupName: manual-backup-202409151400
  subsystemName: apic-mgmt

If a backup CR is not available (for example if you are performing a disaster recovery, or you wish to restore from a backup in a different location), you can directly specify where the backup is located. Unlike with 10.0.5, this does not need to match the location configured in the management CR. This can be done as follows:

apiVersion: management.apiconnect.ibm.com/v1beta1
kind: ManagementRestore
metadata:
  name: mgmt-dr-restore-example
spec:
  backup:
    credentials: aws-backup-secret
    host: s3.us-west-1.amazonaws.com/us-west-1
    path: mybucket/edb-backups/management-site1-db-2024-09-24T155451Z
  subsystemName: apic-mgmt
  backupId: 20241008T084635

The backup section can contain the same parameters as the management CR backup configuration so can for example reference sftp also.

Final points

As part of the upgrade to 10.0.8 from 10.0.5, backup settings are automatically migrated to the new style, with the following points to note:

  • Old backups from 10.0.5 cannot be restored in 10.0.8 due to the change in format and the change in PostgreSQL version (v12 to v15)
  • Backup paths will be updated to append /edb to the end to avoid intermingling new and old backups, potentially causing confusion
0 comments
8 views

Permalink