View Only

Disaster Recovery/High Availability of PowerVC using Backup/Restore

By Ankit Arora posted Mon January 13, 2020 11:14 AM

While most PowerVC deployments use a single instance to manage the infrastructure, there could be environments that need a mechanism to ensure that PowerVC can be recovered in cases of disaster. Though PowerVC does not provide full-fledged high availability (for example, active-active) at this point, this blog lists a strategy that can be employed to enable recovery. You might want to review PowerVC Knowledge Center topic "Backing up IBM Power® Virtualization Center data" before you proceed further.

Taking PowerVC backup the default way

PowerVC provides “powervc-backup” and “powervc-restore” commands for backup and restore respectively. “powervc-backup” command can be used to take periodic backups, such that these backups can be eventually used to restore another instance of PowerVC (using “powervc-restore"). The system administrator has to maintain two instances of PowerVC, both of which are exactly the same version and with the same connectivity to resources.

PowerVC1 can be referred to as the primary /active one; PowerVC2 can be referred to as the secondary/passive/standby one. Once the active PowerVC (PowerVC1) is fully functional to manage the infrastructure, periodic backups can be taken from it.

[caption id="attachment_3483" align="aligncenter"] Help for PowerVC Backup[/caption]

The above command takes a backup of all the configuration and data from an instance of PowerVC and creates an archive file of it. The archive file captures the snapshot of PowerVC for that particular moment of time. This archive file has to be provided as an input to the “powervc-restore” command at the time of restore on PowerVC2.
“powervc-backup” command can be run without any arguments as seen below, in which case it takes the default values:
[caption id="attachment_3906" align="alignnone" width="1624"] PowerVC Backup operation[/caption]

Please note that this command stops all PowerVC services before taking the backup. This happens because PowerVC services consist of multiple databases, whose snapshot can be coherently captured only when there are no write operations happening against it. The "powervc-backup" command does not change or disrupt PowerVC configuration in any manner and is safe to use. By default, all backups are created under the local directory /var/opt/ibm/powervc/backups/timestamp. An option to specify a directory of one's choice can also be provided.
[caption id="attachment_3907" align="alignnone" width="1786"] PowerVC Backup operation with specified target directory[/caption]

PowerVC does not have an option to run this command periodically or at certain intervals. However, the system administrators can easily create a cron job to do this. The backups taken can also be stored at an external location (for example, network file system) so that they are not lost if the system that hosts the primary PowerVC goes down.
When services on PowerVC are stopped during "powervc-backup" or "powervc-restore", PowerVC goes into maintenance mode as shown in image below:

[caption id="attachment_3485" align="aligncenter"] Maintenance mode[/caption]

Running "powervc-backup" without restarting services

While the default option is to stop services before taking a backup, the “powervc-backup” command provides an argument named –-active, which attempts to take a backup without stopping the services.
Until PowerVC 1.4.3, there were certain limitations using the –-active option due to which active option could be used only when there were no operations in progress against PowerVC. These shortcomings have been addressed with PowerVC 1.4.4. To be able to take active backups smoothly, it is highly recommended that PowerVC is upgraded to 1.4.4. Given below is an example of a successful PowerVC backup using the -–active option in 1.4.4.

[caption id="attachment_3908" align="alignnone" width="1621"] Backup in Active mode[/caption]

If you are using 1.4.3 and there are one or more operations in progress, you could run into the below error and this command might have to be run multiple times to ensure that a successful backup is created. (Note that a backup archive file is not created if the backup operation is unsuccessful).

[caption id="attachment_3486" align="aligncenter"] Active PowerVC Backup failed[/caption]

Restoring a backup on PowerVC

The backup archive created as part of the above operation can be used to run restore on PowerVC2 (passive/standby PowerVC). The important thing to note is that when “powervc-restore” is run on PowerVC2, all resources like VMs, compute nodes etc. that were originally managed by PowerVC1 will be automatically moved to PowerVC2 and all references of these will be removed from PowerVC1. "Note that here compute nodes refer to Novalink hosts." VMs mentioned refer to VMs deployed on Novalink Hosts. HMC, HMC managed hosts and HMC managed VMs will still be visible on PowerVC1.

[caption id="attachment_3488" align="aligncenter"] Help for PowerVC Restore command[/caption]

Running powervc-restore on the same PowerVC instance

These backups can also be used in case where PowerVC1 goes into a corrupt state (for example, someone accidentally messed up the configuration files beyond recovery). In such a case, the "powervc-restore" command can be run on the same system where PowerVC1 is installed. The system administrator can restore PowerVC back to a previous state by providing the right backup archive file as input.

[caption id="attachment_3489" align="aligncenter"] PowerVC Restore using target directory[/caption]

Running "powervc-restore" command on another PowerVC instance

To be able to run "powervc-restore" command on another PowerVC instance (for e.g on PowerVC2, which is a passive/standby PowerVC), the backup taken from PowerVC1 has to be made available/accessible to PowerVC2. The restore operation stops all services running on PowerVC2, restores the backup that consists of configuration and database files, and then starts the services. This process will seamlessly unmanage and remove all compute nodes and virtual machines from PowerVC1 (if PowerVC1 is still active) and add them to PowerVC2, such that PowerVC2 is now the active and primary management node managing all the resources that were previously managed by PowerVC1.

Below message is received on the PowerVC1 for the managed remote nodes when the backup is restored on PowerVC2.

[caption id="attachment_3490" align="aligncenter"] Message for PowerVC Restore happening on another PowerVC[/caption]

There could be cases where PowerVC1 has completely crashed, in which case PowerVC2 is restored fine but the restore process will be unable to clean up references of these managed resources in PowerVC1 (as it is inaccessible). When PowerVC1 eventually comes back up, the PowerVC admin has to login to PowerVC1 GUI and clean up compute resources like host and VM that are in error state.
In the below image, you can see that the NovaLink host is displayed in unknown state in PowerVC1 after the host system comes back up:

[caption id="attachment_3491" align="aligncenter"] Remote node is seen in Unknown state[/caption]

When manually removing the NovaLink host from PowerVC1 after the host is restored on PowerVC2, it asks for option to remove PowerVC software on the NovaLink host (as shown in the below picture). Do not select this checkbox, otherwise the NovaLink host might get corrupted.

[caption id="attachment_3492" align="aligncenter"]Manual removal of Remote node from PowerVC[/caption]


Based on the PowerVC deployment your environment has, you can consider if using PowerVC backup/restore is an option you can use for managing the availability of your PowerVC management controller.

If you have any questions about this topic, please comment below. Watch this space for more information about troubleshooting your environment. In the meantime, don't forget to follow us on LinkedIn, Facebook, and Twitter.

Ankit Arora (aarora06@in.ibm.com)
Divya K Konoor (dikonoor@in.ibm.com)
Kumar Biplab Singh (kumarsi1@in.ibm.com)