Automated Remote Restart in PowerVC - Error scenario and recovery tips
Automated remote restart is a widely used feature by clients in Power Systems. PowerVC further automate this by the embedded scheduler. PowerVC monitors hosts for failure using its scheduler; and if a host fails, PowerVC automatically remote restarts virtual machines from the failed host to another viable host within a host group.
What should you do if Automated Remote Restart (ARR) fails due to space issues on the destination host?
In this blog, we will learn about how to recover a scenario where Remote Restart of any VMs fails due to unavailability of resources on the host group.
Prerequisite for ARRAutomated remote restart can be enabled or disabled for each host group, host, and virtual machine.
Note: By default, the automated remote restart is disabled on host groups and is enabled on hosts and virtual machines. However, automated remote restart does not occur unless automated remote restart option is enabled on the host group.
If the source host suddenly goes down and the destination host does not have enough space to accommodate all VMs from the source host, then ARR fails.
In the following example, we have taken PowerVC 2.0.2 on NovaLink. First, we will see how ARR fails. Later, we will discuss the troubleshooting steps on how to rebuild all the source host VMs.
Initially, we have added two NovaLink hosts P8_56 and P8_57. Free procs value on P8_56 is only 6.5 and it can’t hold all the VMs of P8_57 with requirement of 12.5 procs. We will make P8_57 down and let ARR fail.
You can view the list of virtual machines on the VM list page.
Now, we will bring down the host P8_57.
On the 'VM list' page, you can see that all VMs on P8_57 host are in Attention state:
ARR is triggered automatically and the host state changes to Remote Restart Started.
Then, the host state moves to Remote Restart Rebuilding.
Finally, ARR gets failed as P8_56 (destination host) does not have enough space for all the VMs. As you see in the image below, the host state of P8_57 changes to Error.
We can also view a notification in the notifications panel:
Virtual machine RRN-1 is left out on the source host P8_57 which caused error.
To start with, we must remove some old VMs (56-3) on the destination host P8_56 to make some room for the left out VM (RRN-1) from the source.
Then, manually trigger Remote Restart operation on P8_57 to move VM (RRN-1) to P8_56.
All VMs including RRN-1 are moved to P8_56.
However, the source host is still in Error state.
The next step is to bring up the source host.
P8_57 Host is back to Operating state.
As you have seen, it is quite simple to Remote Restart all VMs even if ARR fails due to space issues on the destination. Please reach out to us if you have any queries or comments regarding this blog.
Knowledge center link: PowerVC
Don't forget to follow us on LinkedIn, Twitter, YouTube, and Facebook.
Imranuddin W Kazi