Since PowerVC 1.2.3, users have been able to manually remote restart virtual machines. That is, when a host goes down, you can manually evacuate its virtual machines. But with PowerVC 1.3.2, the evacuation can be done automatically. This new feature is called automated remote restart (ARR).
One problem with manual remote restarts is that you can remote restart each VM singly or you can remote restart them all. There's no easy way to select several VMs and remote restart them. With ARR, you can easily configure which VMs are candidates for remote restart. Then if it becomes necessary, just those VMs are evacuated. For an overview of the remote restart feature, along with the list of requirements, please see Remote Restart Overview. Automated remote restart is supported for NovaLink, HMC, and KVM hosts.
Getting Started
A minimum PowerVC version of 1.3.2.0 is required for the automated remote restart function, but it is always recommended that you use the latest version and any fix packs that are available. ARR has the same requirements as remote restart (see Remote Restart Overview or the link in the Requirements section below). The only difference between a manual remote restart and an automated remote restart is that PowerVC initiates the automated operation whereas a user initiates the manual restart. Once the requirements have been met, automated remote restart can be configured.
Configuration
Configuring ARR consists of enabling the remote restart capability, enabling automated remote restart, and configuring ARR settings that specify how long a host must be down before PowerVC starts issuing remote restarts. The two settings are the run interval and stabilization.
Enabling remote restart
Remote restart is enabled on a VM at deploy time via a compute template. It also needs to be enabled on the host, which is dependent on the hardware and firmware levels of the host. You can't change the host setting through PowerVC, but you can verify that it is enabled by viewing the host details page. When remote restart is enabled on the host and the VM, a VM is eligible for remote restart.
To be capable of automated remote restart, additional configuration properties are needed, as described in the following section.
Enabling automated remote restart
For a VM to be eligible for automated remote restart, the VM must be not excluded from ARR, the host that the VM is on must be not excluded from ARR, and the host group the host belongs to must be ARR enabled. The following are the defaults for the ARR configuration settings:
- Host groups are ARR disabled by default.
- Hosts are not excluded from ARR by default.
- VMs are not excluded from ARR by default.
So by default, no VMs would be automatically remote restarted by PowerVC because the host groups have ARR disabled.
To enable or disable ARR for host groups, simply open the user interface to the appropriate host group details page, click Edit and select or deselect Enable automated remote restart:
The two images below show where to enable automated remote restart for hosts and VMs.
NOTE: For hosts and instances, the UI shows the setting "Excluded from automated remote restart" (just as with the DRO options). ARR is enabled at the host group level. VMs and hosts can be individually excluded from ARR.
One important note is that the automated remote restart setting is independent of the remote restart setting. In the VM pictured above, remote restart is disabled, but automated remote restart is enabled. In this case, the VM would NOT be automatically remote restarted because PowerVC checks the remote restart setting on hosts and VMs before issuing automated remote restarts. You can change the remote restart setting by resizing the virtual machine and specifying a compute template with remote restart enabled.
ARR configuration settings
There are two new configuration options for ARR in host groups that determine how long a host must be down before PowerVC starts issuing remote restarts: run interval and stabilization. The run interval specifies how often (in minutes) PowerVC checks to see if a host is down, with a default of 1 minute. Stabilization specifies how many times in a row PowerVC must find the host "down" before kicking off remote restarts, with a default of 5 times. So if the run interval is 3 and the stabilization is 5, it would take a least 15 minutes (5 minutes x 3 times) from the time a host goes down for PowerVC to start remote restarting the ARR enabled VMs from the host.
The following image shows where to set these values:
...And more
There are a few more details about the automated remote restart feature in PowerVC that are helpful to know:
- When a host boots up after being automatically remote restarted, and after the evacuated VMs are cleaned up by PowerVC, the host is placed in maintenance mode. This does not happen with manual remote restart and serves as an indication that ARR took place. If the host comes back up in the middle of automatically remote restarting it will still be put into maintenance mode.
- With the 1.3.2.0 release, automated remote restarts and manual host-wide remote restarts, VMs are evacuated in order of Availability Priority (values range from 0 to 255) . VMs with a higher Availability Priority are moved first. To change the Availability Priority of a VM, you must resize the VM using a new compute template with the desired value. Note that the Availability Priority setting is used by the hypervisor to determine which LPARs get more scheduled processor time in the event of a processor failure. See Alternate Processor Recovery and Partition Availability Priority for more information on what this setting means to the PowerVM platform.
- Since automated remote restart does not have the benefit of an operator to determine if a host system is indeed down, additional checks are performed for assurance. Some of these checks are in place for manual remote restart as well. These checks happen under the covers and do not need to be configured in any way, but may be helpful to know about. Not all of the checks are run for each type of host, but for a host to be considered down, ALL of the checks that are run for that type must report the system as down:
- Nova compute service check
- For KVM and NovaLink hosts, there is a service that runs on the host when it is registered in PowerVC called nova-compute. For a KVM or NovaLink host to be considered down its nova-compute service must be down (determined by a heartbeat signal sent to the PowerVC controller).
- HMC system state check
- For HMC hosts, the state of the host can be determined by the HMC. The state must be one of these: "error", "error - dump in progress", "power off", "no connection," to be considered "down".
- SSH response check
- For KVM and NovaLink hosts, PowerVC will attempt to open an SSH connection with the system. Only if PowerVC receives no response will ARR proceed.
- VIOS Fibre Channel port activity check via registered fabrics
- For NovaLink and "No connection" HMC hosts (meaning the HMC does not have a connection to the host), the VIOS's FC ports will be checked for activity on PowerVC registered fabrics. Only if no activity is reported for the ports across all registered fabrics will ARR proceed. The fabrics that zone the host's VIOS's FC ports don't need to be registered in PowerVC, but doing this provides better detection of a false positive (where the host isn't actually down) for ARR.
- VIOS SSP check
- For NovaLink hosts, if they belong to a cluster (SSP) then the state of the VIOS(es) reported by the cluster will be tracked by PowerVC. Hosts belonging to the same cluster report the state of all other hosts' VIOSes in the cluster as well. If a host is being considered for ARR and there are other hosts registered in PowerVC in the same SSP, then the state of the host's VIOS(es) must be considered "down" to proceed with ARR.
Requirements
The requirements for automated remote restart are a minimum PowerVC version of 1.3.2.0 and all of the existing requirements for Remote Restart which may be found here: Remote Restart Requirements.
Summary
With PowerVC 1.3.2 comes the support of automated remote restart where, with some configuration, VMs may now be automatically remote restarted in the event a host goes down. Hopefully the information above answers any questions about this new feature. For more on PowerVC, follow us on LinkedIn, Facebook, and Twitter!
#Storage#Compute