Disk-level validation for Live Partition Mobility (LPM) of NPIV LPARs (Logical Partitions using NPIV storage via Virtual FC adapters) is an essential feature of the LPM process. Starting with version 1061, HMC enables this disk-level validation by default. The process is further strengthened with added LUN ID validation, enhancing robustness and reliability in the overall validation workflow. This blog is focused on providing the details of this new enhancement which enables better NPIV storage validation while validating Active LPM of NPIV LPAR. It also explains how misconfigurations can arise and potentially cause issues after LPM.
Problem statement:
Discrepancies in LUN IDs for disks between source and destination hosts can occur due to misconfigured LUN masking at the storage level. The existing LUN-level validation process only verifies the UUID and LUN type, without considering LUN IDs configured at the storage level. If this configuration is overlooked and LPM is performed, the operation might complete successfully, but the LPAR could potentially lose access to the disks.
For instance, the active and inactive WWPNs of a client’s virtual FC adapter may be configured as separate hosts at the storage system, then can assign the same LUNs to these hosts in different orders. This results in varying LUN IDs associated with each client’s virtual FC adapter WWPN.
There was a need to detect such misconfigurations and provide early reporting to the user, as these issues could lead to LUN access problems on destination hosts following migration.
Existing Lun level validation behavior:
Each Virtual Fibre Channel (VFC) adapter assigned to a client logical partition is provided with two unique WWPNs (Worldwide Port Names). At any given time, the client partition uses one active WWPN to log into the SAN, while the second inactive WWPN is reserved for use when the partition is moved to another managed system via Live Partition Mobility (LPM).
During the LPM validation process, a SCSI INQUIRY command is issued at the storage level on both the source and target VIOS as part of the LUN validation. This command retrieves details such as the product ID, vendor name, and description for each LUN. These details, including the descriptor type and UUID, are then compared between the source and destination VIOS (Virtual I/O Servers). If these details match, the LUN level validation will be considered as successful.
It is always recommended to configure the same LUN ID for both WWPNs of the VFC adapter when provisioning LUNs for the client. However, due to misconfigured LUN masking at the storage level, the LUN IDs for both the WWPNs could differ.
The existing LPM process does not validate the LUN IDs of individual LUNs. Consequently, if any LUN ID is misconfigured, the LPM process may still complete successfully, but the client partition could face issues post-migration. The behavior following migration may be unpredictable, depending on the disk with the altered LUN ID. For non-root disks, a mismatched LUN ID could result in the partition losing access to the corresponding storage LUN after migration and IO Errors could be seen. If the LUN ID for the root disk is misconfigured, the partition might seem to be hanged, and it might fail to boot if reboot is attempted.
LUN ID misconfiguration:
As shown in the above two diagrams, the LPAR’s VFC adapter has two WWPNs: wwpn_c1 and wwpn_c2. Currently, wwpn_c1 is active, while wwpn_c2 will be activated on the target Host after LPM. Typically, both WWPNs would be mapped to a single host at the storage level. However, in this case, wwpn_c1 is mapped to HOST_A, and wwpn_c2 is mapped to HOST_B.
At storage, the user/admin has created two LUNs: disk1 (UUID1) and disk2 (UUID2). When mapping disk1 to HOST_A first, it gets assigned LUN ID 0, and disk2 is assigned LUN ID 1 when mapped afterwards. However, on HOST_B, if disk2 is mapped first, it receives LUN ID 0, and disk1 gets LUN ID 1 when mapped later. As a result, the LUN IDs for each disk differ between the active and inactive WWPNs. In this scenario, LPM (Live Partition Mobility) operation will complete successfully but access to both the LUNs will be lost afterward due to the mismatch in LUN IDs.
Solution:
With this feature, the validation algorithms have been enhanced to detect and report it to the user about LUN ID misconfigurations which can lead to issues related to LUN access after Migrating to the destination Hosts. This also helps user to take corrective actions.
This feature supported from VIOS 4.1.1.0 release onwards.
Supported version details:
HMC Version
|
VIOS Version
|
LUNID Validation Support
|
<=1060
|
4.1.0.0(73D)
|
NO
|
<=1060
|
4.1.1.0 (73F)
|
If LUN level validation is enabled.
|
1061
|
4.1.0.0 (73D)
|
NO
|
1061
|
4.1.1.0 (73F)
|
YES
|
Validation error display format:
If LPM validation fails due to a LUN ID mismatch, an error message with the LUN ID and UUID is displayed in the following format. This message appears in both the HMC UI and CLI output.
Miscellaneous:
LUN ID Format across different Storage types:
Following are the snapshots showing the LUN ID formats on different storage types and how they can be seen and compared on the client LPAR.
- SVC storage
- EMC Storage
- Hitachi Storage
Contributors:
Ashok Mungara (ashok.mungara@ibm.com)
Mohit Sharma (mohit.sharma12@ibm.com)