PowerVM

 View Only

Where do I find LPM Documentation, best practices, and information on error codes?

By Bill Carlson posted Thu June 18, 2020 03:41 PM

  
PowerVM LPM Best Practices
I recently spoke with a major client who was a big fan like me of Live Partition Mobility, or LPM.  Live Partition Mobility is a vital aspect of concurrent maintenance on IBM POWER servers. LPM enables migration (e.g. MOVING IN REAL TIME) of running partitions from one physical server to another while maintaining complete transactional integrity and transference of the entire partition environment: processor state, memory, virtual devices, and connected users. Partitions may also migrate while powered off if required (this is called inactive migration). The main requirement for LPM is that the operating system and application must reside on shared virtualized storage and networks accessible from the source and destination LPM servers.

As with other advanced technologies there are more details to make LPM work seamlessly. This is where this article comes into play. The client I was working with was very happy with the LPM function but mentioned that some information that helped them get “into production with LPM” was not readily apparent when they first started on this mission. I wanted to understand and help provide that information, although while it was available, it was not consolidated into a single post.

That’s why I am here.  You will see three sections and groupings of links below.  This first is link aggregation of the most current IBM Redbooks documentation on PowerVM and LPM (with some other external links). The second section gives you a checklist to prepare for LPM as well as some methods and tools to pre-validate your infrastructure prior to attempting LPM. The third section gives you some additional information on error codes you might receive when running LPM to provide more “English language” information about those errors.

LPM Education (How to, best practices, etc.)
Where does one find information / education on POWER Virtualization (PowerVM) which encompasses the LPM function?  This is easy. IBM Redbook external content is rich, complete, and a great way to start. A primer introduction to LPM is here http://www.redbooks.ibm.com/abstracts/sg247940.html?Open


Best Practices for PowerVM are here:

https://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg248062.html?Open

This last link is about Managing and Monitoring PowerVM. Note that Chapter 14 is dedicated to LPM.

https://www.redbooks.ibm.com/abstracts/sg247590.html?Open


Pre-LPM Checklists and other utilities

Once your production virtual environment is up and running, and you want to leverage LPM to do concurrent maintenance or to re-distribute workload on servers, you may want to (1) find  a checklist for what to validate prior to first LPM, and (2) be aware of other utilities to verify source and destination server network and storage equivalence.

One of the best checklists I have found online is here http://www.redbooks.ibm.com/abstracts/tips1185.html?Open
Validate this first: Common Pre-LPM Gotchas:
  1. Network Bandwidth: First and foremost, LPM users should verify there is a MINIMUM of a 10Gb link between source and destination VIOS (a dedicated link is preferred, but this is not required). If analyzing a network performance issue during LPM this link is very useful to collect information and analyze it: http://www.ibm.com/support/docview.wss?uid=isg3T1024652
  2. Device configuration:  SAN fiber channel, (fcsX and fscsiX), storage (hdiskX), devices settings need to be identical on source and destination servers (VIOS). Make sure that you validate this prior to attempting LPM.  Finally, DO NOT FORGET that AIX / VIOS devices on SOURCE and DESTINATION must have the same settings or else this is a good recipe for an LPM validation or LPM failure. Validation will kick the majority of these out; but it is easier to pre-validate and make sure these are all the same.
  3. SAN Configuration : Storage Area Network Zoning is often a common problem that is not identical between source and target servers or not configured for both WWPN on a virtual client (NPIV). A tool exists to pre-check this connectivity, prior to an LPM Validation that would fail. The devscan utility provides this function. It is definitely worth it depending on the degree of control and visibility you have on your SAN-attached storage.  The link to that tool is here: https://www.ibm.com/support/pages/devscan-tool
  4. DLPAR and RMC Configuration : RMC (Resource Monitor Control) Subsystem is vital for LPM to work. Verify from the HMC using the lspartition –dlpar command that all partitions are working properly with RMC prior to attempting LPM. More details about RMC can be found at this link: http://www-01.ibm.com/support/docview.wss?uid=isg3T1020611
  5. PRE- VALIDATE: A common overlooked fact is that the LPM contains a validation function that verifies to its best ability equivalence of source and destination server connectivity to storage and network resources to ensure LPM success. It is possible to run a “pre-validation” prior to a service window to ensure a successful LPM operation.  Make use of it! If it fails, the HSCLA errors provided can provide additional information on what to correct.  NOTE: From VIOS 2.2.4 and beyond has improvements in validation.

Key LPM usage tip
If running LPM using the migrlpar command, there is really only one tip here to remember.
If possible, always specify the name/IP of the MSP in the migrlpar command. This will simplify and reduce the validation time required.
The syntax of the migrlpar command is available here: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8edm/migrlpar.htm

How Many Concurrent LPMs should I run?
Assuming you have a dedicated 10Gb link between source and destination VIOS, understand that LPM will pretty much saturate a 10Gb network link. Therefore, it is expected that running sequential LPMs (one per redundant MSP quad) will behave about as fast as running a certain number (8 is the current maximum per pair of VIOS – 2.2.5) . In a future blog post we will post more about this behavior.

NOTE: From VIOS 2.2.5 and beyond, MSP implementation during LPM is redundant (unless disabled with the -redundantmsp = 0 option). Previously, with 2 VIOS on source and destination (4 total) and a VIOS level of 2.2.3, you could run 16 LPMS at the same time. Starting with VIOS 2.2.5, MSP configuration is redundant such that an MSP that fails will not impact the LPM, but this restricts the number of LPMs that can concurrently be run for 4 VIOS to 8 LPM.

LPM Error Codes
During validation or migration you may see an HSCLxxxx error code. Sometimes these errors can be cryptic. This section attempts to provide some other information on the categories of error codes, their meaning, and how to diagnose or correct.

The link below is a generic help with HSCLA codes that might be presented. https://www.ibm.com/support/knowledgecenter/POWER6/arebi/HSCLA_info.htm

This first class of error codes are emblematic of RMC (Resource Monitor and Control) Subsystem issues. RMC is a vital part of the LPM process and if it is not configured correctly LPM validation will fail.
HSCLA246, HSCLA256, HSCLA257, HSCLA296, HSCLA299, HSCLA282, HSCL2957.
If you encounter one of these; follow the advice previously given on the RMC help link, which I post again here: http://www-01.ibm.com/support/docview.wss?uid=isg3T1020611

The class of error codes below are storage related. 
As previously discussed, the devscan utility as described earlier (https://www.ibm.com/support/pages/devscan-tool) can help validate your SAN infrastructure.
Additionally, there is detailed error information in many of these errors that provide needed information on what in your storage infrastructure (zoning, WWPNS) to correct.  Pay attention to the DETAILED ERROR INFO in these outputs; they often contain the nugget of information you need.
In particular for HSCLA24E, there is a useful link: http://www.ibm.com/support/docview.wss?uid=isg3T1019383
For HSCLA356 or HSCLA29A during LPM validation of AIX NPIV Client: http://www.ibm.com/support/docview.wss?uid=isg3T1022675
For LPM validation errors HSCLA27C, HSCL400A, HSCLA29A, here is a useful link: http://www.ibm.com/support/docview.wss?uid=isg3T1023820
NOTE: Don’t forget for your VSCSI devices (on the VIOS hdisks mapped to clients) to set your hdisk setting for reserve policy to no reserve!
Here is another useful link for vSCSI HSCL errors HSCAL319, HSCLA356, or HSCLA29A during LPM Validation of an AIX vSCSI Client:  http://www.ibm.com/support/docview.wss?uid=isg3T1023309

Here are a couple more HSCLA errors that you may encounter:
  • HSCLA22D: This error is emblematic of a network issue (could be temporary or permanent).
  • HSCLA228: Finally this error is seen if/when a system is not in the right state for LPM
Contacting the PowerVM Team
Have questions for the PowerVM team or want to learn more?  Follow our discussion group on LinkedIn IBM PowerVM or IBM Community Discussions







#PowerVM
#powervmblog
#powervmlpm
0 comments
54 views

Permalink