Skip main navigation (Press Enter).
Log in
Toggle navigation
Log in
Community
Topic Groups
Champions
Directory
Program overview
Rising Champions
IBM Champions group
User Groups
Directory
Benefits
Events
Dev Days
Conference
Community events
User Groups events
All TechXchange events
Participate
TechXchange Group
Welcome Corner
Blogging
Member directory
Community leaders
Resources
IBM TechXchange
Community
Conference
Events
IBM Developer
IBM Training
IBM TechXchange
Community
Conference
Events
IBM Developer
IBM Training
PowerVM
×
PowerVM
Connect, learn, share, and engage with IBM Power.
Group Home
Threads
473
Blogs
113
Events
0
Library
18
Members
1.2K
View Only
Share
Share on LinkedIn
Share on X
Share on Facebook
Back to Blog List
How Live Partition Mobility is Tested
By
THIRUKUMARAN VASANTHA THANANJAYAN
posted
Wed June 17, 2020 01:27 PM
Like
Introduction
I
BM has multiple test organizations with their own specific missions. In the System Assurance organization there is a Software Test team, Hardware Test team, and Storage Test Team. The Software Test mission includes verification of Operation Systems, VIOS, and Management Consoles. Hardware System Test mission includes IO adapters, Firmware, Hypervisor, and Serviceability. Besides the System Assurance team there are development Functional Test teams within AIX, IBM I, Linux, VIOS, Firmware, PowerVM Hypervisor, and Management Consoles with their respective test missions. Live Partition Mobility (LPM) is a critical PowerVM function; customers are increasingly reliant on it to avoid downtime during Hardware Maintenance, Firmware Upgrades, etc. What makes LPM unique is its interaction between Hypervisor, Operating Systems, Firmware, HMC, VIOS, Storage and therefore all IBM Test teams are involved in testing LPM.
LPM Testing Coverage Consideration
L
PM provides the customer with a sizeable number of options and the LPM Test Procedures cycles through combinations listed in the graphic below
Our LPM test environment (a.k.a. LPM Zone) spreads across three different sites (Austin, Poughkeepsie and Guadalajara) encompassing 50 systems that range from POWER6 to the latest generation systems.
LPM Testing Procedure
Good Path Testing
Systems are setup to address all the attributes mentioned in the graphic above. And within each of the components tested we also test variations, for example: LPM operations between different VIOS, LPM operations with different settings within VIOS levels (concurrency levels, security profiles, etc.). Combinations of Firmware levels are included. Testing LPM for new Firmware Release translates to more than 100,000 migrations.
Bad Path Testing
This is where we inject errors or test failure scenarios. The objective of Bad Path Testing is to make sure that LPM failures can be recovered without any issues.
Different scenarios include (but not limited to):
Stop LPM migration when in progress
V
IOS Network cable pull (Source / Target)
VIOS FC cable pull (Source / Target)
VIOS Reboot (in Dual VIOS) (Source / Target)
MSP Failover verification
Inject error in IO Adapter during LPM
Migrate with invalid NPIV mapping (Single port or no Target mapping)
Migrate unsupported (in target) Operating System
Migration from Dual VIOS system to Single VIOS system and vice versa
vNIC – cable pull to trigger Failover
vNIC – cable reconnect to trigger Fallback
Cable pull in Link Aggregation
Memory error injection during Migration
vLAN Bridge / Redundant VIOS / MPIO / Redundant vNIC Backing Device / Redundant MSP overrides
Stress Testing
LPM is done continuously between systems, in back to back loops, with periodically other testing done in between.
Sample scenarios:
5 days of continuous LPM loops
DLPAR in between LPM loops
DPO (Affinity Optimization) in between LPM loops
Concurrent Code Update / Reject in between LPMs
Hibernate – Migrate – Resume in loop (Note: Hibernate will not be supported on POWER9)
Migrate Large Memory Partition (20TB)
Migrate – Remote Restart in loop
Follow-ups from LPM Testing & Field Experience
Our goal is to uncover and address all LPM issues to avoid customer impact. Since the last P8 Firmware release we have been able to gauge the LPM success rate by our customers by using statistics collected by Call Home. This graph shows how the LPM improvements incorporated in recent Firmware levels had a positive effect on the LPM success rate. Success and fail rates calculations include both LPM Validations and actual migrations.
With the breath of testing that LPM entails, we are constantly testing LPM within IBM on existing and planned future releases of System Firmware, VIOS, HMC, PowerVC or NovaLink. In rare cases an LPM issue escapes to the field. When a new LPM issue is encountered (in the field or within IBM) the fix is verified against the originally failing configuration. Thereafter our development team does a through analysis to determine when the problem was introduced. If it is determined that the problem can occur in a publicly available release, the fix is included in the next scheduled service pack for each active impacted release. Net service packs include fixes for issues uncovered in our testing of follow-on releases as well as fixes for any field reported issues.
Conclusion
LPM is one of the most important virtualization feature of PowerVM. As a result, we dedicate significant effort & resources to ensure that LPM works as expected and successfully recovers in case of failures. Keeping your Power Systems IT environment current on latest recommended Service Packs, APARs, PTFs, etc. (for Firmware, VIOS, HMC, OSes...) will maximize LPM success rates.
Contacting the PowerVM Team
Have questions for the PowerVM team or want to learn more? Follow our discussion group on LinkedIn
IBM PowerVM
or IBM Community
Discussions
#PowerVM
#powervmblog
#powervmlpm
0 comments
43 views
Permalink
Copy
https://community.ibm.com/community/user/blogs/thirukumaran-vasantha-thananjayan1/2020/06/17/how-live-partition-mobility-is-tested
Powered by Higher Logic