On the topic of path failures, I wanted to share a recent customer
experience with failover testing. Every installation I perform has a
step for doing failover testing to exercise MPIO, VIO, Etherchannel,
etc. Frequently these redundancy features have upstream dependencies
from other teams, and it's good to do some basic failover tests in the
spirit of "trust but verify".
We did AIX 7.2 TL4 failover testing with MPIO with default hcheck
settings and disabling ports at the switch while running a sample IO
load of about 15 MB/s. We observed with iostat a brief 10 second pause
in IO (100% disk busy) to our hdisk on a path failure. When the path
was enabled there was no IO interruption for recovery, and the
degraded status lasted for a full five minutes in lsmpio. The path
appeared to take up to 60 seconds to show enabled in lspath, or the
top of the next minute. The behavior was consistent across sequential
tests.
I find it interesting that the errpt logs "additional FC info" as soon
as the link resumes, but that it may take a full minute before MPIO
recognizes that path as enabled in lspath.
Doing several tests in series, I confirmed that if a path is degraded
and we lose the last "good" path that MPIO will automatically use any
available path regardless of the degraded status. There wasn't any IO
interruption when we removed the last good path and MPIO was forced to
use a degraded path.
We did notice a longer pause on VIO reboot where we lost one VSCSI
path and two NPIV HBAs, however it was still under a minute (~40
sec). My test didn't report on the interval by path technology, just
monitored the overall IO looking for 100% disk busy with iostat. All
paths healed automatically when VIO resumed. I believe the longer
duration was due to losing three paths of two IO types. I think that
while a virtual NPIV adapter will respond within 10 seconds to a link
event, it takes longer when it loses contact with it's sibling adapter
on the VIO server.
My experience has been that outside of PowerHA critical VGs or the
watchdog on rootvg, a minute of IO delay is tolerated without error by
most database applications that are using JFS2 filesystems on LVM for
storage. If you don't log an LVM I/O Error or a PV Missing event, IO
just continues.
I certainly don't miss RDAC, and the integration of SDDPCM features
into the base MPIO has been great. We've come a long way in storage
reliability without needing third party device drivers or additional
software.
------------------------------------------------------------------
Russell Adams
Russell.Adams@AdamsSystems.nlPrincipal Consultant Adams Systems Consultancy
http://adamssystems.nl/
Original Message:
Sent: 10/22/2021 2:14:00 AM
From: Tommi Sihvo
Subject: RE: AIXPCM MPIO / Recovery after path failure
Thanks Gary for valuable info!
Br,
tommi
------------------------------
Tommi Sihvo, Lead Service Architect
TietoEVRY, Compute Services
email tommi.sihvo@tieto.com mobile +358 (0)40 5180 Finland
------------------------------
Original Message:
Sent: Thu October 21, 2021 01:15 PM
From: Gary Domrow
Subject: AIXPCM MPIO / Recovery after path failure
Everything that Russell said above is correct, but I can add just a couple more details that might be useful.
1. The initial "settling interval" as Russell called it varies from 5 to 15 minutes based on the health check interval. AIX uses 5 times hcheck_interval, but limits it to be between 5 and 15 minutes.
2. During the settling interval, AIX continues to send health check commands on those paths. If any of those health check commands fail, the length of the interval is extended.
3. During the settling interval, the "path_status" column of the lsmpio output for the disk (e.g. lsmpio -l hdisk7) includes the status "Deg", which indicates that the path is degraded, meaning that it recently recovered from some error. Looking at the statistics and time stamps shown by lsmpio -Sl hdisk7 may give more information about why the path is currently degraded.
------------------------------
Gary Domrow
Original Message:
Sent: Thu October 21, 2021 01:42 AM
From: Tommi Sihvo
Subject: AIXPCM MPIO / Recovery after path failure
Thanks Russell!
This was really good information!
And yes, we do have those mentioned settings on place, and using only NPIV, no vscsi.
Br,
tommi
------------------------------
Tommi Sihvo, Lead Service Architect
TietoEVRY, Compute Services
email tommi.sihvo@tieto.com mobile +358 (0)40 5180 Finland
Original Message:
Sent: Wed October 20, 2021 07:05 AM
From: Russell Adams
Subject: AIXPCM MPIO / Recovery after path failure
On Wed, Oct 20, 2021 at 09:51:47AM +0000, Tommi Sihvo via IBM Community wrote:
> But the curious thing he noted was that it took 5 minutes (under concurrent IO load) before the AIX client started really using the recovered paths.
> After 5 mins it started using those automatically, no manual actions required, so no issues with that. We are using AIXPCM drivers.
There is a "settling interval" where a path isn't immediately
used. This used to differ across Powerpath, MPIO, RDAC, and other
drivers. Both Powerpath and MPIO wait 5 minutes after a path is back
before dispatching IO, however I've never seen this documented.
I believe it's to prevent causing more issues if a port is flapping.
If you test further, you'll discover that even if the path is
recovered and idle it will be used immediately if there are further
path failures. This demonstrates that keeping the path idle is a
soft choice made by MPIO.
Did he notice yet that if you unplug all the paths causing a total
outage, that one path will still say Enabled and not Failed in lspath?
> He did also similar testing with RHEL, and there linux started using
> those recovered paths immediately when those were removed.
Using the path immediately would suffer IO degradation if the path is
flapping. It's a simpler implementation to just use it immediately.
> So my question would be is this standard AIX behaviour / something that is hard-coded on the driver itself?
> Or is there some parameter one could finetune to get the time smaller than 5 mins?
There's no need to reduce it. They aren't dead, just on standby.
What's absolutely critical is to confirm you have dyntrk, fast_fail,
and your vscsi options correct so that path failures auto recover
without administrator intervention.
chdev -l vscsiX -a vscsi_path_to=30 -a vscsi_err_recov=fast_fail -P
Recent AIX and VIO default to dyntrk and fast_fail, but dual VIO with
vscsi must have the client AIX vscsiX devices set manually. If not,
when the client fails across VIO the original path may never return
until you perform a cfgmgr.
------------------------------------------------------------------
Russell Adams Russell.Adams@AdamsSystems.nl
Principal Consultant Adams Systems Consultancy
http://adamssystems.nl/
Original Message:
Sent: 10/20/2021 5:52:00 AM
From: Tommi Sihvo
Subject: AIXPCM MPIO / Recovery after path failure
Hi,
My colleague did some SAN path recovery testing.
In our default setup we have standard Dual VIOS NPIV setup, 4 virtual fibre channel adapters.
He did cut down 2/4 SAN paths and restored them after seeing everything worked as expected.
But the curious thing he noted was that it took 5 minutes (under concurrent IO load) before the AIX client started really using the recovered paths.
After 5 mins it started using those automatically, no manual actions required, so no issues with that. We are using AIXPCM drivers.
He did also similar testing with RHEL, and there linux started using those recovered paths immediately when those were removed.
So my question would be is this standard AIX behaviour / something that is hard-coded on the driver itself?
Or is there some parameter one could finetune to get the time smaller than 5 mins?
------------------------------
Tommi Sihvo, Lead Service Architect
TietoEVRY, Compute Services
email tommi.sihvo@tieto.com mobile +358 (0)40 5180 Finland
------------------------------