AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
#Power
 View Only

VIO-FC quirks of LKU and AIX 7.3 TL03/04

By Christian Sonnemans posted Thu February 05, 2026 07:46 AM

  

VIO-FC quirks of LKU and AIX 7.3 TL03/04

In my last blog I did some first test with LKU / LLU and TL04. While doing those tests, I discovered some issues with LKU that it lost very occasionally on npiv (mpio) path. I was already familiar with this issue in TL03, and logged already a case for this, but this rare problem seems also to exist in TL04. Is it a LKU problem or something else?

What happens? During the last phase (post) of LKU it does a cleanup of the “old” LPAR so called the original, and the process also re-enables all the NPIV paths.

What I noticed was the following:

Right after the blackout time it tries to reestablish all the npiv paths and this is successful.

But later on, in the cleanup phase one path disappeared again.

Also the LKU process came with a similar message as example below (but not always):

430-249 An error occurred during post-processing.

09/15/2025-08:32:37 Initiating log capture.

Local log capture in progress and can be tracked from:

/tmp/snapimg/lu_2025-09-15_08-32-32_capture.log

On completion, log archive will be stored as:

/tmp/snapimg/lu_2025-09-15_08-32-32_snap.pax.gz

1430-094 The live update operation completed, but one or more errors occurred during post-processing after the update was applied.

First, we (IBM Ned) and I thought that was I timing issue, but the problem was a bit further on the NPIV road. Therefore we created a new case, this case was taken by IBM VIO expert Lyudmila.

The root cause of this issue was very hard to find, this because this it rarely occurs.

Average once in the 10 consecutive runs of LKU’s.

Therefore Lyudmila and myself spend several mounts of testing, data collection and reviewing the debug data, of both the LPAR were I started the LKU and the VIO servers.

See below the perspective and findings of Lyudmila:

During the LKU phase, once LPAR boots and final configuration of the vFC ports occur, the vFC client issues a fabric login, followed by “discover target” query to the SAN Switch Name Server. The Name Server is then expected to return a list of Port Identifiers that are also registered on the SAN. However, in our case, the switch query failed with errno=6 (ENXIO - No such device)

 as seen in the LPAR boot log:

MS 14483844 12255740 timestamp /usr/lib/methods/cfg_vfc -l fcs1

M0 14483844 timestamp cfg_vfc.c 1090 name of child fscsi1

MS 14483846 12255740 timestamp /usr/lib/methods/cfgemfscsi -l fscsi1

M0 14483846 timestamp cfgfscsidc.c 9531 send_gid_ft: SCIOLNMSRV failure: rc=-1, errno=6

The errpt log confirmed that the vFC adapter disconnected at that time, hence the query to name server also failed:

errpt -a |summ

Dec 16 13:22:38 LVUPDATE   I LVUP_DONE           Live AIX update completed successfully

Dec 16 13:21:47 fscsi1     T FCP_ERR12           ioctl() send seq failure for FFFFFC; ENXIO  [scsi_state LINK_DOWN|

Dec 16 13:21:47 fcs1       P VFC4_ERR3           VFC_ERR_LOC_242 viosN vfchostN vfc transport failed or de-registered; sending async link_down

Dec 16 13:19:30 errdemon   T ERRLOG_ON           ERROR LOGGING TURNED ON

Dec 16 13:10:09 LVUPDATE   I LVUP_STARTED        Live AIX update started

Looking further at VIOS errpt, the vfchost error explained that the client adapter was not yet logged in to the switch but had already attempted to issue an additional query.

errpt -a|summ| grep VIOS_VFC_CLIENT_FAI

Dec 16 13:21:47 vfchost17 T VIOS_VFC_CLIENT_FAI NPIV_ERR_0185 Misbehaved Virtual FC Client Invalid MAD command sent before Fabric Login

Info about the tool summIBM AIX Diagnostic Tool "summ": A Summarized System Error log and Report Generator for I/O devices.

We identified the root cause at the physical FC adapter setting sw_prli_rjt=no.

As Lyudmila explained above the final root cause:

This specific issue is a combination of LKU and using a specific type of Fibre Channel adapter in our Power 10 systems - the EN1G.

Big shoutout to Lyudmila (Lyudmila Simeonova) and Ned (Nayden Stoyanov)

Therefore, Ned and Lyudmila deserve a big compliment for their commitment and especially their perseverance in unraveling this difficult problem. Their collaborative way working together was again excellent!

Note: See also article below that also describes this problem:

Certain Fibre Adapters in IBM Power Systems are Incorrectly Registered as Targets instead of Initiators in the SAN fabric

Explanation of Lyudmila:

With this configuration, VIOS FC adapters (initiators) are allowed to perform process logins to themselves or to other initiators, and if it happens, it may disrupt proper fabric login and SAN storage discovery for NPIV client adapters mapped to same physical FC ports.  This behavior resulted in the observed GID_FT failures and the vfchost error during the LKU Cleanup phase Post phase of LKU.

Friendly waring:

Please check your fiberchannel adapter in your VIO servers:

If you recognize similar problems at the end of a LKU process please check your fiberchannel adapter type!

In other words, please check if you have installed any of those fiberchannel adapters in this list below and if you are missing a NPIV path after a LKU took place:

FC EN0Y &EN12; CCIN ENOY
FC EN0F & EN0G; CCIN 578D
FC 5708 & 5270; CCIN 2B3B
FC EN1E, EN1F; CCIN 579A
FC EN1J, EN1K; CCIN 579C
FC EN1G, EN1H; CCIN 579B

FC EN2N, EN2P; CCIN 2F05

FC EN2L, EN2M; CCIN 2F06

Temporally workaround:

The strange thing however after this failure you can run a cfgmgr, and you will notice that the missing path is corrected after this. This was one of the reasons why I discovered this not earlier our LKU scripts always checks after an LKU if there are paths missing and if so it runs automatically a cfgmgr.

Remediation of this problem

If you encounter these errors during LKU, the following change at the VIOS FC adapter(s) resolves the issue.

$ chdev -dev fcs# -perm -attr sw_prli_rjt=yes  < -- repeat this for every adapter.

$ oem_setup_env

# bosboot -ad /dev/ipldevice

Warning: Make sure that other paths are established via the second vio server !!

And or de-configure the paths on your LPARS of the vio that you like to reboot.

$ shutdown -restart 

Reboot the VIOS to activate the new setup.

After the reboot check your adapter(s) settings with the following commands:

After set above or after reboot of vio server this can be checked with:

Note: that sw_prli_rjt is a hidden attribute.

You can check the effective/current value in kdb as shown below:

oem_setup_env

echo "cvfcs" | kdb

Then use the fcs device address from the output and execute the following:

echo "cvfcs -d address" | kdb | grep sw_prli_rjt

char sw_prli_rjt = 0x0  = no < -- incorrect value (default still active)

char sw_prli_rjt = 0x1 = yes < -- correct value is yes here.

Documentation reference (Lyudmila):https://www.ibm.com/support/pages/node/7260048

Special thanks to Lyudmila Simeonova and Nayden Stoyanov for fixing this issue.

If you like this blog or if you have any other questions, please send you questions and comments, I am always ready to answer your questions and for a good conversation.

0 comments
32 views

Permalink