Last year I promised to post my TL04 test results. This blog post is a bit more detailed, including my initial findings and the upgrade process.
Lets start with preparing TL03SP1 for the upgrade.
NIM client deinstall issues:
Normally the upgrade process with AIX in combination with LKU is straightforward, however due to a high security NIM issue (cve score 10) we had to apply on both the NIM master and all the clients an efix to mitigate this security risk.
The removal of this efix on the client caused some issues.
The normal upgrade process for AIX TL03SP1 to TL04SP00 should be something like:
1. Stop TE (kernel mode) trustchk -p te=off
2. Remove all efixes that are applied to TL03SP1
3. Remove ip-sec ip filters and remove the ipsec_v4 and or ipsec_v6 devices
This is needed because also for IP-SEC I installed an efix for IP-SEC and LKU updates.
4. Apply the new TL level with NIM / update_all.
5. Of course, we use LKU to activate the new TL04 level and let our database and programs running while executing the LKU (geninstall -k)
6. Afterward we verify if the oslevel is TL04, and everything is still running as expected.
However removing all the efixes was fine except for the NIM client efix.
The Main reason was a chicken-and-egg problem. Because we must remove the efix for the NIM client, the old nim client likes to become active. But the NIM server is still running the patched version.
Therefore, during the de-installation of the nim client the uninstall process emgr -r -L <efix-nimclient-label> hangs.
Workaround for this is the following:
1. uninstall all efixes except the openssl ifix and nim ifix
2. export INUCLIENTS=yes
3. stopsrc -s nimsh
4. uninstall nim ifix
5. unset INUCLIENTS
After removing this last efix, I was able to follow the normal procedure; see above continuing with step 3.
Initial LKU backout time TL03 to TL04.
After the initial upgrade from TL03SP1 to TL04SP00 I looked up the blackout time with was 11.539175 seconds.
Command: alog -t mobte -o | sed '/^$/d'| tail -1 | cut -d ' ' -f 40
So slightly better than TL03.
But more about improved blackout times later.
LLU updating our running programs with new libraries: libc.a and libpthreads.a
Of course I was very curious about the behavior of LLU. This because LLU is in TL04 officially released TL04 (7300-04-00-2546).
So I liked to run a llvupdate and monitor the behavior.
I did not stop (on purpose) any running program and database. I always use an IBM LDAP test server for my test cases; this because then I have a DB2, ibmslapd, and some load on my test environment.
First I did a preview to see what processes needed to be updated for the new libraries libc.a and libptreads.a see output below:
llvupdate -P
llvupdate preview
An LLU-capable library is newer for process init pid 1.
Library needs to be updated /usr/lib/libpthreads.a(_shr_xpg5.o)
Validating new module /usr/lib/libpthreads.a(_shr_xpg5.o)
Library needs to be updated /usr/lib/libc.a(_shr.o)
Library needs to be updated /usr/lib/libpthreads.a(_shr_xpg5.o)
Library needs to be updated /usr/lib/libc.a(_shr.o)
An LLU-capable library is newer for process db2fmp pid 7995864.
~snip~
And to lot more lines here
~snip~
An LLU-capable library is newer for process db2vend pid 13566394.
Library needs to be updated /usr/lib/libpthreads.a(_shr_xpg5_64.o)
Library needs to be updated /usr/lib/libc.a(_shr_64.o)
An LLU-capable library is newer for process ibmslapd pid 17367362.
Library needs to be updated /usr/lib/libpthreads.a(_shr_xpg5_64.o)
Library needs to be updated /usr/lib/libc.a(_shr_64.o)
An LLU-capable library is newer for process ksh pid 17760576.
Library needs to be updated /usr/lib/libc.a(_shr.o)
So there was quite some work to do by llvupdate!
After this I executed the command:
llvupdate -a -n2 -t60
All processes could be successfully updated with the new libraries, except the db2sysc processes. This was also the case with TL03, but the issue with this is again a chicken-and-egg problem!
Let me explain what the case is with this part of db2:
Boths apars IJ56521and IJ55138 are available in TL04 so what is the problem?
# instfix -ik IJ56521
All filesets for IJ56521 were found.
# instfix -ik IJ55138
All filesets for IJ55138 were found.
Description of those apar’s:
The crash is caused by either one of these APAR's:
IJ56521: A SYSTEM CRASH MAY OCCUR DURING THE LLU OPERATION. APPLIES TO AIX 7300-04
IJ55138: SYSTEM CRASH WHEN LLU FAIL APPLIES TO AIX 7300-04
Unfortunately, those APARs are only applied after you updated to TL4 AND after the application has been restarted. This makes the use for LLU obsolete for those processes.
Meaning that for db2sysc can be successfully LLU'd the next time we move to TL04SP1. And if there are new libc or pthreads available in SP1 (most likely).
How to avoid kernel crashes with db2sysc running?
We have tree options here:
Option1: The Most simple option is to stop and restart the program that is responsible for this process. In this case our LDAP server application and it’s database DB2 underneath.
Option 2: Exclude the db2sysc processes from being LLU’d (Ned invented a new verb here). This can be done with the llvupdate -e process-id
For example you have to figure out the pid id’s for db2sysc:
ps -ef |grep db2sysc |grep -v grep| awk '{print $2}'
8782116
llvupdate -a -n2 -t60 -e 8782116
Pitfall: if during the execution of llvupdate more db2sysc processes are spined off than those are not excluded.
Option3: Another option is to make use of a sort of exclude file for LKU and LLU:
This file is available in directory /etc/liveupdate/lvup_BaseProcs
See content of this file below:
internal:
lu_req1 = "/usr/ccs/bin/shlap64"
lu_req2 = "/usr/sbin/syncd"
lu_req3 = "/usr/sbin/rpc.lockd"
lu_req4 = "/usr/sbin/uprintfd"
lu_req5 = "/usr/sbin/nfsd"
lu_req6 = "/usr/sbin/rpc.mountd"
lu_req7 = "/usr/lib/errdemon"
lu_req8 = "/usr/sbin/olvupdate"
lu_req9 = "/usr/sbin/getty"
lu_req10 = "/usr/sbin/automountd"
commands:
db2sysc = /data/ldapdb/sqllib/adm/db2sysc
Warnings!
Warning 1: To add the db2sysc executable to this list must be done with this command:
lvupdateSetProcs -a -b -n db2sysc -c /data/ldapdb/sqllib/adm/db2sysc
Verify this via command:
lvupdateSetProcs -l -b
Adding it to the file /etc/liveupdate/lvup_BaseProcs does work, but also LKU references this file.
In other words, as soon if you are finished with the LLU, do NOT forget to remove the line again, otherwise LKU might skip moving the db2sync process to the new kernel.
use command below for this:
lvupdateSetProcs -r -b -n db2sysc -c /data/ldapdb/sqllib/adm/db2sysc
warning 2: I did not completely test this with a upgrade form TL03 to TL04, also the advice of IBM was to use option 2.
But the good news is that LLU is fully in support now and also the right APARS are available. And my tests for the other processes such as ssh / ntp / nfs and so on were all successful.
After being moved to TL04 the next steps…
Upload and enable TE / RBAC
First, I liked to enable all the basic things for us, such as TE (Trusted Exection) and IP-Sec IP filters.
As many of my readers know, TE is a kind of basic security feature that we use to protect our executables and libraries. Also, for TE a new feature (policy) came available for the runtime execution TE, that I will discuss later on in this blog post.
So the first things I always do after a new version of AIX offloading the TE databases tsd.dat libtsd.dat and tepolicies.dat to our IBM Ldap servers. Also, our RBAC databases we store on LDAP. See also examples in an older blog series of rbac of my:
https://community.ibm.com/community/user/power/blogs/christian-sonnemans1/2024/09/27/aix-advanced-rbac-part-4
Storing all those databases on LDAP was not the problem, also the LDAP user and password authentication were no issues.
But reading the TE databases from our LDAP server was an issue.
Problem here is that due to new functionality of the secldapclnd daemon this specific TE functionality was broken. This was fortunately not the case for the RBAC databases.
Solution: created a case and I have already received a test efix in which this problem is fixed.
IBM recognized this as a defect; and an official fix will be created. After applying this test efix I only had to restart the secldapclntd 'restart-secldapclntd'
If the official fix is available I will update this blog with the efix-number.
Enable IP-SEC filters again.
Next thing was to load and enable our IP-SEC filters again, this is no longer necessary in the future but remember the step during de-installation I had to remove the patch for IP-SEC.
Due to this removal, I could not use the new feature in LKU see example of the stanza in lvupdate.data below:
general:
kext_check = yes
ipsec_auto_migrate = yes
So, prior to the upgrade I had to stop once more the ip-sec filter rules and had to remove the ipsec_v4 device (rmdev -Rdl ipsec_v4)
The next step should now be easy and we want to create the ipsec_v4 device again and load our filter rules. See examples below:
/usr/sbin/mkdev -c ipsec -t 4
/usr/sbin/genfilt -n 2 -v 4 -a P -s 0.0.0.0 -m 0.0.0.0 -d 0.0.0.0 -M 0.0.0.0 -g N -c udp -O eq -P 53 -r L -w O -l N -f Y -i all -D "dns"
But on all the upgraded LPAR’s I got the same error:
ipsec_v4 Available
Can not get IPv4 default filter rule.
0519-002 libodm: The CLASS_SYMBOL does not identify a valid object class.
Check parameters...
Again, had to create a case for this as well, and it now seems that two files were corrupted during the upgrade: /etc/security/ ipsec_filter and ipsec_filter.vc
removing both files was the solution here.
Still working on this case why those files were corrupted. I will update this blog post if I have the final solution, work around for now is to remove those files and load the ip-filter rules.
Blackout improvements TL03SP1 compared with TL04
After all the folds had been ironed out, I ran the promised LKU blackout test.
See below the link of Vinod Kumar Boddukuri
What’s New in AIX 7.3 TL4: Live Kernel Update Enhancements
For this test I used my test system with the LDAP / DB2 instance and run LKU test in a loop for both TL03SP1 and TL04. I repeated this loop test 28 times to get a good estimate of the difference between TL03SP1 and TL04.
For TL03SP1 the average backout time was 13,19775008 seconds and for TL04SP0 the average backout time was 11,36420886 seconds.
So the difference is for this rather small LPAR 13,9% faster than TL03SP1.
Of course, this time is very specific for this environment and strongly depends on the number of processes filesystems and usage of memory.
But big compliments to Vinod and his team to improve this technology once again!
New policy for TE (Trusted Execution)
See also this article of BUKAI Biswas:
Runtime Verification support for library shared objects
With TL04 a new TE policy has been introduced for shared library objects
CHKSHOBJS.
This policy is useful when you run TE in kernel mode. (trustchk -p te=on)
This policy is only effective when CHKSHLIB is set (CHKSHLIB=ON ). Then it will check the integrity of the shared objects (.o) file that belongs to the lib.tsd.dat.
In case the result of the check fails than it will not load the shared object. This is an extra security policy that can be used.
Be aware that this policy is influenced by two other TE policies STOP_UNTRUSTD and STOP_ON_CHKFAIL see the table below:

Of course, the were a lot more enhancements in TL04, but that I will discuss in upcoming blog posts. See also this nice video of Andrey Klyachkin about emgr_check_ifixes AIX 7.3 TL4: Patch management with emgr_check_ifixes - YouTube
If you like this blog or if you have any other questions, please send you questions and comments, I am always ready to answer your questions and for a good conversation.