I recently had a customer NIM environment start exhibiting failures for all NIM operations. We had implemented SSL for NIMSH on all clients to improve security and prevent LPM operations across systems from breaking NIM control due to CPU ID's.
We found out the hard way that the NIM master generates custom SSL certificates in /ssl_nimsh that are only valid for one year. At the end of that time, every LPAR rejected NIM operations. That included our weekly mksysb.
The error messages are completely obfuscated, so we could not tell that the certificate had expired. The message on the NIM master was generic:
# nim -o lslpp NIMCLIENT
0042-001 nim: processing error encountered on "master":
0042-006 m_lslpp: (From_Master) connect Error 0
0042-404 nconn: Error connecting to SSL object.
0042-406 nconn: Error verifying SSL object after connection.
405 nconn: Error with certificate at depth .
and the client log had little information:
Thu Jul 11 13:21:11 2024 success: we got 1st write query is 0
Thu Jul 11 13:21:11 2024 success: we got 2nd write local id is 00XXXXXX00
Thu Jul 11 13:21:11 2024 success: we got 3rd write remote id is 00XXXXXX00
Thu Jul 11 13:21:11 2024 success: we got 4th write command is /usr/lpp/bos.sysmgt/nim/methods/c_nimpush "/usr/lpp/bos.sysmgt/nim/methods/c_ckspot" "-l" "-ast_applied=3" "-ast_committed=5" "-aplatform=yes" "-aname=NIMCLIENT "-alocation=/usr"
Thu Jul 11 13:21:11 2024 passing OpenSSL setting of 1
Thu Jul 11 13:21:11 2024 set symbol table
Thu Jul 11 13:21:11 2024 cert filename discovered: /ssl_nimsh/certs/MASTER.0
Thu Jul 11 13:21:11 2024 seed_prng
Thu Jul 1