Hello Shubhangi Tripathi
Strange situation indeed and interesting problem.
And yes Andrey is right post it into the right discussion (power group) will help.
Also indeed an older version of AIX 7.2. but anyway…
The way that I should try to find the root cause (maybe you already did) is the following:
First when did this happen first, after changing or implementing something?
Is the load on the system changed? Or more processes started?
Check entries in the /etc/inittab and more specifically the lines above the line:
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
on our 7.2 systems the order is:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | /usr/bin/alog -tboot > /dev/console # Power Failure Detection
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | /usr/bin/alog -tboot > /dev/console # Multi-User checks
rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
nimsh:2:wait:/usr/bin/startsrc -e "LIBPATH=/usr/lib" -g nimclient >/dev/console 2>&1
aso:23456789:once:/usr/bin/startsrc -s aso
ofed:2:wait:/usr/sbin/ofedctrl -l >/dev/null 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
ldapclntd:2:wait:/usr/sbin/start-secldapclntd > /dev/console 2>&1
when it fails again check all subsystems with:
lssrc -a | grep rsct
ctrmc rsct 9044692 active
IBM.HostRM rsct_rm 54723020 active
IBM.ConfigRM rsct_rm 58589612 active
IBM.DRM rsct_rm 1573384 active
IBM.MgmtDomainRM rsct_rm 62652820 active
IBM.ServiceRM rsct_rm 16580906 active
Try to start failing subsystems with startsrc -s <subsystem_name>
Check again with errpt -a to monitor new errors.
Check with ps -ef | grep srcmstr to see if the process is running
If not running try to start it manually in background with srcmstr & (as root user)
Check again with errpt -a to monitor new errors.
And or
alog -ot console |grep srcm
Check the /var filesystem if there is free space, add some space and try it again.
More detailed logging with alog you can run:
alog -ot console |grep srcm
0 Fri Jun 20 10:02:07 CEST 2025 Checking for srcmstr active... 0 Fri Jun 20 10:02:07 CEST 2025 complete
0 Mon Jun 23 11:51:28 CEST 2025 Checking for srcmstr active... 0 Mon Jun 23 11:51:28 CEST 2025 complete
0 Mon Jun 23 16:12:15 CEST 2025 Checking for srcmstr active... 0 Mon Jun 23 16:12:15 CEST 2025 complete
b.t.w. I remember one case we ran out of max maxuproc and got weird behavior then
you can check this with: lsattr -EHl sys0 | grep maxuproc default is 16384.
Greetings Christian Sonnemans.
------------------------------
Christian Sonnemans
Tactical Unix system engineer
AsnBank
Den Bosch
------------------------------
Original Message:
Sent: Sat June 28, 2025 03:57 PM
From: Shubhangi Tripathi
Subject: SRC daemon becomes inactive and does not respawn even after inittab entry
Hi Team
We have been facing this weird reoccurring issue in one of our AIX 7.2 server (7200_05_04) , where first RMCdaemon stops then IBM.ConfigRM stops finally leading to an inactive srcmstr daemon, which breaks all our sessions to the server leaving us with no option but to reboot the server after which the issue resolves , but the same issue re-occurs within 20-25 days . We have the entry in our /etc/inittab file for respawning srcmstr daemon , however it does not start automatically when this issue occurs . We are still trying to figure out the root cause for this issue. I would request you to please share any suggestions or information incase you have ever faced a similar issue .
Entry in /etc/inittab file :-
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
Error log :-
DE84C4DB 0625103425 I O ConfigRM IBM.ConfigRM daemon has started.
A6DF45AA 0625103425 I O RMCdaemon The daemon is started.
3CACA614 0625103125 I O sys0 Partition boot reason.
69350832 0625103125 T S SYSPROC SYSTEM SHUTDOWN BY USER
9DBCFDEE 0625103125 T O errdemon ERROR LOGGING TURNED ON
192AC071 0625102825 T O errdemon ERROR LOGGING TURNED OFF
447D3237 0625092825 I O ConfigRM IBM.ConfigRM daemon has been stopped.
A2D4EDC6 0625092825 I O RMCdaemon The daemon is stopped.
=====================
LABEL: RMCD_INFO_1_ST
IDENTIFIER: A2D4EDC6
Date/Time: Wed Jun 25 09:28:05 CDT 2025
Sequence Number: 28481
Machine Id: XXXXX
Node Id: XXXXX
Class: O
Type: INFO
WPAR: Global
Resource Name: RMCdaemon
Description
The daemon is stopped.
Probable Causes
The Resource Monitoring and Control daemon is stopped.
User Causes
1. The stopsrc -s ctrmc command has been executed.
2. The stopsrc -fs ctrmc command has been executed.
3. The stopsrc -cs ctrmc command has been executed or
the rmcctrl -k command has been executed.
Recommended Actions
Confirm that the daemon should be stopped.
Detail Data
DETECTING MODULE
RSCT,rmcd.c,1.151.1.1,1337
ERROR ID
64rCpW0pR.Lc/XHV.8.cZ8....................
REFERENCE CODE
Number of command that stopped the daemon
3
========================
LABEL: CONFIGRM_STOPPED_ST
IDENTIFIER: 447D3237
Date/Time: Wed Jun 25 09:28:05 CDT 2025
Sequence Number: 28482
Machine Id: XXXXXX
Node Id: XXXXX
Class: O
Type: INFO
WPAR: Global
Resource Name: ConfigRM
Description
IBM.ConfigRM daemon has been stopped.
Probable Causes
The RSCT Configuration Manager daemon(IBM.ConfigRMd) has been stopped.
User Causes
The stopsrc -s IBM.ConfigRM command has been executed.
Recommended Actions
Confirm that the daemon should be stopped. Normally, this daemon should
not be stopped explicitly by the user.
Detail Data
DETECTING MODULE
RSCT,ConfigRMDaemon.C,1.32,282
ERROR ID
REFERENCE CODE
------------------------------
Shubhangi Tripathi
------------------------------