Power Global

Power Global

Connect, learn, share, and engage with IBM Power.

 View Only
  • 1.  SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Sat June 28, 2025 03:57 PM

    Hi Team 

    We have been facing this weird reoccurring issue in one of our AIX 7.2 server (7200_05_04) , where first RMCdaemon stops then IBM.ConfigRM stops finally leading to an inactive srcmstr daemon,  which breaks all our sessions to the server leaving us with no option but to reboot the server after which the issue resolves , but the same issue re-occurs within 20-25 days . We have the entry in our /etc/inittab file for respawning srcmstr daemon , however it does not start automatically when this issue occurs .  We are still trying to figure out the root cause for this issue. I would request you to please share any suggestions or information incase you have ever faced a similar issue .

    Entry in /etc/inittab file :-

    srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller

    Error log :-


    DE84C4DB   0625103425 I O ConfigRM       IBM.ConfigRM daemon has started.
    A6DF45AA   0625103425 I O RMCdaemon      The daemon is started.
    3CACA614   0625103125 I O sys0           Partition boot reason.
    69350832   0625103125 T S SYSPROC        SYSTEM SHUTDOWN BY USER
    9DBCFDEE   0625103125 T O errdemon       ERROR LOGGING TURNED ON
    192AC071   0625102825 T O errdemon       ERROR LOGGING TURNED OFF
    447D3237   0625092825 I O ConfigRM       IBM.ConfigRM daemon has been stopped.
    A2D4EDC6   0625092825 I O RMCdaemon      The daemon is stopped.

    =====================

    LABEL:          RMCD_INFO_1_ST
    IDENTIFIER:     A2D4EDC6
     
    Date/Time:       Wed Jun 25 09:28:05 CDT 2025
    Sequence Number: 28481
    Machine Id:      XXXXX
    Node Id:         XXXXX
    Class:           O
    Type:            INFO
    WPAR:            Global
    Resource Name:   RMCdaemon
     
    Description
    The daemon is stopped.
     
    Probable Causes
    The Resource Monitoring and Control daemon is stopped.
     
    User Causes
    1. The stopsrc -s ctrmc command has been executed.
    2. The stopsrc -fs ctrmc command has been executed.
    3. The stopsrc -cs ctrmc command has been executed or
    the rmcctrl -k command has been executed.
     
            Recommended Actions
            Confirm that the daemon should be stopped.
     
    Detail Data
    DETECTING MODULE
    RSCT,rmcd.c,1.151.1.1,1337
    ERROR ID
    64rCpW0pR.Lc/XHV.8.cZ8....................
    REFERENCE CODE
     
    Number of command that stopped the daemon
               3

    ========================

    LABEL:          CONFIGRM_STOPPED_ST
    IDENTIFIER:     447D3237
     
    Date/Time:       Wed Jun 25 09:28:05 CDT 2025
    Sequence Number: 28482
    Machine Id:      XXXXXX
    Node Id:         XXXXX
    Class:           O
    Type:            INFO
    WPAR:            Global
    Resource Name:   ConfigRM
     
    Description
    IBM.ConfigRM daemon has been stopped.
     
    Probable Causes
    The RSCT Configuration Manager daemon(IBM.ConfigRMd) has been stopped.
     
    User Causes
    The stopsrc -s IBM.ConfigRM command has been executed.
     
            Recommended Actions
            Confirm that the daemon should be stopped. Normally, this daemon should
            not be stopped explicitly by the user.
     
    Detail Data
    DETECTING MODULE
    RSCT,ConfigRMDaemon.C,1.32,282
    ERROR ID
     
    REFERENCE CODE



    ------------------------------
    Shubhangi Tripathi
    ------------------------------


  • 2.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Mon June 30, 2025 03:38 AM

    Hi,

    try to check following two links:

    HMC - RMC

    Aix4admins remove preview
    HMC - RMC
    Practical Guide to AIX and its related IBM techonologies, like PowerVM, PowerVC, PowerHA, HMC etc.
    View this on Aix4admins >

    Reconfigure RSCT ID to fix DLPAR issues on cloned AIX systems

    Ictbanking remove preview
    Reconfigure RSCT ID to fix DLPAR issues on cloned AIX systems
    Reconfigure RSCT ID to fix DLPAR issues on cloned AIX systems
    View this on Ictbanking >

    It usually helps me to solve RSCT deamons troubles.

    Regards Igor.



    ------------------------------
    Igor Novotny
    Principal Consultant
    MHM Computer, a.s.
    Prague 15
    00420602369375
    ------------------------------



  • 3.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Mon June 30, 2025 03:48 AM

    Hi,

    no, I never had such an issue, but:

    • If you post in the correct forum (AIX), you will probably get more answers.
    • It is better to open a case at IBM for the root cause analysis.
    • You have a rather old AIX version. The latest SP for AIX 7.2 TL5 is SP9. The SP10 is expected on July 10th.
    • I suppose you have a network connectivity issue between HMC and the AIX LPAR. What HMC version do you use? Try to recreate the RMC connection (see Igor's link) and check if it helps.


    ------------------------------
    Andrey Klyachkin

    https://www.power-devops.com
    ------------------------------



  • 4.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Mon June 30, 2025 03:59 AM

    Hello Shubhangi Tripathi

    Strange situation indeed and interesting problem.

    And yes Andrey is right post it into the right discussion (power group) will help.

    Also indeed an older version of AIX 7.2. but anyway…

    The way that I should try to find the root cause (maybe you already did) is the following:

    First when did this happen first, after changing or implementing something?

    Is the load on the system changed? Or more processes started?

    Check entries in the /etc/inittab and more specifically the lines above the line:

    srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller

    on our 7.2 systems the order is:

    init:2:initdefault:

    brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot

    powerfail::powerfail:/etc/rc.powerfail 2>&1 | /usr/bin/alog -tboot > /dev/console # Power Failure Detection

    tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables

    securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1

    rc:23456789:wait:/etc/rc 2>&1 | /usr/bin/alog -tboot > /dev/console # Multi-User checks

    rcemgr:23456789:once:/usr/sbin/emgr -B > /dev/null 2>&1

    srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller

    rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons

    nimsh:2:wait:/usr/bin/startsrc -e "LIBPATH=/usr/lib" -g nimclient >/dev/console 2>&1

    aso:23456789:once:/usr/bin/startsrc -s aso

    ofed:2:wait:/usr/sbin/ofedctrl -l >/dev/null 2>&1

    rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons

    ldapclntd:2:wait:/usr/sbin/start-secldapclntd > /dev/console 2>&1

    when it fails again check all subsystems with:

    lssrc -a | grep rsct

     ctrmc            rsct             9044692      active

     IBM.HostRM       rsct_rm          54723020     active

     IBM.ConfigRM     rsct_rm          58589612     active

     IBM.DRM          rsct_rm          1573384      active

     IBM.MgmtDomainRM rsct_rm          62652820     active

     IBM.ServiceRM    rsct_rm          16580906     active

    Try to start failing subsystems with startsrc -s <subsystem_name>

    Check again with errpt -a to monitor new errors.

    Check with ps -ef | grep srcmstr  to see if the process is running

    If not running try to start it manually in background with srcmstr & (as root user)

    Check again with errpt -a to monitor new errors.

    And or

    alog -ot console |grep srcm

    Check the /var filesystem if there is free space, add some space and try it again.

    More detailed logging with alog you can run:

    alog -ot console |grep srcm

             0 Fri Jun 20 10:02:07 CEST 2025 Checking for srcmstr active...         0 Fri Jun 20 10:02:07 CEST 2025 complete

             0 Mon Jun 23 11:51:28 CEST 2025 Checking for srcmstr active...         0 Mon Jun 23 11:51:28 CEST 2025 complete

             0 Mon Jun 23 16:12:15 CEST 2025 Checking for srcmstr active...         0 Mon Jun 23 16:12:15 CEST 2025 complete

    b.t.w. I remember one case we ran out of max maxuproc and got weird behavior then

    you can check this with: lsattr -EHl sys0 | grep maxuproc default is 16384.

    Greetings Christian Sonnemans.



    ------------------------------
    Christian Sonnemans
    Tactical Unix system engineer
    AsnBank
    Den Bosch
    ------------------------------



  • 5.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Mon June 30, 2025 10:28 AM

    Are you running any cluster on top of this system?

    Any reson you are unable to apply the current service pack (9) to the AIX installation?



    ------------------------------
    Fredrik Lundholm
    ------------------------------



  • 6.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Mon June 30, 2025 01:35 PM
    Hi Everyone 
     
    Thanks for all your suggestions . Here are the answers to few of the common questions :-
     
    1.) We are planning to update this server to AIX7.2 TL5 SP 9 . We are updating our lower environments at present and will eventually update this production server too as recommended by IBM support.
     
    2.) We don't have cluster on this server at present , However , this server was once a part of a cluster in past.
     
    3.) We have HMC version - V10 R3 SP1061 / MF71710 , Server - IBM Power10 ( IBM,9105-42A )
     
    Note : We have 2 vios and 3 lpars in this physical box . We had an issue with other two lpars where RMC connection ( for dlpar operations ) would work on one lpar at a time and we had to switch this RMC connection alternately among these lpars for performing dlpar operations . This issue got rectified after recreating the rsct db on those lpars as they were having some duplicate id's earlier . Now , this src daemon issue is in the third lpar on the same physical server. I am not sure if these issues are connected but  I thought I should bring it up for your suggestions.


    ------------------------------
    Shubhangi Tripathi
    ------------------------------



  • 7.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Mon June 30, 2025 01:06 PM

    Hi

    As mentioned by others, this would be best handled by AIX support case. That said, here are some possibilities to consider:

    1. termination of subsystems should not cause srcmstr to exit. Is it possible that srcmstr died first, causing its children to die which will lead to these errpt messages?
    2. check for any clues in console messages, syslog, app logs in the leadup. e.g. fork or malloc failures
    3. failure to respawn can be caused by init stuck on a hung 'wait' action command further down the inittab. 

    HTH



    ------------------------------
    Chris Wickremasinghe
    IBM
    ------------------------------



  • 8.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Tue July 01, 2025 03:00 AM

    "where first RMCdaemon stops then IBM.ConfigRM stops finally leading to an inactive srcmstr daemon". 

    From the errpt message posted.. 

    ""Probable Causes
    The Resource Monitoring and Control daemon is stopped.
     
    User Causes
    1. The stopsrc -s ctrmc command has been executed.""

    Looks like RMC daemon is being stopped. By any chance 'stopsrc -s ctrmc' or 'rmcctrl -z' (most likely?) is being executed? 

    And second, how are we concluding that, RMC daemon being inactive making 'srcmstr' also inactive? did you issue commands like 'lssrc -a'  or 'ps -aef | grep -i srcmstr' to see if it running or not? 

    finally, if RMC daemon gets stopped, it is OK to see, IBM.ConfigRM also gets stopped. 

    --Srini. 



    ------------------------------
    VEERA SRINIVAS ANANTOJU
    ------------------------------



  • 9.  RE: SRC daemon becomes inactive and does not respawn even after inittab entry

    Posted Wed July 02, 2025 04:33 AM
    Edited by Phill Rowbottom Wed July 02, 2025 04:34 AM

    You should still be able to connect to the system via ssh or the console to look at the issue and restart the required processes.

    Phill.



    ------------------------------
    Phill Rowbottom
    Unix Consultant
    Service Express
    Bedford
    ------------------------------