HMC

 View Only
  • 1.  Redundant HMCs : RMC problems

    Posted Thu March 04, 2021 12:02 PM
    Hello , 

    I have some AIX 7.1 LPARs on a P7 sytem, controlled by 2 HMC.

    Hmc 1 : 172.32.3.66
    Hmc 2 : 172.32.3.67

    In some of this lpars , There are files on the AIX systems that grow rapily with error messages repeated ever 5 sec.
    The filename is eg.: /var/ct/3423108940/log/mc/IBM.MgmtDomainRM/default (the number varies) and the errors inside that log look like this :

    Mon Feb 22 15:43:52 GMT 2021(280027) ../../../../../src/rsct/rm/MgmtDomainRM/MCP_cfg.c/01924/1.25 2613-024 MDC could not start a session with 172.32.3.66, from RTAS slot number 3. The mc_timed_start_session function returned 2.
    2610-602 A session could not be established with the RMC subsystem.

    Lpar detect only hmc 1 (rmcdomainstatus).Where should I start investigating ? any suggestions ??

    ------------------------------
    OUSSAMA NAZIH
    ------------------------------


  • 2.  RE: Redundant HMCs : RMC problems

    Posted Fri March 05, 2021 02:48 AM
    Hello there,

    Sounds like https://www.ibm.com/support/pages/apar/IV66651 to me.. Can you check if the fix is included?

    --Srini

    ------------------------------
    VEERA SRINIVAS ANANTOJU
    ------------------------------



  • 3.  RE: Redundant HMCs : RMC problems

    Posted Fri March 05, 2021 02:51 AM
    And reg.

    "Lpar detect only hmc 1 (rmcdomainstatus).Where should I start investigating ? any suggestions ??"

    May be you should check if the LPAR can reach 172.32.3.66 on port 657 (tcp and udp) or not? 

    --Srini


    ------------------------------
    VEERA SRINIVAS ANANTOJU
    ------------------------------



  • 4.  RE: Redundant HMCs : RMC problems

    IBM Champion
    Posted Fri March 05, 2021 02:56 AM

    Hello!

    We have encountered various different headaches with RMC in the past...There so many different places where the problem can reside.
    It can be
    - HMC issue (HMC version, network etc)
    - Physical HW firmware issue
    - Some network issue
    - AIX issue (AIX bug etc)

    So basically U need to start pinpointing the possible root cause by excluding one issue at a time.

    I think the easiest way to start with; is to remove the 2nd HMC connection; clear out those AIX logs; and maybe reboot the remaining HMC 
    > Then check if the logs are still flooding or is everything fine when having only one HMC present
    > Depending on the outcome of this test; we could select next possible steps.

    I also recommend opening a Support Case / PMR to IBM & provide logs from both HMC (pedbg) and AIX (snap) 
    However, if your P7 firmware / HMC version / AIX 7.1 systems are not on up-to-date or supported level...the first answer from IBM will be "Please update your systems" :)

    Br,
    tommi



    ------------------------------
    Tommi Sihvo, Lead Service Architect
    TietoEVRY, Compute Services
    email tommi.sihvo@tieto.com mobile +358 (0)40 5180 Finland
    ------------------------------