AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
Expand all | Collapse all

DLPAR issue on alt_disk_mksysb'ed system

  • 1.  DLPAR issue on alt_disk_mksysb'ed system

    Posted Tue June 10, 2008 04:47 PM

    Originally posted by: grukrz1


    Hello,

    I have a POWER6 box (firmware EL320_061) with one LPAR configured on. On the LPAR I've installed AIX 5300-08-01-0819 (RSCT 2.4.9.0) using alt_disk_mksysb method run in preinstalled AIX on box (I used a free disk in the box for the alt_disk_mksysb - new systems is running ok). Because it was a kind of cloning I have run following after the installation and boot cloned system:

    /usr/sbin/rsct/bin/rmcctrl -z
    /usr/sbin/rsct/install/bin/recfgct
    /usr/sbin/rsct/bin/rmcctrl -p

    There DNS is configured properly on both LPAR and HMC7 box.

    the 'host' command returns proper IP/name when I check for LPAR or HMC IP/name on LPAR or on HMC.

    The are no communication problem between HMC box and LPAR box (no. firewall - eg. I can login from LPAR to HMC using ssh (and vice versa).

    The problem is I can't DLPARing - get error notiffication on HMC that the RMC is not running on the LPAR.

    In fact, when I check on LPAR, I see there are only following rs* SRC subsystems configured:

    1. lssrc -a|grep rs
    IBM.ERRM rsct_rm 233682 active
    IBM.ServiceRM rsct_rm 90608 active
    IBM.AuditRM rsct_rm 168400 active
    ctcas rsct 200884 active
    ctrmc rsct 98672 active
    rsvpd qos inoperative
    #
    IBM.HostRM - is missing
    IBM.ConfigRM - is missing
    IBM.LPRM - is missing
    IBM.ManagementServer is not defined:

    1. lsrsrc IBM.ManagementServer
    /usr/sbin/rsct/bin/lsrsrc-api: 2612-010 Resource class IBM.ManagementServer is not defined.
    #

    Did anyone meet similiar problem and knows a fix for this issue? Why IBM.HostRM, IBM.ConfigRM and IBM.LPRM are missing on SRC list and what to do to get them available again? (all these services are running on the LPAR I did mksysb I used for new LPAR). No reboots (HMC, LPAR) helps.

    Is there any known bug in RSCT 2.4.9.0 for my issue???

    thank you in advance,
    Kris


  • 2.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Wed June 11, 2008 09:26 AM

    Originally posted by: gcorneau


    Not that it's a known issue or anything, but I wouldn't run with RSCT 2.x.x.0 anything. Always put on, at least, patch level 1 (i.e. 2.4.9.1), but the latest are recommended.

    And when you do a mksysb, the recfgct process should be run automagically. AIX is smart enough now to handle that.
    <hr />
    Glen Corneau
    IBM Power Systems Advanced Technical Support


  • 3.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Wed June 11, 2008 02:40 PM

    Originally posted by: esv


    I have a post install script for all my mksysb clone operations where I throw these lines:
    if -x /usr/sbin/rsct/install/bin/uncfgct
    then
    /usr/sbin/rsct/install/bin/uncfgct -n
    /usr/sbin/rsct/install/bin/cfgct
    fi
    seem to have worked so far.

    best regards,
    esv.


  • 4.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Fri June 13, 2008 03:51 AM

    Originally posted by: grukrz1


    hello,

    alas, even after I installed latest updates for 2.4.9.x :

    1. suma -x -a Action=Download -a RqType=Fileset -a RqName=rsct.basic.rte -a RqLevel=2.4.9.2 -a DLTarget=/tmp/suma
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.basic.sp.2.4.9.1.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.basic.hacmp.2.4.9.1.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.core.lprm.2.4.9.1.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.core.sensorrm.2.4.9.1.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.core.hostrm.2.4.9.1.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.core.sr.2.4.9.1.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.opt.fence.hmc.2.4.9.1.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.core.utils.2.4.9.2.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.core.sec.2.4.9.2.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.opt.storagerm.2.4.9.2.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.core.rmc.2.4.9.2.bff
    Download SUCCEEDED: /tmp/suma/installp/ppc/rsct.basic.rte.2.4.9.2.bff
    Summary:
    12 downloaded
    0 failed
    0 skipped
    ... only such rsct subsystems exist:

    1. lssrc -a|grep rsc
    ctrmc rsct 242006 active
    IBM.ERRM rsct_rm 254208 active
    IBM.ServiceRM rsct_rm 192728 active
    IBM.AuditRM rsct_rm 172460 active
    ctcas rsct inoperative
    No 'rsct' reconfiguration as mentioned before helps.
    I can manage the box from HMC7 but DLPARing still doesn;t works due to the RSCT problem on LPAR.
    regards,
    K.


  • 5.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Fri June 13, 2008 09:59 AM

    Originally posted by: gcorneau


    Have you tried running through the DLPAR checklist?

    http://www.ibm.com/developerworks/eserver/articles/DLPARchecklist.html

    Otherwise, it's probably time to open a support call.
    <hr />
    Glen Corneau
    IBM Power Systems Advanced Technical Support


  • 6.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Fri June 13, 2008 01:37 PM

    Originally posted by: grukrz1


    yes, I checked items from the check-list.


  • 7.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Wed June 25, 2008 07:51 AM

    Originally posted by: grukrz1


    Hello,

    The problem still exists. Even installing AIX from scratch on the LPAR (from delivered AIX 5.3 TL7 media CDs) there are sime missing rsct_rm sybsystems on "lssrc -a" output. Because all needed rsct.* filesets were installed on the LPAR, I have created missing rsct_rm subsystems running commands:

    /usr/sbin/rsct/bin/RMstart -s IBM.HostRM -pIBM.HostRMd
    /usr/sbin/rsct/bin/RMstart -s IBM.ConfigRM -pIBM.ConfigRMd
    /usr/sbin/rsct/bin/RMstart -s IBM.LPRM -pIBM.LPRMd
    /usr/sbin/rsct/bin/RMstart -s IBM.FSRM -pIBM.FSrmd
    /usr/sbin/rsct/bin/RMstart -s IBM.DRM -pIBM.DRMd

    They appeared on the "lssrc -g rsct_rm" output but after I run "startsrc -g rsct_rm":

    1. startsrc -g rsct_rm
    0513-059 The IBM.ERRM Subsystem has been started. Subsystem PID is 156108.
    0513-029 The IBM.CSMAgentRM Subsystem is already active.
    Multiple instances are not supported.
    0513-029 The IBM.ServiceRM Subsystem is already active.
    Multiple instances are not supported.
    0513-059 The IBM.AuditRM Subsystem has been started. Subsystem PID is 168142.
    0513-059 The IBM.HostRM Subsystem has been started. Subsystem PID is 229636.
    0513-059 The IBM.ConfigRM Subsystem has been started. Subsystem PID is 205250.
    0513-059 The IBM.LPRM Subsystem has been started. Subsystem PID is 139666.
    0513-059 The IBM.FSRM Subsystem has been started. Subsystem PID is 229638.
    0513-059 The IBM.DRM Subsystem has been started. Subsystem PID is 225752.

    They are still not operating:

    1. lssrc -g rsct_rm
    Subsystem Group PID Status
    IBM.ServiceRM rsct_rm 192968 active
    IBM.CSMAgentRM rsct_rm 209366 active
    IBM.ERRM rsct_rm 156108 active
    IBM.AuditRM rsct_rm 168142 active
    IBM.HostRM rsct_rm inoperative
    IBM.ConfigRM rsct_rm inoperative
    IBM.LPRM rsct_rm inoperative
    IBM.FSRM rsct_rm inoperative
    IBM.DRM rsct_rm inoperative

    An in /var/ct/*.stderr logs are logged:

    1. cat IBM.DRM.stderr
    Cannot start resource manager IBM.DRM due to exception RMOperError, see trace file Message=2645-006 Operation failed due to exception CNoRuntimeDir from CDaemon::init with error code 0.

    The assert subroutine failed: 0, file ../../../../../../src/rspc/usr/bin/drm/IBM.DRMd.C, line 214
    1. cat IBM.HostRM.stderr
    Cannot start resource manager IBM.HostRM due to exception RMOperError, see trace fileerror code = 98305
    Message=2645-006 Operation failed due to exception CNoRuntimeDir from CDaemon::init with error code 0.

    The assert subroutine failed: 0, file ../../../../../src/rsct/rm/HostRM/IBM.HostRMd.C, line 216
    1. cat IBM.LPRM.stderr
    Cannot start resource manager IBM.LPRM due to exception RMOperError, see trace file Message=2645-006 Operation failed due to exception CNoRuntimeDir from CDaemon::init with error code 0.

    The assert subroutine failed: 0, file ../../../../../src/rsct/rm/LPRM/IBM.LPRMd.C, line 249
    #

    (please have in mind system was re-installed from scratch /full-overwritten-installation/ from AIX 5.3 TL7 cd medias)

    I can locally telnet on port 657 on LPAR. Also from HMC to LPAR 657 is open (I checked it with ssh - stopped rsct and started sshd on port 657 on partition and the ssh -p 657 partition from HMC7 worked fine).

    I can't access 657 port on HMC. Is that ok that 657 on HMC is not reachable using telnet? How to test it? How to check RMC services status on HMC?

    Firewall settings on HMC7 for network adapter enabled for partition communication allow RMC/657 access for all clients.

    Host name resolving works ok on both LPAR and HMC (also reverse).

    Also /usr/sbin/rsct/install/bin/recfgct and RSCT restart don't fix the problem.

    Is there any up-to-date DLPAR check list for Power6 and HMC 7.x?? The http://www.ibm.com/developerworks/systems/articles/DLPARchecklist.html seems to be not 100% valid for HMC7 (eg. telnet command for HMC-partition testing not available on HMC7).

    I have run out of ideas how to analyze the issue.

    thank you in advance for any tip,
    Kris
    1. lssrc -Ss IBM.HostRM
    #subsysname:synonym:cmdargs:path:uid:auditid:standin:standout:standerr:action:multi:contact:svrkey:svrmtype:priority:signorm:sigforce:display:waittime:grpname:
    IBM.HostRM:::/usr/sbin/rsct/bin/IBM.HostRMd:0:0:/dev/console:/dev/console:/var/ct/IBM.HostRM.stderr:-R:-Q:-K:0:0:20:0:0:-d:20:rsct_rm:
    1. lssrc -Ss IBM.LPRM
    #subsysname:synonym:cmdargs:path:uid:auditid:standin:standout:standerr:action:multi:contact:svrkey:svrmtype:priority:signorm:sigforce:display:waittime:grpname:
    IBM.LPRM:::/usr/sbin/rsct/bin/IBM.LPRMd:0:0:/dev/console:/dev/console:/var/ct/IBM.LPRM.stderr:-R:-Q:-K:0:0:20:0:0:-d:20:rsct_rm:
    1. lssrc -Ss IBM.DRM
    #subsysname:synonym:cmdargs:path:uid:auditid:standin:standout:standerr:action:multi:contact:svrkey:svrmtype:priority:signorm:sigforce:display:waittime:grpname:
    IBM.DRM:::/usr/sbin/rsct/bin/IBM.DRMd:0:0:/dev/console:/dev/console:/var/ct/IBM.DRM.stderr:-R:-Q:-K:0:0:20:0:0:-d:20:rsct_rm:
    1. ls -ld /usr/sbin/rsct/bin/IBM.HostRMd /usr/sbin/rsct/bin/IBM.LPRMd /usr/sbin/rsct/bin/IBM.DRMd
    -r-xr-x--- 1 root system 129178 Oct 04 2007 /usr/sbin/rsct/bin/IBM.DRMd
    -rwxr--r-- 1 bin bin 529113 Sep 26 2007 /usr/sbin/rsct/bin/IBM.HostRMd
    -rwxr--r-- 1 bin bin 100668 Sep 26 2007 /usr/sbin/rsct/bin/IBM.LPRMd
    #


  • 8.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Tue October 07, 2008 11:33 AM

    Originally posted by: phil@stras


    Hi,

    I have the same problem and I would like to know if you resolved your problem since your last post.

    Best regards

    Philippe


  • 9.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Wed October 08, 2008 09:21 AM

    Originally posted by: bassemir


    This thread is pretty old but I recently had a similar problem. I was able to correct my DLPAR issue by running the following command.

    /usr/sbin/rsct/install/bin/recfgct

    Rich


  • 10.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Wed October 08, 2008 09:22 AM

    Originally posted by: bassemir


    Oppss, sorry, it is not an old thread, I was looking at the date registered... ha! I need coffee
    Rich


  • 11.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Thu October 09, 2008 09:20 AM

    Originally posted by: phil@stras


    Hi,
    I tried this command without results.
    I think it is a firewall's problem, i will ask my network security team ...
    all advice is wellcome


  • 12.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Tue October 14, 2008 10:59 AM

    Originally posted by: Malcolm_Preen


    Hi

    I also have this same problem, and have been through the same steps as the first poster from first principles... The same situation occurs.

    Strangely, I have three machines which were recently upgraded from AIX 5.2 to AIX 5300-08-02-0822 - one of them is OK, but the other two have the "IBM.DRM not starting" error.

    As far as I know, the configurations are identical.

    Did the firewall suggestions work ? In which case, what did you have to do ?

    Thanks, Malcolm


  • 13.  Re: DLPAR issue on alt_disk_mksysb'ed system

    Posted Thu November 20, 2008 04:38 AM

    Originally posted by: Malcolm_Preen


    Eventually, I got the following response from IBM.

    The AIX 5.3 with TL 8 requires csm.client 1.7.0.10 csm 1.7.0.10 code caused P4 HMC DLPAR operation failed.

    The problem with DLPar is occuring on AIX 5.3 TL8. This problem has been discovered and there is an APAR,(Authorized Program Analysis Report ), a fix, for this.
    Please go to this site and download and apply APAR IZ29205.
    http://www-933.ibm.com/eserver/support/fixes/
    select system P , Products: AIX ,Version 5.3
    Technology :
    Fix Search , use newest date.
    Enter IZ29205
    This should resolve the issue.

    This has resolved my problem, and now all 3 AIX 53 TL 8 LPARs can use DLPAR.

    Hope this helps someone else.

    Malcolm