PowerHA for AIX

 View Only
  • 1.  Problems with verify or cspoc communication

    Posted Fri April 10, 2009 11:54 AM

    Originally posted by: Casey_B


    Having a problem with verification or cspoc, with errors indicating communication errors.

    Like the following:

    rshexec: cannot connect to node NodeA

    Or

    WARNING: Unable to communicate with the remote node: NodeA.
    Please check that node: hacmp11 has the /usr/es/sbin/cluster/etc/rhosts
    file configured and the clcomdES subsystem running.

    What could be causing this?


  • 2.  Re: Problems with verify or cspoc communication

    Posted Fri April 10, 2009 11:58 AM

    Originally posted by: Casey_B


    Cluster communication is handled through the clcomd daemon.
    This includes all of the cspoc commands, and also the verification
    commands.

    clcomd uses a cluster specific, host based access method.
    clcomd also provides for a reliable method of contacting
    other nodes even when there have been changes in the
    the topology from fallovers, or IP label moves.

    For more details, you can read the following
    section in the manual:

    Understanding the /usr/es/sbin/cluster/etc/rhosts file
    in the HACMP administration guide for 5.5, or 5.4.

    First some background:

    The general flow of operation is that when node A
    wants to try and talk to node B, it will send
    an icmp echo request message (A "ping" message)
    to all of the addresses on node B
    that should be node bound.

    This includes all of the boot addresses, standby addresses,
    and the communication path configured when the node was added.

    Node A will then listen for a response. Whichever IP label
    responds back first will be used for initiating communication.

    Then a connection request will be sent to the chosen IP label on node B.
    Node B will check and see if the source address for node A is in it's
    access allowed list (Which is in /usr/es/sbin/cluster/etc/rhosts)

    So, some ideas of how to approach this problem, and common configuration
    problems:

    1) If there is an IP label that is not defined as part of the cluster, but is on the
    same subnet as any of the cluster IP labels, then it could reply to the icmp message,
    or be the source address for the connection request.

    It is safer to make sure that all possible IP labels that could be used to talk between
    the nodes are listed in ../cluster/etc/rhosts.

    2) The permissions are wrong on ../cluster/etc/rhosts
    They need to be 600, and root.system.

    3) The communication path is not node bound. For instance, if it is a service label.
    If the communication path is a service label, it is not guaranteed to be on a
    particular node. If it is on the wrong node, then most communication will fail to the
    node associated with the incorrect communication path.

    4) One workaround is to truncate the /usr/es/sbin/cluster/etc/rhosts on
    a node, and restart clcomd. (a file that exists, but is zero length will be
    interpreted to mean that the cluster is in intial configuration, and it will
    accept incoming connection requests from any IP label.)

    This is different than removing the file! A non-existant file will indicate
    to clcomdES to not accept any incoming connections.

    (To restart clcomd:
    1. stopsrc -s clcomdES
    2. startsrc -s clcomdES
    )

    This may not work if the cluster is at HACMP 5.4.1 and above, and you have
    changed the cluster topology.
    The reason is that if there is a configured cluster, the odm entries are checked,
    even if there is no ../cluster/etc/rhosts information.

    Has anyone else seen any other similar problems?

    If none of these hints help you, then I would suggest calling IBM support.
    They would love to help you. :)

    Hope this helps,
    Casey


  • 3.  Re: Problems with verify or cspoc communication

    Posted Thu July 14, 2016 08:31 AM

    Originally posted by: UQ6M_Hubert_Samm


    Let me throw my 2cents in....  we recently upgraded to PowerHA 7.2.0 SP1 - which now throws WARNINGS about kernel parameters.  According to IBM, if you go into customized verification and choose Details=NO, you should not get these messages... Well, that doesn't work at all, the messages are still there.. plus, the parameters that warning are received on, are STATIC, and cannot be changed... I.M.H.O.  PowerHA has enough complexity, why clutter it up with thing like this, of which we cannot control?