Originally posted by: Casey_B
Cluster communication is handled through the clcomd daemon.
This includes all of the cspoc commands, and also the verification
commands.
clcomd uses a cluster specific, host based access method.
clcomd also provides for a reliable method of contacting
other nodes even when there have been changes in the
the topology from fallovers, or IP label moves.
For more details, you can read the following
section in the manual:
Understanding the /usr/es/sbin/cluster/etc/rhosts file
in the HACMP administration guide for 5.5, or 5.4.
First some background:
The general flow of operation is that when node A
wants to try and talk to node B, it will send
an icmp echo request message (A "ping" message)
to all of the addresses on node B
that should be node bound.
This includes all of the boot addresses, standby addresses,
and the communication path configured when the node was added.
Node A will then listen for a response. Whichever IP label
responds back first will be used for initiating communication.
Then a connection request will be sent to the chosen IP label on node B.
Node B will check and see if the source address for node A is in it's
access allowed list (Which is in /usr/es/sbin/cluster/etc/rhosts)
So, some ideas of how to approach this problem, and common configuration
problems:
1) If there is an IP label that is not defined as part of the cluster, but is on the
same subnet as any of the cluster IP labels, then it could reply to the icmp message,
or be the source address for the connection request.
It is safer to make sure that all possible IP labels that could be used to talk between
the nodes are listed in ../cluster/etc/rhosts.
2) The permissions are wrong on ../cluster/etc/rhosts
They need to be 600, and root.system.
3) The communication path is not node bound. For instance, if it is a service label.
If the communication path is a service label, it is not guaranteed to be on a
particular node. If it is on the wrong node, then most communication will fail to the
node associated with the incorrect communication path.
4) One workaround is to truncate the /usr/es/sbin/cluster/etc/rhosts on
a node, and restart clcomd. (a file that exists, but is zero length will be
interpreted to mean that the cluster is in intial configuration, and it will
accept incoming connection requests from any IP label.)
This is different than removing the file! A non-existant file will indicate
to clcomdES to not accept any incoming connections.
(To restart clcomd:
-
stopsrc -s clcomdES
-
startsrc -s clcomdES
)
This may not work if the cluster is at HACMP 5.4.1 and above, and you have
changed the cluster topology.
The reason is that if there is a configured cluster, the odm entries are checked,
even if there is no ../cluster/etc/rhosts information.
Has anyone else seen any other similar problems?
If none of these hints help you, then I would suggest calling IBM support.
They would love to help you. :)
Hope this helps,
Casey