AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
  • 1.  HACMP 5.4 synchronization Error

    Posted Tue January 29, 2008 10:52 AM

    Originally posted by: SystemAdmin


    I started the configuration of a new HACMP cluster of 2 nodes: p561n1 and p561n2. I defined a basic topology. A network (prod_net) with 2 interfaces on each node (en1 and en2).

    When I'm synchronizing HACMP, I get the following message :

    Retrieving data from available cluster nodes. This could take a few minutes.

    Start data collection on node p561n2
    Start data collection on node p561n1
    Waiting on node p561n2 data collection, 15 seconds elapsed
    Waiting on node p561n1 data collection, 15 seconds elapsed
    Waiting on node p561n2 data collection, 30 seconds elapsed
    Waiting on node p561n1 data collection, 30 seconds elapsed
    send select: Il n'existe pas de processus pour lire les donnúes dirigúes vers un

    ERROR: Cannot send the packet to the node: p561n2
    Collector on node p561n1 completed
    Waiting on node p561n2 data collection, 45 seconds elapsed
    ERROR: Comm error found on node: p561n2.
    Data collection complete

    Thank's for help
    AN
    <hr />

    I'm in AIX 5.3 TL7 SP1 (RSCT 2.4.8.0) and HACMP 5.4.1.

    p561n1:root:/ > cltopinfo
    Cluster Name: cl_p561
    Cluster Connection Authentication Mode: Standar
    Cluster Message Authentication Mode: None
    Cluster Message Encryption: None
    Use Persistent Labels for Communication: No
    There are 2 node(s) and 1 network(s) defined
    NODE p561n1:
    Network prod_net
    p561n1boot2 192.168.11.1
    p561n1boot1 192.168.10.1
    NODE p561n2:
    Network prod_net
    p561n2boot2 192.168.11.2
    p561n2boot1 192.168.10.2

    No resource groups defined

    /etc/hosts :
    ##########################################################################
    1. p561n1 #
    ##########################################################################
    192.168.10.1 p561n1boot1 # boot address for en1
    192.168.11.1 p561n1boot2 # boot address for en2
    172.16.31.121 p561n1admin # persistent address
    172.16.31.140 p561n1 # Service address
    ##########################################################################
    1. p561n2 #
    ##########################################################################
    192.168.10.2 p561n2boot1 # boot address for en1
    192.168.11.2 p561n2boot2 # boot address for en2
    172.16.31.122 p561n2admin # persistent address
    172.16.31.141 p561n2 # Service address

    p561n1:root:/ > cat /usr/es/sbin/cluster/etc/rhosts
    192.168.11.1
    192.168.10.1
    192.168.11.2
    192.168.10.2


  • 2.  Re: HACMP 5.4 synchronization Error

    Posted Tue January 29, 2008 10:58 AM

    Originally posted by: alethad


    Have you set up your heartbeats yet? You really need to do that also before you can verify or synchronize anything. I didn't see that you had from your message.
    That probably accounts for the delay in reaching node2.

    alethad


  • 3.  Re: HACMP 5.4 synchronization Error

    Posted Tue January 29, 2008 11:22 AM

    Originally posted by: alethad


    Hey, I also see you have a persistent address in your hosts file but you haven't configured it in your topology yet. Why don't you go ahead and do that too? Of course that wasn't the reason for your verify/sync issue. Don't forget the clhosts file too. I forget about that one sometimes myself.

    alethad


  • 4.  Re: HACMP 5.4 synchronization Error

    Posted Tue January 29, 2008 11:41 AM

    Originally posted by: SystemAdmin


    alethad,

    I got the same error when I added the persistent IP Label or disk-heartbeats devices in the topology. clhosts file is used by a client machine to find the cluster nodes.

    I think the node p561n1 is unable to contact the node p561n2. It must be a problem of clcomdES. But which one?

    Thank's

    NA


  • 5.  Re: HACMP 5.4 synchronization Error

    Posted Tue January 29, 2008 12:16 PM

    Originally posted by: alethad


    I'm not sure why you're getting this particular comm error but the message does say the error is on node2. I would suggest go to node2 and verify that your interfaces are config'd properly thru HA & TCPIP and anything else you can think to check or test on node2.
    Are you getting errors on your hardware? On either node? I had an issue a few months ago on a new cluster where the error said node2 but the real error was on node1. Sorry to sound confusing but it did confuse me at the time.

    Ok so the persistent & heartbeat don't make a difference in your error so at least get the heartbeats done so you don't get an error not having one config'd when you go to verify/sync again which HA should have said something. And make redundant ones.

    It is still recommended by IBM to set the clhosts file. You will get warnings if you don't, if my memory serves me today. :)

    I'll look around in my notes to see if I see anything else that my help.
    You may just have to call support. Or can you?

    Maybe someone else has had this particular error before.
    For what it's worth.


  • 6.  Re: HACMP 5.4 synchronization Error

    Posted Tue January 29, 2008 02:10 PM

    Originally posted by: SystemAdmin


    alethad,

    I erased the HACMP configuration and do it on the node 2. I get the same error. It has no material errors. I can ping the interfaces between them.

    I have never seen that error before. I installed several HACMP 5.4 configurations, but it was with RSCT 2.4.5.x. On this platform I put AIX TL7 SP1 which contains RSCT 2.4.8.0 ?

    I opened an incident at the IBM support. We will see what's the problem.

    Thank's friend

    NA

    Message was edited by: screendz


  • 7.  Re: HACMP 5.4 synchronization Error

    Posted Tue January 29, 2008 02:21 PM

    Originally posted by: alethad


    Yeah I know what you mean, I haven't seen it either. I'm not on TL7 yet, just TL6.
    Are there any extended messages in the HA log files anywhere? No packet errors in netstat either?
    No typo's anywhere right? :)
    Just asking not nagging.
    I'd be curious to see what you find out from IBM. If it's not too much trouble.

    Good luck.


  • 8.  Re: HACMP 5.4 synchronization Error

    Posted Thu January 31, 2008 09:43 AM

    Originally posted by: SystemAdmin


    The IBM support gave me the answer. The problem was on the customer switch. The speeds of NICs is 100Mb-Full duplex but not on switch ports.


  • 9.  Re: HACMP 5.4 synchronization Error

    Posted Mon February 04, 2008 11:24 AM

    Originally posted by: alethad


    Glad it was simple.
    Thanks for letting me know.
    Good luck.

    Message was edited by: alethad