PowerHA for AIX

 View Only
  • 1.  PowerHA 7.2.8 Network Down Failover Problem

    Posted Tue September 10, 2024 12:23 AM

    PowerHA 7.2.8 Network Down Failover Problem

    *Build environment #1 (Result: Abnormal operation)

    AIX 7.3 TL02
    PowerHA 7.2.8.2

    STARTUP="OHN"
    FALLOVER="FNPN"
    FALLBACK="NFB"

    # PowerHA Test Scenario


    1. Node A's network down: Failover to Node B (normal)
    2. Node A's network recovery: Node A network recovery message (normal), Node B RG online
    3. Node B's network down: No response after Node B network down message
    4. Halt of Node B (in the above state): Failover to Node A (normal)


    *Build environment #2 (Result: normal operation)

    AIX 7.3 TL02
    PowerHA 7.2.7.0

    STARTUP="OHN"
    FALLOVER="FNPN"
    FALLBACK="NFB"


    1. Node A's network down: Failover to Node B (normal)
    2. Node A's network recovery: Node A network recovery message (normal), Node B RG online
    3. Node B's network goes down: Node B network goes down and then fails over to Node A (normal)


    Just in case, the same thing happens if you change the Fallback Policy to Fallback To Higher Priority Node In The List.

    In 7.2.7, it works normally in case of network failure.

    However, in 7.2.8, it operates abnormally in case of network failure.


    I suspect it is a bug in 7.2.8.
    Has anyone experienced a similar case?



    ------------------------------
    승원 이
    ------------------------------


  • 2.  RE: PowerHA 7.2.8 Network Down Failover Problem

    Posted Wed September 11, 2024 01:58 PM

    In my experience, this is due to network devices such as LAN routers and the gratuitous IP packet being sent from the new HA node, which the router ignores to update its ARP table.

    After a switchover, the service IP address is passed from one node to the other, but without a proper ARP update, the router thinks that the service IP address still has the old MAC address.

     

    Regards

     

    Luis A. Rojas Kramer

    TFI Mexico – Grupo Integrador de Productos y Soluciones de TI.

     

     






  • 3.  RE: PowerHA 7.2.8 Network Down Failover Problem

    Posted Tue December 24, 2024 11:36 AM

    I got a really similar problem with the same PowerHA version. The only difference is the AIX version I have, which is: 7200-05-08-2420.

    Even in my environment, after the failure of node A, and RG failover on node B,  I got the "network recovery message", but running the cluster verification, cluster says:

    " Network: net_ether_01 State: DOWN".

    So, even if cluster state is "ST_STABLE" on node "A",trying  to switch back the RG using cspoc feature, PowerHA doesn't show the node "A".

    I didn't test with a different PowerHA version, but you did it. So I believe you are right, there is a problem with this PowerHA version.



    ------------------------------
    Dario Papiro
    ------------------------------



  • 4.  RE: PowerHA 7.2.8 Network Down Failover Problem

    Posted Thu January 16, 2025 11:38 AM

    In PowerHA version 7.2.8, there is a bug (APAR IJ51301) that causes incorrect reporting of the network status using commands such as cldump, clstat, cldisp, etc.
    There is a fix for this issue that you can get from IBM support if you open a case (the Fix is not included in either SP1 or SP2).

    https://www.ibm.com/support/pages/apar/IJ51301

    In my case, the network status was reported incorrectly, even though the "service IP label" address was pingable and operational. Installing the iFix resolved the issue.



    ------------------------------
    Michal Wiktorek
    https://www.linkedin.com/in/michal-wiktorek-83b2b47b
    ------------------------------