PowerHA for AIX

 View Only
  • 1.  PowerHA Failover Test: Is this behavior expected?

    Posted Mon March 25, 2024 10:48 AM

    Hello, we have recently delivered two S1014 units to a customer and installed PowerHA, demonstrating HA according to the test scenario provided by the customer.

    We have a two-server DB setup and followed the scenario below.

    Environment: Two DB servers (DB1, DB2) AIX 7.3 TL02 PowerHA 7.2.8

    Scenario Steps:

    1. Network cable disconnection on DB1

      • The resource group successfully failed over to DB2.
    2. After restoring the network cable on DB1, the network cable on DB2 was disconnected

      • The resource group on DB2 did not fail over back to DB1.

    I was under the impression that the resource group would fail back to DB1, but it did not. The logs showed the network disconnection but no further activity. Additionally, DB2 was not visible in 'smit cspoc'.

    After stopping and starting the cluster services on DB1 using 'smit clstop' and 'smit clstart', HA was possible again. Once 'smit clstart' was run, the peer node was visible again in cspoc.

    Is anyone aware if this is intended behavior? From what I understand about previous versions of PowerHA, after failing over from server 1 to server 2, if server 2 goes down, HA would revert back to server 1 without any additional intervention.



    ------------------------------
    dongyong kang
    ------------------------------


  • 2.  RE: PowerHA Failover Test: Is this behavior expected?

    Posted Tue March 26, 2024 01:21 AM

    Hello.

    You can configure the behaviour of the cluster by setting the policies.

    Resource group startup, fallover, and fallback - IBM Documentation

    Regards



    ------------------------------
    Carsten Stephan
    ------------------------------



  • 3.  RE: PowerHA Failover Test: Is this behavior expected?

    Posted Tue March 26, 2024 05:04 AM

    Hi,

    it depends on your configuration. You can quick check the settings of the resource group by using

    clmgr q rg <your resource group>

    You will find FALLBACK attribute. If it is set to NFB (Never Fallback), your resource group will not be moved back after the node1 is restored. 

    But there are several more things to check with your cluster configuration like:

    • do you start PowerHA software automatically at server startup?
    • do you have quorum (tiebreaker) device in your configuration?
    • do your start/stop scripts work as intended?

    Troubleshooting of PowerHA configuration is cumbersome process. Btw there is automatic testing of PowerHA configuration in smitty. I don't remember where it is and don't have any PowerHA installation right now to check but I am pretty sure someone will paste here smitty shortpath to it.



    ------------------------------
    Andrey Klyachkin

    https://www.power-devops.com
    ------------------------------



  • 4.  RE: PowerHA Failover Test: Is this behavior expected?

    Posted Tue March 26, 2024 11:35 AM
    Edited by JES KIRAN CHITTIGALA Tue March 26, 2024 11:35 AM

    Hi,
    When network or application fails, PowerHA initiates failover to another available node. Later when network is restored on problematic node, PowerHA won't initiate automatic fallback. However if the failed node also fails or gets network issues, it would failover to another available node. Also if the network is a global network failure, means failed on all available nodes, then failover won't be initiated.



    ------------------------------
    JES KIRAN CHITTIGALA
    ------------------------------



  • 5.  RE: PowerHA Failover Test: Is this behavior expected?

    Posted Mon September 09, 2024 12:52 PM

    혹시 조치가 되셨나요? 저도 동일한 문제가 발생했네요.

    AIX 버전 : 7.3, 7.2 둘다 테스트

    HA Version : 7.2.8.2

    Startup Policy                                       Online On Home Node Only
      Fallover Policy                                     Fallover To Next Priority Node In The List
      Fallback Policy                                     Never Fallback 

    A Node 네트워크 절제 시 : 정상 Failover

    A Node 네트워크 복구 시 : 복구 메세지 발생

    B Node 네트워크 절제 시 : Network Down 메세지만 발생(Not Failover)

    B Node Halt(위의 상황에서) : 정상 Failover



    ------------------------------
    승원 이
    ------------------------------