PowerHA for AIX

Connect, learn, share, and engage with IBM Power.

View Only

Back to discussions

Expand all | Collapse all

PowerHA Failover Test: Is this behavior expected?

1. PowerHA Failover Test: Is this behavior expected?

Like
dongyong kang
Posted Mon March 25, 2024 10:48 AM

Reply
Hello, we have recently delivered two S1014 units to a customer and installed PowerHA, demonstrating HA according to the test scenario provided by the customer.

We have a two-server DB setup and followed the scenario below.

Environment: Two DB servers (DB1, DB2) AIX 7.3 TL02 PowerHA 7.2.8

Scenario Steps:

Network cable disconnection on DB1

The resource group successfully failed over to DB2.

After restoring the network cable on DB1, the network cable on DB2 was disconnected

The resource group on DB2 did not fail over back to DB1.

I was under the impression that the resource group would fail back to DB1, but it did not. The logs showed the network disconnection but no further activity. Additionally, DB2 was not visible in 'smit cspoc'.

After stopping and starting the cluster services on DB1 using 'smit clstop' and 'smit clstart', HA was possible again. Once 'smit clstart' was run, the peer node was visible again in cspoc.

Is anyone aware if this is intended behavior? From what I understand about previous versions of PowerHA, after failing over from server 1 to server 2, if server 2 goes down, HA would revert back to server 1 without any additional intervention.

------------------------------
dongyong kang
------------------------------
2. RE: PowerHA Failover Test: Is this behavior expected?

Like
Carsten Stephan

IBM Champion
Posted Tue March 26, 2024 01:21 AM

Reply
Hello.

You can configure the behaviour of the cluster by setting the policies.

Resource group startup, fallover, and fallback - IBM Documentation

Regards

------------------------------
Carsten Stephan
------------------------------

Original Message
3. RE: PowerHA Failover Test: Is this behavior expected?

Like
Andrey Klyachkin

IBM Champion
Posted Tue March 26, 2024 05:04 AM

Reply
Hi,

it depends on your configuration. You can quick check the settings of the resource group by using

clmgr q rg <your resource group>

You will find FALLBACK attribute. If it is set to NFB (Never Fallback), your resource group will not be moved back after the node1 is restored.

But there are several more things to check with your cluster configuration like:

do you start PowerHA software automatically at server startup?

do you have quorum (tiebreaker) device in your configuration?

do your start/stop scripts work as intended?

Troubleshooting of PowerHA configuration is cumbersome process. Btw there is automatic testing of PowerHA configuration in smitty. I don't remember where it is and don't have any PowerHA installation right now to check but I am pretty sure someone will paste here smitty shortpath to it.

------------------------------
Andrey Klyachkin

https://www.power-devops.com
------------------------------

Original Message
4. RE: PowerHA Failover Test: Is this behavior expected?

Like
JES KIRAN CHITTIGALA
Posted Tue March 26, 2024 11:35 AM
Edited by JES KIRAN CHITTIGALA Tue March 26, 2024 11:35 AM

Reply
Hi,
When network or application fails, PowerHA initiates failover to another available node. Later when network is restored on problematic node, PowerHA won't initiate automatic fallback. However if the failed node also fails or gets network issues, it would failover to another available node. Also if the network is a global network failure, means failed on all available nodes, then failover won't be initiated.

------------------------------
JES KIRAN CHITTIGALA
------------------------------

Original Message
5. RE: PowerHA Failover Test: Is this behavior expected?

Like
승원 이
Posted Mon September 09, 2024 12:52 PM

Reply
혹시 조치가 되셨나요? 저도 동일한 문제가 발생했네요.

AIX 버전 : 7.3, 7.2 둘다 테스트

HA Version : 7.2.8.2

Startup Policy Online On Home Node Only
Fallover Policy Fallover To Next Priority Node In The List
Fallback Policy Never Fallback

A Node 네트워크 절제 시 : 정상 Failover

A Node 네트워크 복구 시 : 복구 메세지 발생

B Node 네트워크 절제 시 : Network Down 메세지만 발생(Not Failover)

B Node Halt(위의 상황에서) : 정상 Failover

------------------------------
승원 이
------------------------------

Original Message

PowerHA for AIX

PowerHA for AIX

PowerHA Failover Test: Is this behavior expected?

dongyong kangMon March 25, 2024 10:48 AM

Carsten StephanTue March 26, 2024 01:21 AM

Andrey KlyachkinTue March 26, 2024 05:04 AM

JES KIRAN CHITTIGALATue March 26, 2024 11:35 AM

승원 이Mon September 09, 2024 12:52 PM

1. PowerHA Failover Test: Is this behavior expected?

2. RE: PowerHA Failover Test: Is this behavior expected?

3. RE: PowerHA Failover Test: Is this behavior expected?

4. RE: PowerHA Failover Test: Is this behavior expected?

5. RE: PowerHA Failover Test: Is this behavior expected?

Additional
Resources

Office

Quick Links

PowerHA for AIX

PowerHA for AIX

PowerHA Failover Test: Is this behavior expected?

dongyong kangMon March 25, 2024 10:48 AM

Carsten StephanTue March 26, 2024 01:21 AM

Andrey KlyachkinTue March 26, 2024 05:04 AM

JES KIRAN CHITTIGALATue March 26, 2024 11:35 AM

승원 이Mon September 09, 2024 12:52 PM

1. PowerHA Failover Test: Is this behavior expected?

2. RE: PowerHA Failover Test: Is this behavior expected?

3. RE: PowerHA Failover Test: Is this behavior expected?

4. RE: PowerHA Failover Test: Is this behavior expected?

5. RE: PowerHA Failover Test: Is this behavior expected?

Additional Resources

Office

Quick Links

Additional
Resources