Automation with Power

Power Business Continuity and Automation

Connect, learn, and share your experiences using the business continuity and automation technologies and practices designed to ensure uninterrupted operations and rapid recovery for workloads running on IBM Power systems.

#Power
#TechXchangeConferenceLab

View Only

Back to discussions

Expand all | Collapse all

PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

1. PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Like
Archive User
Posted Thu July 30, 2009 12:26 PM

Reply
Originally posted by: SystemAdmin

Hello thanks for reading my post,

I have a PowerHA 5.5 SP03 cluster with two nodes on AIX 6.1 TL2 SP4 in a Active/Passive config.

Everything works just fine except that the cluster status remains UNSTABLE after a fallback.

To correct this I have to manually run the Recover from Script Error Option in Smit, just after doing this appears some info in the hacmp.out log.

Here is an extract of the final part of HACMP.OUT:

WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 180 seconds. Please check cluster status.
WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 210 seconds. Please check cluster status.
WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 240 seconds. Please check cluster status.
WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 270 seconds. Please check cluster status.
:check_for_site_up_complete+54 [ high = high ]
:check_for_site_up_complete+54 version=1.4
:check_for_site_up_complete+55 :check_for_site_up_complete+55 cl_get_path
HA_DIR=es
:check_for_site_up_complete+57 STATUS=0
:check_for_site_up_complete+59 set +u
:check_for_site_up_complete+61 [ ]
:check_for_site_up_complete+72 exit 0
config_too_long: Event 'TE_JOIN_NODE_DEP_COMPLETE' on Cluster Auto_Ambar Completed Successfully.

Aparently something is hanging the check_for_site_up_complete event during fallbacks.

Any help/clues appreciated.

Thanks in Advance,
Angel Aponte
Venezuela
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX
2. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Like
Archive User
Posted Mon August 03, 2009 12:41 PM

Reply
Originally posted by: Holgervk

what does
/usr/es/sbin/cluster/utilities/cllscustom
tell?
#PowerHAforAIX
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
3. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Like
Archive User
Posted Mon August 03, 2009 12:41 PM

Reply
Originally posted by: Holgervk

what does
/usr/es/sbin/cluster/utilities/cllscustom
tell?
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX
4. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Like
Archive User
Posted Mon August 03, 2009 04:58 PM

Reply
Originally posted by: Casey_B

Hello Angel,

My guess is that any error happens much earlier than the log entries that you quoted.

It looks like site_up_complete runs from start to complete, and doesn't show any signs of being hung in any way.

The end of the previous script, right before config too long occurs might show what is occurring.
You will also have to look on all nodes. The cluster might be waiting for one node to complete a step of
the event before allowing the others to continue.

You can look for the string "ERROR !!!" for one hint.
You can also look for any exit command that is non-zero.

(Particularly look for your application start script, and it's associated monitor,
if you don't have logging specific to your application start scripts, now would be a really
good time to add that logging)

Or this would also be a good call for IBM support.
Hope this helps,
Casey
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX

Automation with Power

Power Business Continuity and Automation

PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Archive UserThu July 30, 2009 12:26 PM

Archive UserMon August 03, 2009 12:41 PM

Archive UserMon August 03, 2009 12:41 PM

Archive UserMon August 03, 2009 04:58 PM

1. PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

2. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

3. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

4. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Additional
Resources

Office

Quick Links

Automation with Power

Power Business Continuity and Automation

PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Archive UserThu July 30, 2009 12:26 PM

Archive UserMon August 03, 2009 12:41 PM

Archive UserMon August 03, 2009 12:41 PM

Archive UserMon August 03, 2009 04:58 PM

1. PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

2. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

3. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

4. Re: PowerHA 5.5 Cluster Status Stays UNSTABLE after Fallback

Related Content

cluster unstable

PowerHA 5.5 not Unmounting Filesystems

PowerHA 5.5 not Unmounting Filesystems

cluster site inter dependencies

Re: cluster unstable

Additional Resources

Office

Quick Links

Additional
Resources