Originally posted by: SystemAdmin
Hello thanks for reading my post,
I have a PowerHA 5.5 SP03 cluster with two nodes on AIX 6.1 TL2 SP4 in a Active/Passive config.
Everything works just fine except that the cluster status remains UNSTABLE after a fallback.
To correct this I have to manually run the Recover from Script Error Option in Smit, just after doing this appears some info in the hacmp.out log.
Here is an extract of the final part of HACMP.OUT:
WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 180 seconds. Please check cluster status.
WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 210 seconds. Please check cluster status.
WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 240 seconds. Please check cluster status.
WARNING: Cluster Auto_Ambar has been running recovery program 'TE_JOIN_NODE_DEP_COMPLETE' for 270 seconds. Please check cluster status.
:check_for_site_up_complete
+54 [
high = high ] :check_for_site_up_complete
+54 version=1.4
:check_for_site_up_complete
+55 :check_for_site_up_complete
+55 cl_get_path
HA_DIR=es
:check_for_site_up_complete
+57 STATUS=0
:check_for_site_up_complete
+59 set +u
:check_for_site_up_complete
+61 [ ]
:check_for_site_up_complete
+72 exit 0
config_too_long: Event 'TE_JOIN_NODE_DEP_COMPLETE' on Cluster Auto_Ambar Completed Successfully.
Aparently something is hanging the check_for_site_up_complete event during fallbacks.
Any help/clues appreciated.
Thanks in Advance,
Angel Aponte
Venezuela
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum#PowerHAforAIX