Originally posted by: Casey_B
Hello,
First, there is a new forum specifically for HACMP/PowerHA questions.
Here is the location:
http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1611 Secondly, the question is a bit vague to give reasonable advice.
Here are some things that you can look at:
1) You need to know why the cluster is in a barrier state.
Simplified, Barrier state means that the cluster nodes are waiting for all nodes to finish the current
step in the recovery plan.
Look back before the config too long messages occur to see what the last message on each node was in the
hacmp.out
2) Killing the config too long process will not help the cluster progress any further in the recovery plan.
IF there is a process that has been hanging, and you have confirmed it is the last thing
to be run in the hacmp.out, (For example, your application stop script that appears to have never exited)
then you COULD try killing that process...the cluster recovery plan would probably continue , and enter
into a failed state. From the failed state, you can use the smit panel for "Recovery from script failure"
to continue.
In the meantime, HA might try to perform actions on your resource groups. (For example, if you kill the
application script, then HA may try to recover by unmounting the filesystems and varying off the volume groups)
It really depends what the exact cluster state it.
All of this of course if without knowing your system, or looking at your logs....
So, the information and advice could be very wrong....
If it was my cluster, I would try to manually stop all of my applications...and manually unmount all of the
filesystems, and manually vary off the volume group. (on all nodes) This way, you know that your data is not being currently
accessed, and you can work a little bit easier. Then, I would think that a reboot might be the easiest way to bring you to
a clean state.
After you are in a clean state, then you can examine your environment and logs in more detail to see what has happened
in your environment.
Or, you could call IBM support, they would love to help you, and would be able to review the logs in more detail.
Hope this helps,
Casey