AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

View Only

Back to discussions

Expand all | Collapse all

Hacmp clstop hung

Archive UserMon April 20, 2009 03:15 AM

Originally posted by: SystemAdmin I have two node Cluster on HA5.3, I am not able to stop the ...

Archive UserMon April 20, 2009 02:53 PM

Originally posted by: Casey_B Hello, First, there is a new forum specifically for ...

1. Hacmp clstop hung

Like
Archive User
Posted Mon April 20, 2009 03:15 AM

Reply
Originally posted by: SystemAdmin

I have two node Cluster on HA5.3, I am not able to stop the cluster. when i am checking the process I could see a clstop process and a config_too_long which is more than a month old. While checking the cspoc.log I could see the current state of cluster is ST_BARRIER. I just need to know whether its safe to kill the hung clstop process and config_too_long in order to bring down the Cluster. Experts pls help me.
2. Re: Hacmp clstop hung

Like
Archive User
Posted Mon April 20, 2009 02:53 PM

Reply
Originally posted by: Casey_B

Hello,

First, there is a new forum specifically for HACMP/PowerHA questions.

Here is the location: http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1611

Secondly, the question is a bit vague to give reasonable advice.

Here are some things that you can look at:

1) You need to know why the cluster is in a barrier state.
Simplified, Barrier state means that the cluster nodes are waiting for all nodes to finish the current
step in the recovery plan.

Look back before the config too long messages occur to see what the last message on each node was in the
hacmp.out

2) Killing the config too long process will not help the cluster progress any further in the recovery plan.
IF there is a process that has been hanging, and you have confirmed it is the last thing
to be run in the hacmp.out, (For example, your application stop script that appears to have never exited)
then you COULD try killing that process...the cluster recovery plan would probably continue , and enter
into a failed state. From the failed state, you can use the smit panel for "Recovery from script failure"
to continue.

In the meantime, HA might try to perform actions on your resource groups. (For example, if you kill the
application script, then HA may try to recover by unmounting the filesystems and varying off the volume groups)

It really depends what the exact cluster state it.

All of this of course if without knowing your system, or looking at your logs....
So, the information and advice could be very wrong....

If it was my cluster, I would try to manually stop all of my applications...and manually unmount all of the
filesystems, and manually vary off the volume group. (on all nodes) This way, you know that your data is not being currently
accessed, and you can work a little bit easier. Then, I would think that a reboot might be the easiest way to bring you to
a clean state.

After you are in a clean state, then you can examine your environment and logs in more detail to see what has happened
in your environment.

Or, you could call IBM support, they would love to help you, and would be able to review the logs in more detail.

Hope this helps,
Casey

AIX

AIX

Hacmp clstop hung

Archive UserMon April 20, 2009 03:15 AM

Archive UserMon April 20, 2009 02:53 PM

1. Hacmp clstop hung

2. Re: Hacmp clstop hung

Additional
Resources

Office

Quick Links

AIX

AIX

Hacmp clstop hung

Archive UserMon April 20, 2009 03:15 AM

Archive UserMon April 20, 2009 02:53 PM

1. Hacmp clstop hung

2. Re: Hacmp clstop hung

Related Content

extendvg problem in HACMP 5.3 cluster

Hacmp: unmount of filesystem even application stop script returncode != 0

Missing Filesystem/LV in /etc/filesystems in HA passive node

HACMP

unable to verify the hacmp cluster version 5.1 (NEED URGENT HELP)

Additional Resources

Office

Quick Links

Additional
Resources