AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

View Only

Back to discussions

Expand all | Collapse all

Hacmp: unmount of filesystem even application stop script returncode != 0

1. Hacmp: unmount of filesystem even application stop script returncode != 0

Like
Archive User
Posted Sat March 21, 2009 05:59 AM

Reply
Originally posted by: Ayaz_Anjum

Folks,

I am observing following behaviour of hacmp.

IF application stop script returns nonzero return code, then hacmp continue to unmount the filesystem. This behavious is undesirable bcz in case application failed to stop hacmp should abort the process of brining the resource group offline and should not unmount the filesystems.

We are using hacmp 5.4.1 with aix 5.3.07. There is only application start and stop scripts and no application monitoring scripts

i can think of commenting of fuser -c -k in hacmp events script, but to me this does not look like a clean solution

any suggesions

thanks
2. Re: Hacmp: unmount of filesystem even application stop script returncode != 0

Like
Archive User
Posted Mon March 23, 2009 11:50 AM

Reply
Originally posted by: Sprellster

Why would you want the cluster to abort the resource group movement in this case? The whole purpose of the cluster is to move resources (sometimes forcefully) between machines if they become inoperable for some reason?
3. Re: Hacmp: unmount of filesystem even application stop script returncode !=

Like
Archive User
Posted Mon March 23, 2009 01:17 PM

Reply
Originally posted by: Casey_B

Hello,
I understand the choice to have manual intervention when your application stop script fails.
( meaning that if you really want PowerHA/HACMP to force the application down, you exit with a zero, and PowerHA/HACMP will do
it's best to continue on...otherwise you would like to give yourself a chance to handle an unexpected condition manually)

It would be best to try and code for all conditions in your stop script, though.

More information about your environment is needed to give you an answer:

What level of HACMP 5.4.1 are you running? (What service pack)
When do you see the behaviour? With a fallover, or a resource group move when the releasing node will still be active, or a
resource group move when the releasing node will not be running cluster services afterwards...

There is an APAR IZ08308 in 5.4.1 that standardizes the behaviour between types of resource group moves.
This may change the behaviour to what you want.

Otherwise, I would agree that commenting out fuser would not be a clean workaround. You are exposing
yourself to changes when you apply service packs.

I will look further at this.

Also, There is a new PowerHA/HACMP forum starting.
There is not much information currently, and the forum hasn't been listed on the main developer works
forum, but it is open for new topics, and I would like to duplicate the question/response on that
forum once this is answered:

http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1611

Thanks,
Casey
4. Re: Hacmp: unmount of filesystem even application stop script returncode !=

Like
Archive User
Posted Wed March 25, 2009 04:46 AM

Reply
Originally posted by: Ayaz_Anjum

thanks,

The reason for this behaviouring being underable is bcz of datacurruption is that there is a chance of datacurrption in case of databases. I have experience this with out SAP application, when the stop script failed to bring down the database and returned non-zero code and then hacmp failed over the filesytem - which resulted in oracle crash.

Pl see the following
http://www-01.ibm.com/support/docview.wss?uid=isg1IZ10028
apparently its waiting for DCR

Coding all scenarios in application stop script is not easy cause there can be many reasons for example why oracle is not shutting down, and the best would be to leave it to DBA's to figure out how to stop database if script is failing to do so.

We are usnig SP4

thanks, Ayaz
5. Re: Hacmp: unmount of filesystem even application stop script returncode !=

Like
Archive User
Posted Wed March 25, 2009 09:27 AM

Reply
Originally posted by: Casey_B

Oracle should be pretty good about recovering from a forced exit.
Most enterprise level databases have transaction logging to be able to recover
the consistency of the database when killed.

From personal experience, I seem to remember that db2 was pretty good at recovering after being killed.
(In my clusters, if you didn't stop normally, you were killed, and shared memory removed, etc...)

db2 needed to be started with a flag to use those logs, and recover...
Maybe your Oracle start scripts aren't set up with the right flags to recover the database?

Back to a work around for your problem with the current design of HACMP....
I still think editing the HACMP scripts is the wrong thing to do.

Maybe, if your application stop script is going to fail....and fail in a way that you don't want
any further processing to occur....

Then maybe you send a page, email, etc, print a big message to the logs (The big message is important, so that you don't
have a co-worker forget what you had configured)...and stop the script from completing.

Maybe something like this:

echo "ERROR ERROR, stopping script execution"
read

or maybe
echo "ERROR ERROR....House is falling down"
sleep 99999

The script would wait for input on the standard in that it will never get.
Or it would wait for a very long time.
The cluster would wait for the script and enter into "config too long"

Why do I say "maybe" so many times? :)

Although I can understand making a choice for manual interventions...
There are some real dangers with not continuing with the fallover.

Even if the other node was able to start the application, and continue running...it wouldn't.
- This means possible longer times to recovery.
The application has not been killed, but is not running well.
- With the application not killed, it could be accepting incoming connections, not able to write to the
disk, and just losing data.

You have to evaluate anything I say with understanding of your environment, and your application.

Hope this helps
Casey

PS. Can I move some of this information into the PowerHA forum?
6. Re: Hacmp: unmount of filesystem even application stop script returncode !=

Like
Archive User
Posted Wed March 25, 2009 03:37 PM

Reply
Originally posted by: Ayaz_Anjum

Sure Casey, move the thread to HACMP forum.

Well i would say, capturing all the scenarios of application failure in the script and corresponding recovery action is not easy especially when the system administration and application administration are with different teams. And azthis requires fair knowledge of application as well.

thanks

Ayaz

AIX

AIX

Hacmp: unmount of filesystem even application stop script returncode != 0

Archive UserSat March 21, 2009 05:59 AM

Archive UserMon March 23, 2009 11:50 AM

Archive UserMon March 23, 2009 01:17 PM

Archive UserWed March 25, 2009 04:46 AM

Archive UserWed March 25, 2009 09:27 AM

Archive UserWed March 25, 2009 03:37 PM

1. Hacmp: unmount of filesystem even application stop script returncode != 0

2. Re: Hacmp: unmount of filesystem even application stop script returncode != 0

3. Re: Hacmp: unmount of filesystem even application stop script returncode !=

4. Re: Hacmp: unmount of filesystem even application stop script returncode !=

5. Re: Hacmp: unmount of filesystem even application stop script returncode !=

6. Re: Hacmp: unmount of filesystem even application stop script returncode !=

Additional
Resources

Office

Quick Links

AIX

AIX

Hacmp: unmount of filesystem even application stop script returncode != 0

Archive UserSat March 21, 2009 05:59 AM

Archive UserMon March 23, 2009 11:50 AM

Archive UserMon March 23, 2009 01:17 PM

Archive UserWed March 25, 2009 04:46 AM

Archive UserWed March 25, 2009 09:27 AM

Archive UserWed March 25, 2009 03:37 PM

1. Hacmp: unmount of filesystem even application stop script returncode != 0

2. Re: Hacmp: unmount of filesystem even application stop script returncode != 0

3. Re: Hacmp: unmount of filesystem even application stop script returncode !=

4. Re: Hacmp: unmount of filesystem even application stop script returncode !=

5. Re: Hacmp: unmount of filesystem even application stop script returncode !=

6. Re: Hacmp: unmount of filesystem even application stop script returncode !=

Related Content

HACMP 5.4.1 concurrent with DS3400

HACMP 5.4.1 AIX 5.3 Installation pre-requisites

HACMP 5.4.1->5.5 offline upgrade - different instance numbers ?

Hacmp clstop hung

HACMP

Additional Resources

Office

Quick Links

Additional
Resources