Automation with Power

Power Business Continuity and Automation

Connect, learn, and share your experiences using the business continuity and automation technologies and practices designed to ensure uninterrupted operations and rapid recovery for workloads running on IBM Power systems.

#Power
#TechXchangeConferenceLab

#Servers

View Only

Back to discussions

Expand all | Collapse all

Application scripts

Archive User03/16/09 10:22 AM

Originally posted by: Casey_B PowerHA allows any script to be configured to start, stop and even ...

Archive User04/27/09 11:10 AM

Originally posted by: sfoster I've found that you should only put CRITICAL applications in the ...

1. Application scripts

Like
Archive User
Posted 03/16/09 10:22 AM

Reply
Originally posted by: Casey_B

PowerHA allows any script to be configured to start, stop and even monitor
an application.

This ability provides a lot of power, but can also cause a lot of problems.

Here are some of the roadblocks that I have personally seen, and
some hints on avoiding them.

Please share any problem that you have seen, or tips that you have!

I hope that this is helpful.

Logging.

Logging is so basic that everyone does it to some degree. Here are some things to consider to making logs more useful.

Don't only log failures, make sure to log progress, and times.

Take for example this made up script

#!/bin/ksh
# Program: App stop script
load_database_environment
open_database_connection
stop_database_command
if [ $? ne 0 ]
then
print "ERROR! my database wont stop"
exit 1
fi
If the stop script hangs, then there is little in your log to determine if it is the load_database_environment,
of if it is the open_database_connection command that hung. ( Or even the stop_database command! )

If you use ksh, "set -x" is a good feature to read up on.

Another useful ksh feature is the $SECONDS built in variable

If you are looking at your application script logs, expect that you will be looking through alot of logs.

Make sure that you can easily determine what your important log entries are.

My personal favorite method is to use prefixes to each line using PS4 with ksh

I also personally use a prefix for every log entry to show if it is an error, or a warning, or just informational.

Assume nothing.

High availability is the art of performing the best possible action when the worst possible scenario.

When you have a hardware failure, things may work differently than in normal testing.

.h4 Specific examples and considerations:

Storing your libraries, or executables in an NFS mounted directory may be problematic, especially if the NFS mount is not controlled by PowerHA.

Consider the case where your application libraries and executables are stored in an NFS mounted directory.

For convenience, you added the directory to root's PATH, and LIBPATH. (Through /.profile, or maybe even /etc/environment)

LIBPATH=/nfs_mounted:$LIBPATH
PATH=/nfs_mounted:$PATH

Now assume that the network connection to the NFS directory fails. At this point, even "ls" may appear to hang!

Have a secondary plan for stopping the application when the normal method fails.

What if all your application's executables are missing on the node?

Would you want PowerHA to wait until you could sort it out? Would you want to manually kill your processes?

Scripts may not perform in the same way on the command line as automated.

Maybe the "ops" user id is used by PowerHA to stop applications.

Maybe also it is used by the application administrators for interactive login

The application administrators want to see the following prompt when they login:

$ su - ops
WARNING: You are on the production machine, please hit enter if you want to continue!!

Now imagine the following in the application stop script:
#/bin/ksh
#Program: App stop script 2
su - ops -c /apps/bin/stop_app

-To expand on the previous example, the operators worked out how to avoid the prompt in non-interactive mode.

Now one of the ops added the following into the ops .profile:

if [ ~~-e /var/apps/log/something ]~~
then
echo "WARN: We have to do something with something."
echo "WARN: Or maybe we have new mail!!"
fi

Now consider the revised application stop script:

#/bin/ksh
#Program: App stop script 3
result=$(su - ops -c /apps/bin/stop_app)
if [ -n $result ]
then
echo "ERROR: stop_app returned a message, must be an error!"
fi

This application stop script would work in testing, but fail once the ops user got some mail, or the .profile printed anything to the screen.

Leave nothing

Make sure that the application leaves nothing behind.

Even if your stop script performs well under normal test conditions,

your application may fail to stop processes, remove shared memory segments, remove ipc sockets, and unload shared libaries.

Know what processes are used by the application, and make sure to kill any of them left after a normal stop.

You can check for shared memory, and ipc sockets with "ipcs"

Any that appear to belong to your application, and are not used can be deleted with "ipcrm"

Unused shared libraries can be unloaded from memory by using "slibclean"

slibclean is a fairly safe command to run.

Be careful with kills

When you kill processes as mentioned above, make sure you never kill more than you need.

I wrote a stop script that killed everything running under the database user id.

While I was working together with the database administrators to return a system to working order...
I was logged in under my id, and they were working through logs under the database user id.
They said to stop the database while they looked through the logs...
Seconds later, I heard a "Heeeeyyy...I got kicked off of the system" :)

grep -w can help with preventing wrong kills

For example:
"ps -ef | grep db2" will return lines with db2, db2das, db2prod, db2test, etc.....
"ps -ef | grep -w db2" will only return lines with "db2" as a seperate word.
.h2 What other common problems have you seen? What other tips do you have to share about writing application scripts in PowerHA?
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX
2. Re: Application scripts

Like
Archive User
Posted 04/27/09 11:10 AM

Reply
Originally posted by: sfoster

I've found that you should only put CRITICAL applications in the start script. If there is a chance that something could prevent the HA fall over, I use another method for starting these (ITM log monitor on the HA fallover, then Take Action to run my TSM startup, restart cron, license managers, etc.).
#PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
#PowerHAforAIX

Automation with Power

Power Business Continuity and Automation

Application scripts

Archive User03/16/09 10:22 AM

Archive User04/27/09 11:10 AM

1. Application scripts

Logging.

Assume nothing.

2. Re: Application scripts

Office

Community Links

IBM Links

Automation with Power

Power Business Continuity and Automation

Application scripts

Archive User03/16/09 10:22 AM

Archive User04/27/09 11:10 AM

1. Application scripts

Logging.

Assume nothing.

2. Re: Application scripts

Related Content

HACMP and NFS issue

HA Edition for NFS tiebreaker

HACMP/PowerHA resources

Assistance with NFS and PowerHA 6.1 setup

Influence on PowerHA by the snmpd stop

Office

Community Links

IBM Links