Automation with Power

Power Business Continuity and Automation

Connect, learn, and share your experiences using the business continuity and automation technologies and practices designed to ensure uninterrupted operations and rapid recovery for workloads running on IBM Power systems. 


#Power
#TechXchangeConferenceLab

 View Only
  • 1.  Application scripts

    Posted Mon March 16, 2009 10:22 AM

    Originally posted by: Casey_B


    PowerHA allows any script to be configured to start, stop and even monitor
    an application.

    This ability provides a lot of power, but can also cause a lot of problems.

    Here are some of the roadblocks that I have personally seen, and
    some hints on avoiding them.

    Please share any problem that you have seen, or tips that you have!

    I hope that this is helpful.

    Logging.

    • Logging is so basic that everyone does it to some degree. Here are some things to consider to making logs more useful.

    • Don't only log failures, make sure to log progress, and times.
      • Take for example this made up script
    #!/bin/ksh
    # Program: App stop script
    load_database_environment
    open_database_connection
    stop_database_command
    if [ $? ne 0 ]
    then
    print "ERROR! my database wont stop"
    exit 1
    fi
    If the stop script hangs, then there is little in your log to determine if it is the load_database_environment,
    of if it is the open_database_connection command that hung. ( Or even the stop_database command! )

    • If you use ksh, "set -x" is a good feature to read up on.
      • Another useful ksh feature is the $SECONDS built in variable

    • If you are looking at your application script logs, expect that you will be looking through alot of logs.
      • Make sure that you can easily determine what your important log entries are.
      • My personal favorite method is to use prefixes to each line using PS4 with ksh
      • I also personally use a prefix for every log entry to show if it is an error, or a warning, or just informational.

    Assume nothing.

    • High availability is the art of performing the best possible action when the worst possible scenario.

    • When you have a hardware failure, things may work differently than in normal testing.

    .h4 Specific examples and considerations:
    • Storing your libraries, or executables in an NFS mounted directory may be problematic, especially if the NFS mount is not controlled by PowerHA.
      • Consider the case where your application libraries and executables are stored in an NFS mounted directory.
        • For convenience, you added the directory to root's PATH, and LIBPATH. (Through /.profile, or maybe even /etc/environment)
    LIBPATH=/nfs_mounted:$LIBPATH
    PATH=/nfs_mounted:$PATH
    • Now assume that the network connection to the NFS directory fails. At this point, even "ls" may appear to hang!

    • Have a secondary plan for stopping the application when the normal method fails.
      • What if all your application's executables are missing on the node?
      • Would you want PowerHA to wait until you could sort it out? Would you want to manually kill your processes?
    • Scripts may not perform in the same way on the command line as automated.
      • Maybe the "ops" user id is used by PowerHA to stop applications.
      • Maybe also it is used by the application administrators for interactive login
      • The application administrators want to see the following prompt when they login:

    $ su - ops
    WARNING: You are on the production machine, please hit enter if you want to continue!!

    Now imagine the following in the application stop script:
    #/bin/ksh
    #Program: App stop script 2
    su - ops -c /apps/bin/stop_app

    -To expand on the previous example, the operators worked out how to avoid the prompt in non-interactive mode.
    • Now one of the ops added the following into the ops .profile:
    if [ -e /var/apps/log/something ]
    then
    echo "WARN: We have to do something with something."
    echo "WARN: Or maybe we have new mail!!"
    fi

    Now consider the revised application stop script:

    #/bin/ksh
    #Program: App stop script 3
    result=$(su - ops -c /apps/bin/stop_app)
    if [ -n $result ]
    then
    echo "ERROR: stop_app returned a message, must be an error!"
    fi

    This application stop script would work in testing, but fail once the ops user got some mail, or the .profile printed anything to the screen.

    • Leave nothing
      • Make sure that the application leaves nothing behind.
      • Even if your stop script performs well under normal test conditions,
    your application may fail to stop processes, remove shared memory segments, remove ipc sockets, and unload shared libaries.
    • Know what processes are used by the application, and make sure to kill any of them left after a normal stop.
    • You can check for shared memory, and ipc sockets with "ipcs"
      • Any that appear to belong to your application, and are not used can be deleted with "ipcrm"
      • Unused shared libraries can be unloaded from memory by using "slibclean"
      • slibclean is a fairly safe command to run.

    • Be careful with kills
      • When you kill processes as mentioned above, make sure you never kill more than you need.
        • I wrote a stop script that killed everything running under the database user id.
    While I was working together with the database administrators to return a system to working order...
    I was logged in under my id, and they were working through logs under the database user id.
    They said to stop the database while they looked through the logs...
    Seconds later, I heard a "Heeeeyyy...I got kicked off of the system" :)

    • grep -w can help with preventing wrong kills
    For example:
    "ps -ef | grep db2" will return lines with db2, db2das, db2prod, db2test, etc.....
    "ps -ef | grep -w db2" will only return lines with "db2" as a seperate word.
    .h2 What other common problems have you seen? What other tips do you have to share about writing application scripts in PowerHA?
    #PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
    #PowerHAforAIX


  • 2.  Re: Application scripts

    Posted Mon April 27, 2009 11:10 AM

    Originally posted by: sfoster


    I've found that you should only put CRITICAL applications in the start script. If there is a chance that something could prevent the HA fall over, I use another method for starting these (ITM log monitor on the HA fallover, then Take Action to run my TSM startup, restart cron, license managers, etc.).
    #PowerHA-(Formerly-known-as-HACMP)-Technical-Forum
    #PowerHAforAIX