WebSphere Application Server Network Deployment Health Policies are a powerful tool for maintaining a stable environment. Part of the Intelligent Management suite of capabilities, they enable sporadic issues to be debugged much more easily, allow for monitoring of runtime conditions to ensure a good running environment, and can be temporarily used during periods of high failure to take automatic corrective action while efforts are made for more permanent solutions.
These capabilities can be enabled easily through the creation of Health Policies, but there are some things to keep in mind when doing so. Health Policies are monitored by the "Health Controller" which is a process that runs on a cycle and checks for violations of each health policy after every cycle. This Health Controller has 3 settings which can greatly impact the effectiveness of health policies in your environment.
-
Control Cycle Length: This setting defaults to 5 minutes which is sufficient for most use cases. However, in some cases you might be experiencing rapid changes in your environment, for example you might escalate from 20% cpu usage to 100% within a minute. If that is the case, then it is unlikely the health policy will be able to react in time. You can reduce the control cycle length in order to have it react faster to violations in such cases. On the opposite side, having too short of a control cycle can cause performance issues as the controller is constantly checking for updates. It also increases the volatility of the health policies as they will be reacting to smaller periods of time and thus be more susceptible to "blips" that would not normally trigger the policy.
-
Restart Timeout: This setting can be adjusted to accommodate long server start times. The default is 5 minutes which should be enough, but in the case that your servers take longer than 5 minutes to start, this setting can be adjusted to allow for enough time to complete the restart.
-
Maximum Consecutive Restarts: This setting specifies the maximum number of times the controller will try to revive an application server once a restart is performed. The default value is 3 tries. This setting is used to prevent the controller from repeatedly performing ineffective actions and slowing the environment down further. This can cause some confusion if you don't know about it as it can disable the actions of a health policy which you might expect to trigger later. You can adjust this setting if you have a frequently hit policy that has problematic server restarts, but generally we recommend to not use health policies to solve long term systematic issues.
You can find all of these settings in the Admin Console of WebSphere Application Server Deployment Manager. Navigate to Operational policies > Autonomic Managers > Health Controller to adjust them.
Another thing to note with health policies is that they are frequently run against systems that are having performance issues. This can lead to some problematic behavior of the health policy because the very fact that it triggered means there is probably some problem that could prevent the actions for the health policy from working. We see this occasionally with the "Restart Action" that is often used on health policies. The default behavior of the restart action is to request a "nice" shutdown of the JVM in violation. This can stall indefinitely if the JVM is having performance issues. A workaround for this is to bypass the default restart behavior by using a Custom Health Action that will do a forced shutdown on the JVM. You can read more about custom health actions here: https://www.ibm.com/docs/en/was-nd/8.5.5?topic=policies-creating-health-policy-custom-actions
The following is an example of how I created a "Forced Restart" custom health action which may be helpful:
Here are the steps I took to get the restart custom action working.
-
I created the folder "customScripts" in the WAS_HOME directory on the system.
-
I created the file "healthScript.sh" inside the "customScripts" folder, gave it execute permissions and made its' file contents the following:
#!/bin/sh
echo "running: /opt/Moonstone/WAS/bin/wsadmin.sh -lang jython -user $userID -password ***** -f wsadminScript.py $server $node $cell"
/opt/Moonstone/WAS/bin/wsadmin.sh -lang jython -user $userID -password $password -f wsadminScript.py $server $node $cell
where "/opt/Moonstone/WAS" is my WAS_HOME directory
- I created the file "wsadminScript.py" inside the "customScripts" folder, gave it execute permissions and made its' file contents the following:
print("Running wsadminScript.py")
print AdminControl.stopServer(sys.argv[0], sys.argv[1], 'immediate')
print("Stop Completed")
print AdminControl.startServer(sys.argv[0], sys.argv[1])
print("Start Completed")
-
In the admin console, I went to the page "Operational policies->Custom Action" and clicked the "New Action" button. I then selected "non-java action" and clicked next
-
I filled out the properties as follows and then saved the custom action:
Name: customHealthAction
Executable: /opt/Moonstone/WAS/customScripts/healthScript.sh
Name of a variable for referencing a user name in executable arguments: userID
The user name to be substituted for the user name variable at invocation time: user1
Name of a variable for referencing a password in executable arguments: password
The password to be substituted for the password variable at invocation time: password for the given user
Supported on Operating Systems of type: UNIX
Working directory: /opt/Moonstone/WAS/customScripts
- I created the health policy for the desired health condition and added the custom action "customHealthAction" to the policy. I then set the response to automatic.
After setting this up I was able to see the activity of "customHealthAction" in the file "/opt/Moonstone/WAS/profiles/node1/logs/customHealthAction_native_stdout.log" which had the following contents after the health action triggered:
===============================================================
running: /opt/moonstone/WAS/bin/wsadmin.sh -lang jython -user user1 -password ***** -f wsadminScript.py server1 node1 ndcell
WASX7209I: Connected to process "dmgr" on node dmgr using RMI connector; The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[server1, node1, ndcell]"
Running wsadminScript.py
WASX7337I: Invoked stop for server "server1" on node "node1"; Waiting for stop completion.
WASX7264I: Stop completed for server "server1" on node "node1"
Stop Completed
WASX7262I: Start completed for server "server1" on node "node1"
Start Completed
By tuning these settings and using the restart action workaround, you should be able to ensure a stable environment. Health Policies are an excellent tool to help debug intermittent issues and quickly react to undesired performance. Being aware of some of the nuances behind these policies means that you can now use them more confidently and to greater effect.
You can find more information about WebSphere Health Management here: https://www.ibm.com/docs/en/was-nd/9.0.5?topic=managers-configuring-health-management
This blog was created by Brad Mayo, developer on the WebSphere Intelligent Management Team