WebSphere Health Policies are a powerful, proactive monitoring solution for maintaining the health of your server environment. In this post, we'll take you through the steps of creating and applying a custom health policy. If you're new to Health Management in WebSphere, check out this prior post for an overview on Health Management. If you're interested in basic health policies, check out this prior post here.
The first step in policy creation is navigating to the Health Policies page (Operational Policies -> Health Policies). To create a new policy, click on the 'New..." button.
On the next page, specify your policy name and description, and choose "Custom health condition". Click "Next".
The next page is where a custom health policy is built. The top half of the page is where custom conditions are constructed. Similar to predefined conditions, the lower half of the page has options for how health actions are taken when a condition is breached.
To begin creating a custom condition, click on "Subexpression Builder". This will cause the subexpression builder menu to pop up.
Let's break down the subexpression build menu before we start constructing expressions.
Logical Operator: Used to join expressions together; options are "AND" and "OR". Only used if your subexpression consists of more than one expression. "AND" will cause the condition to trigger if both conditions are met; "OR" will trigger if one of the conditions are met. This option is used by the Append button add a new subexpression onto an existing one.
Select Operand: These are essentially the class of statistics available for custom health conditions. These are:
- PMI (Performance Monitoring Infrastructure) statistics
- ODR (On Demand Router) statistics (Available if topology includes an ODR)
- Mbean based conditions/ statistics
- URL return code metrics
Most operands can be selected as "From server start" or "Last reported interval". "From server start" uses an average number of the reported values from the time that the server started; "Last reported interval" uses an average of the reported values in the last interval. The interval is the length of the health controller cycle.
Mbean operands can be selected for either a Long or String return type.
Depending on the operand chosen, a wide variety of statistics are available (these are these next 1-2 fields listed after "Select Operand" ). For a detailed breakdown of metrics, visit the following Knowledge Center page: https://www.ibm.com/docs/was-nd/9.0.5?topic=policies-custom-health-condition-subexpression-builder
Operator: The operator connects the condition statistic chosen for "Select Operand" with a value. For example, if you want the policy to trigger when CPU exceeds 85%, you'd want to choose the Greater Than operator.
- Equals: The equality operator expresses a case-sensitive match.
- Not Equals: The not equal operator expresses that the operand value is not equal to the value you enter.
- Greater Than: The greater-than operator is for use with numbers.
- Greater Than or Equals: The greater-than or equal to operator is for use with numbers.
- Less Than: The less-than operator is for use with numbers.
- Less Than or Equals: The less-than or equal to operator is for use with numbers.
- Between: The value must be between a Lower bound and Upper bound that you specify.
- In: The value must be in a list of values. You can type in values and add them to a list.
Value: The value represents the threshold for your policy. Using the example of a policy that triggers when CPU exceeds 85%, the value for this policy would be 85.
Generate Subexpression: This button will take the values chosen for operand, operator, and value to construct a custom condition subexpression. This subexpression will appear in the Subexpression box below "Generate Subexpression"
Append: This button will take the subexpression from the Subexpression box and will paste it into the "Run Reaction Plan When" box on the original page (outside of the Subexpression builder menu). If any subexpressions are already present in "Run Reaction Plan When", Append will use the value from Logical Operator to join the new subexpression to the existing one.
Example: Creating a Policy when CPU utilization is over 80% and more than 2 Hung Threads
Let's walk through creating a subexpression for a custom condition- we want a policy that will trigger when CPU usage is over 80% and more than 2 threads are hung.
Let's start by creating the first subexpression- the condition where CPU is over 80%.
In the Subexpression Builder, select "PMI Metric". For PMI Module Name, choose "Process Module", and then choose "Process CPU utilization: Last interval" for metric name. Select the "Greater than" operator, and enter 80 in the Value box.
Click "Generate Subexpression", which should cause
PMIMetric_FromLastInterval$xdProcessModule$recentCPUUtilization > 80L
to appear in the subexpression box. Click "Append" to add this to our custom condition. Your screen should look similar to the one below.
Note: The subexpression builder validates the rule when applied, and alerts you to mismatched parentheses and unsupported logic operators.
Let's add our second condition, where more than 2 threads are hung.
In the Subexpression Builder, select "PMI Metric". For PMI Module Name, choose "Thread Pool Module", and then choose "Concurrently Hung Threads" for metric name. Select the "Greater than" operator, and enter 2 in the Value box.
Click "Generate Subexpression", which should cause
PMIMetric_FromLastInterval$threadPoolModule$concurrentlyHungThreads > 2L
to appear in the subexpression box.
Ensure that "and" is the logical operator selected, and click "Append" to add this to our custom condition.
Close out of the subexpression build; your screen should look similar to the one below, with this subexpression:
PMIMetric_FromLastInterval$xdProcessModule$recentCPUUtilization > 80L and PMIMetric_FromLastInterval$threadPoolModule$concurrentlyHungThreads > 2L
You should now know how to create and configure custom health policies to help monitor your WebSphere server environment. As a next step, you can choose the reaction mode, health actions, and policy targets in response to a policy breach, enabling self-healing capability in your environment. These next steps are the same for a custom policy and a basic policy- see this prior blog post for more information.
To learn more about Health Management in general, visit the following Knowledge Center page: