IBM Workload Automation & Workload Scheduler

IBM Workload Automation & Workload Scheduler

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Leon's WA Waypoints - monitor events

By Leon Odenbrett posted Thu March 27, 2025 07:26 AM

  

Leon’s WA Waypoints: Monitoring Workload Automation Events

Monitoring events in IBM Workload Automation (IWA) is a critical component of maintaining a stable and efficient scheduling environment. Understanding event errors and implementing proactive monitoring strategies can help prevent disruptions and ensure smooth operations. In this post, we'll explore the two primary types of event errors and best practices for monitoring and troubleshooting them.

Understanding Event Errors

When monitoring events, it’s essential to recognize two categories of errors:

  1. Event Failure: The event fails to trigger due to an issue in the event monitoring process.
  2. Unexpected Event Action Result: The event triggers successfully but encounters an issue executing its designated action.

Let’s look at an example where an event is set up to trigger when the file /tmp/leon.txt appears, launching the job stream CPU1#LEONSCHED.

Situation 1: Event Failure (Event Engine Stops Working)

In this case, the event monitoring process encounters an error, preventing the event from triggering. Specifically, if monman (the event monitoring process) crashes, no events will be processed until it is restarted. To resolve this:

  • Restart monman using the command:

conman startmon

  • Since monman is responsible for monitoring, it cannot detect its own failure. Therefore, use an external monitoring tool like HP OpenView (HPOV) or any enterprise monitoring solution to keep track of monman, JobManager (for dynamic agents), and netman/batchman (for Fault-Tolerant Agents).
  • Additionally, implement an event to monitor your external monitoring tool (e.g., setting up an event to check HPOV status).
  • For debugging, examine the logs located in:

{TWSinstall location}/TWSDATA/stdlist/appserver/engineServer/logs

Specifically, review PlanEventMonitor.log* and messages.log for relevant errors.

Situation 2: Unexpected Event Action Result

Here, the event triggers successfully but fails to execute the expected action. For example, if the event detects the file /tmp/leon.txt, it attempts to submit CPU1#LEONSCHED. However, if the job stream no longer exists, the action fails, resulting in an error message in the logs.

Example error message:

[3/26/25, 14:36:47:040 CET] 0003ca5a com.ibm.tws.cli.events.command.CommandFactory                I AWSJCL554E Centralized agent update cannot be performed for the agent with operating system: LINUX_X86_64-X86_64 and version: 10.2.1.0 because there is no zip file available in the depot folder: /opt/WA/ws/TWS/depot/agent or the zip file is not readable.

[3/26/25, 14:36:47:044 CET] 0003ca5a com.ibm.tws.cli.exception.ExceptionHelper                    E AWSJCL008E The server has encountered an unexpected error communicating with the client.

[3/26/25, 14:36:48:261 CET] 00000089 com.ibm.tws.event.EventProcessorManager                      I AWSEVP001I The following event has been received: event type = "UPGRADE"; event provider = "GenericEventPlugin"; event scope = "on EU-HWS-LNX277_D".

To proactively monitor such errors:

  • Set up an event to watch for specific error messages like AWSJCL008E.
  • Configure an alert (e.g., email notification) to notify the appropriate support team.
  • Review messages.log in the same log directory mentioned earlier for detailed insights.

Can Events Monitor Themselves?

Yes, events can monitor their own execution, ensuring the event engine remains operational. However, if the event engine crashes, it will be unable to monitor itself. This is why external monitoring solutions are essential for detecting critical failures.

Conclusion

Proactive event monitoring in IBM Workload Automation ensures smooth operations and minimizes disruptions. By distinguishing between event failures and unexpected action results, setting up external monitoring for key components, and configuring event-based alerts, administrators can maintain a resilient workload automation environment.

Stay tuned for more insights on best practices in workload automation! Let me know in the comments if you have specific monitoring challenges you’d like to discuss.

— Leon Odenbrett


#IBMChampion
0 comments
8 views

Permalink