WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #72: Step Events

By David Follis posted Wed January 08, 2020 07:41 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

We started sort of from the outside with Job Instance events and then moved inward to Job Execution events.  Now we’ll go inside the job to events generated at the Step level.

As each step begins a message is published to the topic for started steps.  As before, the message has properties identifying the job instance and execution.  But since we are at the step level there is also a step execution identifier included as a property.  This isn’t something you can know ahead of time but having seen the property for the start of the step it would make it easier to find other step-related messages.

The content of the message is the same JSON string returned by the REST interface when asking for information about the step (using the step execution id). 

As with the job itself, a message is published to an appropriate topic as the step ends.  Which topic depends how the step ends.  There is a completed topic for steps that end normally.  This may be the final step in the job or flow might transition from here to another step.

If things didn’t go well, a message is published to the failed topic for steps.  A failed step will cause the job to end and might be a handy thing for a monitor to recognize and raise an alert about.

If the job was stopped, either because an operator stopped it or because the job internally stopped itself (remember that end-of-step flow transitions can include stopping the job) then a message is published to the step stopped topic.

The final step-related topic is checkpoint.  Every time a chunk step takes a checkpoint, a message is published to this topic.  Some chunk steps might have a fixed (or at least predictable) number of checkpoints.  Monitoring the checkpoint events would then give you a count of checkpoints against the known total and the ability to present a progress bar for the step in a monitor.

However, this might not always be possible.  Of course, some jobs might have a varying number of checkpoints depending on how chunking is done (time based vs. item count based will make it hard to predict).  Error handling within the job can influence the checkpoint count also.  If retry-rollback exceptions are defined and one is thrown the step will rollback to the last checkpoint and then retry, checkpointing one record at a time, until the problem record is passed.  This process would substantially throw off the number of checkpoints seen from what was expected.  Still, it is a neat idea in cases where it will work.

Next time we move further inside the step to partitioned steps.