WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #84: The Multi-Server Batch Configuration – Bad, Bad Messages

By David Follis posted Wed April 01, 2020 07:54 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

One more time – suppose we have a Liberty server configured as a batch dispatcher and another configured as an executor.  We use the REST interface to submit a job to the dispatcher who puts a message on the queue which the executor picks up.  Then….a bad thing happens.

Ok, it depends on the kind of bad thing.  Generally speaking if the message gets picked up and the job starts to run but something goes wrong then the job will be marked failed and the message is considered processed.

What’s that about the message being processed?  Ok, remember that for throttling to work the message is ‘being processed’ by the server the entire time the job is running.  Most things that can go wrong will result in the Job Repository being updated with the job marked as failed and the batch code will finish processing the message and all is well.

But.  There are some scenarios where bad things can happen along the way to executing the job, before the Job Repository is updated to know the job started dispatch, and an exception gets thrown clear out of the batch code and the message is unhandled.  What happens next depends how the messaging engine handles these cases and possibly how it is configured. 

Some messaging engines will handle a failure to process a message by just re-delivering it.  That’s really awesome if whatever went wrong was some transient thing and the jobs gets to run on the second try.  But what if the problem is in the message itself?  What if there’s just something, somehow, wrong with it? 

It may be possible to configure a backout threshold for the batch job queue in the messaging engine (such as MQ).  If it is possible, this tells the messaging engine how many times to attempt to re-deliver a message that failed being processed before giving up on it.  You can also specify a backout requeue queue where such bad messages get parked. 

This configuration of the messaging engine will prevent an endless loop of the server trying and failing to process the same bad message over and over.  Instead it will try a few times (backout threshold value) and then give up.  The message is handy in the backout requeue queue so you or your support organization can look at it to figure out what the fuss was all about.