WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #34: Get Comfortable and Go Back In Your Mind with Retry-Rollback Processing

By David Follis posted Wed March 20, 2019 09:21 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed


Last time we talked about basic retry processing.  In that scenario we had a problem and simply went back and tried to process the same record again.  This time we’re going to go back farther.  Relax and watch the spinning pendant….back you go…farther and farther…wait..not that far…just to the last checkpoint!

(ok, so…song title…. I’m going with “Back in Time” by Huey Lewis, but I’m sure there are others)

Right, so retry-rollback processing undoes whatever we’ve done since the last checkpoint and then tries again.  We’ll close the reader and writer and rollback the current chunk transaction.  That will undo any transactional things we’ve done in this chunk.  Of course, if you aren’t making transactional updates (maybe just writing things at the end of a file) you’ll need to manually go back and undo whatever you did – which might mean retry-rollback processing isn’t for your scenario.

After the rollback, the open method for the reader and writer will be called again.  Remember that these methods get provided the checkpoint data from the last checkpoint which should enable them to reposition themselves to take up where they left off.

Here comes the weird part.  Chunk processing begins again with a call to the reader and the processor, but then we call the writer and immediately checkpoint.  No matter how you have configured your job to checkpoint (item based, time based, or via an algorithm) on retry-rollback checkpointing will occur with an item-count of one until we get back to where we failed. 

Suppose you had a job with item-count=10 and failed processing item number 13.  There was a checkpoint at item 10 and the JSL says to retry-rollback this exception.  The chunk will roll back to the checkpoint at item 10 and then read/process/write/commit for item 11, then item 12, then item 13.  Once we get past the item that caused the retry-rollback before, then things return to normal and we’ll checkpoint again 10 items later (item 23). 

Be careful of this if you are expecting some specific number of checkpoints from the job.  Suppose you have 100,000 items and checkpoint every 1000 items.  You would normally expect 100 checkpoints.  But if you fail 500 items into a chunk and have the job configured to retry-rollback that failure, you will have 500 checkpoints just getting back to spot of the failure. 

The reason for this behavior is to allow you to ‘sneak up’ on the problem record and try to commit as many good records as possible before you hit the bad one again. 

When I snap my fingers, you’ll wake up and have a great day….1….2….3…..<snap>