This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.To start at the beginning, follow the link to the first post.
The next post in the series is here
.This series is also available as a podcast on iTunes, Google Play, Stitcher, or use the link to the RSS feed.
A chunk step reads and process records until a checkpoint is reached. Then the writer is called and the transaction that wraps the whole thing is committed. You can checkpoint after some configurable count of records have been read and processed. Or you can checkpoint after some fixed amount of time. Or you can get a lot more control by writing your own checkpoint algorithm.
You configure a checkpoint algorithm in two steps. The first part is to add an attribute to the chunk element that indicates the checkpoint-policy is custom. That causes the batch container to ignore the count and time checkpoint configuration and look for a checkpoint-algorithm element inside the chunk. That points to your implementation of the CheckpointAlgorithm interface.
The interface has three parts. The first is the checkpointTimeout method. There are probably quite a few reasons why you might want to set a different timeout value for every chunk you process. I tend to think of this is a sort of backstop to the chunk to make sure it ends in case my code making checkpoint decisions has a bug (a backstop is an America baseball expression. It is there to protect the spectators from wild pitches and other stray fast-moving balls).
The second part are the two methods that get control at the beginning and end of the chunk. They don’t have anything to do with controlling the chunk – there’s no returned value. But it is a chance to know when the chunk begins and ends (and thus how long it took).
The final part is where the action is. The aptly named isReadyToCheckpoint gets control after every read/process cycle and decides if this pass through the loop is the right time to checkpoint or not. Based on whatever criteria you like.
For our example, I decided to try to implement my own time-based checkpoint algorithm. Using timestamps collected at the beginning and end of the chunk, plus how long each pass through the read/process cycle takes, I try to determine if the step should take one more pass or go ahead and checkpoint now.
To make my life difficult, the amount of time spent in the reader, processor, and writer varies somewhat randomly up to half-a-second as I try to checkpoint every five seconds. The endCheckpoint method reports how many iterations we made and how much I missed the goal by.
Go have a look. The code is in SampleCheckpointAlgorithm.java for JSL in UsingCheckpointAlgorithm.xml.
The sample parts are here: https://github.com/follisd/batch-samples