WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #24: One Chunk at a Time

By David Follis posted Wed January 09, 2019 07:27 AM


This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.
Most descriptions of JSR-352 begin with a talk about the chunk step type, so I guess it is about time we got around to discussing it.  We’ll get into details in the following posts, but we’ll start with an overview here.


The other step type, batchlet, is pretty simple.  Your application code gets control, does whatever it does, and the step ends.  A chunk step, on the other hand, involves multiple bits of application code, a loop, and transactions managed by the batch container.


The chunk step matches a pretty typical batch programming model that reads records, does some processing with each record, and then writes some results somewhere.  And the JSR-352 model breaks the application code down into those three main parts:  reader, processor, and writer. 


The batch container loops calling those application artifacts over and over.  As it does that, it creates a transaction around the processing done by the reader/processor/writer and commits that transaction at checkpoints.  How often it checkpoints is configurable a few different ways that we’ll look at later.


When the transaction commits, it will commit any updates done by the application (probably by the writer) that are transactional.  But it also gets checkpoint data from the reader and writer (like what record number we just read) and commits that into the Job Repository where the batch container is keeping track of the progress of the job.


The checkpoint data allows a job running a chunk step to be restarted and be able to pick up again at the last checkpoint.  That’s because on a restart of a chunk step the reader will be provided with the last committed checkpoint data so it can just go read the next record after that, rather than start all over again from the beginning. 


In upcoming posts we’ll take a closer look at the specifics of the reader, processor, and writer, as well as different ways to specify the checkpoint interval, how errors are handled, listeners, and more! 


Check(point) back regularly to find out more about it… (gak – can’t believe I wrote that).