WebSphere Application Server & Liberty

Jakarta Batch Post 126: No Transaction Chunk

By David Follis posted Wed February 24, 2021 09:51 AM


This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

The issue can be found here.

This one is pretty simple.  The standard chunk processing begins a new transaction at the start of each chunk and maintains it until a checkpoint is reached, either through the JSL specifying an item count or time limit, or through a checkpoint algorithm implementation determining that it is time. 

When a checkpoint is reached the reader and writer are called to provide checkpoint information.  The row in the Job Repository for this chunk (a whole step or maybe just a partition) is updated with the checkpoint data and the transaction is committed.  Any transactional activity performed by the reader or writer (perhaps the writer inserted rows into a table) commits along with the updates to the Job Repository providing a consistent point across the application data and the checkpoint data.

But what if your application doesn’t update any transactional resources?  What if you just read from a flat file and produce some sort of report?  What if you don’t care about having a consistent restart point at all?  Maybe you just re-run the job if something bad happens.  Then all this transactional stuff is just overhead that slows things down and creates complications.

The proposal is simply that there be an option you can specify as part of a chunk step that indicates you don’t want or need a transaction wrapped around the chunk processing.  Just let it run….

Would the batch container bother to call the reader/writer to obtain checkpoint data in this case?  Probably not.  There’d be no reason to update the checkpoint data in the repository, although you could.  Maybe that’s optional on top of whether the main transaction is present or not. 

What about metrics?  There are metrics about the number of reads and writes and skips that happen that are updated at each chunk.  Would those still get updated in a small immediately-committed update? 

At some point you’ve removed enough value from a chunk step that perhaps it should just be a batchlet with the loop in the application code.  At least that’s what I think.

Chime in at the link above…