WebSphere Application Server

JSR-352 (Java Batch) Post #40: Coding a Skippable, Retryable, Restartable, Partitionable Reader

By David Follis posted Wed May 01, 2019 07:36 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

How about “Listen, Learn, Read On” by Deep Purple? 

An ItemReader only has four methods:  open, readItem, checkpointInfo, and close.  The open method opens a connection to a datasource or a file.  The readItem method reads one record and puts relevant information into an object to be handled by the processor.  The checkpointInfo method returns whatever serializable object contains your checkpoint information.  And the close method closes the connection or the file.  Seems pretty simple really. 

And it can be, unless you want to be able to do skip or retry processing, or you want to restart a failed job, or you want to use partitions.  Then you need to be a bit more careful. 

Since we’re on a stretch of posts about partitions, we’ll consider this aspect first.  A basic reader is just going to create a connection or open a file and just read until it runs out of stuff to read and then stop.  A partitionable reader needs to know where to start and stop.  For a file that means knowing which record in the file to start on and which one to stop on.  For a database table, it assumes the records are sorted in some order and, again, it knows where to start and stop.  All this information can be put in PartitionPlan properties that are different for each partition.  When your reader is used in a simple chunk and not a partitioned one, choose defaults for the properties that make it clear you want to start at the beginning and end at the end. 

To handle retry with rollback or a job restart (we haven’t talked about restart processing yet), it all comes down to the checkpoint data.  When the step starts for the first time, there won’t be any checkpoint information, but on a restart or retry it will contain whatever you put in the checkpoint object at the last checkpoint commit.  To pick up where you left off, you need to be sure to put whatever information you will need in that object.  This is probably just the record number or whatever you are using to track where you are between the start and end of the range you are processing.

Handling skippable and retry without rollback scenarios is pretty easy if you have all that working.  Before your reader throws an exception that it knows will be listed as skippable or retry-no-rollback, just be sure to adjust how you are tracking your current location so you do the right thing next time around (increment forward or backwards as necessary). 

If you are trying to write a ‘generic’ reader from some datasource, give some thought to whether you want to support it being used in a partitioned step and plan ahead.  Spend some time considering how it will handle skips, retries, rollbacks, and restarts.  Somebody else might take your reader and use it in JSL they are writing.  Make clear to them how it can be used and what it expects in terms of parameters and how it behaves on errors.  Or just put your phone number at the top and expect to get a call at 3:00 in the morning.