This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.
To start at the beginning, follow the link to the first post
The next post in the series is here
You’ll write an implementation of an
ItemReader to fetch data to process in the
ItemProcessor part of the chunk step. There are only four methods to implement. Let’s take a look…
We’ll start in an odd place, with the
checkpointInfo method. This method will get called at every checkpoint to provide information about where the reader is processing its way through the source of the data. It has to return a serializable object because the data will get flattened and persisted in the Job Repository along with other information about the progress of the step (or partition when we get to those). What information should be in the
checkpointInfo? Anything the reader needs to pick up where it left off. That might be a record number or an account number or an offset into something. Whatever you might need to know where you were. And it might be that nothing works because the data can change underneath you and you just have to start over.
checkpointInfo you provide will be given to the
open method when it is called. When a job is run the first time the parameter will be null because there is no data from a prior execution, but if you provide
open method should be prepared to use it. That might mean positioning a cursor or reading your way into a file to a particular record or whatever is appropriate for your data source. The
open method runs inside a transaction that is separate from the chunk transaction so anything you do in this method will commit before chunk processing starts.
Of course, the
open method is matched by a
close which is your opportunity to close files or database connections or whatever you did in open processing to establish your data source. Close processing also happens inside a transaction that is separate from the chunk processing transactions.
readItem method is where you….read an item to process! This method gets called every time through the chunk processing loop and is expected to return an item to be processed. Be sure to advance however you are keeping track of where you are in the data so your checkpoint information will be current.
readItem is called repeatedly inside a transaction that will commit at the end of the chunk. If your cursor that is keeping track of where you are closes on a commit you will lose your place. You can get around that by configuring the data source you are using as non-transactional so the commit won’t affect your cursor.
That’s it for this time. Next post we’ll look at the