This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.
To start at the beginning, follow the link to the first post
The next post in the series is here
ItemWriter has a lot in common with the
ItemReader. It has an
open and a
close method which get control at the beginning and end of the step (and in some retry cases we’ll get to later). It also has a
checkpointinfo method that gets called at every checkpoint.
As with the
ItemReader, the writer can return any serializable object as checkpoint data and that data will be provided to the
open method when the writer needs to pick up where it left off after a failure. If the writer is writing to a transactional resource, then the checkpoint data will be committed to the Job Repository along with whatever application data was written. However, if the target of the writer is non-transactional (like a flat file) then updates to the file aren’t coordinated with the commit of the checkpoint data. If that’s the case, on a restart you might have to compare the checkpoint data with what is actually in the file and possibly remove some updates to get things back in sync.
Of course the main method of the
writeItems( ). The
readItem method is called to read one item, but
writeItems is plural, emphasizing that it writes more than one. The signature for
writeItems says that it gets a
List of objects to write. As the chunk has progressed with
processItem being called again and again, the results returned from
readItem if there’s no
ItemProcessor) have been piling up in memory. When a checkpoint is reached, those result items are placed into a
List and passed to the
Why do it this way? The specification could have said to just call the
ItemWriter for each item as it is processed. Well, sometimes it can be more efficient to make updates in bulk. Some databases support bulk or batch inserts that allow you to provide a whole set of updates and make a single call to the database to do them all at once. This can be much more efficient, especially if there is some network latency or other overhead involved in getting to the database.
Of course, you don’t have to do that. Your writer can choose to just iterate over the
List of objects and write them one a time to wherever you are writing things. Whatever is most convenient.
We’ve waved our hands a lot around the idea of a checkpoint. Next time we’ll look at some different ways to decide when they should happen.