WebSphere Application Server

JSR-352 (Java Batch) Post #27: Writing results with an ItemWriter

By David Follis posted Wed January 30, 2019 09:48 AM

  

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.
------

The ItemWriter has a lot in common with the ItemReader.  It has an open and a close method which get control at the beginning and end of the step (and in some retry cases we’ll get to later).  It also has a checkpointinfo method that gets called at every checkpoint. 

 

As with the ItemReader, the writer can return any serializable object as checkpoint data and that data will be provided to the open method when the writer needs to pick up where it left off after a failure.  If the writer is writing to a transactional resource, then the checkpoint data will be committed to the Job Repository along with whatever application data was written.  However, if the target of the writer is non-transactional (like a flat file) then updates to the file aren’t coordinated with the commit of the checkpoint data.  If that’s the case, on a restart you might have to compare the checkpoint data with what is actually in the file and possibly remove some updates to get things back in sync.

 

Of course the main method of the ItemWriter is writeItems( ).  The ItemReader’s readItem method is called to read one item, but writeItems is plural, emphasizing that it writes more than one.  The signature for writeItems says that it gets a List of objects to write.  As the chunk has progressed with readItem and processItem being called again and again, the results returned from processItem (or readItem if there’s no ItemProcessor) have been piling up in memory.  When a checkpoint is reached, those result items are placed into a List and passed to the writeItems method.

 

Why do it this way?  The specification could have said to just call the ItemWriter for each item as it is processed.  Well, sometimes it can be more efficient to make updates in bulk.  Some databases support bulk or batch inserts that allow you to provide a whole set of updates and make a single call to the database to do them all at once.  This can be much more efficient, especially if there is some network latency or other overhead involved in getting to the database. 

 

Of course, you don’t have to do that.  Your writer can choose to just iterate over the List of objects and write them one a time to wherever you are writing things.  Whatever is most convenient.

 

We’ve waved our hands a lot around the idea of a checkpoint.  Next time we’ll look at some different ways to decide when they should happen.

 

0 comments
5 views

Permalink