WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #103: Creating an Aggregate Reader

By David Follis posted Wed August 12, 2020 08:24 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

The specification allows a chunk step to have a Reader, a Processor, and a Writer.  But what if your application needs to read from more than one source?  Suppose you have a table containing client information (name, address, email, etc) and another table containing loyalty-award-point information.  Both tables use a customer id as the primary key.  You want to write a chunk step that iterates over all the clients pulling out the customer information and award point information so that processing can create customized email text to be sent out by the writer.


Of course, you could just do that by having the reader get the client information record and the processor fetches the matching award point information, but lets say we want to do it all in the reader.

You might already have ItemReader implementations that can read from both tables.  What you really want to do is put two readers in the JSL for this step, but the spec doesn’t allow that.  What to do?

The idea is to create a sort of generic aggregate reader that calls the other two readers and aggregates the results into a combined object to be passed to the Processor.  That sounds simple but turns out to be rather complicated.

First of all, the aggregate reader will get all of the properties that would normally be injected into both readers.  You’ll need to sort out some syntax for the properties so the aggregate can tell which ones go to which reader.  Injection won’t work for the reader’s called by the aggregate, so you need some interface into the existing readers to pass along the values.  A Properties object would work well.

The next thing to consider is checkpoint information.  Each reader will have its own and the aggregate will need to merge them together into a single serializable object and be able to separate it back out in a restart to pass the right information to each reader’s open processing.

The results of the two items read will need to be merged into a joint object to be passed to the process, but that’s just the usual arrangement to work out between a reader and processor.

Finally(?) there are error cases to consider.  What if one reader fails for some reason?  What if, for some reason, one table runs out of rows before the other one?  Maybe a customer has no award points. 

As you can see, this is an interesting idea but tricky to implement.  It would be especially difficult to do as some sort of generic aggregation reader with no specific knowledge of the existing reader implementations it is calling.