WebSphere Application Server & Liberty

 View Only

Jakarta Batch Post 124: Multiple Readers

By David Follis posted Wed February 10, 2021 08:30 AM

  
This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed
-----

The issue can be found here.

We talked about this idea a few weeks ago (post #103).  The original specification allowed for one reader to be defined for a chunk step, but what if you need to read data from multiple sources to merge together (or something) in the processor?  Sure you could build a single ItemReader that just reads from both data sources, but you might already have ItemReaders you regularly use that read from each source and it seems like you shouldn’t have to write a third one to use them together.  Maybe.

What would this look like?  Would you just have two reader elements inside your chunk element?  Presumably you could have more than two if we allow more than one.  Is there an upper limit?

How would you know when the chunk step is complete?  Would it end when 1-of-N readers returned a null or would all the readers need to run out of data?  Maybe that should be an option for the chunk definition?  Could it be a reasonable thing for a reader to just ‘pass’ on having an item this time around?

What about listeners?  Would a read listener get control around each ItemReader or would we need some way to define which listener(s) go with which readers?  Or at least a way to indicate to a listener which reader’s results it was being passed?  Or would a new read listener get control after all the readers were done and get their collected results in a list (like the ItemWriter)? 

How does skip handling work?  Does the skip limit apply to each reader or collectively to all of them?

What about metrics?  Right now we count how many items were read.  Would you need separate counts for each reader or sum the counts together? 

The properties problem we talked about in an earlier post is solved here by each reader having its own set of properties.

This is clearly a pretty cool thing to be able to do, and it seems like a pretty common problem (multiple input data sources) to have.  But as you can see there are a lot of details that need to be sorted out (and I’m sure I missed some). 

If we allow multiple readers, should we also consider multiple processors?  Writers?

As always, please feel free to join in the discussion at the link above.
0 comments
8 views

Permalink