WebSphere Application Server & Liberty

Jakarta Batch Post 147: Extras – NoSQL databases

By David Follis posted Thu July 22, 2021 08:08 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

What about readers and writers that interact with NoSQL databases (or non-relational databases, or not-only-SQL databases, opinions seem to vary)?  For this we’re looking at things like MongoDB and Apache Cassandra (there are certainly others).  JBeret offers readers and writers to interact with these also.

I don’t want to get into a discussion about database preferences and the pros and cons of relational databases vs non-relational.  Writers of batch applications frequently don’t get any choice.  The data is already wherever it is.  What you’re looking for is a way to get your batch application to read it (or write into it).  Maybe your batch job is just a converter from one datasource to another.  A lot of batch jobs read flat files in various forms and insert records into databases. 

Looking over the code, the two things are quite different.  That makes sense as Cassandra and MongoDB are quite different.  For the reader with MongoDB you specify criteria for the ‘rows’ you want and with Cassandra, well, there’s CQL statements.  But I thought it was interesting that both readers have a ‘batch size’ parameter you can inject.

The batch size controls how much it pre-reads to get ahead.  The results are stored locally to avoid trips back and forth to the database.  On the one hand, you want to read ahead a little bit to avoid making a trip to the database for every row.  On the other hand, you don’t want to store a mountain of data locally.  Some of the readers we’ve looked at earlier appeared to pre-read the entire set of results into memory and then just iterate over it.  The batch size configuration allows you to find a good middle ground between needing too many trips across the network against needing a giant heap. 

This is going to wrap up our review of readers and writers that interact with files or databases.  The next couple of weeks will look at interacting with other sources of data.