WebSphere Application Server & Liberty

Jakarta Batch Post 135: Jakarta Batch Extras – Reading CSV Files

By David Follis posted Thu April 29, 2021 08:19 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

As we discussed earlier, you could use the FlatFileItemReader to read a file and you can get control to parse it apart.  If you are reading a CSV file (Comma Separated Values) you could extend what the Flat File reader does to handle those.  In fact there are several existing things you could exploit instead of writing a parser yourself. 

And that’s what both the JBeret and BatchEE folks did.  JBeret offers a JacksonCsvItemReader and also a CsvItemReader.  BatchEE has a JSefaCsvReader.  Now before I peek inside, I just want to remind you that there’s nothing in those that are specific to JBeret or BatchEE (at least that I spotted).  You should be able to use those pretty easily with any JSR-352 implementation.  So if you tell your team lead that you’re planning to use a JBeret item reader in your batch application, don’t let them tell you no because you’re not using JBeret. 

Ok, so digging around inside all three of these reveals that they all pretty much work the same way, which you’d expect.  Like the FlatFileItemReader, they all build on top of base classes that implement some of the drudgery of managing your position in the file for restarts and that sort of thing. 

And, ultimately, they all read and parse records out of the file.  The big differences are essentially in which framework they use to do the parsing. 

The CsvItemReader uses SuperCSV which you can find easily with a quick internet search.  How ‘super’ is it?  Beats me, but it sure looks pretty easy to use.  And the code has a bunch of injected values you can use to customize how it behaves.

Staying with JBeret, there’s also the JacksonCsvItemReader which, as you might expect, uses the Jackson parser.  Also easy to find information on.  Jackson is, I think, more known for being a JSON parser (and BatchEE has a Jackson JSON Item Reader), but it supports other formats including CSV.

And finally, there’s the BatchEE JSefaCsvReader.  No surprise, it uses the JSefa parser (Java Simple Exchange Format API) to handle reading the csv file.  JSefa also handles a bunch of other formats and you can read more about that easily enough. 

So which to use?  Well, there are capability differences and I’m sure some benchmarking would show some performance differences that probably vary depending on your data.  The main thing is that the heavy lifting has been done for you already.  Give ‘em all a try and see which one works best for you. 

By the way, I’m not providing links to any of this stuff because I don’t want to suggest I’m endorsing anything.  I’m trying to remain neutral.  And it is all very easy to find anyway.