WebSphere Application Server & Liberty

Jakarta Batch Post 133: Apache BatchEE – FlatFileItemReader

By David Follis posted 22 days ago

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

Lots of batch jobs read records from a plain text flat file.  There might be some structure to the records using a separator like a comma or a space or some other character.  Or it might throw back to the punch card days and be organized by column.  Or perhaps some fields are variable length and preceded by a length.  All these possibilities made me wonder how the BatchEE folks could produce a common flat file ItemReader.

Well, what they realized is that no matter what the structure of the record there are some common things you need to do.  The first of these is just keeping track of what record you are on.  To do this they actually extend another BatchEE class called CountedReader.  The CountedReader tracks the record being processed and provides the count as checkpoint information.  On a restart it handles positioning itself in the flat file at the next record to process based on the checkpoint information.  This is common function you’d have to provide no matter what kind of flat file you’re processing, so why not have a standard implementation to extend?

But the CountedReader doesn’t actually open the file or read records.  It assumes that will be handled by some extending class that also implements a doRead method to actually read the records.  Our FlatFileItemReader does just this.  And that is where the injected properties come in that tell us what file to open.

FlatFileItemReader also allows you to specify, via injected parameter, a column one character that indicates a comment row.  Any record read that begins with this character will just be skipped.  The default is the pound symbol ‘#’ (aka hash, number, or octothorp (yeah…really..new one for me)).

All this gets us the basics of managing and reading through a flat file as part of a Jakarta Batch job, but what about the format of the file?  By default the FlatFileItemReader will just return whatever it reads as a String.  That’s probably fine if your processor can deal with it that way. 

If you need to do some parsing of the record, you can implement your own LineMapper (another BatchEE interface) with a map method and the FlatFileItemReader will call it after reading each record.  The map method gets passed the String that is the record read, along with the line number in case that helps somehow.  Your map function can then parse away and return any Object it likes for the processor to handle. 

Which means that all you really need to do to read a flat file in any format is write the bit of code that pulls the record apart.  The BatchEE code handles all the boring stuff for you.