This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.To start at the beginning, follow the link to the first post.The next post in the series is
here.
This series is also available as a podcast on iTunes, Google Play, Stitcher, or use the link to the RSS feed.
-----
There are just lots of different ways to process various formats of flat files, and the JBeret and BatchEE folks seem to have created readers and writers to use them all. This week I’ll take a quick look at the ones that use BeanIO.
So what is it? BeanIO (according to the web site) is, “an open source Java framework for marshalling and unmarshalling Java beans from a flat file, stream, or simple String object.” Well that sounds a lot like some other things we’ve seen here. From poking around in their doc it sounds like you can create an XML document that describes the structure of the file you’re reading, or you can just annotate the Java class you want it read into, and it mostly takes care of the rest.
That means the reader and writer take care of the usual sorts of things. First off is the management of position within the file and handling checkpoint data received on a retry or restart. It also handles all the interaction with BeanIO. Of course there are things you can give it, through injected parameters from the JSL, to configure BeanIO so it knows what you want.
All that gets passed to BeanIO as part of the setup (open) and then it just slogs its way through the file, letting BeanIO do all the parsing and whatnot. It feels like you could probably just use this as-is by setting up the right stuff for the file format.
Well, by now you’re probably wondering if you should use this or one of the other reader/writer implementations we’ve looked at. Good question. As I’ve said before, there are probably some performance differences in all these that will no doubt vary with your actual data and structure, so do some testing. There are also a lot of subtle differences in the kinds of things they deal with. If you’ve got nice normal looking stuff with no quirky syntax, then you have more choices, but I suspect there are some odd cases that some of these handle better than others. On the other hand, more configuration options generally slows things down, so simple is sometimes better…and easier to maintain.
I tend to look at all these different reader/writer implementations from a different perspective…it isn’t so much..I have a flat file in some format and I need to read/write it. I think it is more, I was told to use this tool to process our file in this format and I need a reader/writer to do that…and JBeret and BatchEE provide a lot of them. Or at least a pretty good starting point.
If you are coming in cold and need to figure out which way is best to read/write your particular file format…you have a lot of options and you’re going to need to experiment a bit.