This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.To start at the beginning, follow the link to the first post.The next post in the series is
here.
This series is also available as a podcast on iTunes, Google Play, Stitcher, or use the link to the RSS feed.
-----
Ah batch that involves messaging. We talked about this months ago with the sample pipeline job I wrote. In that case I had a split flow with the writer of one flow feeding the reader of another flow through a message queue. There’s a lot you can do with this.
Well, first I should go through what JBeret and BatchEE provide since that’s what brought up the topic. There are readers and writers that use JMS, but also ones that work with Apache Kafka and Apache Artemis, and Apache Camel. You can go look those up, but they are all essentially messaging based (argue in the comments, of course, but I’m at the waving-my-hands-around level).
Messaging writers are pretty cool. Basically, you’re converting whatever you read (flat file, database rows, etc) into a message to be processed by something else. The job step is essentially a protocol converter. It can also be a way to get some concurrency. Rather than have the job process record by record, convert them into messages to be processed as message-driven-beans in a multi-threaded application server.
Messaging readers raise an interesting question. When is it done? It is pretty easy to dream up a job that just reads messages from a queue forever, pushing them through the processor as they turn up. But what you’ve really done is created a server and not a batch job. Do what you want, of course, but I think this sort of thing should be run periodically until the queue dries up and then terminate. So every day (or ever hour, etc) the job runs and handles all the messages that have turned up since the last time it ran. Kind of a “your update will be processed within the next 24 hours” kind of thing.
As you might guess, my favorite use for messaging based batch jobs is for pipelining. Multiple jobs, maybe running concurrently, or multiple flows in a split in a single job, with data flowing between the jobs/flows. One flow does some processing, feeds results to the next flow through a message queue, which does a bit more and passes it along.
Rather than run one thing then the next, it all kind of happens at once. Cleanup if bad things happen can get a bit tricky, but it can be handled.
Not that this is a new idea…. I remember people working on BatchPipes on z/OS decades ago.