This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.To start at the beginning, follow the link to the first post.
The next post in the series is here
.This series is also available as a podcast on iTunes, Google Play, Stitcher, or use the link to the RSS feed. -----
We’ve already talked about using Splits and Flows to run different steps concurrently within the same job. Now we’re ready to move on to partitions. With a partition we’re running multiple copies of the SAME step concurrently.
They run in parallel, so we’ll go with “Parallels” by Yes.
The traditional example of this is a step that needs to do the same processing with a lot of records. Suppose you have a table with one record for each account. You need to look at every account and do some sort of processing and then put a result somewhere. Maybe you need to insert a new row into a different table for any account you find that matches some criteria.
A simple chunk step will iterate across every row, do the processing, and, at checkpoints, write the rows that met the criteria. If you have a lot of rows, this could take a long time. With a partitioned step you can break that up into several threads, running concurrently, each processing a different range of the data.
In a later post we’ll look at how to decide how to break up the work and how to communicate with each thread what it is supposed to do. But for now, just assume there is a way and we can take our table of one million rows and process it with ten threads each concurrently processing one hundred thousand rows.
Of course, just because you want ten threads running concurrently doesn’t mean they really will. Depending on the platform and available processors and a host of other factors, they might not run as concurrently as you’d hope. And we’ve also made some assumptions that these threads won’t get in each other’s way. We will have ten threads inserting rows into the same table at the same time. Is that too many? Will the contention doing the inserts outweigh the benefits of concurrent processing? Maybe.
Another thing to remember about partitions is that they can be batchlets instead of chunk steps. You could have a simple batchlet that copies a file and use a partitioned batchlet step to copy multiple files at the same time, each partition copying a different file.
Next time we’ll have a look at the JSL syntax to define a partition and the simplest way to tell each partition what to do.