WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #83: The Multi-Server Batch Configuration – Partitions

By David Follis posted Wed March 25, 2020 07:34 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

We’re still talking about dispatchers and executors, but let’s go back for a moment to our discussions about the Java Batch specification and recall how partitioned steps work.

A partitioned step allows you to run multiple copies of the same step concurrently.  You can specify different values for the properties that get injected into the elements of the step (such as the ItemReader) for each partition.  This allows you to tell partition one to start at record zero and process to record 1000, partition two to start at record 1001 and process to record 2000, etc. 

Assuming contention problems or system resource limitations don’t cause problems, you can significantly reduce the elapsed time required to process a set of data by processing different ranges of it concurrently on separate threads. 

But what if one server doesn’t have the capacity (heap, or CPU available on the system where the server runs) to actually run the partitions concurrently in an effective way (or maybe at all!).  Wouldn’t it be cool if we could get the partitions to spread out across several servers? 

Ok, now remember all that dispatcher/executor stuff we’ve been talking about.  It turns out that if you add some configuration to an executor that allows it to also act as a dispatcher, we can do this.  When an executor server reaches a partitioned step, it checks to see if it can also act as a dispatcher for partitions.  If it can, then a message is created for each partition and put on the specified queue. 

Same queue as the job queue or a different one?  Turns out it doesn’t matter.  Either way works, so do whatever makes sense for you.  There’s a property added to the message by dispatchers and executors-acting-as-dispatchers to indicate whether the message represents a ‘job’ or a ‘partition’.  This allows you to configure executor servers to just select jobs for a given application or just partitions or both. 

You might have three tiers of servers:  dispatchers, executors that run jobs, executors that run partitions spun off from those jobs.  And remember that executors and dispatchers communicate through the message queues and the Job Repository, but don’t have to be co-located on the same OS image.  Which would let you spread partition messages to servers scattered across different physical systems and separate heaps, letting you run more, larger, partitions concurrently, hopefully reducing overall elapsed time.