WebSphere Application Server & Liberty

 View Only

JSR-352 (Java Batch) Post #111: Batch Performance – Partitions

By David Follis posted Wed October 07, 2020 07:48 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

The next item of interest was how use of partitions can reduce elapsed time.  In this experiment we used the same reader/writer (no processor) we started with and continued to process 10,000,000 records with an item-count value of 1000.

What we changed was the level of concurrency.  In all our runs so far we used a single thread to process all the records.  This time we’ll split up processing of the records across multiple threads.  We’ll begin with just two partitions each processing 5,000,000 records and proceed to 10 partitions processing 1,000,000 records each.

Our partition mapper arranged for each partition to use different primary key value ranges for the database inserts/deletes to avoid contention.  In a real application we’d hopefully be inserting (or deleting) different rows, but how the table is locked (row etc) could impact you.

Remember that just because we have 10 partitions doesn’t mean all 10 will run concurrently on 10 separate threads.  Instead 10 work items will be queued up inside the server and will be picked up when a thread is available to run the work.  Liberty will dynamically adjust the number of threads it has to try to handle the work, but we don’t pre-configure the server with some number of threads.

After 10 partitions, we’ll proceed to 20 and then 100 and finally 1,000 partitions each processing a single chunk of 1,000 records. 

From the results we could see that the first increment to two partitions nearly cut the elapsed time in half (137 seconds down from 272).  Moving to 10 threads cut the time to not quite 1/10 of the single threaded version (32 seconds instead of 27).  But 20 partitions almost cut the time in half from 10 partitions (down to 17 seconds).  But our gains began to max out as going to 100 partitions didn’t even cut the time from 20 partitions in half (about 10 seconds).  Why is that?

Most likely it is because the server can’t spin up 100 threads very fast and instead just re-used existing threads to process different partitions sequentially.  And when we moved to 1000 partitions the step actually took longer than with just 20 (23 seconds).

I should point out that these experiments were run on a dedicated LPAR.  Nothing else was running on this OS image, nothing else was touching DB2.  The system I was running on was a z14 with 20 GPs.  Clearly a real-world test would involve some contention for resources and affect the results.

Nevertheless, it seems quite clear that, when possible, the use of partitions can dramatically reduce elapsed time for a step.