WebSphere Application Server

JSR-352 (Java Batch) Post #39: Creating a Partition Plan Dynamically with a PartitionMapper

By David Follis posted Wed April 24, 2019 07:46 AM

  
This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed
-----

A song about dynamic partition planning.  That’s a tough one.  Maybe a song about breaking up?  Perhaps “50 Ways to Leave Your Lover” by Paul Simon?

Last time we saw how to create a partitioned step just using syntax in the JSL.  It is pretty easy to do and makes it clear how many partitions you want and what properties get passed to each copy of the step.  Meanwhile, out here in the real world, you often don’t know ahead of time exactly what you’re going to want to do.

Fortunately, instead of specifying everything explicitly in the JSL, you can just point to an implementation of the PartitionMapper interface.  The mapper has one method, mapPartitions, that returns a PartitionPlan object.  That means you need to create your own class that implements the PartitionPlan interface.

The interface defines methods to return all the information you specified in the JSL:  the number of partitions, the number of threads, and the properties to go to each partition.  A PartitionMapper looks at the data it is going to process and makes some sort of decision about how to split things up.  You might know ahead of time that 10 partitions are how many you want and determine the data ranges for each partition to process dynamically.  Or you might know that you want each partition to process 100,000 records and decide how many partitions you need based on the amount of data you have, possibly restricting it to no more than 10 threads running your partitions at a time.  These numbers are just examples.  You need to determine what works best for you based on your application and your data.

How do you specify those properties for each partition?  By creating an array of Java Properties objects, one object in the array for each partition.  Then you can set whatever properties names and values you like into each Properties object and be assured that the value of property called starting_record in the zero-slot of the array will be given to the first (remember we start counting at zero) partition.

One gotcha…. Remember that names and values in Properties objects are Strings.  Use the setProperty method to put values into the object.  But because Properties inherits from HashMap, you may be tempted to use the put method to do something like this:

 

myProperties.put(“start_record”,1000);

 

You think that sets the start_record value to 1000, but when the property value is used, that isn’t what you’ll get because it has to be a string.  Do it like this:

 

myProperties.setProperty(“start_record”,”1000”); 

 

It is a bit awkward because you’ll have it as a number in the PartitionMapper and want it as a number in the ItemReader, but you need to make it into a string to get it there.  

How do I know this is a tempting thing to do?  Well….. 

0 comments
13 views

Permalink