WebSphere Application Server

JSR-352 (Java Batch) Post #38: Static Partition Control

By David Follis posted Wed April 17, 2019 08:05 AM

  
This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed
-----

A song about control?  What else, “Control” by Janet Jackson…

I’ve tried to avoid spending posts going through syntax details.  You can get that from the specification.  But the best way to explain static partition control is by looking at the syntax of specifying a partition in JSL.  Try to stay awake.  Next week we’ll look at writing code to dynamically define the partitioning.  Doesn’t get more exciting than that…

A step becomes a partitioned step by including the <partition> element inside the <step>.  It looks, roughly, like this:

<step id=”step1”>
   <chunk >
      ..chunk stuff goes here
   </chunk>
   <partition>
      ..partition stuff goes here
   </partition>
</step>


There’s a lot of different ‘partition stuff’.  This time we’ll just look at the partition plan.  The plan is where we tell the batch runtime how many copies of the step (how many partitions) we want and what properties to pass to each copy.  Here’s an example:
 

<partition>
     <plan partitions=”2” threads=”2” >
          <properties partition=”0”>
               <property name=”startRec” value=”0” />
          </properties>
          <properties partition=”1”>
               <property name=”startRec” value=”1000” />
          </properties
     </plan>
</partition>


There’s a lot in there.   First of all, we’ve decided we’re having two partitions.  We also tell the runtime that both of them can run at the same time on different threads.  If we had 10 partitions, we might tell the runtime to only allow up to five of them to run concurrently (for some reason).  The number of threads doesn’t guarantee you will get that many, just that you won’t get more partitions running concurrently than that.  You might want to break the data up into 10 sets but know that contention causes issues with more than five running concurrently.

Inside the plan we specify the properties to inject into each copy of the step.  In our case the <reader> element in the chunk probably has a property whose value is set to something like “#{partitionPlan['startRec']}”.  That says that on each partition thread the copy of the reader running on that thread will have a value injected for startRec that comes from the partition plan for this partition.

Note that the partition numbering starts with zero.  Be sure to provide property values for each partition.  You also need to be sure each partition gets all the information it needs.  In our example, how does partition zero know when to stop?  It starts with record zero, but it doesn’t see the start value of ‘1000’ that the other partition gets.  Is it hard-coded to only process 1000 records?  Or do we need to provide a stopRec parameter to each partition also?

Defining your partition this way is fine if you know ahead of time what the right partitioning is and what properties to provide to each partition.  A step that copies multiple files might be a good example because you probably know ahead of time which files get copied at this point in the job.  Scenarios with record numbers tend to be less static (what table will have the same number of rows – forever?).  For those cases you should use dynamic partition definition which we’ll look at next time.
0 comments
5 views

Permalink