WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #43: Restarting a Partitioned Step

By David Follis posted Wed May 22, 2019 08:23 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

We haven’t gotten to how you restart a failed job and how all that works (soon), but while we’re talking about partitions I thought I’d have a quick look at when job restart happens to a partitioned step.

Oh right, we were doing song titles to go with these.  Let’s see…a song about restarting after a failed partition (or a break up)?  Must be thousands of those…take your pick.

The key thing is the partition properties set up by the PartitionMapper.  Do you use the same properties at on the original run of the job or do you create new ones?  If you are dynamically figuring out how to partition things and the underlying data might have changed since the original job ran, do you want to re-partition or stick with the original decision?  Could you miss records using the original scheme?  Do you care? 

The PartitionPlan returned by the mapper tells the batch container what you want to do.  The value returned from the getPartitionsOverride method on the PartitionPlan determines the behavior.  If your plan returns ‘true’ then you are going to provide a whole new plan.  You can have more or less partitions than you originally had.  You can supply new properties.  Basically, the step starts over.  This is easy, but if your first attempt at this step processed (and committed) some changes you might be processing those records again.  Maybe that’s ok and maybe it isn’t.  Or maybe you can tell and set up the new partition properties to do the right thing. 

If the partition override value is false, then you want to pick up where you left off in the last try.  Each partition, if it didn’t complete, will start using the checkpoint data from the previous run.  However (and this is a big deal), your plan MUST be set up to return the same number of partitions as it did the previous time and the same properties assigned to the same partitions.  That means if the prior run gave partition 0 a property of ABC=”1” and partition 1 a property of ABC=”2” then those same partitions have to get those same properties. 

The property values from the previous run aren’t automatically provided to the partitions.  You have to remember what you did and do the same thing again.  If you are dynamically determining how to partition things, and the environment might have changed, this could be challenging to get right.