This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.To start at the beginning, follow the link to the first post.The next post in the series is
here.
This series is also available as a podcast on iTunes, Google Play, Stitcher, or use the link to the RSS feed.
-----
If you’ve taken a look at our sample PartitionMapper from last week you might have noticed a couple of things we didn’t talk about relating to values set into the PartitionPlan.
The first of these is a couple of places where we set the partition count to zero using the setPartitions method. This is a special case you might need to consider. What should you do if you get into your mapper and find it has no data to process? If this is a surprising condition (what do you mean the table of store locations is empty?) then you probably want to throw an exception and fail the job. Something is obviously horribly wrong.
But there might be cases where this is totally normal. Suppose this job runs every day and processes any files it finds in a certain location. It might be completely reasonable for there to be no files there (maybe today is a holiday). In that case, you really didn’t want to run this step at all. You could have a batchlet step that runs before the partitioned step to figure out if there is any data to process and use a step exit status value and flow control to jump over the partitioned step.
Or you can just let it go and get into the mapper to discover there is no data. Set the partition count to zero in the plan and return it. You don’t need to set any other data into the plan. The batch container will see the zero partition count value and just end the step. Any defined step listeners will run, of course, so you should be sure they can handle the step not having actually done anything.
The second thing you might notice is our call to the setPartitionsOverride method. This has to do with a previously executed job being restarted. If restart processing involves running a partitioned step that has executed already (maybe only partially) in a previous execution, this method tells the container what to do about partitioning.
If you set the partition override value to true, then that tells the container to ignore anything that happened before and just use the partition values found in the partition map you are creating. You want to do this if you just need to start completely over, or if the data you are processing will have been modified so you won’t process it again.
Suppose a partition’s job is to process a particular file. When it is done with the file, it deletes it (or moves it somewhere else). In that case you would just work with the files you find to process because only the ones that are left need processing (yes, there are some windows here you’d need to worry about).
Do you want to say true or false here? It really depends on how your application is going to handle a restart. Are you going to re-process data you already processed? Can you tell? Will checkpoint data in the partitions help you? If so, you want to say false and use the existing partitioning information so checkpoint data will be used. It is a tough call to make and you need to understand how your application will handle a restart to set it to the right value. Be careful because it can matter a lot.