WebSphere Application Server

JSR-352 (Java Batch) Post #45: Job Restart Processing – How it works..

By David Follis posted Wed June 05, 2019 07:47 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

In this post we’ll take a look at the mechanics of restart processing.  When we wave our hands around and talk about restarting a job, we generally sort of suggest that it just “picks up where it left off” which is, of course, wrong. 

For a song I’m going to go with “Get Right Back” by Maxine Nightingale, a suggestion for an earlier post from a loyal reader. 

In general, a restarted job starts over again with the first step of the JSL (unless the failing job execution sets a restart step when it ended with a <stop> element).  But that doesn’t mean it actually runs that step (or any other step).  The key is an attribute of the step element called allow-restart-if-complete.  If set to true, then the step is re-run, even though it completed successfully on the previous run.  This sets a new value for the exit status of the step and flow proceeds based on what is in the JSL conditional flow for the step and the new exit status.

On the other hand, if allow-restart-if-complete is false and the step completed on the previous execution then whatever exit status resulted from that run is used.  Put more simply, we skip this step and proceed to whatever step we flowed to last time.

That all means that if you always have allow-restart-if-complete set to false or it is true for some steps, but they produce the same exit status on a new run, that flow through the job will proceed exactly as it did the previous time.  But if a step is re-run and produces a different exit status on this run it is possible the flow could proceed to a different step.  That step might not have run at all in the previous execution.  In which case it has no saved exit status and must be run for this execution. 

That’s all great for steps that completed in the previous execution.  But eventually you might reach a step that was executing when the job stopped or failed.  That step is always run as part of the restart of the job (assuming flow of control reaches that step). 

If the step is a partitioned step, then the individual partitions are restarted as we discussed a few weeks ago. 

We’ll talk about how a chunk step gets restarted in a separate post.