WebSphere Application Server

JSR-352 (Java Batch) Post #44: Play it again, Sam! Restarting a failed job

By David Follis posted Wed May 29, 2019 07:33 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

First of all, yes, I know that’s not really a quote from Casablanca, but the actual quote (“Play it once, Sam”) doesn’t work as a title for a post about restarting jobs. 

Before we get into the details of how job restart processing works, we need to get our heads around a few basic concepts.  Then we can look at what makes a job restartable, and how you restart it.

To begin with, when you start a job it creates something called a Job Instance.  This represents the submission of this job at this time.  If you submit a job every Tuesday at noon, then every Tuesday at noon you get a new Job Instance.  As the job starts to run, it creates a Job Execution.  A Job Execution represents an attempt to, ahem, execute a particular Job Instance.

Which means, if a job fails and you restart it, you get a new Job Execution for this Job Instance.  If you give up and just submit it again, then you get a new Job Instance with its own Job Execution.  Multiple executions for a single instance mean you have restarted that job. 

To restart a job, it has to be restartable.  I know….seems obvious, but what does that mean exactly?  Well, first of all it has to be in a state that makes it restartable.  That means the job was started, created an Instance and an Execution, and the Execution’s batch status is either STOPPED or FAILED.  A job can only get into those states if it was stopped using the Job Operator Stop operation or if it threw an unhandled exception. 

There’s an attribute of the JSL that matters too.  In the <job> element in the JSL there is a restartable attribute that defaults to true.  If set to false, then the job cannot be restarted regardless of its state. 

The actual act of restarting a job requires calling the restart method (instead of the start method) on the Job Operator API.  Restart requires you to specify the identifier of the execution you wish to restart.  This has to be most recent execution id for a job instance.  Which means, if a job fails and is restarted multiple times, each restart has to specify the execution id of the previous failure.  You can’t just restart a job instance, you have to restart an execution after it has failed.  Which means you can only restart a particular execution once.  After that you get a new execution and, if it fails, you have to restart that one. 

That’s enough to get started on restart..