WebSphere Application Server & Liberty

 View Only

JSR-352 (Java Batch) Post #168: Kubernetes Jobs – Concurrency

By David Follis posted Thu January 27, 2022 09:09 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

Finally, we’re going to talk about concurrency with Kubernetes Jobs.  This is done by just setting spec.parallelism to the maximum number of concurrent pods you want to allow.  The default is obviously one.  As with concurrency in Jakarta Batch, this doesn’t mean that you will always get this many concurrent pods running at once, just that you won’t get any more than this.  Usually. 

In a case where you want more than one successful completion (indexed or not) you could specify parallelism up to the number of completions and get multiple copies of the pod running at the same time (at least possibly) until you reach the required number of successful completions.  This is much more analogous to the partitioned step from Jakarta Batch.

But the documentation for Kubernetes Jobs goes one step further and encourages you to think of each pod instance essentially as a server.  The pods get started concurrently and the completion value is equal to whatever parallelism value you set.  Each pod spins up and starts processing whatever it processes, somehow coordinating with the other instances.

The tricky part is shutting down.  Somehow the different pods need to figure out that it is time to go away.  Maybe when they run out of data?  Anyway, as soon as a pod ends successfully, that counts as a completion, so no new pods get created.  Presumably the other pods are ending too.  Once all of them are done you’ve reached your completion count and the job is over. 

Of course, if something bad happens and a pod completes unsuccessfully, a new one is started to run that part of the job again.

This seems like kind of a big deal to the Kubernetes Job stuff, but it feels a whole lot like just a set of servers based on some cardinality value and Kubernetes is keeping that many servers active. 

I mean sure, you could say the work being done is part of a ‘job’ because they are reading records or processing database rows in some range and not handling HTTP requests.  And maybe, because the ‘servers’ end when they figure out they’ve run out of things to do (something HTTP servers don’t usually do), it is kind of a job.

But it feels like a bit of a stretch to me.