This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.To start at the beginning, follow the link to the first post.
The next post in the series is here
.This series is also available as a podcast on iTunes, Google Play, Stitcher, or use the link to the RSS feed.
As we continue our look at Kubernetes Jobs, this week we explore yet another configuration option: spec.completionMode. You can specify one of two values: Indexed or NonIndexed. The default (and the behavior we’ve been assuming so far) is NonIndexed.
Ok, so what happens if you specify an Indexed Completion Mode? Well, it ties into the value you specify for spec.Completions. Each started pod is assigned an index value between zero and spec.Completions minus one (so if you set Completions to 5 you’ll get index values from zero to four). To finish the job you need one successful completion for each index value.
This is really kind of the same as just setting Completions to a value bigger than one. You need that many successful completions to finish. The difference is just that each started pod gets an input value that tells it which completion it is trying to achieve. And you could use that value as an index into something.
When we talked about Completions bigger than one last week, we said that you might process through a bunch of records trying to successfully process some number of them. With indexing you could use the index to tell you which record you are trying to process, not just whatever is ‘next’ (however you are keeping track of that on your own).
Your pod would get started with an index of zero to process the zeroth (is that a word? Apparently.) record. It would, if configured, get restarted over and over until it completed successfully when another pod would get started to process the first record, and so on.
Or, the index might just be a key that each started instance uses to find parameters that tell it what range of records to process. I guess in our discussion last week that might have been an option too…each started pod might process 1000 records (instead of just the one I discussed). Using indexes as a key into some sort of parameterization table might let you customize the range. But you do still need to know in the YAML how many successful completions you need.
In that sense it feels a lot like having JSR-352 JSL for a partitioned step with the partition plan in the JSL. You have to know from the start what you’re going to do. Well, and a partitioned step runs things concurrently. What we’ve described here all happens one at a time. Next week we’ll get concurrent…