WebSphere Application Server & Liberty

JSR-352 (Java Batch) Post #96: SMF Recording

By David Follis posted Wed June 24, 2020 07:55 AM

This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.

To start at the beginning, follow the link to the first post.

The next post in the series is here.

This series is also available as a podcast on iTunesGoogle PlayStitcher, or use the link to the RSS feed

If you’re running a batch workload on z/OS, you’re worried about (at least) two things:  how long (elapsed time) do the jobs take and how much CPU the jobs use.  This might be because you’re trying to keep the batch workload inside a particular part of the day (in the infamous “batch window”) or for financial reasons (four hour rolling averages, internal chargeback, or whatever).  Tracking everything you need to know about your batch workload is most easily done by processing the SMF Type 30 records z/OS writes for the batch initiator address space where the job ran. 

But what about Java Batch jobs running inside a Liberty server?  You will, of course, still get Type 30 records for the server address space.  That doesn’t help very much though because one server could run multiple jobs…concurrently. 

WebSphere Type 120 records to the rescue!  Liberty writes SMF 120 subtype 12 records for Java Batch jobs.  A record gets written at the end of the job, at the end of each step, at the end of each partition, and the end of each flow.  Like the type 30’s you’d examine for traditional batch applications, the 120-12 records contain information about elapsed and CPU time.  They also contain information about how the job or step ended (batch and exit status values) plus information to help you identify the specific job (and step, flow, etc). 

What they don’t contain that the type 30’s have is information about I/O and memory usage.  For a traditional batch job, the job occupies the entire address space, so all I/O and memory usage belongs to that job.  Liberty Java Batch jobs are sharing a Java Heap and the rest of the resources of the address space and it isn’t possible to attribute usage of those shared resources to a particular job.

CPU accounting for Java Batch jobs isn’t even simple.  Remember our multi-threading capabilities with partitions and split/flow constructs?  The CPU recorded for a job is actually only the CPU used by the ‘main’ job thread.  To find the total CPU for the whole job you really need to hunt down the end of partition and end of step records and sum up the CPU time from those to determine the total CPU time used. 

Why doesn’t Liberty do that for you?  Well, remember that partitions can spread across multiple servers which could be located on different z/OS images which could be running on different physical hardware.  To calculate the correct ‘total’ CPU for the job you need to normalize the different CPU times.  Do you trust us to do that right?  Didn’t think so….