Performance management is a key focus of any platform. A system might also have several other resources (CPU, I/O, storage or networks) that collectively work together to process a job. To assess the system’s overall health, mainframe system reports can be reviewed using certain key performance metrics for these resources. These metrics are measured and compared against service level agreements (SLAs), or performance rule-of-thumb standards. An SLA is a contract between the user and the system that establishes specific goals for meeting business critical workloads. If results aren’t as expected, then mission-critical workloads running on the system will usually suffer. Some possible remedies for this are performance tuning to get metrics results back to base standards, buying more resources, offloading eligible workloads to specialty engines or taking from less-critical workloads to adjust priorities.
Types of Workloads
First, let’s look at the typical workloads that run on mainframe systems. They can be classified into two flavors—batch processing or online transactional processing:
- Batch workloads: Typically submitted as batch jobs, and process high volumes of data while producing outputs or reports. Batch workloads are typically scheduled programs processed without user interaction and often run during off-peak hours. One example of a batch workload is a mainframe job requiring a large number of customer billing statements or customer orders.
- Transactional workloads: Typically involve end-user interactions. These transactions are usually short and are often considered mission critical workloads for the business. A few examples of transactional workloads are bank ATM transactions, merchant credit card processing at checkout stations or online order purchases.
Below are some of key metrics that can be used to gauge system performance. Mainframe systems regularly capture and provide these metrics with data that various performance-monitoring tools use to display to end users.
The average throughput is the average number of service completions per unit time. For example, it might measure the number of transactions per second or minute. Transactional workloads are typically measured using this performance metric.
Average Response Time
The average response time measures the average amount time it takes to complete a single service. This metric is typically used to measure transactional workloads. It can also be used to set SLA goals for workloads.
Resource utilization measures how long a resource was busy. For example, it might measure CPU utilization, processor storage utilization, I/O rates or paging rates. This metric typically determines how much time a workload, batch or transaction spends on resources over a period of time.
Resource velocity is a measure of resource contention. When multiple workloads require one resource (e.g., CPU) at the same time, it creates contention for the resource. While one workload is using the resource, the other workloads are put in the waiting queue. Resource velocity is the ratio of time taken for using the resource (A) to the total time spent using the resource (A) in addition to the total time spent waiting in the queue (B). The formula for resource velocity then becomes: A / (A+B)
This value is expressed as a percentage ranging from zero to 100. A value of zero represents a high amount of contention for a resource, whereas a value of 100 means there’s no contention. This metric can also be specified as an SLA goal for workloads.
As part of an SLA, workloads are classified into service classes and each service class has goal. Goals for workloads can be expressed as response time, velocity, etc. Since there are several types of goals defined for various workloads in SLA, a simple metric performance index (PI) is used to determine how workloads are performing with respect to their defined goals. PI is simply a ratio of defined goals to achieved goals. A PI value of exactly one means workloads are meeting goals, a value of one or more means workloads are exceeding goals and a value of less than one means that workloads are missing goals.
Mainframe systems have several resources that collectively work together to process mission-critical workloads. How well these resources are being used will ultimately determine the performance of your system. There exist five very important performance metrics outlined above that will aid in assessing the health of your system.
Hemanth Rama is a senior software engineer at BMC Software with over 11 years of experience in IT. He holds one patent and two pending patent applications. Hemanth has led several projects, and also works on BMC Mainview for z/OS, CMF Monitor and Sysprog Services product lines. He recently began working on Intelligent Capping for zEnterprise (iCap), a product which optimizes MLC costs. Hemanth holds a master’s degree in computer science from Northern Illinois University. He writes regularly on LinkedIn Pulse, BMC communities and his personal blog.