I have been closely linked to the mainframe capacity management for more than four decades, as a performance analyst and author of commercial performance and capacity management products. I continue to be amazed by the mystique and astounding complexity attributed to the job of assuring business workload performance requirements are met. I am even more perplexed by the human and machine resources frequently allocated to optimizing and conserving hardware resource usage.
Capacity and Workload Management
Capacity in and of itself is multidimensional. In most cases CPU capacity is only one dimension of the overall capacity-management process. For many workloads the availability of sufficient I/O bandwidth and central storage is equally, if not more, important.
Some fundamental facts underlying workload performance management have remained constant over time. For example, making more CPU capacity available to a critical business transaction that is I/O intensive isn’t likely to improve its performance. You need a basic understanding of how the really important workloads utilize the primary physical resources (CPU, central storage and I/O bandwidth). This information alone will enable you to approximate how a particular workload is likely to behave in a resource-constrained environment.
What defines a capacity problem? Is running all available processors at 100 percent utilization a bad thing? Is it even a problem? What about all that metric data produced by commercial performance monitors? Do you know what those metrics mean? Should you even care?
Capacity management is ultimately about ensuring reasonable performance goals are met and consequently that an organization’s requirements are satisfied. There are many reasons to care about processor utilization being pegged at 100 percent, but if current business requirements are being satisfied you might not care at all.
Assuming you have assigned reasonable relative importance levels to coexisting workloads, the z/OS workload manager will quite effectively manage workload access to physical resources without any outside help. When the important workloads have been appropriately categorized as such and a particular workload is not achieving its performance goals, in all likelihood the root cause will be insufficient physical resources. There are, of course, situations where other factors come into play, but most often it’s simply the workload requires more resources than are available to achieve the stated performance goal.
Is Managing Hardware Resources Worth the Effort?
In the early days of commercial computing hardware, resources were so expensive that only the largest organizations could afford them. The cost of CPU capacity was typically measured in millions of dollars per MIPS, central storage in dollars per byte, and so on. Furthermore the scalability characteristics of these computer systems were very limited in every dimension: 1 MIPS or less in CPU capacity, 1 MB or less in central storage, etc. In this world—the one in which performance monitors were born—it made sense to conserve and optimize hardware resource utilization at the expense of application workload performance.
Fast forward to today’s reality where massively scalable processor complexes are composed of physical resources that cost a fraction of what they did a decade ago. Does it make sense to expend scarce and costly people resources to squeeze the last ounce of capacity out of currently allocated computing hardware? Not really.
Today people and software account for the lion’s share of IT budgets. More often than not it’s more cost effective for a business to add hardware capacity than to invest significant people resources in an attempt to rationalize current resource consumption levels. At the end of the day, performance management is about satisfying the workload performance requirements of an organization. One needs to avoid getting lost in the details and look at the big picture. In cases where a clear, quantifiable and real cost results in failing to meet workload performance goals, it may well be to the organization’s overall financial benefit to add physical capacity (in one or more dimensions) rather than claw through report after report in an attempt to identify ways to free up or reallocate current hardware resources.
Effective Use of Performance Management Tools
Knowing which workloads are really critical to an organization’s success enables you to concentrate performance management activities on those things that matter. Knowing what resources those workloads proportionally consume on a per-unit-of-work basis can give you a pretty good idea of how they will perform during periods of resource constraint. This also enables you to establish a rough baseline resource profile that can be referenced to diagnose future performance problems should they arise. This is an important point worth emphasizing. Indeed there’s an old diagnostician’s axiom that states, “If you don’t know what it looks like when it’s working then you have little hope of understanding what’s wrong when it isn’t.”
This leads to the topic of performance tools—more specifically software performance monitors. Over the years, traditional performance monitors have become increasingly complex and verbose, producing vast volumes of data and precious little actionable information. Intense competition among vendors of these tools led to feature wars, which in most cases have resulted in products that generate more metric data than any human could hope to transform into actionable information.
Clearly, some significant filtering and distillation of this data is required. When deciding which metrics should be captured, you need to ask: “Will the value of this metric in any way impact the decisions I make relative to the performance of the workloads my organization really cares about?” If the answer is “yes” then by all means collect and analyze it in context with other equally well chosen metrics; otherwise don’t bother to collect it at all.
Performance management tools are just that—tools. Use them to establish baseline performance profiles for your key workloads, and to track them over time. But strive for simplicity. Track only those metrics germane to understanding the behavior of your organization’s “priority 1” workloads. And don’t be distracted by reams of irrelevant (non-actionable) metric data that will probably require hours to analyze.
Final Thoughts and Advice
The bottom line is performance management can be as complex or as simple as you choose to make it. It’s extremely important to maintain perspective. Know how the cost of hardware resources compares to that of people, because this will assist in making quality trade-off decisions. Know what the organization’s critical workloads are and ensure their relative importance is communicated to the z/OS workload manager. And invest the time required to identify the performance metrics relevant to those workloads.
Focusing on the workloads fundamental to an organization’s success and on the key metrics to monitor their behavior will simplify the performance-management process, increase the likelihood of positive results, and lower operational cost as well.
Ron Higgin is an independent software consultant at RHCS. With more than 40 years of experience working with enterprise-class computing systems, most of his career has been dedicated to the design and development of large-scale mainframe software systems, infrastructure and products.