Editor’s note: This article is based on content initially published in the SHARE President's Corner blog.
The term “tip of the iceberg” refers to a small, evident part or aspect of something largely hidden. At this juncture, this is an ideal term to describe IBM System z Advanced Workload Analysis Reporter (IBM zAware).
IBM zAware is a program designed to automatically assist in mainframe troubleshooting by analyzing minute details in systems logs as well as other data relevant to systems and application performance to isolate problems and help discover the cause of anomalous behaviors. It analyzes vast volumes of messages, comparing message traffic during known-good operations to message traffic during the current time period. When a performance or reliability issues occur, IBM zAware points operations to any unusual message traffic. In this respect, zAware is essentially an analytic message-based troubleshooter. This program is exciting from a number of angles:
It can help IT managers identify the root cause of a problem faster than using a traditional trial-and-error guesswork approach. It’s especially useful for identifying needle-in-a-haystack type problems.
It can help preempt snowballing degradation from impacting the system further. It does this by flagging traffic that looks unusual and helping programmers quickly direct their focus to that behavior.
It can improve operational efficiency by showing mainframe managers where problems reside. These managers then spend less time trying to find problems and more time fixing them.
IBM zAware opens the door for more and more automated management/analysis programs that will greatly simplify mainframe management over time. For instance, its XML output is now consumed by Tivoli, which can potentially add additional troubleshooting. Further, NetView can use IBM zAware data to improve problem determination.
This fourth point is very important. Today’s mainframes generate a lot of message/status traffic—far more than human managers can track and analyze. So, with IBM zAware, IBM has applied its vast analytics knowledge to message-traffic analysis—an effort that will initially streamline mainframe troubleshooting but has the potential to be expanded across the entire mainframe management environment.
In short, analytical problem analysis could be used in the future to streamline configuration management, to tune applications (advanced application performance management), for security deployment and operations—and more. The net result could be that someday, by using IBM zAware and related analytics programs, the mainframe could practically manage itself, with very little human involvement. (Mainframe managers wouldn’t “go away,” instead, they would take on new tasks that align business operations with underlying technology to achieve new levels of efficiency in service delivery).
How Is IBM zAware Being Used?
Unfortunately, at this time, few customers have enough experience with IBM zAware to provide many use case scenarios. The primary reason is that the product has only been recently released, and it’s necessary to capture at least 90 days of system information in order to establish a baseline for “normal operations.” Customers are just starting to deploy IBM zAware now.
Still, IBM was able to provide me with a few use cases based on internal usage. The first is based on a development System z deployed in Poughkeepsie, NY. In this case, an IBM IMS database was having a problem. The problem message informed database managers that the definition of a data file was missing.
This particular problem was generating a lot of message traffic—but the cause was unknown. IBM zAware had been deployed on this system during its development. Accordingly, it had a good understanding based on the data that had been gathered about what the environment looked like when operating correctly.
By analyzing the difference between the known-good configuration data and the problem at hand, IBM zAware was able to show IT managers that a configuration mistake had been made by operators who had reconfigured virtual telecommunications access method (VTAM)—the subsystem that implements communications.
If this problem hadn’t been isolated by IBM zAware, it had the potential to cascade, possibly causing a VTAM communications failure. As mainframe operators know, failures are unacceptable on mainframes. So, in this case, IBM zAware helped identify a potential failure that could have caused a major mainframe communications outage.
IBM also provided another internal scenario: a lightweight directory access protocol (LDAP) server failure. In this case, an LDAP directory server in a test environment would ABEND (abnormally end) and then restart. IBM zAware was used to isolate the cause of this failure and found a message that described the root of the problem. This message was overlooked by systems administrators, but IBM zAware was able to “highlight” that this particular LDAP server needed to be reconfigured with more storage, such that the problem could be debugged.
Although two use cases isn’t much to go on, they do represent a start. They show how answers to problems can be found by analyzing message traffic—and how IBM zAware can be used to sift through mountains of message traffic to quickly identify issues problems—leading to more rapid problem resolution.
The Bottom Line
When problems occur, systems administrators need to get a handle on the source of the problem and be positioned to fix it rapidly. And some mainframe shops might not have the right skills in the right place to perform in-depth root-cause problem determination. This is why a tool such as IBM zAware is so valuable: It saves time while helping to address skills-shortage issues.
Based on what we’ve seen to date, and what I believe the future of this product could be, I think we’ve only seen the tip of the iceberg when it comes to machine-driven advanced analytics management tools.
Joe Clabby is a 30-plus year veteran of the IT industry with experience in sales, marketing and research/analysis. He currently focuses on consolidation, virtualization and provisioning of IT resources.