AIOps on IBM Z - Group home

Monitors, Analytics, and AI ... Oh My!

  

Delivering high quality service to end users is the primary mission of IT organizations.  The importance of this imperative is steadily increasing as the world becomes ever more in entrenched in a technology driven digital economy.  Thankfully, more tools than ever before are available to the systems programmers, admins, and operators tasked with maintaining SLA's.  But what are all these tools and what capabilities do each of these tools provide that assists in delivering the expected quality of service?  I divide these tools into the following broad categories; Monitors, Analytics, and Artificial Intelligence.   

 



Monitors

Monitors come in two flavors, performance monitors and application performance monitors (APM).  System performance monitoring is a well-established need.  Market leading products such as OMEGAMON having a long history of providing real-time insight into system performance.  The metrics observed are largely resource data; feeds, speeds, and utilizations of the related infrastructure components.   Performance monitors are robust data collectors, with a large number of metrics available to understand every aspect of operating system and subsystem performance.  The collected metrics can be made available historically, enabling trending that is useful when comparing current performance to a baseline.  The characteristic of performance monitors which distinguishes them from other tools is real-time access to data.  Not only can data be displayed, but alerts and actions based on the evaluation of real-time metrics are possible.  These capabilities make performance monitors the tool of choice for operations war-room calls where service must be immediately restored. 

Application Performance Monitors are a much later addition to the arsenal.  They focus on understanding application health and responsiveness across the entirety of an enterprise application.  Application architectures now span multiple systems, cloud and on-prem, of dissimilar types, with both internal and external network connections. Representing the entirety of a complex application on a single pane of glass is where the APM's shine.   While many APM's support distributed platforms, very few support z/OS environments.  This is where IBM Z Application Performance Management Connect and IBM Observability by Instana Application Performance Monitoring on z/OS_1.1 fit into the puzzle.  These products track the execution of an application through mainframe systems and subsystems, in addition to off-platform nodes in the application topology.   

The majority of business transactions executed today rely on mainframe backend processing, making insight into a workload’s execution on z/OS a critical component of an APM.

 

Analytics

Analytics engines are very powerful tools.  They allow for searching and visualization of vast quantities of data, turning that data into useful information. In recent years, these tools have been used to provide insight into IT operational data.  Modern analytics tools simplify the use of ad hoc queries, lessening the reliance on canned reports.  Dashboards and custom visualizations intended for specific people and purpose are easily created.  The technology which has enabled analytics engines to become valuable IT operations tools is known as data streaming.  Data streaming collects key systems information, system and application logs, and user specific data.  The streamer then delivers this data directly to your analytics platform.  This takes place in what we refer to as ‘near real-time’.  Armed with data that was traditionally not available until the following day, operators and IT analysts can respond to system and application slowdowns just moments after they occur.  The paradigm shift from traditional historical data to near real-time is profoundly important when considering today's emphasis on system availability. Waiting until tomorrow for diagnostic data is simply no longer acceptable.  IBM Z Operational Log and Data Analytics is IBM's latest entry in this arena.  IBM Z Operational Log and Data Analytics provides both the data streaming component and analytics visualization capabilities.

Additionally, this solution is fully integrated with IBM’s artificial intelligence solution focused on operational data.

 

Artificial Intelligence

So much data, so little time!  The mainframe has always been rife with management data.  The piece that was lacking was a timely efficient mechanism for processing all of that management data.  As we discussed previously, harvesting data from System z is now easier than ever.  One of the principal targets of that data is artificial intelligence.  Using AI to address or prevent problematic system behavior has many benefits.  To begin with, what better mechanism to process the enormous quantity of systems management data available than to use a computer!  Performance analysts have been awash in data, forced into making a manual decision to select the best data to evaluate.  With AI driving the process, substantially greater amounts of systems management data can be evaluated in problem diagnosis.  This is commonly referred to as AIOps. 

AIOps is dynamic in nature, differentiating AIOps from traditional systems management disciplines. Performance monitors require predefined alert mechanisms to be manually codified to generate events indicating a system problem.  With AIOps, the ability to ‘know the unknown’ removes the limitation of only being able to prepare for the problems of the past.   For a deeper understanding of this topic, see Patrick Chan’s blog Know your anomalies and be friends with them!  

 

Summary

After reading this blog, I hope the old adage ‘The right tool for the job’ comes to mind.   All of the above technologies play a role in the management of a highly available resilient IT environment.  Here we have lightly touched on the capabilities found in performance monitors, analytics engines, and AIOps.  In subsequent blogs, we will explore these topics more deeply, applying the functionality available to specific use cases.   In the meantime, for a look at how some of the solutions mentioned in this blog can be applied to monitoring z/OS Connect, checkout z/OS Connect Monitoring in 3 minutes on my YouTube Channel Wayne_Z_World.

Comments

Wed April 20, 2022 11:48 AM

Great overview Wayne!!!  I'm looking forward to the use case blogs!!