AIOps: Monitoring and Observability - Group home

Observability – game changing monitoring via OMEGAMON Data Provider including new functions


I talk to a lot of customers about z/OS monitoring. Something that doesn’t get old is sharing screen shots of metrics on a variety of analytics platforms. Existing Extended 3270 user interface and Tivoli Enterprise Portal workspaces might show data in spreadsheet form or bar graphs with some coloring. And that data will be limited to hours to a max of a week. You’d need a data lake, such as Tivoli Data Warehouse, to see more data. Not anymore. With OMEGAMON Data Provider actively streaming data to a wide variety of analytics platforms, a tremendous variety of graphing options are available, from moments to years, depending on the storage choices used within the analytics platforms.

Observability is about “seeing”. It’s hard to see an anomaly or trend by looking at raw data in a spreadsheet or across a screen. Through analytics platforms, it’s easy to spot trends via different graphing models. Even better, those analytics platforms might toss out an alert if they see an anomaly due to a trend moving off a standard deviation, for example. These capabilities can greatly improve the time to resolution for a problem. They will also notice a potential problem faster and help a business resolve it before there is a true disruption. Preventing a problem is far less expensive than detecting a problem that’s already occurred. I do need to call out the role of situations, as another measure of improved observability. Situation processing has existed for years within OMEGAMON. A business can set them up to trigger alerts when a previously defined abnormality begins to occur. Nothing to look at, but an alert is sent to speed observability and time to resolution. 

Here's a visual observability example where no pre-defined situation exists. In this first screen shot, the view is of private storage for a number of different jobs running within a system. At the bottom of the screen is a line that appears to be growing linearly over time. And time is an important element here. Often,  existing or traditional monitoring user interfaces will show a small fraction of time, like minutes, where something is not observable, this is is considerably longer. 

In image 2, we've isolated to show only the single line/job and with the change in scale.  It's much clearer to see that storage is obtained, used and then the majority, but not all of it, is released. 

And further reducing the time interval, it becomes even clearer that there is a storage leak within this job. What's fortunate is this particular monitoring was done in a development environment. The job in question had been recently updated and placed in a test environment. The code was corrected before it was put into production. It's equally important to monitor code development in your DevOps environment as it is in your production environment. 

The skills necessary to build the graphing models apply to all platforms and a wide variety of tools. This is not a secret science or unique to the mainframe. It’s fundamentally similar capabilities to graphing within a spreadsheet application. Once the data starts streaming, choose the Key Performance Indicators (KPIs) within the data and start building new views. Line graphs, bar graphs, speedometers/tachometers, pie charts and more. All with a goal of visualizing changes in baseline system performance.

New Function Updates APAR  OA63539 PTF UJ09309

All of the above are capabilities that have been part of OMEGAMON Data Provider since its first deliverable. Now, with the general availability of the fourth “new function” APAR  OA63539 PTF UJ09309, there are some new capabilities and improved documentation to get you on your way to exploiting this form of observability.

New data, across many agents, now streaming

Each of the OMEGAMON agents has been streaming through OMEGAMON Data Provider since June. In the meantime, some agents have provided functional updates and therefore, each of those updates need to be streamed as well. Some of those updates include the following: OMEGAMON for CICS 5.6 was released, coincident with CICS Transaction Server 6.1. The new Program Trace Facility data can now be streamed. OMEGAMON for Db2 Performance Expert 5.5 was released and is required to monitor Db2 V13. New data attributes for Db2 are now streamable. OMEGAMON Monitor for Z/OS 5.6 was updated to include support for the IBM System z16 processor family. This support includes streaming for new encryption capabilities as well as the monitoring of containers running within z/OS Container Extensions. There have been functional enhancements to other agents as well. Each of them have provided streaming updates.

As OMEGAMON agents are updated with new functions, expect to see updates to streaming capabilities, coincident with the agent release or shortly after.

Improvements for streaming to IBM Instana

Support for Instana was delivered with APAR 3. That included monitoring of CICS, Db2 and z/OS infrastructure. Instana was recently updated to leverage OMEGAMON to provide information about MQ and Java Virtual Machines (JVMs) as well.

You might ask yourself: why would you need OMEGAMON to plug into Instana, don’t I get enough information out of the box already? Instana is a distributed application performance monitor. It can look at end-to-end applications, from mobile or desktop device, through networks, to back-end transaction and database servers on a wide range of platforms. It uses tracing capabilities across the components it monitors. As a result, several views are provided to end-users:
A topological diagram of the interconnection between component parts,

an infrastructure diagram, represented like pizza boxes in communities, with some towers representing mainframe and virtualized servers and then

graphics with details from the tracing of the components of the applications.

The analytics within Instana can tell when an application begins to behave badly from its baseline operational environment. But what happens on a mainframe, when an underlying subsystem might start to have issues? Perhaps a buffer pool is constrained….and now ALL applications on a mainframe start to show some critical issues…is in panic time?

Well, with OMEGAMON streaming to Instana, subsystem or infrastructure issues will begin appear BEFORE applications start to get alerts. A business can correct the infrastructure issues and performance of all business applications can continue to meet service level agreements (SLAs). Like the components that have tracing monitor analytics, a similar display is added for infrastructure monitoring and metrics analysis. This additional capability can drive faster time to resolution of performance problems.

Documentation Improvements

The documentation for OMEGAMON Data Provider got an enhancement with this release. Much of the functionality was previously delivered, but new explanations or better diagnostic information is now provided.

While OMEGAMON Data Provider is pretty easy to deploy, there are many moving parts. For example, to successfully stream data, near-term history needs to be turned on via one of the traditional user interfaces, OMEGAMON Data Broker must have it’s YAML configuration updated to select which data is to be streamed and OMEGAMON Data Connect needs to direct where the data will be sent. A common question we’ve gotten from customers is: How do I know if I’ve set this all up correctly? A new section of the documentation has been developed to explain the expected messages and the order in which they are to be received and where you might find them. This can help easily let a system programmer know that either everything is okay or some additional work is necessary to correct a problem.

Some of the settings for OMEGAMON Data Provider have improved documentation. Some examples are:  

  1. What does an Interval 0 (zero) really mean?
  2. What if the data broker suggests streaming certain data, but the data connect server doesn’t identify a target for that data?
  3. What if there are multiple collection settings at different intervals for the same table, but they don’t match the user interface defined collection interval?

Rather than repeating the explanations here, check out the documentation updates for a complete understanding. Suffice to say, it’s important to understand some of these operational behaviors for proper monitoring.

High availability and scalable performance are key attributes of OMEGAMON Data Provider. The documentation has been updated to show the configuration and settings necessary to operate multiple Data Brokers and Data Connect instances to meet operational goals.

Speedy diagnosis of problems is also important. The Spring Boot server utilized within OMEGAMON Data Provider has a number of log settings for capturing diagnostic information. Rather than point users solely to the open source manuals for Spring Boot, some of that information has been replicated within the documentation for faster and easier operational deployment.

More information about OMEGAMON Data Provider

Where this blog started is Observability. This is a game changing way of looking at traditional z/OS performance metrics. More data will be made streamable. More starter dashboards will be made available. Stay with this blog and bookmark our “master blog” to stay in touch with the latest happenings for OMEGAMON Data Provider.