z/OS Connect - Group home

Using OMEGAMON for JVM to diagnose API failures

By Yun Han Li posted Tue August 11, 2020 02:20 AM

  
This blog was originally published on March 28, 2019, by Nigel Williams, IBM Systems Center Montpellier.



Introduction

In the article Monitoring APIs with OMEGAMON for JVM we provided an overview of how OMEGAMON for JVM can be used to monitor the status of a z/OS Connect EE API workload. In this article we show some examples of how OMEGAMON for JVM can be used to diagnose API failures.

Identifying APIs that are failing

The ‘top level’ OMEGAMON for JVM views allow operations staff to see a quick health status of z/OS Connect EE. Figure 1 shows the catalog and phonebook APIs deployed to a z/OS Connect EE server.

APIs view
Figure 1 APIs view

In Figure 1 we see the different HTTP methods used to invoke each API, request counts, error counts, timeouts and response times. For example, we can see that within the chosen interval, 1 error has occurred for the catalog API and 62 errors for the phonebook API. You can specify the duration or time range for which you want to see monitoring data.

Below we show how OMEGAMON for JVM helps to identify the cause of these failures.

Investigating timeouts

z/OS Connect EE requests time out if they do not complete within a specified time. We can use the services view (Figure 2) to identify which service has timed out.

Services view

Figure 2 Services view

Note: In asynchronous mode, requests time out if they do not complete within the time that is specified by the asyncRequestTimeout attribute on the zosconnect_zosConnectManager element in server.xml.

Figure 2 shows that whereas most API requests that invoke the placeOrder service are ending normally, one request timed out. So how do we find out more information about this timeout?

Figure 3 shows the list of catalog API requests for the placeOrder service in descending order of response time.

placeOrder requests
Figure 3 placeOrder requests

OMEGAMON for JVM helps you to break down total response time (Req Time) into the time spent in the z/OS Connect server (zOSConnect Time) and the time spent waiting for a response from the System of Record (SoR). Figure 3 shows that the cause of the timeout is a delay in the SoR.

Figure 4 shows the details of the request that timed out.

placeOrder timeout details
Figure 4 placeOrder timeout details

Figure 4 gives a complete view of the timeout:

  • It occurred at 11:07:06 on March 19th.
  • It occurred when using the CICS service provider for a connection to CICSMOB1.
  • The CICS transaction that timed out was MZPO and the program that was being executed was DFH0XCMN.
  • An HTTP return code 503 was returned to the client.
  • The identity of the client was EMPLOY1.

Investigating other failures

Figure 1 above also shows that errors are occurring for the phonebook API, and Figure 2 shows that these are associated with the contactsBrowse service. These failures are not time outs, so how do we but find out more information about the failures?

Figure 5 below shows the list of phonebook API requests for the contactsBrowse service.

contactsBrowse requests
Figure 5 contactsBrowse requests

By scrolling right until the HTTP code column appears, and then ordering the requests in HTTP response code descending order, we can see the requests that have failed (Figure 6).

HTTP codes in descending order
Figure 6 HTTP codes in descending order

Figure 6 shows that the failed requests had an HTTP response code of 500 (‘The server encountered an unexpected condition which prevented it from fulfilling the request’).

Figure 7 shows the details of one of the failed requests.

contactsBrowse request details
Figure 7 contactsBrowse request details

Figure 7 gives a complete view of the failure:

  • It occurred at 11:07:58 on March 19th.
  • It occurred when using the IMS service provider for a connection to IMC1.
  • The IMS transaction that failed was IVTNO.
  • An HTTP return code 500 was returned to the client.
  • The identity of the client was EMPLOY1.

We now have the necessary information that allows us to investigate the root cause of the failure by looking at the IMS message logs.

Summary

This article gives an example of how OMEGAMON for JVM can be used to monitor a z/OS Connect EE server for API failures. It also shows how to identify the target sub system and resource when there is a problem with a specific API.

More information

More information on monitoring z/OS Connect EE APIs with OMEGAMON for JVM can be found in the OMEGAMON for JVM Knowledge Center.

0 comments
20 views