Db2 Tools for z/OS

 View Only

Db2 Query Monitor: Alerts for Monitoring -904 Conditions

By Tom Glaser posted Fri December 01, 2023 01:16 PM

  

Alerts for Monitoring  -904 Conditions

By Tom Glaser

Date: December, 2023

Practical example: How to monitor -904 using IBM Db2 Query Monitor

When Db2 encounters a situation where it requires some kind of resource that is not available, the task will fail with a -904 SQL Code.  It may be a plan, a package, a buffer pool, internal work space or something else. The official response from Db2 is:

UNSUCCESSFUL EXECUTION CAUSED BY AN UNAVAILABLE

RESOURCE. REASON reason-code,

TYPE OF RESOURCE resource-type,

AND RESOURCE NAME resourcename

Explanation

The SQL statement could not be executed because a required resource was not available.

Associated with the SQL Code of -904 are reason codes (what kind of resource), a resource-type (such as a buffer pool) and the actual name.  The reason code is a good starting place to resolve this issue.  Some reason codes show up in the Db2 MSTR address space (ssidMSTR), others do not.  Some reason codes are critical, others are not.  Some objects are critical, such as the Db2 catalog database, DSNDB06.  Others are not, such as the Db2 “play” database DSN8D13.  For those messages that show up in MSTR, you can use an automation tool to pull off the SQL Code and reason code.  But for those that are not in MSTR, here’s where Db2 Query Monitor can help.  This is a list of reason codes returned with a -904…

SQL Code -904 Reason Codes as of Db2 V12:

Reason Code

Description

00C200EA

DB2 is unable to perform the drain function on an object

00C200E0

A table space or index space cannot be accessed because DB2 is unable to create the referenced buffer pool

00C200E8

The buffer pool cannot be created because of insufficient virtual storage

00C20110

DB2 cannot access object because the highest formatted page is higher than that derived from the VSAM high-used RBA.

00C20206

An attempt was made to access a page which is currently in the logical page list

00C20208

The user attempted to access a group-buffer-pool-dependent page set or partition. However, the DB2 member was unable to connect to the group buffer pool because there was not enough hardware system area.

00C20252

DB2 is unable to obtain a physical lock

00C900A0

Object is in recover pending status

00C900A3

Object is in check pending status

00C90080

Object is read-only; an update is trying to take place

00C90081

An attempt was made to allocate a resource that is stopped for all access

00C90082

Db2 utility is executing against object

00C90083

Utility is executing, no updates permitted

00C90084

Our of temp space

00C90085

The limit of 253 extends for a temporary file has been reached

00C90086

In utility status

00C90096

The page or row lock on the page caused the total number of page or row locks concurrently held to reach the installation maximum number of page or row locks (NUMLKUS) allowed for a single agent in the system.

00C90305

Limit reached on thread for sort work space;

00D50001

You have exceeded the amount of space allowed for processing XML values.

00D70025

Could not create IDCAMS file

00E7000E

IDAA Resource is unavailable

00E7000F

Deadlock condition

In a production environment, it’s imperative that the DBAs monitor resource unavailable conditions, as this can cause an outage.  There are multiple ways of doing this; the end result will be the same.  You would like to receive a page on critical conditions as well as an email message listing the issue. 

As a DBA, you want to be aware of the issue as it happens.  If you are not monitoring for -904, the normal procedures might be as follows:

-        The application might research the -904 on their own (google).

-        Jump on slack or teams and text a DBA to see if one is available.

-        If no DBA is available, they might call their cell or group pager. 

-        The DBA finally gets involved to assist.

If you automate tracking -904, you can start on the issue immediately and possibly have it resolved before the application calls you.  To set up monitoring for -904, let’s follow these steps:

·       Update PROFILE

·       Add an ACTION

o   Include an email message and a WTO (Write To Operator) message

·       Add a SCOPE

·       Add a RESPONSE

The first step is to update your Profile.  This can be accomplished (if you have the authority) under ISPF or the web interface.  For now, jump to Option 8 of Query Monitor on ISPF (Work with Profiles):

UPDATE PROFILE:

Inside your monitoring profile, you will notice a section on Alerts.  In this example, we are not monitoring for CPU consumption, nor elapsed times, getpages or the number of SQL Calls.  We do not care about anomalies.  You will notice field Generate SQLCODE Alerts is set to Y.  You have the option of including specific SQL Codes ore excluding them.  In this example, I’m including a -904.  When you exit this screen, update the following panel:

PF3 out to save the alert. 

ADD an ACTION

For step 2, we will create two ACTIONS.  One is for the WTO and the other is for an email message.  ACTION is what will happen when the alert is fired.   Note that you can include the reason codes when the SCOPE is defined (see example at the bottom of this document).  Query Monitor is setup to send an email message, but not page.  You can also set up other options for Query Monitor to page you, but this is outside the scope of this document.   In this example, we are going to send a WTO to syslog so that this alert can be picked off the syslog by an automation tool to page Db2 Support.  Let’s start with the WTO. 

In the browser client, navigate to CONFIGURATION > ACTIONS:

…and add a note:

The action panel will look like the following:

The data contained in the “Message” field is what will be displayed in your syslog.  If you are new to Knowledge Base Management Language (KBML) coding, you might want to check out the Db2 Query Monitor User’s Guide (SC27-8803) for more details.

www.ibm.com/docs/en/SSAURY_3.3.0/pdf/cqmugc3.pdf

When the above alert is fired off, you will see the following message:

Your automation tool that monitors specific messages can look at message SDB-904, pull off the reason code and parse the output looking for additional information.

The second part of step 1 of monitoring -904 is to create an email message. 

The main panel of the Action will look like this:

This is a server-based action looking at Db2SqlStatement for the subject type.  You can put more than one email address for folks to receive this email note.  This can be the application DBA, Db2 Support Team, etc.  Under the “Subject” field is what will show up in your in-box.  If an id and password is required, supply that.

Action Group:

Allows to select between CAE-Agent based Actions, CAE-Server based Actions and other action groups. Leave this as “Server-based-Actions” because the WTO action is initiated by a CAE Agent on z/OS.

Subject Type:

Allows to specify the Domain Element type which is associated with the event.  Usually, alerts are raised when a Db2 SQL Statement is executed.  Leave this as “Db2SqlStatement”.

Event Type:

Specify the type of event which should trigger this action. The drop-down list shows all possible events for the specified subject type (in this case Db2SqlStatement). If you specified a different subject type, you would see different event types.

Under the “Message” is the KBML text.  You can change this information to whatever you like, this is only an example:

DB2 SSID:  ${subject.db2Ssid}

LPAR:      ${subject.smfId}

Date/Time: ${event.datetime}

SQL Code: -904

  User:    ${event.getAttribute("USER")}

  Jobname: ${event.getAttribute("JOBNAME")}

  Plan:    ${subject.plan}

  Package: ${subject.programName}

  Collection ID:  ${subject.collectionId}

  Correlation ID: ${event.getAttribute("CorrelationId")}

  Event Type: ${event.displayName}

DB2 Objects/Reason Code

==============================================

${subject.db2Ssid}/${event.getAttribute("MessageTokens")}

${if event is a SqlError then

"NOTE: SQLSTATE and MessageTokens are for SqlError only"+

"\nevent.getAttribute(\"SQLSTATE\"):"+ event.getAttribute("SQLSTATE")+

"\nevent.getAttribute(\"MessageTokens\"):"+

event.getAttribute("MessageTokens")

else

"event is not an SqlError"}

${urlEncode(toString(event.getAttribute("MessageTokens")))}

Alert Link: ${messageDetailsURL(event)}

DEFINE SCOPE: A scope defines a set of domain elements and or events based on several criteria.

We will be working with constraint-based scopes: Defined by a set of elements that meet specified criteria.  Look at “Scope” as setting up the criteria for your alert.  For example, do you want to monitor all Db2 subsystems, or just a specific one?  Do you want to only look at certain ID or package?  This is all defined in the scope.  Click on the ELEMENTS tab and Constraint Based Scope.  Set up your definition to look at the subject, like the following example.

 

Once your scope is defined, create the Response.

RESPONSE: This ties everything together.  A response is defined by choosing a scope, selecting one or more event types that are pertinent to that scope, then selecting an action, or multiple actions, that address those events for that scope.

Give it a name that uniquely identifies the Scope, such as Monitoring -904.  Click on “Domain Elements & Event Types.”  Now click on “everything” for the domain elements.  This is where you will find the “scope” you had created earlier.

 

Select the scope you had created.  In our example, we called it “Monitoring -904.”

Click OK to close this section.  Notice another tab called “Actions to Execute” at the top of your scope?  This will execute those actions.  Remember we had created two actions, one for WTO and the other for an email note?  Let’s add them.  Click on Actions to Execute.  As you can see, there are 2 actions we selected.  Note that these names can be anything.  For our email alert, we are using “TS4430 email -904.”  The WTO is “Monitoring for ‘904.”   

At the top of your Response, you will see this list:

We’ve selected 3 options.  Here are the details:

Select the appropriate options in the Response is Triggered When section.

The following options are available:

An event is posted

Triggers a response when an event has been posted to the message board.

An event is acknowledged

Triggers a response when an event is acknowledged by a user.

An acknowledged event is unacknowledged

Triggers a response when an event that was previously acknowledged by a user is

unacknowledged.

An event is cleared

Triggers a response when an event is removed from the message board.

The priority of an event changes

Triggers a response when the priority level associated with an event is altered.

The repetition count of an event increments

Triggers a response when the event has taken place again, thus the number of times the event

has occurred is incremented.

The alert status of an element has changed

Triggers a response when the alert status of an element is raised or lowered.

An event is restored

Triggers a response when an event is restored to the message board.

An event is correlated

Triggers a response when the event is correlated.

Click OK followed by the RED check mark in the upper left-hand corner, to create your Response.

Why not stop a tablespace and use your favorite query tool to run a select against the table that’s stopped.  This will produce a -904 SQL Code with a reason code of 00C90081.  Here’s our test:

The syslog will show the following:

Here’s the email note within my Outlook email inbox:

When you open the email note:

When you click on the link inside the email note, it will automatically log you into the CAE/Web interface and take you directly to the record for additional information:

What if you want to monitor one specific reason code for a -904? This is doable.

For this to work, you must change the Scope definition.  Here are the details (and these details are also in the Query Monitor’s Guide).

Now let’s take the case where additional notifications need to happen if the reason for resource

unavailable SQL Codes is that the resource is in a “stopped’ status.  In this case, Db2 issues a -904

with a reason code of “00C90081”. For SqlError event types, the Db2 reason code is included in the

MessageTokens field of the event record.

Here are the steps to create a scope which checks for a “stopped status reason code” in the

“MessageToken” field of an SqlError with a -904 SQL Code.

1. Create a new scope using the Scopes Editor of the Configuration Browser. This would be an event

scope for -904 SQL Codes.  Above, we created a scope pointing to Elements.  Let’s create a new scope.  Click on the EVENTS tab then:

Enter the following into the panel that’s displayed:

(event is in scope "Everything")

and (event is a SqlError)

and (event.HighestSqlCode is -904)

and regexMatches("^00C90081", event.getAttribute("MessageTokens").toString())

Like the following…and click the red check mark sign:

When you tie this scope to your Response, only 00C90081 will be monitored.

Summary: -904 is an SQL code that should be monitored, maybe you only want it in production.  There are multiple ways to set this up, the above is 2 examples.  A key point here is that the web interface (CAE) to Query Monitor needs to be installed.  There are many more reasons why the web interface should be installed.  Stay tuned for more examples.

Do you have an idea?  Below is a link to take you to IBM’s aha web page to submit ideas for Db2 Query Monitor. 

https://ibm-data-and-ai.ideas.ibm.com/?project=DB2ZQUEMON 


    #IBMChampion
    3 comments
    22 views

    Permalink

    Comments

    Fri January 12, 2024 11:18 AM

    Hi Tom,

    I believe that should definitely help us with our getpage alerts. We'll look at getting that APAR so we can work with those. Is there something similar for DelayProblem, SqlElapsedTimeProblem, or SqlError events? Or is there something I can do with KBML on the Scopes screen to get the object for these particular events?

    Thanks, Daniel

    Thu January 11, 2024 11:37 PM

    Hi Daniel,

    Can you take a look at this blog I wrote on monitoring objects.  

    https://community.ibm.com/community/user/datamanagement/blogs/preethi-n/2023/04/28/support-for-monitoring-alerts-on-getpage 

    Does this answer your question?

    Thanks, Tom

    Thu January 11, 2024 05:40 PM

    This was very helpful, thank you for posting on this. My organization has a potential use case where they would like to send certain alerts depending on the object in question. Your regexMatches example seems like a possible way we could gather this information from the MessageTokens in some cases (i.e. some event types occasionally list the Database and/or table/tablespace), and possibly the SQL text. But I don't think these would be solutions across the board. Any thoughts on how we could do this from the scopes screen?