AIOps & Management

 View Only

MQ Monitoring Agent: SAMPINT vs. "Situation Interval" for Sampled Tables

By Kristen Meren posted Mon January 08, 2018 08:06 AM


by Walter Pietroni

It may happen that when you create a situation based on a sampled table (meaning an attribute group whose data are collected at specified intervals), the situation does not fire when it's expected but at another, apparently wrong, interval.
What could cause this behavior to occur?
Is there something wrong in the setup that needs to be fixed?

Let's assume that the sampled table is the "Queue Statistics" attribute group.
As "Queue Statistics" group is a sampled table (ref and so it makes use of
the SAMPINT value configured in the cfg file.
Every SAMPINT the MQ agent run a RESET QUEUE STATISTICS against WebSphere MQ (or new IBM MQ) to get statistics for the monitored queues (STATISTICS(YES)) must be enabled in the same .cfg file) and keeps the data in-memory.
So the data available in-memory at MQ agent is refreshed every SAMPINT.

If the situation you create is based on Queue Statistics attribute group when "Situation Interval" expires, ITM code checks the value that is available in-memory at MQ agent in that moment and evaluates if the condition is matched.
See example below:

The MQ agent was started at 17:12:26.
The SAMPINT under your environment is set to 300s, this is the definition of your situation:




The queue clearly indicates that there was a message written at 17:15:15, but MQ agent fired an alert at 17:15:26 (see excerpt from operation log below) and based on the situation definition above the alert should not be raised and related action should not be executed: 
dis qs(QL1)
     1 : dis qs(QL1)
AMQ8450: Display queue status details.
   QUEUE(QL1)                              TYPE(QUEUE)
   CURDEPTH(38)                            IPPROCS(0)
   LGETDATE( )                             LGETTIME( )
   LPUTDATE(2016-12-15)                    LPUTTIME(17.15.15)
   MEDIALOG( )                             MONQ(MEDIUM)
   MSGAGE(86229)                           OPPROCS(0)
   QTIME( , )                              UNCOM(NO)

Operation log:
1161215171426835KRAIRA002 Executed with status 0 ...
1161215171526075KRAIRA002 Executed with status 0 ...
1161215171626119KRAIRA002 Executed with status 0 ...

Then the situation alert was closed at 17:17:26.

In the example above the situation interval is less than SAMPINT (situation Interval = 1 min and SAMPINT = 300s) and no messages were put in the monitored queue since when the MQ agent was started and so the situation resulted to be true also in intervals after the message was put in the queue and this occurred until the SAMPINT expired and new values were gathered by MQ agent from IBM MQ.

So in short when you design situation based on sampled tables, the expected results are dependent on both the values of "Situation Interval" and SAMPINT.

Hope this is useful!