MQ

Uniform Cluster: monitor application resource usage metrics

By Louis Horsley posted Mon April 06, 2020 04:42 AM

  
Introduction

Earlier in the 9.1.x time frame we delivered various capabilities to enable the automatic balancing of explicitly named application instances across a set of queue managers configured in a uniform cluster. We also introduced the MQSC administrative command DISPLAY APSTATUS to display details about the instances of applications currently connected to a queue manager.

As of IBM MQ version 9.1.5 we have added a feature to monitor a named application's instances by querying the resource usage metrics for applications for the purposes of graphing, alerting and reporting. We can monitor application instances on a stand-alone queue manager, a regular queue manager or a uniform cluster.


System topic publications & subscriptions

When any instances of an application name exist, resource usage metrics for the application are published in the form of PCF messages to a system topic, to retrieve these messages we subscribe to the system topic by creating a subscription.

In this blog we will use the sample application amqsrua to subscribe to this topic and parse the PCF messages, however if you want to manually create an administrative subscription, you can use a topic string in the following format:


  • $SYS/MQ/INFO/QMGR/$QUEUE_MANAGER_NAME/Monitor/STATAPP/$APPLICATION_NAME/INSTANCE
 

The amqsrua sample program will block the terminal while it receives or waits for publications which are published every 10 seconds. To get a one-time snapshot we can append the flag & value -n 1 to the command to get one publication or -n N to get N publications. To run indefinitely omit -n N as follows:


  • amqsrua -m $QUEUE_MANAGER_NAME -c STATAPP -t INSTANCE -o $APPLICATION_NAME
 
Simple demo

As a simple test we can run amqsrua to get the metrics for itself on a single queue manager:


  • create and start a stand alone queue manager:

    crtmqm QM1 && strmqm QM1

  • run amqsrua:

    /opt/mqm/samp/bin/amqsrua -m QM1 -c STATAPP -t INSTANCE -o amqsrua -n 1
 

You should see the following output:


  • Publication received PutDate:20200311 PutTime:15394102 Interval:10.000 seconds
    amqsrua Instance count 1
    amqsrua Movable instance count 0
    amqsrua Instance shortfall count 0
    amqsrua Instances started 0
    amqsrua Initiated outbound instance moves 0
    amqsrua Completed outbound instance moves 0
    amqsrua Instances ended during reconnect 0
    amqsrua Instances ended 0
  • Hover your pointer over the metrics in the output above to display definitions.

 

As we can see from the output above, there is one Instance count of amqsrua, Movable instance count is 0 as amqsrua is not a reconnectable application and Instances ended is 0 as we received this output before amqsrua has ended!

The remaining metrics are related to balancing and all have the value 0 as we are running on a stand-alone queue manager therefore balancing can not occur.

To really see the benefit of these metrics we need to run this on a uniform cluster with multiple reconnectable applications connected.


Uniform Cluster Demo

Task 1: Create a demo test environment

1.1 Set up terminals

  • Open four terminals, we will refer to them as terminals 1 through 4.


1.2 Create JSON CCDT

  • In terminal 4 create a file of name 'ccdt.json' and paste in the following JSON:
  • {
      "channel": [
        {
          "name": "CLIENTCHL",
          "clientConnection": { "connection": [ { "host": "localhost", "port": 1415 } ], "queueManager": "QMGROUP" },
          "connectionManagement": { "clientWeight": 1, "affinity": "none" },      
          "type": "clientConnection"
        },
        {
          "name": "CLIENTCHL",
          "clientConnection": { "connection": [ { "host": "localhost", "port": 1415 } ], "queueManager": "QM1" },
          "type": "clientConnection"
        },
        {
          "name": "CLIENTCHL",
          "clientConnection": { "connection": [ { "host": "localhost", "port": 1416 } ], "queueManager": "QMGROUP" },
          "connectionManagement": { "clientWeight": 1, "affinity": "none" }, 
          "type": "clientConnection"
        },
        {
          "name": "CLIENTCHL",
          "clientConnection": { "connection": [ { "host": "localhost", "port": 1416 } ], "queueManager": "QM2" },
          "type": "clientConnection"
        },
        {
          "name": "CLIENTCHL",
          "clientConnection": { "connection": [ { "host": "localhost", "port": 1417 } ], "queueManager": "QMGROUP" },
          "connectionManagement": { "clientWeight": 1, "affinity": "none" }, 
          "type": "clientConnection"
        },
        {
          "name": "CLIENTCHL",
          "clientConnection": { "connection": [ { "host": "localhost",   "port": 1417 } ], "queueManager": "QM3" },
          "type": "clientConnection"
        }
      ]
    }
    

    Note:

    We have defined each CLNTCONN twice per QM, first as a member of a QM Group and second as an independent connection to its respective QM.

    The former definition allows re-connectable application instances to load balance across all QMs in the uniform cluster, while the latter is used to couple application instances to specific QMs.

    In the event of a failed rebalance an application instance using the former definition will reconnect to any queue manager within the group, whereas with the latter definition an instance will only reconnect to the specified queue manager.

    We will take advantage of the latter to initially set the application into an unbalanced state by deploying all the application instances to QM1, we will then see the instances rebalance in our resource usage metrics.


1.3 Export environment variables

  • Run the following commands in terminal 4:

    export MQAPPLNAME="MY_APP_1" to name all subsequent applications deployed in this terminal as ‘MY_APP_1’.

    export MQCCDTURL="file:///tmp/ccdt.json" to route all subsequent applications deployed in this terminal to our CCDT file.


  • In terminals 1, 2 & 3 run the following command to add MQ binaries to the PATH variable:

     export PATH=$PATH:/opt/mqm/bin:/opt/mqm/samp/bin 


1.4 Set up uniform cluster

  • In terminal 4 create a file uniclus.ini and paste in the following:

  • AutoCluster:
    Repository2Conname=localhost(1414)
    Repository2Name=QM1
    Repository1Conname=localhost(1415)
    Repository1Name=QM2
    ClusterName=MY_CLUSTER
    Type=Uniform

  • Create a file uniclus.mqsc and paste in the following:

  • define channel('+AUTOCL+_+QMNAME+') chltype(clusrcvr) trptype(tcp) conname('+CONNAME+') cluster('+AUTOCL+') replace
  • define channel(CLIENTCHL) chltype(svrconn) trptype(tcp) replace
  • Note: each queue manager will have a svrconn defined.


  • Create & start 3 QMs: QM1, QM2 & QM3

  • crtmqm -p 1414 -ii uniclus.ini -ic uniclus.mqsc -iv "CONNAME=localhost(1414)" QM1 && strmqm QM1

    crtmqm -p 1415 -ii uniclus.ini -ic uniclus.mqsc -iv "CONNAME=localhost(1415)" QM2 && strmqm QM2

    crtmqm -p 1416 -ii uniclus.ini -ic uniclus.mqsc -iv "CONNAME=localhost(1416)" QM3 && strmqm QM3



Task 2: How are my application instances behaving in my uniform cluster?


2.1 Deploy re-connectable applications

  • In terminal 4, repeat the following command 9 times:

    amqsghac SYSTEM.DEFAULT.LOCAL.QUEUE QM1 > /dev/null 2>&1 &

    Note:

    We redirect stdout & stderr to null and fork the process so we can continue working in the same terminal.

    Sample program amqsghac is a get application with the re-connectable flag set.

    As we have deployed all application instances to QM1, the uniform cluster will initially be in an unbalanced state.


2.2 Show application resource usage metrics on each member of the uniform cluster

  • Run the following command in terminals 1, 2 & 3 replacing * with 1, 2 & 3 respectively:

    amqsrua -m QM* -c STATAPP -t INSTANCE -o MY_APP_1

  • You should see output similar to:

  • Terminal 1, QM1: Terminal 2, QM2: Terminal 3, QM3:
    Publication received PutDate:20200312 PutTime:12504279 Interval:34.584 seconds
    MY_APP_1 Instance count 3
    MY_APP_1 Movable instance count 3
    MY_APP_1 Instance shortfall count 0
    MY_APP_1 Instances started 9
    MY_APP_1 Initiated outbound instance moves 6
    MY_APP_1 Completed outbound instance moves 6
    MY_APP_1 Instances ended during reconnect 0
    MY_APP_1 Instances ended 6

    Publication received PutDate:20200312 PutTime:12505279 Interval:10.000 seconds
    MY_APP_1 Instance count 3
    MY_APP_1 Movable instance count 3
    MY_APP_1 Instance shortfall count 0
    MY_APP_1 Instances started 0
    MY_APP_1 Initiated outbound instance moves 0
    MY_APP_1 Completed outbound instance moves 0
    MY_APP_1 Instances ended during reconnect 0
    MY_APP_1 Instances ended 0
    ...
    Publication received PutDate:20200312 PutTime:12504840 Interval:34.576 seconds
    MY_APP_1 Instance count 3
    MY_APP_1 Movable instance count 3
    MY_APP_1 Instance shortfall count 0
    MY_APP_1 Instances started 3
    MY_APP_1 Initiated outbound instance moves 0
    MY_APP_1 Completed outbound instance moves 0
    MY_APP_1 Instances ended during reconnect 0
    MY_APP_1 Instances ended 0
    ...
    Publication received PutDate:20200312 PutTime:12504731 Interval:10.000 seconds
    MY_APP_1 Instance count 3
    MY_APP_1 Movable instance count 3
    MY_APP_1 Instance shortfall count 0
    MY_APP_1 Instances started 0
    MY_APP_1 Initiated outbound instance moves 0
    MY_APP_1 Completed outbound instance moves 0
    MY_APP_1 Instances ended during reconnect 0
    MY_APP_1 Instances ended 0
    ...

  • Analysing the first block of amqsrua output on QM1 at this period in time we had Instances started 9 which reflects the 9 new instances of MY_APP_1 we initially deployed. We also see that the Initiated & Completed outbound instance moves are at 6, this tells us that within this period 6 application instances of MY_APP_1 have successfully reconnected to another member of the uniform cluster.

  • Analysing the second block of amqsrua output on QM1, the first  block on QM2 & QM3 and all subsequent blocks for all QMs we see that Instance count 3 & Movable instance count 3, this is a clear indicator that our application is balanced across all members of the uniform cluster.

  • Note that in the fist line of each block which we see: Interval:XX.XXXX seconds. This is the interval period in which publications are published to the system topic, this value may be different for the first block as we may initiate the subscription at any point during this interval. Also bear in mind that the individual interval cycles for each queue manager within the cluster will most likely be out of step with each other.


2.3 Deploy non reconnectable applications.

  • In terminal 4, repeat the following command 3 times:

  •          amqstrgc SYSTEM.DEFAULT.LOCAL.QUEUE QM1 > /dev/null 2>&1 &
  • Note:

    Sample application amqstrgc does not have the re-connectable flag set and has unlimited wait time.

    Environment variable MQAPPLNAME was previously set to ‘MY_APP_1’, so these non re-connectable instances will have the same name as our re-connectable instances we deployed in step 2.1.

  • After a short period of time when the application instances have rebalanced you should see the following output:

    • Terminal 1, QM1: Terminal 2, QM2: Terminal 3, QM3:
      ...
      MY_APP_1 Instance count 4
      MY_APP_1 Movable instance count 1
      ...
      ...
      MY_APP_1 Instance count 4
      MY_APP_1 Movable instance count 4
      ...
      ...
      MY_APP_1 Instance count 4
      MY_APP_1 Movable instance count 4
      ...

  • From this we can see that QM1 now has 4 instances of MY_APP_1 but only one of these is reconnectable, whereas QM2 & QM3 now have 4 reconnectable instances each.


Conclusion

We have seen how to query real-time resource usage metrics on the state of application instances on any given queue manager within a uniform cluster. By exporting the resource usage metrics into Prometheus & Grafana as shown in the mq-metric-samples on github, you can see the real benefit of this new feature that enables us to implement an automated real-time monitoring & reporting solution. Watch this space for more to come on integrating application resource usage metrics with Prometheus & Grafana.

1 comment
65 views

Permalink