Summary
In mod release 12.0.12.0 of IBM App Connect Enterprise (ACE) we introduced a new path analysis feature to help with diagnosing problems in production environments. When enabled this feature keeps track of the paths that are traversed by messages flowing through message flows and allows display of this aggregated data. This enhances the capabilities of Administrators or Developers working with complex production systems giving a new way to quickly drill down to areas of interest. In this article we will explore this new feature and how to enable it, discuss how it works and then use an example to show how it can be used to aid troubleshooting.
Enabling Path Analysis
By default path analysis is disabled, in order to enable this feature you can use one of the following three methods:
- Modify the server.conf.yaml to set the "enabled" property of the PathTracker Resource Manager to "true"
PathTracker:
enabled: true # Enable path tracking for all message flows, default is false.
- Use mqsichangeproperties to set the "enabled" property of the PathTracker Resource Manager to true
mqsichangeproperties IN -e IS -o PathTracker -n enabled -v true
- Use the Admin REST API to patch the "enabled" property of the PathTracker Resource Manager to "true"
curl -v http://localhost:7600/apiv2/resource-managers/path-tracker -X PATCH --data "{ \"properties\": { \"enabled\": true} }" -H "Content-Type: application/json"
curl -v http://localhost:4414/apiv2/servers/default/resource-managers/path-tracker -X PATCH --data "{ \"properties\": { \"enabled\": true} }" -H "Content-Type: application/json"
If you need to disable the Path Tracker the enabled property can be set to "false" using the same methods.
Turning the Path Tracker Resource Manager on and off using the Admin REST API or mqsichangeproperties is dynamic and changes will take effect immediately. Changing the server.conf.yaml file requires an Integration Server restart in order to become effective.
Exploring Path Analysis Data
Once the Path Analysis feature is enabled a new tab will be visible in the Web Admin User Interface on the Message Flow Page called "Path Analysis" as shown in the screenshot below:
When you click on this tab you will be taken to the Path Analysis view, this shows an entry for the top 25 unique paths that were traversed through the message flow. For example:
We can expand an entry in the table in order to see more details about the path that was taken. For example expanding the first entry:
This table shows a path with the internal identifier "(0000475D-66056311-00000005)", this identifier is generated automatically each time a new path is added. The "Message count" column shows the number of messages that have taken this particular path through the flow since this path was added to the Path Tracker.
The table that is shown in the expanded section for this path entry shows the actual path through the message flow. Each entry indicates a transition between 2 terminals in the flow.
So for example we can see that for this particular path execution started in a HTTP Input Node called Business Partner 5 and then propagated via the out terminal to a Compute Node called "ExtractRoutingInfo".
After this the message was propagated via the out terminal of the Compute Node to a HTTP Request Node called callBackEnd before finally being propagated via the out terminal to a HTTP Reply node which completed the flow transaction.
The picture below shows the flow diagram in the Integration Toolkit with the patch represented by this table highlighted:
Clicking on the "Show message list" link in the row associated with a path, shows a list of the message ID's for the last 100 messages that traversed this path along with the timestamp:
In addition to the graphical display, the Admin REST API also allows access to the underlying data. For example it is possible to report all data held by the Path Tracker:
// curl -v http://localhost:4451/apiv2/servers/default/resource-managers/path-tracker/report-all -X POST
{
"pathData": [
{
"applicationName": "demoFlow",
"applicationURI": "/apiv2/servers/default/applications/demoFlow",
"flowName": "backEnd",
"messageCount": 2,
"messageList": [
{
"id": "(000120AC-66057024-00000001)",
"timestamp": "2024-03-28T13:27:00.774Z"
},
{
"id": "(000120AC-6605702A-00000002)",
"timestamp": "2024-03-28T13:27:06.119Z"
}
],
"path": [
{
"source": {
"flowLabel": "backEnd",
"flowName": "backEnd",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "HTTP Input",
"nodeName": "backEnd#FCMComposite_1_1",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd/nodes/HTTP%20Input",
"terminalName": "out",
"type": "ComIbmWSInputNode"
},
"target": {
"flowLabel": "backEnd",
"flowName": "backEnd",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "HTTP Reply",
"nodeName": "backEnd#FCMComposite_1_2",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd/nodes/HTTP%20Reply",
"terminalName": "in",
"type": "ComIbmWSReplyNode"
}
}
],
"pathIdentifier": "(00011F3F-66057024-00000001)"
},
{
"applicationName": "demoFlow",
"applicationURI": "/apiv2/servers/default/applications/demoFlow",
"flowName": "demoFlow",
"messageCount": 1,
"messageList": [
{
"id": "(000120AE-66057024-00000001)",
"timestamp": "2024-03-28T13:27:00.774Z"
}
],
"path": [
{
"source": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "BusinessPartner 1",
"nodeName": "demoFlow#FCMComposite_1_1",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/BusinessPartner%201",
"terminalName": "out",
"type": "ComIbmWSInputNode"
},
"target": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "ExtractRoutingInfo",
"nodeName": "demoFlow#FCMComposite_1_6",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/ExtractRoutingInfo",
"terminalName": "in",
"type": "ComIbmComputeNode"
}
},
{
"source": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "ExtractRoutingInfo",
"nodeName": "demoFlow#FCMComposite_1_6",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/ExtractRoutingInfo",
"terminalName": "out",
"type": "ComIbmComputeNode"
},
"target": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "callBackend",
"nodeName": "demoFlow#FCMComposite_1_9",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/callBackend",
"terminalName": "in",
"type": "ComIbmWSRequestNode"
}
},
{
"source": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "callBackend",
"nodeName": "demoFlow#FCMComposite_1_9",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/callBackend",
"terminalName": "out",
"type": "ComIbmWSRequestNode"
},
"target": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "HTTP Reply",
"nodeName": "demoFlow#FCMComposite_1_10",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/HTTP%20Reply",
"terminalName": "in",
"type": "ComIbmWSReplyNode"
}
}
],
"pathIdentifier": "(00011F3F-66057024-00000002)"
},
{
"applicationName": "demoFlow",
"applicationURI": "/apiv2/servers/default/applications/demoFlow",
"flowName": "demoFlow",
"messageCount": 1,
"messageList": [
{
"id": "(000120B6-6605702A-00000001)",
"timestamp": "2024-03-28T13:27:06.120Z"
}
],
"path": [
{
"source": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "BusinessPartner 4",
"nodeName": "demoFlow#FCMComposite_1_5",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/BusinessPartner%204",
"terminalName": "out",
"type": "ComIbmWSInputNode"
},
"target": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "checkRouting",
"nodeName": "demoFlow#FCMComposite_1_7",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/checkRouting",
"terminalName": "in",
"type": "ComIbmJavaComputeNode"
}
},
{
"source": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "checkRouting",
"nodeName": "demoFlow#FCMComposite_1_7",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/checkRouting",
"terminalName": "out",
"type": "ComIbmJavaComputeNode"
},
"target": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "ExtractRoutingInfo",
"nodeName": "demoFlow#FCMComposite_1_6",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/ExtractRoutingInfo",
"terminalName": "in",
"type": "ComIbmComputeNode"
}
},
{
"source": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "ExtractRoutingInfo",
"nodeName": "demoFlow#FCMComposite_1_6",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/ExtractRoutingInfo",
"terminalName": "out",
"type": "ComIbmComputeNode"
},
"target": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "callBackend",
"nodeName": "demoFlow#FCMComposite_1_9",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/callBackend",
"terminalName": "in",
"type": "ComIbmWSRequestNode"
}
},
{
"source": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "callBackend",
"nodeName": "demoFlow#FCMComposite_1_9",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/callBackend",
"terminalName": "out",
"type": "ComIbmWSRequestNode"
},
"target": {
"flowLabel": "demoFlow",
"flowName": "demoFlow",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "HTTP Reply",
"nodeName": "demoFlow#FCMComposite_1_10",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/demoFlow/nodes/HTTP%20Reply",
"terminalName": "in",
"type": "ComIbmWSReplyNode"
}
}
],
"pathIdentifier": "(00011F3F-6605702A-00000003)"
}
],
"reportGeneratedTimestamp": "2024-03-28T01:27:10Z"
}
This can pull back a large amount of data so it is also possible to filter the results returned via the Admin REST API using message flow and application name. For example:
//curl 'http://localhost:4451/apiv2/servers/default/resource-managers/path-tracker/report?application=demoFlow&flow=backEnd' -X 'POST'
{
"pathData": [
{
"applicationName": "demoFlow",
"applicationURI": "/apiv2/servers/default/applications/demoFlow",
"flowName": "backEnd",
"messageCount": 2,
"messageList": [
{
"id": "(000120AC-66057024-00000001)",
"timestamp": "2024-03-28T13:27:00.774Z"
},
{
"id": "(000120AC-6605702A-00000002)",
"timestamp": "2024-03-28T13:27:06.119Z"
}
],
"path": [
{
"source": {
"flowLabel": "backEnd",
"flowName": "backEnd",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "HTTP Input",
"nodeName": "backEnd#FCMComposite_1_1",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd/nodes/HTTP%20Input",
"terminalName": "out",
"type": "ComIbmWSInputNode"
},
"target": {
"flowLabel": "backEnd",
"flowName": "backEnd",
"flowURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd",
"libraryName": "",
"libraryURI": "",
"nodeLabel": "HTTP Reply",
"nodeName": "backEnd#FCMComposite_1_2",
"nodeURI": "/apiv2/servers/default/applications/demoFlow/messageflows/backEnd/nodes/HTTP%20Reply",
"terminalName": "in",
"type": "ComIbmWSReplyNode"
}
}
],
"pathIdentifier": "(00011F3F-66057024-00000001)"
}
],
"reportGeneratedTimestamp": "2024-03-28T01:32:47Z"
}
Exploring an Example Use Case
In order to demonstrate how the Path Analysis feature can enhance problem determination capabilities we will consider an example scenario. In this scenario an integration has been deployed which supports HTTP Integrations between 5 business partners and a back end system also via HTTP. In this sample we simulate the back end system using a simple HTTP flow that always returns success.
In our example an ACE Administrator has received reports from "Business Partner 4" that intermittently responses are delayed and these delayed messages have corrupted message data. The business partner is unable to reliably reproduce the issue but it does happen to a small percentage of messages. This flow is running on an Integration Node and the Administrator has already checked the syslog and can find no errors reported.
Let's look at the Path Analysis tab in the Admin Web User Interface to explore how we might use the Path Tracker to narrow down the problem.
Immediately we can see that there is one path through the flow that has been taken by a much smaller proportion of the messages processed by the system. The path with identifier "(00011F3F-6605784B-00000007)" has only processed 15 messages but all of the other paths have processed over 1000 messages. Let's start by expanding this path so we can see the detail:
This is already looking like a good candidate for a path that might be causing the delayed messages. We can see from the very first entry in this list that this path is associated with Business Partner 4 which is the one which is reporting a problem.
In this example the node names have been set to give a clue as to what might be causing the problem. So for instance "PrepareRareRequest" gives a clue as to why this is only affecting a small proportion of messages. The node name "ThrowAwayFailureDetails" and "BuildBlob" also may explain why the message data is appearing to be corrupt. In a real example these names would be related to a business function performed by the node, on it's own this may be enough to narrow down the problem however there is some more information that we can gather just by looking at the path. Let's consider entry number 7:
Here we can see that the message was propagated out of the catch terminal of the HTTP Input Node. This means that an exception was thrown down stream in the flow. In most cases the node throwing the exception will be the one immediately preceding this entry:
So already we can hypothesise that the failure is caused by a call to a remote back end failing in the HTTP Request Node named "callAnotherBackEnd".
Let's use another feature of the Path Analysis tab to find some more information. At the top of the tab is a search box, this will match entries in either the path definitions or in the message list. In this case let's enter the string "BusinessPartner 4" to see all of the paths that messages for this business partner took:
The search will auto expand all paths containing the string "BusinessPartner 4". In addition to the path we previously examined we can see that the vast majority of messages for Business Partner 4 take a much simpler path through the flow:
By comparing the 2 paths we can immediately see some differences:
- The checkRouting node propagated the message to a different terminal , in the failing case messages went through the "alternate" terminal instead of the "out" terminal
- Both paths include a backend request made via a HTTP Request Node however these are made by different nodes and therefore likely to going to different back end servers.
- The failing path encountered an exception and then some exception processing logic was executed in the nodes "ThrowAwayFailureDetails" and "Build blob"
- The exception handling logic calls the same HTTP Reply Node to send a response back to the user.
The diagram below shows the underlying message flow in the Integration Toolkit with the successful path highlighted in yellow and the failing path highlighted in red for illustrative purposes:
So based solely on the information from the Path Tracker we can conclude that:
- A small number of messages are following a different routing path, mostly likely based on message data
- On this problematic routing path a call is made to a remote back end server via HTTP that fails
- The message flow attempts to handle the failure and makes a reply back to the client
If we examine the node properties of the callAnotherBackend node we can see that in this case the error is caused by a non-existent back end, however this could easily be simulating an intermittent error with a remote system that we were able to uncover using only the Path Analysis feature:
We were also able to determine that the exception processing branch was making an HTTP Reply which is why there are no error messages output to the syslog (or stdout on Integration Servers).
In our simplified example the catch processing branch contains a compute node that explicitly discards the ExceptionList. In a real production system it is more common to see error handling logic which deals with the ExceptionList, often writing it to a customer implemented logging facility. Looking for nodes with names related to writing log or audit entries, often via MQ or to a database would give a clue as to where to find the exception list if the message flow developer had chosen to log it.
Similarly in this simplified example the message data is modified using ESQL to contain some binary characters:
SET OutputRoot.BLOB.BLOB = X'AAABACAAABAC';
In a real example the nodes between the catch terminal and the HTTP Reply could all be involved in transforming the output message.
By comparing what we discovered using the Path Tracker with what now know when we examined the source code of the message flow we can see that the hypothesis was correct, and more importantly we were able to reach this hypothesis in a matter of minutes using the Path Anlaysis tab and without having access to the message flow source code.
In addition if we did want to dig deeper we can use the message ID list for the failing path to correlate entries in either Activity Log or Trace. Lets look at the most recent message processed by the failing path:
We can filter the activity log using this message ID to get further details:
(base) davicrig:tmp$ grep activityLog.txt* -e "(0001A27A-66058585-000004B0)"
activityLog.txt:11501,I,"2024-03-28 14:58:13.681044",107130,"BIP11501I: Received data from input node 'BusinessPartner 4'.","MSGFLOW=demoFlow,NODE=BusinessPartner 4,NODETYPE=INPUT,",BusinessPartner 4,demoFlow,(0001A27A-66058585-000004B0),
activityLog.txt:13072,I,"2024-03-28 14:58:13.681708",107130,"BIP13072I: Invoking a Web Service.","MSGFLOW=demoFlow,NODE=callAnotherBackend,RM=WebService,",http://loclahost:7800/iDontExist,121,0,(0001A27A-66058585-000004B0),
activityLog.txt:13074,E,"2024-03-28 14:58:21.459285",107130,"BIP13074E: Web Service invocation was unsuccessful.","MSGFLOW=demoFlow,NODE=callAnotherBackend,RM=WebService,",http://loclahost:7800/iDontExist,BIP3152,7778,(0001A27A-66058585-000004B0),
activityLog.txt:13072,I,"2024-03-28 14:58:21.459816",107130,"BIP13072I: Invoking a Web Service.","MSGFLOW=demoFlow,NODE=callAnotherBackend,RM=WebService,",http://loclahost:7800/iDontExist,121,0,(0001A27A-66058585-000004B0),
activityLog.txt:13074,E,"2024-03-28 14:58:28.958923",107130,"BIP13074E: Web Service invocation was unsuccessful.","MSGFLOW=demoFlow,NODE=callAnotherBackend,RM=WebService,",http://loclahost:7800/iDontExist,BIP3152,7499,(0001A27A-66058585-000004B0),
activityLog.txt:11506,I,"2024-03-28 14:58:28.960894",107130,"BIP11506I: Committed a local transaction.","MSGFLOW=demoFlow,NODE=BusinessPartner 4,",(0001A27A-66058585-000004B0),
Here we can clearly see the unsuccessful call to http://loclahost:7800/iDontExist as well as a single inline retry associated with this message ID. This message ID is also logged in service trace as context information so if you need to engage IBM Support this message ID can be used to correlate service trace entries.
How it Works
To interpret the Path Analysis data correctly it can help to understand a little about how it works internally. When this feature is turned on as the message progresses through the flow it stores context information which is updated at each terminal transition in memory. For performance reasons this is stored on a per thread basis.
When the message flow completes this context information is converted into a completed path representation. At this point a per-flow cache is interrogated to see if this path has been seen before. If the path is new then the path will be added to the cache. At this point the message number is added to the message list for this path.
In order to ensure that the cache does not consume an unbounded amount of memory for cases where the path could be determined dynamically, or where there may be loops in the flow the size of the cache is limited to 20 entries per flow and for similar reasons the message list is limited to 100 message IDs per path.
If the size of the cache exceeds 20 elements and a new path is observed then the entry which was accessed least recently is evicted from the cache in order to make room for the new entry.
The message list stored the most recent 100 entries for each path, if a new message is added the oldest entry is evicted from the message list.
This has a few consequences for how the data should be interpreted. Firstly it is possible for a path to drop out of the cache and then be added back to the cache at a later time because that path hasn't been traversed for a long time. At this point it's message list will have been lost and message count reset, so the message counts should be viewed as a relative measure of path usage rather than an absolute number of messages processed.
Summary
In this article we have learned how to use the new Path Analysis feature in IBM App Connect Enterprise version 12.0.12.0. We have discussed how to enable and disable Path Analysis and looked at how it might be used by working through a simplified example. Finally we have discussed some of the implementation details so that we can properly understand how to interpret the data.