z/TPF - Group home

Improve insights dashboard starter kit (PJ46295)

  
z/TPF real-time runtime metrics collection (PJ45657) provides information about the state of the z/TPF system and the resources used by message processing and applications. The z/TPF real-time insights dashboard starter kit provides a sample analytics pipeline that enables you to view graphical representations of metrics, perform statistical analysis, perform machine learning, and more.

Improve insights dashboard starter kit (PJ46295 and z/TPF tools download) enhances the online calculation of the skip factor thereby improving the accuracy of the modeled_cpu metric calculated offline. Other improvements have been made to the starter kit including simplifying the administration of dual server modes, simplifying installation of the tpfrtmc offline utility, improving the metrics shown on the statistics dashboards, providing a more intuitive folder layout and more.

The following is an example scenario of using real-time runtime metrics collection with the z/TPF real-time insights dashboard starter kit. The ZRTMC CPU Utilization Message Type Analysis sample dashboard is shown. Note that in the following example all message traffic in the z/TPF system sets the name-value pair MsgType to indicate what type of message was processed by the system.
Sample dashboard provides a graph panel and two correlation analysis panels

In the upper panel, we have a graph of CPU usage overage time. Notice that the actual_cpu (yellow line) is rising over time. The actual_cpu is the same value that you would see for system CPU utilization in Continuous Data Collection (CDC) and the ZSTAT U command displays. Notice the modeled_cpu (green line) rises similar to the actual_cpu. The modeled_cpu indicates what the actual_cpu is expected to be at a point in time based upon name-value pair sample data collected. PJ46295 improves the accuracy of the modeled_cpu calculation for several different scenarios. Since the modeled_cpu is similar to the actual_cpu, this indicates that we may have name-value pair data that helps explain why the change in actual_cpu is occurring.

In the lower two panels, we have the results of two different statistical analysis. Both of the lower panels perform a calculation of correlation coefficients for left and right time frames for the graph in the upper panel for the name-value pair MsgType (Coef-Lt and Coef-Rt respectively). Notice that both lower panels show a red “Variation” indicator when the data does not have enough variation to calculate a meaningful correlation. This is an enhancement included in the z/TPF real-time insights dashboard starter kit download as part of this effort. Notice in the lower left panel the Booking message type is highlighted to indicate that on the right side of the graph in the upper panel the increase in the number of Booking messages received is highly correlated with the rise in actual_cpu. Notice in the lower right panel the Booking message type does not look remarkable indicating that on the right side of the graph in the upper panel the average amount of CPU used by the Booking messages is unchanged from the left side of the graph.

Given the insights discussed above, it appears that the rise in actual_CPU on the system can be attributed to a rise in the traffic rate of Booking messages as opposed to Booking messages using more CPU over time. If this rise in actual_CPU is problematic, these insights provide a possible lead in your investigation.

For more information about APAR PJ46295 and the z/TPF real-time insights dashboard starter kit, see the APEDIT, Tools for z/TPF 1.1 & z/TPFDF 1.1 download page and IBM Knowledge Center.