zPET Experiences with Z Digital Integration Hub (zDIH)

Back to Blog List

zPET Experiences with Z Digital Integration Hub (zDIH)

Like

z/OS Platform Evaluation and Test (zPET) runs customer-like workloads in a Parallel Sysplex environment to perform the final verification of new IBM Z hardware and software. For more information about zPET, please check out our community.

Introduction

What is zDIH?

What is zDIH? To quote the official documentation: “The IBM Digital Integration Hub (IBM zDIH) provides real-time consumable information flow at scale, protects production environments from unpredictable inquiry traffic with adaptability, and offers flexible interactions with API and event-based architectures. IBM zDIH is applicable to use cases that have systems of record on IBM z/OS”.

But what does this all mean? Without delving into too much detail, the zDIH solution is made up of several components, which provide the following capabilities:

Integrate with Systems of Record efficiently by using templates
Create zDIH applications automatically and quickly with a developer kit
Store the relevant information in memory with cache technology and present with standards-based interfaces

Because zDIH offers an in-memory data cache, primarily fed by logstreams, it in effect creates a cached copy of your data, which offers several benefits. Firstly, it allows for real-time information flow at scale; therefore relevant data currently being funneled into your existing systems of record can also be spread to zDIH as well. This means that folks like hybrid cloud application developers and business analysts can still access the data they need to without disturbing or even interacting with the existing systems of record. This brings intrinsic additional benefits such as faster development of hybrid cloud applications, self-service for cloud-based end users, and cost optimization through the separation of query processing from core transactions.

For more information on what zDIH is and what it can be used for, we highly recommend you read up on the following documents:

Additionally, you can refer to the diagram below to get a better sense of where zDIH fits in an environment utilizing a system of record. A full explanation of the numbered components can be found at here. For the purposes of this blog we will mainly discuss number 2, the zDIH in-memory caches, which are implemented in a JVM on z/OS that runs within a Unix System Services (USS) environment.

Why should you read this blog?

With all that said, you may still be asking yourself, “why do I need to read this blog?” For those who are new to the z/OS Platform Evaluation Test (zPET) blog, we are a team of testers and system programmers with decades of experience with IBM Z, and we often call ourselves “IBM’s first customer” as we get the opportunity to test both the latest and greatest in Z hardware and software as well as the newest IBM z/OS products such as zDIH.

When we first got our hands on zDIH, it was a bit of information overload. Not only was there new terminology to learn (nodes, clusters, members, etc.) but there were lots of configuration files, configuration settings, and the like to sort through and arrange appropriately. But don’t worry, since we have already gone through this process we hope that you can take some of the things we learned and apply it to your environment which will help you get setup with zDIH in a timely fashion.

General zDIH Setup

Foreword

Before we dive into some of the zPET-specific configurations we made for zDIH in our environment, we first want to point out some useful guides from zDIH itself. Below you will find links to the main zDIH documentation as well as the best practices guide and frequently asked questions (FAQs). Our experiences with zDIH helped to shape a lot of this documentation and we feel that it is the best place to look first when you run into any questions or concerns.

Additionally, in the rest of this blog we will assume you have a working knowledge of the product, meaning we won’t stop to define all the nomenclature associated with it. If we do mention something that confuses you, please refer to the official documentation to gain a better understanding of the concept. With that, let’s start going through some things we found were helpful to us in our environment.

General filesystem structure

In zPET we have a LOT of different tools and products installed in USS, so as you can imagine housekeeping can be quite a chore for us. For this reason, we spent some time to carefully create a USS directory structure which was both organized and intuitive. This is by no means something that you must do in your environment, but perhaps it can give you some ideas on how to proceed.

Our base directory is simply /zdih where we mount a relatively small ZFS filesystem so that nothing is stored in the root filesystem. Each of the major bullets below represents a sub directory of /zdih (for example /zdih/current, /zdih/levels, etc) and each nested bullet is a sub-directory of the bullet it’s nested under (for example, the config bullet represents our /zdih/instances/<instance name>/config directory):

current = a symlink to /levels/<current zDIH level>, this allows us to easily change a symlink when updating the product version or level without having to change any JCL or script that utilizes the symlink
global = holds globally applicable files, such as certificates, the checkpoint directory
- scripts = where we store some scripts that might be applicable to all of zDIH, such as those that come with zDIH product code which is originally shipped under /<install-directory>/bin
  - apps = where we store the scripts that load data into zDIH
- chkpt = the checkpoint directory where zDIH client application checkpoint files get created
instances = where we store information for all of our zDIH cluster members
- <instance name> = we have 1 sub-directory for each cluster member
  - config = where we store the zdih.xml and log4j2.properties configuration files for the member
  - logs = the directory the cluster member's logs will get written to (pointed to by its log4j2.properties file)
  - persistence = where a member's persistence information is written to (pointed to by its zdih.xml configuration file)
  - Note: because persistence can take up a considerable amount of space, we mount a unique ZFS filesystem on these directories when we have persistence enabled for a member/cluster
levels = where we install zDIH code. We could install every version of zDIH here, but in the past we've taken rhe approach of having "seta" and "setb" sub-directories, such that when you get new service you simply overwrite the older of the two service levels so you always have a backup available

Logs

There are two main logs that you will be concerned with when working with zDIH: the JCL joblogs and the USS logs. This should be obvious when you examine a cluster member’s log4j2.properties file and see that there are settings for both an appender.console.* log and an appender.rolling.* log. The former is the JCL log, the latter USS. In theory if you were to set the logging level of both logs to the same value (INFO for example) you would in effect get two duplicate logs. However, it is not advised to do so, as zDIH can be quite chatty, and an appender.console.filter.threshold.level of INFO would get you a very large JCL log very quickly. For this reason, we set appender.console.filter.threshold.level to WARN and reserve INFO for rootLogger.level.

This setup is what zDIH is shipped with by default in the example properties files, and you can read more about it here. Additionally, even more helpful tips about logging configuration changes can be found in the zDIH Best Practices guide.

Specific instance, config, and log directory for each member

As we stated previously, we created a directory called /zdih/instances which contains a unique instance subdirectory for each zDIH cluster member that we have defined. Within the instance subdirectory, we have additional config and logs directories which contain the configuration files and the logs, respectively, for the member. In theory, you could co-locate all of your configuration files to one directory and logs to another, such as /zdih/member-config and /zdih/member-logs, or do any other number of setups. However, we found this setup to make the most sense for us as it allowed us to simply duplicate a cluster’s files, directories, and JCL then modify the JCL to point to the new instance directory for some of its parameters. Again, this is merely a suggestion, and you can feel free to structure your directories however you wish.

zOS Workload Manager configuration

In zPET we often make use of z/OS Workload Manager (WLM) to help manage system resources for our running jobs, and zDIH is no exception. We won’t delve into exact details here (as such an explanation could be its own lengthy blog), so instead we recommend that you refer to zDIH’s published write-up on the subject available here.

Hints and Tips

DBeaver connection to zDIH

The zDIH Management Center that is shipped with zDIH is highly useful for cluster management and monitoring and has some built-in support for SQL query. However, folks such as administrators, developers, and analysts may already be familiar with other SQL clients such as DBeaver and prefer to use those for SQL queries instead. For this reason, we recommend that you install and try using DBeaver for interacting with your zDIH cache maps.

The zDIH team has put together some great documentation on installing DBeaver and establishing a connection to your zDIH installation, so we suggest that you peruse that documentation here.

ZOAU

During our testing of zDIH, one pain point we frequently ran into was the size of JCL joblogs, as well as the length of some of the lines included in them. The behavior that we saw was that when we tried to save a joblog via the XDC command, some extremely long output lines would get truncated (even when we defined a large block size for XDC). Additionally, sometimes XDC would encounter an ABEND as the job output was simply too large. To circumvent these problems, we turned to Z Open Automation Utilities (ZOAU). This allows you to use Python to programmatically write job output to a USS file for further investigation and does a good job of capturing even the longest of lines.

The zDIH team has already provided an excellent overview of using ZOAU in this fashion, so please read up on that section of the doc for more info.

Test.py

A typical zDIH deployment consists of clusters with more than 1 member, and even more than 1 cluster. Depending on your environment, keeping track of all this in a systematic way can be challenging. For example, we wanted to keep track of what cluster members were running on which system, which ones did (and did not) have persistence enabled, etc. To solve this issue, we wrote a simple python script to “crawl” the directories and pull relevant information for our clusters. Here is an example of that script:

import os
os.chdir('./instances')    # open the directory your instances are located in
cwd = os.getcwd()
dir_list = os.listdir(cwd) 
print(dir_list)            # print the contents of the working directory
dir_list.sort()            # sort the contents

# loop through each directory (instance) in the list
for each_dir in dir_list:
    # an example of directory names you may wish to skip as they are not DIH instances
    if "." in each_dir:
        print(f'skipping {each_dir}\n')
    # we use the keyword svr to detect that a directory is a proper DIH instance
    elif "svr" in each_dir:
        os.chdir(f'/zdih/instances/{each_dir}/config')    # go into each directory
        print(each_dir)
        # open the instance's zdih.xml file and read its contents
        with open('zdih.xml') as f:
            lines = f.readlines()
        # iterate over the contents of zdih.xml looking for specific xml tags holding the info we want to print
        for line in lines:
            if ("<cluster-name>" in line or
                "</port>" in line or
                "<interface>" in line or
                "<member>" in line or
                '<persistence enabled="true">' in line):
                line = line.strip()
                print(line)
        print('')

Now, a few things about this script:

This is by no means an efficient or robust script, it is very much a "quick and dirty" script that we wrote to get something working, and added to it as we went
This script makes several assumptions
- All member sub-directoreis are nested under a master "instances" directory
- Each member's name (and therefore the name of the sub-directory) contains the string "svr"
- All member config directories are structures in the same way, i.e. located in /zdih/instances/{each_dir}/config
There are several hardcoded values in the script, such as the os.chdir() statements, so obviously these would need to be updated in your version of the script

Once again, this is a very rudimentary example script that could be improved and augmented. However, it is quite useful to get a quick overview of what members are running where, so give it a try and see what additions and modifications you can make to it! Please note that this script is provided on an as-is basis only. Service support for this tool should not be assumed and none is implied.

"Do's and Don'ts"

Dos

Recycle the Management Center to alleviate issues
- During our testing we encountered some intermittent issues with the zDIH Management Center which we found could be alleviated by simply restarting it. Please note that because the zDIH Management Center and zDIH clusters are separate processes, restarting the Management Center had no ill effects on the cluster(s).
Add port in zdih.xml <member> element
- As described in this doc topic, zDIH cluster members can be added to the <member-list> element by adding the member’s IP address with or without the member’s port number
- We suggest always adding the port number (such as <member>host-name:port</member>). When this wasn’t done in our testing, it caused an increase in time for members to join a cluster via the default discovery process as in our testing we saw instances where not doing so caused the time it took for members to join a cluster via the default discovery process to increase
Note the order in which cluster members were taken down when persistence is enabled
- If for some reason you need to take down a zDIH cluster which has persistence enabled, it is very important to bring the nodes back up in last-out-first-in order (meaning the last node to go down should be the first node brought back up)
- Some ways in which we verified which was the last node to go down included:
  - Manual intervention i.e. ensuring that an operator manually cancelled/stopped the last surviving node and ensured that was the first node to be manually started later
  - Inspecting the JCL joblogs for timestamps indicating when the job ended
  - Inspecting the USS logs for timestamps indicating when the last time a certain node took an action before coming down
Enable PAGEFRAMESIZE64=(1M,4K,4K,4K,4K,4K,4K)
- zDIH provides an excellent explanation of why the default page frame size should be set as such here
- We are calling extra attention to this fact here because all of our testing was done using this product recommendation

Don'ts

Don’t delete maps from a persistence enabled cluster
- During our testing, a particular dataset we worked with included three maps which we persisted to a cluster. At some point, we got ourselves into a situation where we added an additional map as a quick test. Afterwards, we removed the additional map’s persistence entry in the member’s zdih.xml configuration, which resulted in a NullPointerException. In short, once persistence is enabled on a cluster for a given map, it needs to remain persisted or removed properly, not simply removed from the XML.
- In order to properly remove a cache map from the zDIH server, you must destroy the cache map. To do so, navigate to your Management Center > Scripting and run the following script:
- ```
function destroy_map(){
    hazelcast.getMap("map-to-destroy").destroy();
    return name + " => " + node;
}
destroy_map();
```
- Again, please note that this script is provided on an as-is basis only. Service support for this tool should not be assumed and none is implied
Don't comment out properties in log4j2.properties
- While this may seem like an obvious “don’t do this” type of thing, this had tripped us up several times during our testing hence our mentioning it here
- If you comment out properties in the log4j2.properties file for your zDIH member, you can expect to cause exceptions in your member’s startup JCL (such as a org.apache.logging.log4j.core.config.ConfigurationException) and your member will fail to start
- To ensure your zDIH member works properly make sure you follow the zDIH guidelines for customizing your logging setup which can be found here

Closing Remarks

In conclusion, we hope that you found this blog useful and insightful. We discussed a great deal of topics here including: a brief overview of what zDIH is, some specifics on our zDIH configuration in zPET, and helpful tips on what to do and what not to do with the product. As we continue to use the product more broadly, we hope to discover new use cases for it as well as refine our existing configuration. As we do so, we will be sure to post more blogs discussing our findings!

Resources & Helpful Links

Author

Trent Balta (Trent.Balta@ibm.com)

zPET - IBM Z and z/OS Platform Evaluation and Test - Group home