App Connect

 View Only

Troubleshooting network issues with App Connect Enterprise running in CP4I

By AMAR SHAH posted Tue September 26, 2023 04:53 AM

  

This article explains the steps to capture network traces for investigating network communication issues with  an App Connect Enterprise flow deployed to CP4I or Openshift Container Platform (OCP).

You can capture the network traces using tcpdump utility at OCP node level or at an individual container level.  We will look at both the methods.

Collecting a network trace from an OpenShift Container Platform node:

  • Login to your OCP cluster using command line (oc login ) from the bastion host or Infra host.
  • Get the list of nodes of your OCP cluster

# oc get nodes

    NAME                            STATUS   ROLES                    AGE    VERSION

master0.test-cp4i-cluster.ibm.com   Ready    control-plane,master   194d   v1.25.11+1485cc9
master1.test-cp4i-cluster.ibm.com   Ready    control-plane,master   194d   v1.25.11+1485cc9
master2.test-cp4i-cluster.ibm.com   Ready    control-plane,master   194d   v1.25.11+1485cc9
worker0.test-cp4i-cluster.ibm.com   Ready    worker                 193d   v1.25.11+1485cc9
worker1.test-cp4i-cluster.ibm.com   Ready    worker                 193d   v1.25.11+1485cc9
worker2.test-cp4i-cluster.ibm.com   Ready    worker                 194d   v1.25.11+1485cc9

· Get the name of the pods running in your namespace.

# oc get pods -n [namespace]

  • If your requirement is to capture network trace at a node where your App Connect IS/IR pod is running, then you need to first identify the target worker node of the App Connect pod using the command below:

# oc get pod [IS/IR pod name] -n [namespace] -o jsonpath='{.spec.nodeName}’

It will produce the output as below :

worker0.test-cp4i-cluster.ibm.com

·   To capture network activity on node worker0.test-cp4i-cluster.ibm.com, run following command:

# oc debug node/worker0.test-cp4i-cluster.ibm.com

Temporary namespace openshift-debug-znngt is created for debugging node...
Starting pod/worker0amar-cp4i-2023cpfyreibmcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.22.109.87
If you don't see a command prompt, try pressing enter.

·        Set /host as the root directory within the debug shell.

sh-4.4# chroot /host

·        From within the chroot environment console, obtain the node’s interface names:

sh-4.4# ip ad

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000

    link/ether 22:22:0a:16:6d:57 brd ff:ff:ff:ff:ff:ff
    inet 10.22.109.87/20 brd 10.22.111.255 scope global dynamic noprefixroute ens3
       valid_lft 467sec preferred_lft 467sec
    inet6 fe80::2022:aff:fe16:6d57/64 scope link noprefixroute

  •  Install the toolbox component on debug pod

sh-4.4# toolbox

Trying to pull registry.redhat.io/rhel8/support-tools:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob bea2a0b08f4f done
Copying blob d47b62675169 done
Copying config a27e26d4d6 done
Writing manifest to image destination
Storing signatures
a27e26d4d65c432428a7c60f504206261d352ec88757a7502750d3688772e8ea
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
e3a5489eb67b49ed6a034a5d3703632e6ba5077b5da11713e7bd0d99e376318a
toolbox-root
Container started successfully. To exit, type 'exit'.

·        Initiate a tcpdump session on the cluster node and redirect output to a capture file. This example uses ens3 as the interface that we obtained in the previous step:

(You run the tcpdump command interactively, so you can control the duration of the packet capture.)

[root@worker0 /]# tcpdump -nn -s 0 -i ens3 -w /host/var/tmp/my-cluster-node_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap

dropped privs to tcpdump
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes

·        To capture matching service trace on ACE side,  enable Service trace on the Integration Server/Integration Runtime where the application/messageflows that you want to troubleshoot are deployed.

·        Process the workload through the message flow to reproduce the network issue.

·        Once the problem has been reproduced, you can stop the network trace by issuing Ctrl+C on the debug pod prompt ,  which will then show output similar to below:

^C23965 packets captured
24116 packets received by filter
0 packets dropped by kernel

  •      The network trace will be stored at the location /host/var/tmp/ in the debug pod.
  •      Exit from the debug pod
sh-4.4# exit
exit

Removing debug pod ...
Temporary namespace openshift-debug-znngt was removed.

   ·        Copy the logs off the debug pod to your local system.(replace the names of files with the actual ones on your system/debug pod)

# oc debug node/worker0.test-cp4i-cluster.ibm.com -- bash -c 'cat /host/var/tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap' > /tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap

Temporary namespace openshift-debug-6ck2r is created for debugging node...
Starting pod/worker0test-cp4i-clusteribmcom-debug ...
To use host binaries, run `chroot /host`
Removing debug pod ...
Temporary namespace openshift-debug-6ck2r was removed.

·        Stop the service trace on IS/IR

·        Now you can open the network trace in tools like Wireshark to investigate the issue.

Viewing Network trace in Wireshark tool

          

            In this example, the message flow is listening on endpoint /Flow1  on port 7800.  You can verify that the connection has been established on port 7800 you can run 'netstat -an' command by login into ACE container terminal.

        $ netstat -an |grep 7800

tcp        0      0 0.0.0.0:7800            0.0.0.0:*               LISTEN     
tcp        0      0 10.254.12.77:7800       10.254.16.2:53612       ESTABLISHED

           Collecting a network trace from a specific App Connect Pod/Container

In this section, we look at the procedure to capture network trace against a specific ACE Integration Runtime/Server pod.

·        Login to your OCP cluster using command line (oc login ) from the bastion host or Infra host.

·        Get the list of nodes of your OCP cluster

oc get nodes

·       Find the worker node the pod that you want to debug is running on

$ oc get pod <pod name> -n namespace -o jsonpath='{.spec.nodeName}’

                          worker0.test-cp4i-cluster.ibm.com

·        Remote login into that worker node  using  ‘oc debug’

 

# oc debug node/worker0.test-cp4i-cluster.ibm.com

Creating debug namespace/openshift-debug-node-zpbbm...
Starting pod/worker0test-cp4i-clusteribmcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.169.235.132
If you don't see a command prompt, try pressing enter.

sh-5.1#

·        Confirm that the pod you want to debug is available by running following command

sh-4.4# chroot /host crictl ps

CONTAINER           IMAGE                                                                                                                    CREATED             STATE               NAME                                 ATTEMPT             POD ID              POD

ba6f3a8e4fb26       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:33607dc738b2496b67ff5c25abb04bff4a9a1d805d11907804cb820c7f88c27e   6 minutes ago       Running             container-00                         0                   ca5f5677403a0       worker0test-cp4i-clusteribmcom-debug

d2e548c6275fd       b4b91515cd98963de9baaf010f57eaccda1c1f02913b39a0abd3388cede0af57                                                         4 days ago          Running             is-01-quickstart2                    2                   69d64934352ae       is-01-quickstart2-is-7949d78764-wht8q

e5159b911c401       icr.io/cpopen/appconnect-operator-catalog@sha256:8061f9df021abfaa34ff9dbc184bbd5504d28145b4188c306689f8bdcefc3826        4 days ago          Running             registry-server                      0                   d95af9a94333a       appconnect-operator-catalogsource-x9dbg

·        Determine the container’s process ID. For that, first you require container ID. You can obtain container id using following.  (You can run this command on a separate terminal to obtain containers process ID.)

# oc describe pod is-01-quickstart2-is-7949d78764-wht8q

Status:           Running
IP:               10.254.21.184
IPs:
  IP:           10.254.21.184
Controlled By:  ReplicaSet/is-01-quickstart2-is-7949d78764
Containers:
  is-01-quickstart2:
  Container ID:   cri-o://d2e548c6275fd6787cb49a9162a1c8b737401dac5ac66c5c1237714ecd159372

                       Check for the 'Container ID' field and copy the hex field. In the above example it is: d2e548c6275fd6787cb49a9162a1c8b737401dac5ac66c5c1237714ecd159372

·        Now determine containers PID

sh-4.4# chroot /host crictl inspect --output yaml d2e548c6275fd6787cb49a9162a1c8b737401dac5ac66c5c1237714ecd159372 | grep 'pid' | awk '{print $2}'

2491468

                          (The PID obtained here is 2491468).

·       From within the debug pod, initiate a tcpdump session on the container and redirect output to a capture file. This example uses 2491468 as the container’s process ID and any as the interface name. The nsenter command enters the namespace of a target process and runs a command in its namespace. because the target process in this example is a container’s process ID, the tcpdump command is run in the container’s namespace from the host:

sh-4.4# nsenter -n -t 2491468 -- tcpdump -nn -i any -w /host/var/tmp/my-cluster-node-my-container_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap

dropped privs to tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
^C117 packets captured
133 packets received by filter
0 packets dropped by kernel
(use Ctrl+C to stop the tcpdump packet capture).

    •  The network trace will be stored at the location /host/var/tmp/ in the debug pod.
    •      Exit from the debug pod by typing 'exit' command on debug pod terminal.

   ·        Copy the logs off the debug pod to your local system using following command:

            (replace my-cluster-node_18_09_2023-04_16_51-UTC.pcap with the actual file name that got generated on your system under /host/var/tmp)

# oc debug node/worker0.test-cp4i-cluster.ibm.com -- bash -c 'cat /host/var/tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap' > /tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap

Temporary namespace openshift-debug-6ck2r is created for debugging node...
Starting pod/worker0test-cp4i-clusteribmcom-debug ...
To use host binaries, run `chroot /host`
Removing debug pod ...
Temporary namespace openshift-debug-6ck2r was removed.

 

·        Now you can Open the network trace in tools like Wireshark to investigate the issue.

In this example,  the message flow is listening for incoming traffic on endpoint /Flow1 on port 7800. You can verify that the connection has been established on port 7800 you can run 'netstat -an' command by login into container terminal.

sh-4.4$ netstat -an |grep 7800
tcp        0      0 0.0.0.0:7800            0.0.0.0:*               LISTEN     
tcp        0      0 10.254.12.77:7800       10.254.16.2:53612       ESTABLISHED
   Wireshart trace for ACE POD

0 comments
38 views

Permalink