This article explains the steps to capture network traces for investigating network communication issues with an App Connect Enterprise flow deployed to CP4I or Openshift Container Platform (OCP).
You can capture the network traces using tcpdump utility at OCP node level or at an individual container level. We will look at both the methods.
Collecting a network trace from an OpenShift Container Platform node:
- Login to your OCP cluster using command line (oc login ) from the bastion host or Infra host.
- Get the list of nodes of your OCP cluster
# oc get nodes
NAME STATUS ROLES AGE VERSION
master0.test-cp4i-cluster.ibm.com Ready control-plane,master 194d v1.25.11+1485cc9
master1.test-cp4i-cluster.ibm.com Ready control-plane,master 194d v1.25.11+1485cc9
master2.test-cp4i-cluster.ibm.com Ready control-plane,master 194d v1.25.11+1485cc9
worker0.test-cp4i-cluster.ibm.com Ready worker 193d v1.25.11+1485cc9
worker1.test-cp4i-cluster.ibm.com Ready worker 193d v1.25.11+1485cc9
worker2.test-cp4i-cluster.ibm.com Ready worker 194d v1.25.11+1485cc9
· Get the name of the pods running in your namespace.
# oc get pods -n [namespace]
- If your requirement is to capture network trace at a node where your App Connect IS/IR pod is running, then you need to first identify the target worker node of the App Connect pod using the command below:
# oc get pod [IS/IR pod name] -n [namespace] -o jsonpath='{.spec.nodeName}’
It will produce the output as below :
worker0.test-cp4i-cluster.ibm.com
· To capture network activity on node worker0.test-cp4i-cluster.ibm.com, run following command:
# oc debug node/worker0.test-cp4i-cluster.ibm.com
Temporary namespace openshift-debug-znngt is created for debugging node...
Starting pod/worker0amar-cp4i-2023cpfyreibmcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.22.109.87
If you don't see a command prompt, try pressing enter.
· Set /host
as the root directory within the debug shell.
sh-4.4# chroot /host
· From within the chroot
environment console, obtain the node’s interface names:
sh-4.4# ip ad
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 22:22:0a:16:6d:57 brd ff:ff:ff:ff:ff:ff
inet 10.22.109.87/20 brd 10.22.111.255 scope global dynamic noprefixroute ens3
valid_lft 467sec preferred_lft 467sec
inet6 fe80::2022:aff:fe16:6d57/64 scope link noprefixroute
- Install the toolbox component on debug pod
sh-4.4# toolbox
Trying to pull registry.redhat.io/rhel8/support-tools:latest...
Getting image source signatures
Checking if image destination supports signatures
Copying blob bea2a0b08f4f done
Copying blob d47b62675169 done
Copying config a27e26d4d6 done
Writing manifest to image destination
Storing signatures
a27e26d4d65c432428a7c60f504206261d352ec88757a7502750d3688772e8ea
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
e3a5489eb67b49ed6a034a5d3703632e6ba5077b5da11713e7bd0d99e376318a
toolbox-root
Container started successfully. To exit, type 'exit'.
· Initiate a tcpdump
session on the cluster node and redirect output to a capture file. This example uses ens3
as the interface that we obtained in the previous step:
(You run the tcpdump
command interactively, so you can control the duration of the packet capture.)
[root@worker0 /]# tcpdump -nn -s 0 -i ens3 -w /host/var/tmp/my-cluster-node_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap
dropped privs to tcpdump
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
· To capture matching service trace on ACE side, enable Service trace on the Integration Server/Integration Runtime where the application/messageflows that you want to troubleshoot are deployed.
· Process the workload through the message flow to reproduce the network issue.
· Once the problem has been reproduced, you can stop the network trace by issuing Ctrl+C on the debug pod prompt , which will then show output similar to below:
^C23965 packets captured
24116 packets received by filter
0 packets dropped by kernel
- The network trace will be stored at the location
/host/var/tmp/
in the debug pod.
- Exit from the debug pod
sh-4.4# exit
exit
Removing debug pod ...
Temporary namespace openshift-debug-znngt was removed.
· Copy the logs off the debug pod to your local system.(replace the names of files with the actual ones on your system/debug pod)
# oc debug node/worker0.test-cp4i-cluster.ibm.com -- bash -c 'cat /host/var/tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap' > /tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap
Temporary namespace openshift-debug-6ck2r is created for debugging node...
Starting pod/worker0test-cp4i-clusteribmcom-debug ...
To use host binaries, run `chroot /host`
Removing debug pod ...
Temporary namespace openshift-debug-6ck2r was removed.
· Stop the service trace on IS/IR
· Now you can open the network trace in tools like Wireshark to investigate the issue.
In this example, the message flow is listening on endpoint /Flow1 on port 7800. You can verify that the connection has been established on port 7800 you can run 'netstat -an' command by login into ACE container terminal.
$ netstat -an |grep 7800
tcp 0 0 0.0.0.0:7800 0.0.0.0:* LISTEN
tcp 0 0 10.254.12.77:7800 10.254.16.2:53612 ESTABLISHED
Collecting a network trace from a specific App Connect Pod/Container
In this section, we look at the procedure to capture network trace against a specific ACE Integration Runtime/Server pod.
· Login to your OCP cluster using command line (oc login ) from the bastion host or Infra host.
· Get the list of nodes of your OCP cluster
#
oc get nodes
· Find the worker node the pod that you want to debug is running on
$ oc get pod <pod name> -n namespace -o jsonpath='{.spec.nodeName}’
worker0.test-cp4i-cluster.ibm.com
· Remote login into that worker node using ‘oc debug’
# oc debug node/worker0.test-cp4i-cluster.ibm.com
Creating debug namespace/openshift-debug-node-zpbbm...
Starting pod/worker0test-cp4i-clusteribmcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.169.235.132
If you don't see a command prompt, try pressing enter.
sh-5.1#
· Confirm that the pod you want to debug is available by running following command
sh-4.4# chroot /host crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
ba6f3a8e4fb26 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:33607dc738b2496b67ff5c25abb04bff4a9a1d805d11907804cb820c7f88c27e 6 minutes ago Running container-00 0 ca5f5677403a0 worker0test-cp4i-clusteribmcom-debug
d2e548c6275fd b4b91515cd98963de9baaf010f57eaccda1c1f02913b39a0abd3388cede0af57 4 days ago Running is-01-quickstart2 2 69d64934352ae is-01-quickstart2-is-7949d78764-wht8q
e5159b911c401 icr.io/cpopen/appconnect-operator-catalog@sha256:8061f9df021abfaa34ff9dbc184bbd5504d28145b4188c306689f8bdcefc3826 4 days ago Running registry-server 0 d95af9a94333a appconnect-operator-catalogsource-x9dbg
· Determine the container’s process ID. For that, first you require container ID. You can obtain container id using following. (You can run this command on a separate terminal to obtain containers process ID.)
# oc describe pod is-01-quickstart2-is-7949d78764-wht8q
Status: Running
IP: 10.254.21.184
IPs:
IP: 10.254.21.184
Controlled By: ReplicaSet/is-01-quickstart2-is-7949d78764
Containers:
is-01-quickstart2:
Container ID: cri-o://d2e548c6275fd6787cb49a9162a1c8b737401dac5ac66c5c1237714ecd159372
Check for the 'Container ID' field and copy the hex field. In the above example it is: d2e548c6275fd6787cb49a9162a1c8b737401dac5ac66c5c1237714ecd159372
· Now determine containers PID
sh-4.4# chroot /host crictl inspect --output yaml d2e548c6275fd6787cb49a9162a1c8b737401dac5ac66c5c1237714ecd159372 | grep 'pid' | awk '{print $2}'
2491468
(The PID obtained here is 2491468).
· From within the debug pod, initiate a tcpdump
session on the container and redirect output to a capture file. This example uses 2491468
as the container’s process ID and any
as the interface name. The nsenter
command enters the namespace of a target process and runs a command in its namespace. because the target process in this example is a container’s process ID, the tcpdump
command is run in the container’s namespace from the host:
sh-4.4# nsenter -n -t 2491468 -- tcpdump -nn -i any -w /host/var/tmp/my-cluster-node-my-container_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap
dropped privs to tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
^C117 packets captured
133 packets received by filter
0 packets dropped by kernel
(use Ctrl+C to stop the tcpdump packet capture).
-
- The network trace will be stored at the location
/host/var/tmp/
in the debug pod.
- Exit from the debug pod by typing 'exit' command on debug pod terminal.
· Copy the logs off the debug pod to your local system using following command:
(replace my-cluster-node_18_09_2023-04_16_51-UTC.pcap
with the actual file name that got generated on your system under /host/var/tmp)
# oc debug node/worker0.test-cp4i-cluster.ibm.com -- bash -c 'cat /host/var/tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap' > /tmp/my-cluster-node_18_09_2023-04_16_51-UTC.pcap
Temporary namespace openshift-debug-6ck2r is created for debugging node...
Starting pod/worker0test-cp4i-clusteribmcom-debug ...
To use host binaries, run `chroot /host`
Removing debug pod ...
Temporary namespace openshift-debug-6ck2r was removed.
· Now you can Open the network trace in tools like Wireshark to investigate the issue.
In this example, the message flow is listening for incoming traffic on endpoint /Flow1 on port 7800. You can verify that the connection has been established on port 7800 you can run 'netstat -an' command by login into container terminal.