WebSphere Application Server & Liberty

 View Only

Lessons from the field #3: OpenShift Live Container Debugging

By Kevin Grigorenko posted Tue March 02, 2021 03:10 PM

There are many ways to diagnose issues in containers running on the OpenShift Container Platform. Some of the main approaches are:

There are also two advanced techniques:

  • Diagnostic sidecar containers: modify the Deployment configuration with an additional diagnostic container in the pod that will share the same cgroup (i.e. resources). When the new deployment is created, you can remote into the diagnostic container in that pod to gather diagnostics since you can see the processes and other resources in the other containers.
  • kubectl/oc debug: Create a copy of the pod with or without modifications for further investigation. This is similar in some ways to the diagnostic sidecar container although a bit easier to use.
But what if you want to run tools or diagnostics on a live pod without restarting it, and all the above approaches are insufficient? This becomes more pressing in a container world where images are often extremely minimal due to security or file size general practices and therefore containers often lack basic tools like top (top -H is a great way to investigate CPU usage by thread, for example):

% oc rsh myimage-7d57d6599f-5qq7z -n myproject
sh-4.4$ top -H
sh: top: command not found

There is work underway on Ephemeral Containers which are like diagnostic sidecar containers but the pod/deployment doesn't have to be restarted; however, as of this writing, this feature is in alpha and most customers gate (i.e. disable) alpha features. (Note that the example of using ephemeral containers uses the debug command with the --target flag which is different than making a copy of the pod without the --target flag).

If you have the authority, one alternative is to start a debug container on the node that's running the pod. Unlike virtual machines, the way that Linux containers work is that the processes in the container are simply processes on the node (with cgroups that isolate containers from each other). In other words, PID 1 in Container X is simply PID N in Node Y. For example, in the container above, it doesn't even have the ps command, but we can see that it's running a Java process as PID 1:

sh-4.4$ ps
sh: ps: command not found
sh-4.4$ readlink -f /proc/1/exe

Next, find which node the pod is running on:

$ oc get pod myimage-7d57d6599f-5qq7z -n myproject -o jsonpath='{.spec.nodeName}'

Remote into that node or go to the OpenShift web console under Node Details } Terminal and run ps:

$ oc debug node/ -t
Creating debug namespace/openshift-debug-node-zpbbm ...
Starting pod/10169235132-debug ...
To use host binaries, run `chroot /host`
Pod IP:
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.2# ps -elf | grep java
4 S 1000660+ 73138 73125 0 80 0 - 4270784 futex_ Feb15 ? 00:17:07 /opt/java/openjdk/jre/bin/java [...]
4 S 1000630+ 77182 77171 1 80 0 - 950102 futex_ 2020 ? 1-12:05:33 /opt/java/openjdk/jre/bin/java [...]
0 S root 81704 80212 0 80 0 - 2274 pipe_w 15:25 ? 00:00:00 grep java
4 S 1001 84890 84878 0 80 0 - 1003889 futex_ 2020 ? 01:16:51 java -classpath [...]
4 S 1001 100958 100636 5 80 0 - 3230065 futex_ 2020 ? 6-07:47:20 /opt/ibm/java/jre/bin/java [...]

We see our first problem: There are multiple "/opt/java/openjdk/jre/bin/java" processes running on that node and we don't know the PID in the node from the container (all we know is that it's PID 1 from inside the container, but we don't know the PID on the node). What we can do is use the runc command to list the containers by PID (we only search for certain PIDs to make this a bit more efficient):

sh-4.2# for pid in $(pgrep -f java); do runc list | grep $pid; done | awk '{print $1}'

For each of these containers, we can run the state command of runc until we find the right pod name (some output removed for clarity):

# runc state 76d7cbc64b8411fc04390c940fe14c797d4a996a00a56d1014312a7aa7b6d260
"pid": 73138,
"rootfs": "/var/data/criorootstorage/overlay/a83cefbb7952694e724af131870657c6b13043f9fc847b7c4757457224a308da/merged",
"io.kubernetes.pod.name": "myimage-7d57d6599f-5qq7z",

Success! Now we know that the PID of our target process in a particular container on the node itself is 73138 and we can run normal diagnostics on it:

sh-4.2# top -H -p 73138

The oc debug command allows you to specify a different image stream to start the debug container with. If the node doesn't have the built-in tool that you need, just build an image with that tool, push it to your registry, and start oc debug with that image. For example:

$ oc debug node/ -t --image=image-registry.openshift-image-registry.svc:5000/diagnostics/mydiagnosticimage -n diagnostics
sh-5.0# gcore 73138

One final note is that you can even access the filesystem of the container with the rootfs value of runc state. For example, if you need to attach with gdb, you need the process binary but what you see in the ps output (/opt/java/openjdk/jre/bin/java) is not where the actual binary is; it's in the container filesystem:

sh-4.2# ls -l /var/data/criorootstorage/overlay/a83cefbb7952694e724af131870657c6b13043f9fc847b7c4757457224a308da/merged/opt/java/openjdk/jre/bin/java
-rwxr-xr-x. 1 root root 8640 Jan 20 02:57 /var/data/criorootstorage/overlay/a83cefbb7952694e724af131870657c6b13043f9fc847b7c4757457224a308da/merged/opt/java/openjdk/jre/bin/java

With this approach, you can perform all of the normal debugging that you're used to from the node once you find the PID from runc. If a node doesn't come pre-installed with a tool that you need, you can build a custom diagnostic image with the tools and start it on the node. You can even install runc in your diagnostic container and run it as follows so that you don't have to chroot into the node to find the PID:

runc --root /host/run/runc list

See our team's previous post in the Lessons from the field series: Database timing on WAS traditional

1 comment



Wed March 17, 2021 04:59 PM

This came in handy today, thanks!