Hi Folks,
In this blog post, I am showing how to use advanced debugging techniques for OpenShift Container Platform on Power using bpftrace and lsof. This blog post unlocks the steps to debug complicated problems and you can follow these steps to debug the problems in your application or your cluster.
The steps show you how to create an image to debug, launch the image as part of a Pod, use bpftrace on the command line, use bpftrace as a script, and use lsof to inspect an application or cluster.
Let me show you how:
1. To create an image
In this image, you need to include the tools to debug your cluster or application, such as:
1. bpftrace is a dynamic tracing tool for Linux that allows users to trace kernel and user space events using BPF. bpf enables users to write scripts to trace system events such as function calls, system calls, kernel calls, kprobes, tracepoints, and uprobes. You can see almost everything happening on the system.
2. lsof is a command-line utility for listing open files, including regular files, directories, pipes, sockets, and more. It allows you to see which processes are the culprit of a problem.
To create your debug tools image:
1. Define a new image in a file Dockerfile, which matches the kernel level of the Red Hat CoreOS system you are inspecting. For instance, for Red Hat OpenShift Container Platform 4.12, you select stream8 as it aligns with the RHCOS 4.12. For 4.13, you would select stream9.
FROM quay.io/centos/centos:stream8
RUN dnf install bpftrace lsof -y
2. Build the image using below command:
$ podman build -t quay.ocp-power.xyz/powercloud/debug-tools .
3. After you login to the site, push the image to your external registry.
$ podman push quay.ocp-power.xyz/powercloud/debug-tools
Once the image is ready, you can use the image in your Pod and take advantage of the tools.
Tip: if you need to push to your internal registry on OpenShift Container Platform, the documentation covers it in Exposing the Registry.
Tip: quay.ocp-power.xyz/powercloud/debug-tools is a dummy site and contains no actual image, you’ll need to update it to match the location where you have deployed your image.
2. To launch the image as part of a Pod
By default the OpenShift Container Platform locks down the access of tools to a cgroup or namespace so the Pod has limited visibility into the entire platform. For bpftrace and lsof to maximize the inspection of the running platform, the Pod must define elevated privileges and point to Host’s /sys and define the host permissions.
kind: Pod
apiVersion: v1
metadata:
name: debug-tools
namespace: example-ns à 1
spec:
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
containers:
- name: diagnostic
image: quay.ocp-power.xyz/powercloud/debug-tools à 2
imagePullPolicy: IfNotPresent
command: [ "sh", "-c", "sleep 1h" ]
resources:
requests:
cpu: 1000m
memory: 2048Mi
volumeMounts: à 3
- name: host-sys
mountPath: /sys
terminationMessagePath: /dev/termination-log
securityContext: à 4
privileged: true
seccompProfile:
type: RuntimeDefault
capabilities:
add:
- CAP_SYS_ADMIN
- CAP_FOWNER
- NET_ADMIN
- SYS_ADMIN
drop:
- ALL
runAsUser: 0
runAsGroup: 0
runAsNonRoot: false
readOnlyRootFilesystem: false
allowPrivilegeEscalation: true
volumes: à 5
- name: host-sys
hostPath:
path: /sys
type: Directory
nodeName: ip-*-*-*-*.**** à 6
priorityClassName: system-cluster-critical
hostPID: true à 7
hostIPC: true à 8
hostNetwork: true à 9
Details about highlighted points from the yaml file.
- Use the project name which you want to use.
- Image which we have built for debug pod, and this image is not maintained
- Volume which will get mounted on the pod.
- Here we have defined privilege to the superuser.
- Volume which got created on the host and path is /sys which required for bpftrace.
- We are mentioning master node that we want to run it on master instead of a worker node.
- hostPID is true means Pods will be denied access to the host's process namespace
- hostIPC is true means Pods will be denied access to the host's interprocess communication (IPC) namespace
- hostNetwork is true means Pods will be allowed to access the host's network namespace
To deploy your debug tools container:
1. Login to OpenShift with kube-admin level privileges.
$ oc login
2. Check your kube-admin privileges
$ oc --kubeconfig=./openstack-upi/auth/kubeconfig auth can-i create pod -A
yes
3. If you haven’t created your OpenShift project yet, use oc new-project example-ns
4. Create the Pod Definition using the working sample at link
5. Apply the Pod
$ oc apply -f debug-pod.yml
6. Check the status of the Pod
$ oc --kubeconfig=./openstack-upi/auth/kubeconfig get pods
NAME READY STATUS RESTARTS AGE
debug-tools 1/1 Running 0 15s
You have created a working debug tools Pod and deployed it to the OpenShift Container Platform.
Tip: If you didn’t want to target a specific node, you could create a DaemonSet which launches a single Pod on each Linux Host/Node.
Tip: Some scripts and command line expressions take a lot of memory; you may need to increase your Pod’s memory limits.
3. To use bpftrace on the command line
The bpftrace utility takes input on the command line and enables a real-time trace.
1. Connect to your debug-tools Pod
$ oc rsh debug-tools
2. Run your script
# bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
Attaching 1 probe...
runc /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
runc:[2:INIT] /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
runc /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
runc /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
runc:[2:INIT] /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
runc:[2:INIT] /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
grpc_health_pro
grpc_health_pro
3. This run in a loop until you CTRL+C to exit the trace.
You’ve seen how to run a bptrace expression on the command-line.
Tip: Brendan Gregg has a nice cheat-sheet for building your expression Link
Tip: The bpftrace repository has more examples for you to base your expression on link.
4. To use bpftrace scripts
The bpftrace utility takes file input and enables a real-time trace. For instance, you can use the opensnoop.bt script to trace file access.
To use the opensnoop.bt script:
1. Connect to your debug-tools Pod
$ oc rsh debug-tools
2. Copy the script. Note, you can bake your scripts into your debug tools Image.
$ curl -O -L https://raw.githubusercontent.com/iovisor/bpftrace/master/tools/opensnoop.bt
3. Make the script executable
$ chmod +x opensnoop.bt
4, Run the script and filter on static-pod to see the
# ./opensnoop.bt | grep -v kube
2098864 grpc_health_pro 3 0 /usr/bin/grpc_health_probe
2098832 conmon 9 0 /etc/localtime
2098871 runc:[2:INIT] 5 0 /proc/sys/kernel/cap_last_cap
3186541 crio 129 0 /tmp/crio-log-8cee46f55e41ce6990b121a4756bb6dfd4b16aa4249f69e50
2098871 runc:[2:INIT] 5 0 /proc/self/attr/keycreate
2098871 runc:[2:INIT] 5 0 /proc/thread-self/attr/exec
2098871 runc:[2:INIT] 5 0 /proc/self/fd
2098871 runc:[2:INIT] 5 0 /proc/self/status
2098871 runc:[2:INIT] 5 0 /etc/passwd
2098871 runc:[2:INIT] 9 0 /etc/group
2098871 runc:[2:INIT] 5 0 /etc/group
2098871 runc:[2:INIT] 5 0 /proc/self/setgroups
2098848 runc 3 0 /tmp/.pidfile3346035309
You can see you get really advanced details on what’s happening on your node and in your cluster. You can filter with grep -v to exclude and grep to include/select data you need.
Tip: You may need to install the linux-headers in your Image. I was able to see fips calls in my host/Node which confirmed my code was calling FIPS checks correctly.
$ bpftrace -I /usr/src/kernels/4.18.0-448.el8.ppc64le/include -e 'kprobe:vfs_open { printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); }' | grep fips
open path: fips
open path: fips
open path: fips_enabled
open path: fips_enabled
open path: fips_enabled
Note, the OpenShift Container Platform on IBM Power Systems supports FIPS mode configuration.
5 . To use lsof
lsof adds an extra tool you can use to debug an application, Node, or cluster.
Find the process numbers of the application you are debugging (for instance, etcd):
- Find the process numbers of the application you are debugging (for instance, etcd):
$ ps -ef | grep 'etcd '
root 36437 36425 6 07:14 ? 00:02:17 etcd --logger=zap --log-level=info --experimental-initial-corrupt-check=true --initial-advertise-peer-urls=https://HIDDEN:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-HIDDEN.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-ip-HIDDEN.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-ip-HIDDEN.crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-ip-HIDDEN.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://HIDDEN:2379 --listen-client-urls=https://HIDDEN:2379,unixs://HIDDEN:0 --listen-peer-urls=https:// HIDDEN:2380 --metrics=extensive --listen-metrics-urls=https://HIDDEN:9978
root 77208 74533 0 07:51 pts/0 00:00:00 grep etcd
- Run lsof command with process id to get the libraries related to etcd:
sh-4.4# lsof +p 36425
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
conmon 36425 root cwd DIR 0,24 180 519623 /run/containers/storage/overlay-containers/700120d21f123f5136717bf947080c79707d23d600824547c67f583c9d7d2d18/userdata
conmon 36425 root rtd DIR 259,4 253 157286970 /
conmon 36425 root txt REG 259,4 160056 2941029 /usr/bin/conmon
conmon 36425 root mem REG 59,4 2835103 /usr/lib64/libpcre2-8.so.0.7.1 (path dev=0,853, inode=187024850)
conmon 36425 root mem REG 59,4 2645049 /usr/lib64/libffi.so.6.0.2 (path dev=0,853, inode=187024700)
conmon 36425 root mem REG 259,4 2595716 /usr/lib64/libgpg-error.so.0.24.2 (path dev=0,853, inode=187024726)
As you can see, each file or network call is shown with lsof.
Summary
In this post, you’ve learned how to create an image to debug, launch the image as part of a Pod, use bpftrace and lsof to inspect an application or cluster on OpenShift Container Platform on IBM Power Systems.
Thanks for reading! I hope you found this helpful :)
Acknowledgements
Thanks to Paul Bastide for his support.