Containers, Kubernetes, OpenShift on Power

 View Only

Advanced debugging techniques for OpenShift Container Platform on Power

By Gaurav Bankar posted Tue April 04, 2023 09:10 AM

  

Hi Folks,

In this blog post, I am showing how to use advanced debugging techniques for OpenShift Container Platform on Power using bpftrace and lsof. This blog post unlocks the steps to debug complicated problems and you can follow these steps to debug the problems in your application or your cluster.

The steps show you how to create an image to debug, launch the image as part of a Pod, use bpftrace on the command line, use bpftrace as a script, and use lsof to inspect an application or cluster.

Let me show you how:

1.    To create an image

In this image, you need to include the tools to debug your cluster or application, such as:

1.     bpftrace is a dynamic tracing tool for Linux that allows users to trace kernel and user space events using BPF. bpf enables users to write scripts to trace system events such as function calls, system calls, kernel calls, kprobes, tracepoints, and uprobes. You can see almost everything happening on the system.

2.     lsof is a command-line utility for listing open files, including regular files, directories, pipes, sockets, and more. It allows you to see which processes are the culprit of a problem.


To create your debug tools image:

1.     Define a new image in a file Dockerfile, which matches the kernel level of the Red Hat CoreOS system you are inspecting. For instance, for Red Hat OpenShift Container Platform 4.12, you select stream8 as it aligns with the RHCOS 4.12.  For 4.13, you would select stream9. 

FROM quay.io/centos/centos:stream8
RUN dnf install bpftrace lsof -y

2.     Build the image using below command:

$ podman build -t quay.ocp-power.xyz/powercloud/debug-tools .

3.     After you login to the site, push the image to your external registry.

$ podman push quay.ocp-power.xyz/powercloud/debug-tools

Once the image is ready, you can use the image in your Pod and take advantage of the tools.
Tip: if you need to push to your internal registry on OpenShift Container Platform, the documentation covers it in Exposing the Registry.

Tip: quay.ocp-power.xyz/powercloud/debug-tools is a dummy site and contains no actual image, you’ll need to update it to match the location where you have deployed your image.


2.   
To launch the image as part of a Pod

By default the OpenShift Container Platform locks down the access of tools to a cgroup or namespace so the Pod has limited visibility into the entire platform. For bpftrace and lsof to maximize the inspection of the running platform, the Pod must define elevated privileges and point to Host’s /sys and define the host permissions.

kind: Pod
apiVersion: v1
metadata:
  name: debug-tools
  namespace: example-ns à 1
spec:
  nodeSelector:
    kubernetes.io/os: linux
  restartPolicy: Always
  containers:
    - name: diagnostic
      image: quay.ocp-power.xyz/powercloud/debug-tools à 2
      imagePullPolicy: IfNotPresent
      command: [ "sh", "-c", "sleep 1h" ]
      resources:
        requests:
          cpu: 1000m
          memory: 2048Mi
      volumeMounts: à 3
      - name: host-sys
        mountPath: /sys
      terminationMessagePath: /dev/termination-log
      securityContext: à 4
        privileged: true
        seccompProfile:
          type: RuntimeDefault
        capabilities:
          add:
            - CAP_SYS_ADMIN
            - CAP_FOWNER
            - NET_ADMIN
            - SYS_ADMIN
          drop:
            - ALL
        runAsUser: 0
        runAsGroup: 0
        runAsNonRoot: false
        readOnlyRootFilesystem: false
        allowPrivilegeEscalation: true
  volumes:   à 5
  - name: host-sys
    hostPath:
      path: /sys
      type: Directory
  nodeName: ip-*-*-*-*.**** à 6
  priorityClassName: system-cluster-critical
  hostPID: true à 7
  hostIPC: true à 8
  hostNetwork: true à
9


Details about highlighted points from the yaml file.

  1. Use the project name which you want to use.
  2. Image which we have built for debug pod, and this image is not maintained
  3. Volume which will get mounted on the pod.
  4. Here we have defined privilege to the superuser.
  5. Volume which got created on the host and path is /sys which required for bpftrace.
  6. We are mentioning master node that we want to run it on master instead of a worker node.
  7. hostPID is true means Pods will be denied access to the host's process namespace
  8. hostIPC is true means Pods will be denied access to the host's interprocess communication (IPC) namespace
  9. hostNetwork is true means Pods will be allowed to access the host's network namespace



    To deploy your debug tools container:

    1.     Login to OpenShift with kube-admin level privileges.

    $ oc login

    2.     Check your kube-admin privileges

    $ oc --kubeconfig=./openstack-upi/auth/kubeconfig auth can-i create pod -A

    yes

    3.     If you haven’t created your OpenShift project yet, use oc new-project example-ns

    4.     Create the Pod Definition using the working sample at link

    5.     Apply the Pod

    $ oc apply -f debug-pod.yml

    6.     Check the status of the Pod

    $ oc --kubeconfig=./openstack-upi/auth/kubeconfig get pods
    NAME        READY   STATUS      RESTARTS  AGE
    debug-tools 1/1     Running     0         15s

    You have created a working debug tools Pod and deployed it to the OpenShift Container Platform.

    Tip: If you didn’t want to target a specific node, you could create a DaemonSet which launches a single Pod on each Linux Host/Node.

    Tip: Some scripts and command line expressions take a lot of memory; you may need to increase your Pod’s memory limits.



    3.    To use bpftrace on the command line

    The bpftrace utility takes input on the command line and enables a real-time trace. 

    1.     Connect to your debug-tools Pod

    $ oc rsh debug-tools

    2.     Run your script

    # bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
    Attaching 1 probe...
    runc /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
    runc:[2:INIT] /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
    runc /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
    runc /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
    runc:[2:INIT] /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
    runc:[2:INIT] /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
    grpc_health_pro
    grpc_health_pro

    3.     This run in a loop until you CTRL+C to exit the trace.

    You’ve seen how to run a bptrace expression on the command-line.

    Tip:  Brendan Gregg has a nice cheat-sheet for building your expression Link

    Tip: The bpftrace repository has more examples for you to base your expression on link.

4.    To use bpftrace scripts

The bpftrace utility takes file input and enables a real-time trace. For instance, you can use the opensnoop.bt script to trace file access.

To use the opensnoop.bt script:

1.     Connect to your debug-tools Pod

$ oc rsh debug-tools

2.     Copy the script. Note, you can bake your scripts into your debug tools Image.

$ curl -O -L https://raw.githubusercontent.com/iovisor/bpftrace/master/tools/opensnoop.bt 

3.     Make the script executable

$ chmod +x opensnoop.bt 

4, Run the script and filter on static-pod to see the

# ./opensnoop.bt | grep -v kube
2098864 grpc_health_pro     3   0 /usr/bin/grpc_health_probe
2098832 conmon              9   0 /etc/localtime
2098871 runc:[2:INIT]       5   0 /proc/sys/kernel/cap_last_cap
3186541 crio              129   0 /tmp/crio-log-8cee46f55e41ce6990b121a4756bb6dfd4b16aa4249f69e50
2098871 runc:[2:INIT]       5   0 /proc/self/attr/keycreate
2098871 runc:[2:INIT]       5   0 /proc/thread-self/attr/exec
2098871 runc:[2:INIT]       5   0 /proc/self/fd
2098871 runc:[2:INIT]       5   0 /proc/self/status
2098871 runc:[2:INIT]       5   0 /etc/passwd
2098871 runc:[2:INIT]       9   0 /etc/group
2098871 runc:[2:INIT]       5   0 /etc/group
2098871 runc:[2:INIT]       5   0 /proc/self/setgroups
2098848 runc                3   0 /tmp/.pidfile3346035309



You can see you get really advanced details on what’s happening on your node and in your cluster. You can filter with grep -v to exclude and grep to include/select data you need.

Tip: You may need to install the linux-headers in your Image. I was able to see fips calls in my host/Node which confirmed my code was calling FIPS checks correctly.

$ bpftrace -I /usr/src/kernels/4.18.0-448.el8.ppc64le/include -e 'kprobe:vfs_open { printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name)); }' | grep fips
open path: fips
open path: fips
open path: fips_enabled
open path: fips_enabled
open path: fips_enabled

 

Note, the OpenShift Container Platform on IBM Power Systems supports FIPS mode configuration.



5 .    To use lsof

lsof adds an extra tool you can use to debug an application, Node, or cluster.
Find the process numbers of the application you are debugging (for instance, etcd):

 

  1. Find the process numbers of the application you are debugging (for instance, etcd):
$ ps -ef | grep 'etcd '
root       36437   36425  6 07:14 ?        00:02:17 etcd --logger=zap --log-level=info --experimental-initial-corrupt-check=true --initial-advertise-peer-urls=https://HIDDEN:2380 --cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-HIDDEN.crt --key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-serving-ip-HIDDEN.key --trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-serving-ca/ca-bundle.crt --client-cert-auth=true --peer-cert-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-ip-HIDDEN.crt --peer-key-file=/etc/kubernetes/static-pod-certs/secrets/etcd-all-certs/etcd-peer-ip-HIDDEN.key --peer-trusted-ca-file=/etc/kubernetes/static-pod-certs/configmaps/etcd-peer-client-ca/ca-bundle.crt --peer-client-cert-auth=true --advertise-client-urls=https://HIDDEN:2379 --listen-client-urls=https://HIDDEN:2379,unixs://HIDDEN:0 --listen-peer-urls=https:// HIDDEN:2380 --metrics=extensive --listen-metrics-urls=https://HIDDEN:9978
root       77208   74533  0 07:51 pts/0    00:00:00 grep etcd
  1. Run lsof command with process id to get the libraries related to etcd:
sh-4.4# lsof +p 36425
COMMAND   PID USER   FD      TYPE  DEVICE SIZE/OFF      NODE NAME
conmon  36425 root  cwd       DIR  0,24      180    519623 /run/containers/storage/overlay-containers/700120d21f123f5136717bf947080c79707d23d600824547c67f583c9d7d2d18/userdata
conmon  36425 root  rtd       DIR  259,4      253 157286970 /
conmon  36425 root  txt       REG  259,4   160056   2941029 /usr/bin/conmon
conmon  36425 root  mem       REG  59,4            2835103 /usr/lib64/libpcre2-8.so.0.7.1 (path dev=0,853, inode=187024850)
conmon  36425 root  mem       REG  59,4            2645049 /usr/lib64/libffi.so.6.0.2 (path dev=0,853, inode=187024700)
conmon  36425 root  mem       REG  259,4            2595716 /usr/lib64/libgpg-error.so.0.24.2 (path dev=0,853, inode=187024726)

As you can see, each file or network call is shown with lsof.

Summary

In this post, you’ve learned how to create an image to debug, launch the image as part of a Pod, use bpftrace and lsof to inspect an application or cluster on OpenShift Container Platform on IBM Power Systems.

Thanks for reading! I hope you found this helpful :)

Acknowledgements

Thanks to Paul Bastide for his support.

Permalink