In our previous blog post, we discussed the value of running the Linux perf
native CPU sampling profiler to investigate Java CPU usage in production. However, perf
generally requires root
access and OpenShift application containers generally don't run with such privileged access.
This post will describe how to run perf
in OpenShift. We'll create a diagnostic container with perf
installed, we'll run that container with root
access on the worker node, we'll figure out the target container process ID by using runc
, and finally we'll run perf
as normal.
Preparing Java
First, it's best to restart the JVM with certain parameters to improve perf
call stacks. With containers, this means adding Java arguments to your Dockerfile
and rebuilding your deployment in OpenShift:
- IBM Java/Semeru/OpenJ9 offer the command line argument
-Xjit:perfTool
that writes /tmp/perf-$PID.map
which is used by perf
to resolve JIT-compiled Java method names. If not all symbols are resolved, try adding -Xlp:codecache:pagesize=4k
. Only the last -Xjit
option is processed, so if there is additional JIT tuning, combine the perfTool
option with that tuning; for example, -Xjit:perfTool,exclude={com/example/generated/*}
.
- On HotSpot Java, use
-XX:+PreserveFramePointer
and something like libperf-jvmti.so or perf-map-agent for JIT-compiled methods.
Here's an example Dockerfile of a Java program that burns one CPU running on Semeru Runtime Open Edition with -Xjit:perfTool
:
FROM ibm-semeru-runtimes:open-17-jdk
RUN printf 'public class BurnCPU { public static void main(String... args) { System.out.println("Burning 1 CPU..."); while (true) {} } }' > BurnCPU.java && javac BurnCPU.java
CMD ["java", "-Xjit:perfTool", "BurnCPU"]
This is published on DockerHub so you may create and run a deployment directly for testing:
$ oc create deployment burncpu --image=docker.io/kgibm/burncpu
Creating a perf container
We'll need a container that we'll run on the target worker node that has perf
as well as some other useful utilities. Here's an example Dockerfile
based on Fedora (note that this distribution doesn't need to match the distribution of your target container):
FROM fedora
RUN dnf install -y perf runc procps-ng binutils less lsof psmisc sysstat vim zip util-linux && \
dnf clean all
Then build and push this image to your OpenShift registry. Alternatively, you may use a public image from DockerHub that was built from the above: docker.io/kgibm/perfcontainer
Running the perf container
Next, use kubectl
, oc
, or the OpenShift web console under Workloads } Pods to find the worker node where your Java pod is running.
Once you've found the worker node, start a debug container on that worker node and point to the image created in the previous section. For example:
$ oc debug node/10.169.235.132 -t --image=docker.io/kgibm/perfcontainer:latest
Creating debug namespace/openshift-debug-node-5bdmv ...
Starting pod/10.169.235.132-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.169.235.132
If you don't see a command prompt, try pressing enter.
sh-5.1#
Note that OpenShift tends to clean up debug pods very aggressively (idle timeout of about one minute) if no active command is running, so you can run a command like top
and then Ctrl^C
when you're ready to run more commands.
Run perf top
to make sure it's working and Ctrl^C
once you've confirmed.
If it's not working, depending on the error message, you may need to enable perf
on the node with a command such as:
sysctl -w kernel.perf_event_paranoid=-1
Finding the target process
Next, we'll want to find the target process ID. We can list all Java processes, use runc
to dump the container names and then find the right process ID. For example:
sh-5.1# for containerid in $(for pid in $(pgrep -f java); do runc --root /host/run/runc list | grep $pid; done | awk '{print $1}'); do runc --root /host/run/runc state $containerid | grep -e '"id"' -e '"pid"' -e '"rootfs"' -e '"io.kubernetes.pod.name"'; done
"id": "47f4957dda2502616f717ddae284a467847c4df679fcc308e4d78f3f9624f473",
"pid": 38956,
"rootfs": "/var/lib/containers/storage/overlay/645b2b122388c89ea956197ec0795cb31381c76948417c0dc32cca06bf17aaac/merged",
"io.kubernetes.pod.name": "burncpu-8dbb7b7d5-rm8l8",
In the above example, there's a single Java process on the worker node, its container name is burncpu-8dbb7b7d5-rm8l8
, its worker node PID is 38956
and its ephemeral filesystem is at /var/lib/containers/storage/overlay/645b2b122388c89ea956197ec0795cb31381c76948417c0dc32cca06bf17aaac/merged
. Actually, from the point of view of the debug container, the ephemeral filesystem needs to be prefixed with /host/
, so it's actually at /host/var/lib/containers/storage/overlay/645b2b122388c89ea956197ec0795cb31381c76948417c0dc32cca06bf17aaac/merged
.
For IBM Java/Semeru/OpenJ9 with -Xjit:perfTool
, we'll want to find the /tmp/perf-$PID.map
file. This will be generated in the /tmp
folder of the running container rather than the worker node. To find this, we take the ephemeral filesystem link above, go up one directory, and then go down into the diff
directory:
sh-5.1# ls -l /host/var/lib/containers/storage/overlay/645b2b122388c89ea956197ec0795cb31381c76948417c0dc32cca06bf17aaac/diff
total 0
drwxr-xr-x. 2 root root 18 Dec 28 16:04 etc
drwxrwxrwt. 3 root root 53 Dec 28 16:04 tmp
There's our container's tmp
directory and we can list its contents to show the perf
map file:
sh-5.1# ls -l /host/var/lib/containers/storage/overlay/645b2b122388c89ea956197ec0795cb31381c76948417c0dc32cca06bf17aaac/diff/tmp/
total 32
-rw-r-----. 1 root root 30591 Dec 28 16:04 perf-1.map
However, the PID of the perf.map
file is PID 1 from inside the container, but for our perf
command to map things correctly, we need to take the PID we found from runc
above (in this example, 38956
) and make a symbolic link to this file inside our debug container's tmp
directory with that PID:
sh-5.1# ln -s /host/var/lib/containers/storage/overlay/645b2b122388c89ea956197ec0795cb31381c76948417c0dc32cca06bf17aaac/diff/tmp/perf-1.map /tmp/perf-38956.map
Finally, we can run perf like we normally would. For this usage, we'll generally want to focus on just our target PID. For example:
perf record --call-graph dwarf,65528 -F 99 -g -p 38956 -- sleep 15
Finally, we can analyze the perf.data
file as we normally would. The following basic report quickly shows the CPU usage of our process and it has successfully resolved the top stack frame to BurnCPU.main
:
# perf report -n --show-cpu-utilization | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles'
# Event count (approx.): 45922815255
#
# Children Self sys usr Samples Command Shared Object Symbol
# ........ ........ ........ ........ ............ .............. ................. ..........................................
#
98.54% 98.54% 0.00% 98.54% 1470 main [JIT] tid 38956 [.] BurnCPU.main([Ljava/lang/String;)V_hot
Be careful about the debug container idle timeout mentioned before. Once OpenShift deletes the pod, your perf.data
file will be gone. The simplest thing to do is to perform perf archive
, grab the /tmp/perf-${PID}.map
file and copy these files to the worker node into something like /host/tmp
and then download the files from the worker node.
This exercise is complete and you may exit
the debug node to delete it.
Clean up
If you used the burncpu
container for testing, don't forget to delete the deployment:
$ oc delete deployment burncpu
Conclusion
In summary, this post described how to run the Linux perf
native CPU sampling profiler on Java workloads in OpenShift in production. We create a diagnostic container with perf
installed and then run it as root
on the worker node that is running the target Java process. We find the worker node PID of the target Java process using runc
. Then, we create a symbolic link to the perf.map
file in the diagnostic container's tmp
directory. Finally, we run perf
as we normally would.
#app-platform-swat#automation-portfolio-specialists-app-platform#Java#Linux#performance#troubleshoot#WebSphere#WebSphereApplicationServer(WAS)#WebSphereLiberty