WebSphere Application Server & Liberty

 View Only

Lessons from the field #33: Manually gathering core dumps on Kubernetes/OpenShift

By Kevin Grigorenko posted Tue September 19, 2023 12:06 PM

  

In a previous post, we discussed the general value of getting operating system core dumps on Linux to investigate Java crashes and other diagnostics. In a subsequent post, we discussed general best practices for gathering Linux core dumps by used a piped kernel core_pattern.

In this post, we will discuss a variation of gathering operating system core dumps where a container in Kubernetes or OpenShift is crashing immediately on startup. This means the container and/or pod will be automatically restarted. If following the best practices mentioned in the second link above, then the core dump will go to the worker node and there are various techniques to download a dump from a worker node. If using systemd-coredump, for example, depending on the size of the core dump, you may need to change system settings (see an example for OpenShift).

However, what if such changes cannot be made to live production systems or you are hitting some issue with the piped core dump processing program (in our case, Ubuntu's apport was not producing the dump and we didn't have the time to investigate) and you just want to write the core dump somewhere? We hit this situation with a customer and I'm sharing our simple workaround:

First, create writecore.sh somewhere on the worker node. In general, well known executable paths are recommended in case of SELinux restrictions. In the following example, /usr/local/bin is used but change as needed. Also change the destination directory of /tmp/ to where you want to write the cores and the log. Note that some worker nodes have limited disk space in /tmp/ so ensure the target directory has sufficient disk space.

cat > /usr/local/bin/writecore.sh <<"EOF"
#!/bin/sh
/usr/bin/echo "[$(/usr/bin/date)] Asked to create core for ${1}.${2}.${3}" >>/tmp/writecore.log
/usr/bin/cat - > /tmp/core.${1}.${2}.${3}.dmp 2>>/tmp/writecore.log
/usr/bin/echo "[$(/usr/bin/date)] Finished writing core for ${1}.${2}.${3}" >>/tmp/writecore.log
EOF

Next, make the script executable:

chmod +x /usr/local/bin/writecore.sh

If SELinux is in use (check getenforce), change the security context:

chcon --reference=/usr/bin/cat /usr/local/bin/writecore.sh

Get the current core_pattern so that you can revert it later:

sysctl kernel.core_pattern

Update the core_pattern:

sysctl -w "kernel.core_pattern=|/usr/local/bin/writecore.sh %p %P %t"

Now core dumps should be processed through writecore.sh and written to the destination directory on the worker node. Note that I had some further issues as seen in journalctl -f:

Sep 06 10:41:52 localhost.localdomain audit[4985]: AVC avc:  denied  { map } for  pid=4985 comm="writecore.sh" path="/usr/bin/bash" dev="vda4" ino=201689627 scontext=system_u:system_r:kernel_generic_helper_t:s0 tcontext=system_u:object_r:shell_exec_t:s0 tclass=file permissive=0
Sep 06 10:41:52 localhost.localdomain kernel: audit: type=1400 audit(1694014912.395:806): avc:  denied  { map } for  pid=4985 comm="writecore.sh" path="/usr/bin/bash" dev="vda4" ino=201689627 scontext=system_u:system_r:kernel_generic_helper_t:s0 tcontext=system_u:object_r:shell_exec_t:s0 tclass=file permissive=0
Sep 06 10:41:52 localhost.localdomain kernel: Core dump to |/usr/local/bin/writecore.sh pipe failed

My worker node did not have semanage installed to be able to modify SELinux permissions, so I just temporarily disabled SELinux and it worked:

setenforce Permissive

Finally, reproduce the issue and core dumps should be written to /tmp. Check that this directory has plenty of free disk space and make sure to manage the core dumps there until you revert kernel.core_pattern. Don't forget to revert SELinux if you disabled it temporarily. Both the core_pattern and SELinux changes will be reverted on reboot of the node.

#Java#RedHatOpenShift#WebSphereLiberty#automation-portfolio-specialists-app-platform

0 comments
18 views

Permalink