In a previous post, we discussed the general value of getting operating system core dumps on Linux to investigate Java crashes and other diagnostics. In a subsequent post, we discussed general best practices for gathering Linux core dumps by used a piped kernel core_pattern.
In this post, we will discuss a variation of gathering operating system core dumps where a container in Kubernetes or OpenShift is crashing immediately on startup. This means the container and/or pod will be automatically restarted. If following the best practices mentioned in the second link above, then the core dump will go to the worker node and there are various techniques to download a dump from a worker node. If using systemd-coredump
, for example, depending on the size of the core dump, you may need to change system settings (see an example for OpenShift).
However, what if such changes cannot be made to live production systems or you are hitting some issue with the piped core dump processing program (in our case, Ubuntu's apport
was not producing the dump and we didn't have the time to investigate) and you just want to write the core dump somewhere? We hit this situation with a customer and I'm sharing our simple workaround:
First, create writecore.sh
somewhere on the worker node. In general, well known executable paths are recommended in case of SELinux restrictions. In the following example, /usr/local/bin
is used but change as needed. Also change the destination directory of /tmp/
to where you want to write the cores and the log. Note that some worker nodes have limited disk space in /tmp/
so ensure the target directory has sufficient disk space.
cat > /usr/local/bin/writecore.sh <<"EOF"
#!/bin/sh
/usr/bin/echo "[$(/usr/bin/date)] Asked to create core for ${1}.${2}.${3}" >>/tmp/writecore.log
/usr/bin/cat - > /tmp/core.${1}.${2}.${3}.dmp 2>>/tmp/writecore.log
/usr/bin/echo "[$(/usr/bin/date)] Finished writing core for ${1}.${2}.${3}" >>/tmp/writecore.log
EOF
Next, make the script executable:
chmod +x /usr/local/bin/writecore.sh
If SELinux is in use (check getenforce
), change the security context:
chcon --reference=/usr/bin/cat /usr/local/bin/writecore.sh
Get the current core_pattern
so that you can revert it later:
sysctl kernel.core_pattern
Update the core_pattern
:
sysctl -w "kernel.core_pattern=|/usr/local/bin/writecore.sh %p %P %t"
Now core dumps should be processed through writecore.sh
and written to the destination directory on the worker node. Note that I had some further issues as seen in journalctl -f
:
Sep 06 10:41:52 localhost.localdomain audit[4985]: AVC avc: denied { map } for pid=4985 comm="writecore.sh" path="/usr/bin/bash" dev="vda4" ino=201689627 scontext=system_u:system_r:kernel_generic_helper_t:s0 tcontext=system_u:object_r:shell_exec_t:s0 tclass=file permissive=0
Sep 06 10:41:52 localhost.localdomain kernel: audit: type=1400 audit(1694014912.395:806): avc: denied { map } for pid=4985 comm="writecore.sh" path="/usr/bin/bash" dev="vda4" ino=201689627 scontext=system_u:system_r:kernel_generic_helper_t:s0 tcontext=system_u:object_r:shell_exec_t:s0 tclass=file permissive=0
Sep 06 10:41:52 localhost.localdomain kernel: Core dump to |/usr/local/bin/writecore.sh pipe failed
My worker node did not have semanage
installed to be able to modify SELinux permissions, so I just temporarily disabled SELinux and it worked:
setenforce Permissive
Finally, reproduce the issue and core dumps should be written to /tmp
. Check that this directory has plenty of free disk space and make sure to manage the core dumps there until you revert kernel.core_pattern
. Don't forget to revert SELinux if you disabled it temporarily. Both the core_pattern and SELinux changes will be reverted on reboot of the node.
#Java#RedHatOpenShift#WebSphereLiberty#automation-portfolio-specialists-app-platform