Core dumps are requested to be produced when a process crashes or when an IBM Java or IBM Semeru Runtimes process runs out of Java heap space (OutOfMemoryError
). These are critical diagnostic artifacts that help determine the cause of the problem through tools such as the Memory Analyzer Tool.
In recent versions of OpenShift, when CoreOS is used, for example, the default value of the Linux setting kernel.core_pattern
is |/usr/lib/systemd/systemd-coredump
. When core_pattern
starts with the pipe character (|
), the core dump is sent to the specified program on the worker node (e.g. /usr/lib/system/systemd-coredump
). Therefore, the core dump will not exist within the container but rather on the worker node.
However, systemd-coredump
before and including version v250 truncates 64-bit process core dumps at 2GB by default. If the Java process has a virtual size greater than 2GB, then the core dump will be unlikely to be very valuable for analysis.
This post will go through how to understand the version of systemd-coredump
in use, whether its settings have been modified, and if not, how to modify its settings to avoid core dump truncation.
Query the systemd-coredump version
First, list all of the nodes and we’ll focus just on worker nodes as an example:
$ oc get nodes -l node-role.kubernetes.io/worker
NAME STATUS ROLES AGE VERSION
worker0.ibm.com Ready worker 40d v1.24.6+5658434
worker1.ibm.com Ready worker 40d v1.24.6+5658434
worker2.ibm.com Ready worker 40d v1.24.6+5658434
Choose a worker node and determine the core_pattern
by replacing $NODE
with the node name:
oc debug node/$NODE -t -- chroot /host sysctl kernel.core_pattern
For example:
$ oc debug node/worker0.ibm.com -t -- chroot /host sysctl kernel.core_pattern
Starting pod/worker0ibmcom-debug ...
To use host binaries, run `chroot /host`
kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
This verifies that systemd-coredump
is the core dump processing program in-use.
Determine the systemd
version by replacing $NODE
with the node name:
oc debug node/$NODE -t -- chroot /host coredumpctl --version
For example:
$ oc debug node/worker0.ibm.com -t -- chroot /host coredumpctl --version
Starting pod/worker0ibmcom-debug ...
To use host binaries, run `chroot /host`
systemd 239 (239-58.el8_6.9)
In the above example, systemd
is version 239
which is <= 250
and therefore the default is to truncate core dumps at 2GB.
Query the systemd-coredump configuration
Next, print the systemd-coredump
configuration by replacing $NODE
with the node name:
oc debug node/$NODE -t -- chroot /host sh -c "find /etc/systemd/coredump.conf /etc/systemd/coredump.conf.d/ /usr/lib/systemd/coredump.conf.d/ /usr/local/lib/systemd/coredump.conf.d/ -type f | xargs cat"
For example:
$ oc debug node/worker0.ibm.com -t -- chroot /host sh -c "find /etc/systemd/coredump.conf /etc/systemd/coredump.conf.d/ /usr/lib/systemd/coredump.conf.d/ /usr/local/lib/systemd/coredump.conf.d/ -type f | xargs cat"
Starting pod/worker0ibmcom-debug ...
To use host binaries, run `chroot /host`
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See coredump.conf(5) for details.
[Coredump]
#Storage=external
#Compress=yes
#ProcessSizeMax=2G
#ExternalSizeMax=2G
#JournalSizeMax=767M
#MaxUse=
#KeepFree=
By default, the main configuration file lists the compiled-in defaults as commented out lines. This confirms that systemd-coredump
will truncate a core dump at 2GB.
Update systemd-coredump configuration
Configuration updates to OpenShift worker nodes are generally done through MachineConfig objects:
Aside from a few specialized features, most changes to operating systems on OpenShift Container Platform nodes can be done by creating what are referred to as MachineConfig
objects that are managed by the Machine Config Operator.
Create a systemd-coredump-overwrite.conf
file to represent the changes we will make using the example below. Consult the version of the documentation for your particular version of OpenShift; in particular, you should update the version on the second line to match your product version.
variant: openshift
version: 4.12.0
metadata:
name: 99-systemd-coredump-overwrite
labels:
machineconfiguration.openshift.io/role: worker
storage:
files:
- path: /etc/systemd/coredump.conf.d/99-systemd-coredump-overwrite.conf
mode: 0644
overwrite: true
contents:
inline: |
# 99-systemd-coredump-overwrite.conf
# See coredump.conf(5) for details.
[Coredump]
#Storage=external
#Compress=yes
ProcessSizeMax=100G
ExternalSizeMax=100G
#JournalSizeMax=767M
#MaxUse=
#KeepFree=
Next, install the Butane program.
Run Butane on the conf
file to produce the yaml
file:
butane systemd-coredump-overwrite.conf -o ./99-systemd-coredump-overwrite.yaml
Next, apply the yaml
file. Note:
Nodes that are updating might not be available for scheduling.
$ oc apply -f 99-systemd-coredump-overwrite.yaml
machineconfig.machineconfiguration.openshift.io/99-systemd-coredump-overwrite created
Retrieve the status of the update:
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-28d12e5e81b78289040369dd4c6479bb True False False 3 3 3 0 40d
worker rendered-worker-a8a7efd99c8afe44e785443fce2c0dae False True False 3 1 1 0 40d
As the update cascades through the worker nodes, the UPDATING
column is True
and UPDATEDMACHINECOUNT
is set to the number of nodes that have been updated.
Once all nodes have been updated, the UPDATED
column will be True
:
$ oc get machineconfigpool
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-28d12e5e81b78289040369dd4c6479bb True False False 3 3 3 0 40d
worker rendered-worker-172c1a2f1aa57e48eccb3670e581c8ef True False False 3 3 3 0 40d
You can verify the updated configuration by performing the same cat as above. For example:
$ oc debug node/worker0.ibm.com -t -- chroot /host sh -c "find /etc/systemd/coredump.conf /etc/systemd/coredump.conf.d/ /usr/lib/systemd/coredump.conf.d/ /usr/local/lib/systemd/coredump.conf.d/ -type f | xargs cat"
Starting pod/worker0ibmcom-debug ...
To use host binaries, run `chroot /host`
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See coredump.conf(5) for details.
[Coredump]
#Storage=external
#Compress=yes
#ProcessSizeMax=2G
#ExternalSizeMax=2G
#JournalSizeMax=767M
#MaxUse=
#KeepFree=
# 99-systemd-coredump-overwrite.conf
# See coredump.conf(5) for details.
[Coredump]
#Storage=external
#Compress=yes
ProcessSizeMax=100G
ExternalSizeMax=100G
#JournalSizeMax=767M
#MaxUse=
#KeepFree=
cat: '/usr/lib/systemd/coredump.conf.d/*.conf': No such file or directory
cat: '/usr/local/lib/systemd/coredump.conf.d/*.conf': No such file or directory
Notice that the original, built-in configuration file is still there, and now we also see the new override file written.
Finally, although systemd-coredump
configuration changes do not require a restart of the node, they do require that systemd
reloads its configuration (which is normally done on system boot). This can be done on a worker-by-worker basis with the following command, replacing $NODE
with the worker node:
oc debug node/$NODE -t -- chroot /host systemctl daemon-reload
Note that you may also need to increase MaxUse and KeepFree depending on available disk space and the expected size of core dumps.
For details on how to download a core dump from a worker node, see example instructions.
Reverting the change
If at any point the change needs to be reverted, simply delete the MachineConfig
object:
$ oc delete machineconfig 99-systemd-coredump-overwrite
machineconfig.machineconfiguration.openshift.io "99-systemd-coredump-overwrite" deleted
Review oc get machineconfigpool
to confirm that the change has propagated. The settings have gone back to the previous state of any modified files which means that the override file has been removed. Note that you will need to run systemctl daemon-reload
as per above again.
#automation-portfolio-specialists-app-platform #Java #WebSphere #OpenLiberty #WebSphereLiberty #RedHatOpenShift