Authors: Praveen Pandey (praveen.pandey@in.ibm.com) , Sachin Bappalige (sachin.pb@in.ibm.com)
Introduction
In large-scale enterprise environments, ensuring system serviceability and root-cause analysis is critical. When unexpected failures such as crashes or hangs occur, it's essential to capture the system state at the moment of failure to facilitate effective debugging and recovery. This is where First Failure Data Capture (FFDC) plays a pivotal role. It enables the collection of vital system information the very first time a problem arises, significantly improving the chances of diagnosing and resolving the issue accurately. FFDC captures essential data at the time of failure, helping system administrators and developers perform a root cause analysis (RCA) effectively.
The Operating System (OS) is central to overall system stability and reliability. Capturing its state during failures is the first and most important step toward identifying the root cause of system-level problems. Among the core components of FFDC on Linux systems are kdump and fadump—mechanisms that generate a memory dump (vmcore) when the kernel crashes. However, capturing the vmcore is just the beginning. The real diagnostic power lies in interpreting it, and that’s where the crash tool comes into play.
For IBM POWER systems, Firmware-Assisted Dump (fadump) serves as a reliable alternative to the traditional kdump mechanism. Unlike kdump, fadump leverages system firmware to preserve memory contents across reboots. Upon a crash, the system firmware ensures critical memory regions are retained, and after a reboot, the new kernel (a fresh OS image) invokes user-space kdump tools to save the crash dump. This firmware-assisted approach offers improved reliability—especially for complex I/O and PCI states, since it captures the memory state from a fully reinitialized system, making it highly suitable for enterprise-grade workloads.
The "crash" tool plays a significant role in the context of Kdump/Fadump by providing a powerful tool for analyzing crash dump files generated by the Kdump mechanism. The crash tool provides a live session interface to analyze the contents of a crashed Linux kernel memory image. It's used by developers, testers, and support engineers to investigate kernel bugs. It reads: vmcore – memory dump from the crashed kernel , vmlinux – uncompressed, debug-enabled kernel image (with symbols).
crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/20250626/vmcore
From crash shell, you can run commands like: bt , ps , vm, kmem and log...etc
Overview of its role and functionality
Crash is a command-line utility designed to analyze Linux kernel crash dump files. It provides a comprehensive set of commands for inspecting various aspects of the system's state during the crash.
b) Interpretation of Dump Files
Crash can interpret the contents of crash dump files generated by Kdump. These dump files contain information about the kernel's state, including memory contents, process information, kernel data structures, registers, and stack traces.
Crash allows users to inspect the contents of the kernel's memory at the time of the crash. This includes examining memory regions, displaying memory addresses, and searching for specific patterns or values within memory.
Users can use crash to retrieve information about running processes at the time of the crash. This includes details such as process IDs, parent process IDs, process states, CPU usage, and more.
e) Kernel Data Structures
Crash provides commands for examining various kernel data structures, such as task structures, file system information, network-related structures, and more. This allows users to gain insights into the kernel's internal state.
f) Stack Traces
One of the essential features of crash is its ability to display stack traces for processes and kernel threads. This helps users identify the sequence of function calls leading up to the crash, aiding in debugging and troubleshooting efforts.
g) Symbol Resolution
Crash can resolve memory addresses to corresponding symbols in the kernel's symbol table, providing meaningful names for functions, variables, and data structures. This makes it easier to interpret the contents of memory and stack traces.
h) Debugging and Troubleshooting
The crash tool is invaluable for debugging and troubleshooting kernel crashes. It provides detailed insights into the system's state at the time of a crash, helping system administrators and developers identify the root cause of crashes and implement appropriate fixes.
Common commands frequently used for debugging purposes
Displays information about processes at the time of the crash, including process IDs, parent process IDs, process states, CPU usage, and command names. Shows backtraces for processes and kernel threads, indicating the sequence of function calls leading up to the crash. This command is invaluable for understanding the execution flow before the crash.
vm: Provides information about virtual memory mappings, including mappings between virtual addresses and physical addresses, as well as memory protection attributes.
mem: Allows users to examine the contents of physical or virtual memory at specific addresses. This command is useful for inspecting memory regions and searching for patterns or values within memory.
task: Displays detailed information about a specific process, including its task structure, memory mappings, file descriptors, and thread information.
files: Lists open files and file descriptors associated with processes at the time of the crash. This command helps identify file-related issues that may have contributed to the crash.
log: Prints messages from the system log buffer captured at the time of the crash. This command provides additional context about system events leading up to the crash.
stack: Shows the kernel stack for a specific process or kernel thread. This command is useful for examining the call stack of a thread and identifying the functions being executed.
mod: Displays information about loaded kernel modules at the time of the crash, including module names, addresses, sizes, and dependencies.
sym: Resolves memory addresses to corresponding symbols in the kernel's symbol table, providing meaningful names for functions, variables, and data structures.
These are just a few examples of the Here are some common details provided by crash, which are frequently used for debugging purposes' available in the crash tool. Depending on the specific debugging scenario, users may utilize additional commands to extract relevant information from crash dump files and diagnose kernel-related issues effectively.
To start the utility, type the command in the following form at a shell prompt
You need to ensure the “kernel-debuginfo” package is present and it is at the same level as the kernel.
crash /var/crash/<timestamp>/vmcore /usr/lib/debug/lib/modules/$(uname -r)/vmlinux
Note that the kernel version should be the same as the one that was captured by kdump. To find out which kernel you are currently running, use the uname -r
This starts an interactive crash> shell where various commands can be used.
Command |
Description |
bt |
Shows backtrace of the crashing thread |
ps |
Lists all processes at time of crash |
vm |
Displays virtual memory layout |
task |
Provides detailed info on a specific process |
files |
Lists open files and FDs per process |
log |
Prints kernel log buffer at time of crash |
mod |
Shows loaded kernel modules |
stack |
Displays kernel stack of threads |
sym |
Resolves symbols from memory addresses |
Below is an example run of crash tool on a vmcore collected on a system with fadump enabled
Crash Tool Automation & Scripting
The crash tool supports scripting via macros and command files, making it possible to automate parts of the crash analysis. For example, a script can be created to automatically collect:
a) Stack traces of all CPUs
cat > analysis.crash <<EOF
crash vmcore vmlinux -i analysis.crash > crash_report.txt
NOTE: Crash analysis data collected in crash_report.txt shown in above example based on input script "analysis.crash"
Live system analysis using Crash
Crash can be used to examine the current state of a running Linux system, providing access to kernel data structures, memory, and other system information .It helps to investigate system hangs, performance issues, or other problems without needing a core dump.
In the Crash command if a MEMORY-IMAGE argument is not entered, the session will be invoked on the live system, By default, /dev/crash will be used if it exists. If it does not exist, then /dev/mem will be used , but if the kernel has been configured with CONFIG_STRICT_DEVMEM, then /proc/kcore will be used. It is permissible to explicitly enter /dev/crash, /dev/mem or /proc/kcore.
Example :

Crash supports loadable extensions that add new analysis commands. Crash Extension Modules written in C using crash’s plugin API and compiled into .so (shared object) files. These extensions Loaded at runtime using the command as shown below :
crash> extend /path/to/module.so
Common Examples of Extensions:
Extension |
Purpose |
net |
Analyze networking data structures like sockets |
ext4 |
Show ext4 filesystem internals |
dm |
Graphics subsystem analysis |
vfs |
View VFS structures and mounts |
You can find these extensions in /usr/libexec/crash/ or /usr/lib64/crash/ depending on distro
Example : dminfo.c can be referred which help to get Device Mapper information .
Crash Dump Data Collector:
crashdc data collector is a script that can be used in conjunction with the crash utility to automatically generate a text file containing major information about a newly generated crash dump.
Conclusion
The crash tool remains a critical component of the Linux First Failure Data Capture (FFDC) strategy and a cornerstone for effective post-mortem kernel debugging. Mastering the usage of crash for analyzing vmcore files significantly shortens the time required to isolate and resolve complex kernel-level issues across enterprise environments whether on RHEL, SLES, or other Linux distributions. With capabilities such as symbol resolution, in-depth memory and process analysis, and support for automation and scripting, crash enables teams not only to identify the root cause of system failures but also to build a proactive approach to prevent recurrence. The crash tool empowers system administrators and developers with the visibility and control needed to maintain uptime, stability, and trust in enterprise Linux deployments.
Special thanks you to Sachin Sant , Aditya Gupta and Sourabh Jain for sharing your knowledge and expertise, which were instrumental in bringing this much-needed document !