Enterprise Linux

Enterprise Linux on Power

Enterprise Linux on Power delivers the foundation for your open source hybrid cloud infrastructure with industry-leading cloud-native deployment options.

 View Only

CRASH TOOL for vmcore analysis in Enterprise POWER Systems

By Sachin P Bappalige posted 20 hours ago

  

Authors:  Praveen Pandey (praveen.pandey@in.ibm.com) ,  Sachin Bappalige (sachin.pb@in.ibm.com)

Introduction 

In large-scale enterprise environments, ensuring system serviceability and root-cause analysis is critical. When unexpected failures such as crashes or hangs occur, it's essential to capture the system state at the moment of failure to facilitate effective debugging and recovery. This is where First Failure Data Capture (FFDC) plays a pivotal role. It enables the collection of vital system information the very first time a problem arises, significantly improving the chances of diagnosing and resolving the issue accurately. FFDC captures essential data at the time of failure, helping system administrators and developers perform a root cause analysis (RCA) effectively. 

    The Operating System (OS) is central to overall system stability and reliability. Capturing its state during failures is the first and most important step toward identifying the root cause of system-level problems. Among the core components of FFDC on Linux systems are kdump and fadump—mechanisms that generate a memory dump (vmcore) when the kernel crashes. However, capturing the vmcore is just the beginning. The real diagnostic power lies in interpreting it, and that’s where the crash tool comes into play. 

    For IBM POWER systems, Firmware-Assisted Dump (fadump) serves as a reliable alternative to the traditional kdump mechanism. Unlike kdump, fadump leverages system firmware to preserve memory contents across reboots. Upon a crash, the system firmware ensures critical memory regions are retained, and after a reboot, the new kernel (a fresh OS image) invokes user-space kdump tools to save the crash dump. This firmware-assisted approach offers improved reliability—especially for complex I/O and PCI states, since it captures the memory state from a fully reinitialized system, making it highly suitable for enterprise-grade workloads. 

    The "crash" tool plays a significant role in the context of Kdump/Fadump by providing a powerful tool for analyzing crash dump files generated by the Kdump mechanism. The crash tool provides a live session interface to analyze the contents of a crashed Linux kernel memory image. It's used by developers, testers, and support engineers to investigate kernel bugs. It reads: vmcore – memory dump from the crashed kernel , vmlinux – uncompressed, debug-enabled kernel image (with symbols). 

  

Usage: 

crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux /var/crash/20250626/vmcore 

Or 

crash vmlinux vmcore 

From crash shell, you can run commands like: bt , ps , vm, kmem and log...etc

Overview of its role and functionality

a) Analysis Tool

Crash is a command-line utility designed to analyze Linux kernel crash dump files. It provides a comprehensive set of commands for inspecting various aspects of the system's state during the crash. 

b) Interpretation of Dump Files

Crash can interpret the contents of crash dump files generated by Kdump. These dump files contain information about the kernel's state, including memory contents, process information, kernel data structures, registers, and stack traces. 

c) Memory Inspection

Crash allows users to inspect the contents of the kernel's memory at the time of the crash. This includes examining memory regions, displaying memory addresses, and searching for specific patterns or values within memory. 

d) Process Information

Users can use crash to retrieve information about running processes at the time of the crash. This includes details such as process IDs, parent process IDs, process states, CPU usage, and more. 

e) Kernel Data Structures

Crash provides commands for examining various kernel data structures, such as task structures, file system information, network-related structures, and more. This allows users to gain insights into the kernel's internal state.

f) Stack Traces

One of the essential features of crash is its ability to display stack traces for processes and kernel threads. This helps users identify the sequence of function calls leading up to the crash, aiding in debugging and troubleshooting efforts. 

g) Symbol Resolution

Crash can resolve memory addresses to corresponding symbols in the kernel's symbol table, providing meaningful names for functions, variables, and data structures. This makes it easier to interpret the contents of memory and stack traces.  

h) Debugging and Troubleshooting

The crash tool is invaluable for debugging and troubleshooting kernel crashes. It provides detailed insights into the system's state at the time of a crash, helping system administrators and developers identify the root cause of crashes and implement appropriate fixes. 

Common commands frequently used for debugging purposes

Displays information about processes at the time of the crash, including process IDs, parent process IDs, process states, CPU usage, and command names. Shows backtraces for processes and kernel threads, indicating the sequence of function calls leading up to the crash. This command is invaluable for understanding the execution flow before the crash. 

vm: Provides information about virtual memory mappings, including mappings between virtual addresses and physical addresses, as well as memory protection attributes. 

mem: Allows users to examine the contents of physical or virtual memory at specific addresses. This command is useful for inspecting memory regions and searching for patterns or values within memory. 

task: Displays detailed information about a specific process, including its task structure, memory mappings, file descriptors, and thread information. 

files: Lists open files and file descriptors associated with processes at the time of the crash. This command helps identify file-related issues that may have contributed to the crash. 

log: Prints messages from the system log buffer captured at the time of the crash. This command provides additional context about system events leading up to the crash. 

stack: Shows the kernel stack for a specific process or kernel thread. This command is useful for examining the call stack of a thread and identifying the functions being executed. 

mod: Displays information about loaded kernel modules at the time of the crash, including module names, addresses, sizes, and dependencies. 

sym: Resolves memory addresses to corresponding symbols in the kernel's symbol table, providing meaningful names for functions, variables, and data structures. 

These are just a few examples of the Here are some common details provided by crash, which are frequently used for debugging purposes' available in the crash tool. Depending on the specific debugging scenario, users may utilize additional commands to extract relevant information from crash dump files and diagnose kernel-related issues effectively. 

To start the utility, type the command in the following form at a shell prompt

You need to ensure the “kernel-debuginfo” package is present and it is at the same level as the kernel. 

A screen shot of a computer code

Description automatically generated 

crash /var/crash/<timestamp>/vmcore /usr/lib/debug/lib/modules/$(uname -r)/vmlinux 

Note that the kernel version should be the same as the one that was captured by kdump. To find out which kernel you are currently running, use the uname -r  

This starts an interactive crash> shell where various commands can be used.

Command  Description
bt Shows backtrace of the crashing thread
ps Lists all processes at time of crash
vm Displays virtual memory layout
task Provides detailed info on a specific process
files Lists open files and FDs per process
log  Prints kernel log buffer at time of crash
mod Shows loaded kernel modules 
stack Displays kernel stack of threads
sym Resolves symbols from memory addresses

Below is an example run of crash tool on a vmcore collected on a system with fadump enabled

 

 

 

 

 

 

 

 

Crash Tool Automation & Scripting

The crash tool supports scripting via macros and command files, making it possible to automate parts of the crash analysis. For example, a script can be created to automatically collect: 

a) Stack traces of all CPUs 

b) Kernel log 

c) Module list 

d) Faulting process info 

Example: 

cat > analysis.crash <<EOF 

log 

bt 

ps 

mod 

exit 

EOF 

  

crash vmcore vmlinux -i analysis.crash > crash_report.txt 

Example:  

 

NOTE: Crash analysis data collected in crash_report.txt shown in above example based on input script  "analysis.crash" 

Live system analysis using Crash  
Crash can be used to examine the current state of a running Linux system, providing access to kernel data structures, memory, and other system information .It helps to  investigate system hangs, performance issues, or other problems without needing a core dump. 
 
    In the Crash command if a MEMORY-IMAGE argument is not entered, the session will be invoked on the live system, By default, /dev/crash will be used if it exists. If it does not exist, then /dev/mem will be used , but if the kernel has been configured with CONFIG_STRICT_DEVMEM, then /proc/kcore will be used. It is permissible to explicitly enter /dev/crash, /dev/mem or /proc/kcore. 

 
Example : 
 
 
 
 

Crash Tool Extensions

Crash supports loadable extensions that add new analysis commands. Crash Extension Modules written in C using crash’s plugin API and compiled into .so (shared object) files. These extensions Loaded at runtime using the command as shown below : 

crash> extend /path/to/module.so 

Common Examples of Extensions: 

Extension Purpose
net Analyze networking data structures like sockets
ext4 Show ext4 filesystem internals
dm Graphics subsystem analysis
vfs View VFS structures and mounts

You can find these extensions in /usr/libexec/crash/ or /usr/lib64/crash/ depending on distro

Example : dminfo.c  can be referred which help to get Device Mapper information . 
 
Crash Dump Data Collector: 
crashdc data collector is a script that can be used in conjunction with the crash utility to automatically generate a text file containing major information about a newly generated crash dump.  
 
 
Conclusion 

The crash tool remains a critical component of the Linux First Failure Data Capture (FFDC) strategy and a cornerstone for effective post-mortem kernel debugging.  Mastering the usage of crash for analyzing vmcore files significantly shortens the time required to isolate and resolve complex kernel-level issues across enterprise environments whether on RHEL, SLES, or other Linux distributions. With capabilities such as symbol resolution, in-depth memory and process analysis, and support for automation and scripting, crash enables teams not only to identify the root cause of system failures but also to build a proactive approach to prevent recurrence. The crash tool empowers system administrators and developers with the visibility and control needed to maintain uptime, stability, and trust in enterprise Linux deployments. 

Special  thanks  you to Sachin Sant , Aditya Gupta and Sourabh Jain  for sharing your knowledge and expertise,  which were instrumental in bringing this much-needed document  !

0 comments
2 views

Permalink