Chapter 14. Configuring kdump on the command line - RHEL9

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel

Chapter 14. Configuring kdump on the command line

The memory for kdump is reserved during the system boot. You can configure the memory size in the system’s Grand Unified Bootloader (GRUB) configuration file. The memory size depends on the crashkernel= value specified in the configuration file and the size of the physical memory of system. 14.1. Estimating the kdump size

When planning and building your kdump environment, it is important to know the space required by the crash dump file.

The makedumpfile –mem-usage command estimates the space required by the crash dump file. It generates a memory usage report. The report helps you decide the dump level and the pages that are safe to exclude.

Procedure

Enter the following command to generate a memory usage report:

# makedumpfile --mem-usage /proc/kcore


TYPE        PAGES    EXCLUDABLE    DESCRIPTION
-------------------------------------------------------------
ZERO          501635      yes        Pages filled with zero
CACHE         51657       yes        Cache pages
CACHE_PRIVATE 5442        yes        Cache pages + private
USER          16301       yes        User process pages
FREE          77738211    yes        Free pages
KERN_DATA     1333192     no         Dumpable kernel data

Important

The makedumpfile –mem-usage command reports required memory in pages. This means that you must calculate the size of memory in use against the kernel page size.

By default the RHEL kernel uses 4 KB sised pages on AMD64 and Intel 64 CPU architectures, and 64 KB sised pages on IBM POWER architectures. 14.2. Configuring kdump memory usage on RHEL 9 Copy link

The kexec-tools package maintains the default crashkernel= memory reservation values. The kdump service uses the default value to reserve the crash kernel memory for each kernel. The default value can also serve as the reference base value to estimate the required memory size when you set the crashkernel= value manually. The minimum size of the crash kernel can vary depending on the hardware and machine specifications.

The automatic memory allocation for kdump also varies based on the system hardware architecture and available memory size. For example, on AMD64 and Intel 64-bit architectures, the default value for the crashkernel= parameter will work only when the available memory is more than 1 GB. The kexec-tools utility configures the following default memory reserves on AMD64 and Intel 64-bit architecture:

crashkernel=1G-4G:192M,4G-64G:256M,64G:512M

You can also run kdumpctl estimate to get an approximate value without triggering a crash. The estimated crashkernel= value might not be an exact one but can serve as a reference to set an appropriate crashkernel= value. Note

The crashkernel=auto option in the boot command line is no longer supported on RHEL 9 and later releases.

Prerequisites

You have root permissions on the system.
You have fulfilled kdump requirements for configurations and targets. For details, see Supported kdump configurations and targets.
You have installed the zipl utility if it is the IBM Z system. 

Procedure

Configure the default value for crash kernel:

# kdumpctl reset-crashkernel --kernel=ALL

When configuring the crashkernel= value, test the configuration by rebooting the system with kdump enabled. If the kdump kernel fails to boot, increase the memory size gradually to set an acceptable value.

To use a custom crashkernel= value:

    Configure the required memory reserve.

    crashkernel=192M

    Optionally, you can set the amount of reserved memory to a variable depending on the total amount of installed memory by using the syntax crashkernel=<range1>:<size1>,<range2>:<size2>. For example:

    crashkernel=1G-4G:192M,2G-64G:256M

    The example reserves 192 MB of memory if the total amount of system memory is 1 GB or higher and lower than 4 GB. If the total amount of memory is more than 4 GB, 256 MB is reserved for kdump.

    Optional: Offset the reserved memory.

    Some systems require to reserve memory with a certain fixed offset since crashkernel reservation is very early, and it wants to reserve some area for special usage. If the offset is set, the reserved memory begins there. To offset the reserved memory, use the following syntax:

    crashkernel=192M@16M

    The example reserves 192 MB of memory starting at 16 MB (physical address 0x01000000). If you offset to 0 or do not specify a value, kdump offsets the reserved memory automatically. You can also offset memory when setting a variable memory reservation by specifying the offset as the last value. For example, crashkernel=1G-4G:192M,2G-64G:256M@16M.

    Update the boot loader configuration:

    # grubby --update-kernel ALL --args "crashkernel=<custom-value>"

    The <custom-value> must contain the custom crashkernel= value that you have configured for the crash kernel. 

Reboot for changes to take effect:

# reboot

Verification

The commands to test kdump configuration will cause the kernel to crash with data loss. Follow the instructions with care. You must not use an active production system to test the kdump configuration.

Cause the kernel to crash by activating the sysrq key. The address-YYYY-MM-DD-HH:MM:SS/vmcore file is saved to the target location as specified in the /etc/kdump.conf file. If you select the default target location, the vmcore file is saved in the partition mounted under /var/crash/.

Activate the sysrq key to boot into the kdump kernel:

# echo c > /proc/sysrq-trigger

The command causes kernel to crash and reboots the kernel if required.
Display the /etc/kdump.conf file and check if the vmcore file is saved in the target destination. 

Additional resources

How to manually modify the boot parameter in grub before the system boots.
grubby(8) man page on your system. 

14.3. Configuring the kdump target

The crash dump is usually stored as a file in a local file system, written directly to a device. Optionally, you can send crash dump over a network by using the NFS or SSH protocols. Only one of these options to preserve a crash dump file can be set at a time. The default behaviour is to store it in the /var/crash/ directory of the local file system.

Prerequisites

You have root permissions on the system.
Fulfilled requirements for kdump configurations and targets. For details, see Supported kdump configurations and targets. 

Procedure

To store the crash dump file in /var/crash/ directory of the local file system, edit the /etc/kdump.conf file and specify the path:

path /var/crash

The option path /var/crash represents the path to the file system in which kdump saves the crash dump file.
Note
    When you specify a dump target in the /etc/kdump.conf file, then the path is relative to the specified dump target.
    When you do not specify a dump target in the /etc/kdump.conf file, then the path represents the absolute path from the root directory. 

Depending on the file system mounted in the current system, the dump target and the adjusted dump path are configured automatically.

To secure the crash dump file and the accompanying files produced by kdump, you should set up proper attributes for the target destination directory, such as user permissions and SELinux contexts. Additionally, you can define a script, for example kdump_post.sh in the kdump.conf file as follows:

kdump_post <path_to_kdump_post.sh>

The kdump_post directive specifies a shell script or a command that executes after kdump has completed capturing and saving a crash dump to the specified destination. You can use this mechanism to extend the functionality of kdump to perform actions including the adjustments in file permissions.
The kdump target configuration 

grep -v ^# /etc/kdump.conf | grep -v ^$

ext4 /dev/mapper/vg00-varcrashvol path /var/crash core_collector makedumpfile -c –message-level 1 -d 31

The dump target is specified (ext4 /dev/mapper/vg00-varcrashvol), and, therefore, it is mounted at /var/crash. The path option is also set to /var/crash. Therefore, the kdump saves the vmcore file in the /var/crash/var/crash directory.

To change the local directory for saving the crash dump, edit the /etc/kdump.conf configuration file as a root user:
    Remove the hash sign (#) from the beginning of the #path /var/crash line.

    Replace the value with the intended directory path. For example:

    path /usr/local/cores

    Important

    In RHEL 9, the directory defined as the kdump target using the path directive must exist when the kdump systemd service starts to avoid failures. Unlike in earlier versions of RHEL, the directory is no longer created automatically if it does not exist when the service starts.

To write the file to a different partition, edit the /etc/kdump.conf configuration file:

    Remove the hash sign (#) from the beginning of the #ext4 line, depending on your choice.
        device name (the #ext4 /dev/vg/lv_kdump line)
        file system label (the #ext4 LABEL=/boot line)
        UUID (the #ext4 UUID=03138356-5e61-4ab3-b58e-27507ac41937 line) 

    Change the file system type and the device name, label or UUID, to the required values. The correct syntax for specifying UUID values is both UUID="correct-uuid" and UUID=correct-uuid. For example:

    ext4 UUID=03138356-5e61-4ab3-b58e-27507ac41937

    Important

    It is recommended to specify storage devices by using a LABEL= or UUID=. Disk device names such as /dev/sda3 are not guaranteed to be consistent across reboot.

    When you use Direct Access Storage Device (DASD) on IBM Z hardware, ensure the dump devices are correctly specified in /etc/dasd.conf before proceeding with kdump.

To write the crash dump directly to a device, edit the /etc/kdump.conf configuration file:
    Remove the hash sign (#) from the beginning of the #raw /dev/vg/lv_kdump line.

    Replace the value with the intended device name. For example:

    raw /dev/sdb1

To store the crash dump to a remote machine by using the NFS protocol:
    Remove the hash sign (#) from the beginning of the #nfs my.server.com:/export/tmp line.

    Replace the value with a valid hostname and directory path. For example:

    nfs penguin.example.com:/export/cores

    Restart the kdump service for the changes to take effect:

    sudo systemctl restart kdump.service

    Note

    While using the NFS directive to specify the NFS target, kdump.service automatically attempts to mount the NFS target to check the disk space. There is no need to mount the NFS target in advance. To prevent kdump.service from mounting the target, use the dracut_args --mount directive in kdump.conf. This will enable kdump.service to call the dracut utility with the --mount argument to specify the NFS target.

To store the crash dump to a remote machine by using the SSH protocol:
    Remove the hash sign (#) from the beginning of the #ssh user@my.server.com line.
    Replace the value with a valid username and hostname.

    Include your SSH key in the configuration.
        Remove the hash sign from the beginning of the #sshkey /root/.ssh/kdump_id_rsa line.

        Change the value to the location of a key valid on the server you are trying to dump to. For example:

        ssh john@penguin.example.com
        sshkey /root/.ssh/mykey

Additional resources

Files produced by kdump after system crash. 14.4. Configuring the kdump core collector

The kdump service uses a core_collector program to capture the crash dump image. In RHEL, the makedumpfile utility is the default core collector. It helps shrink the dump file by:

Compressing the size of a crash dump file and copying only necessary pages by using various dump levels.
Excluding unnecessary crash dump pages.
Filtering the page types to be included in the crash dump. 

Note

Crash dump file compression is enabled by default in the RHEL 7 and above.

If you need to customise the crash dump file compression, follow this procedure.

Syntax

core_collector makedumpfile -l –message-level 1 -d 31

Options

-c, -l or -p: specify compress dump file format by each page using either, zlib for -c option, lzo for -l option or snappy for -p option.
-d (dump_level): excludes pages so that they are not copied to the dump file.
--message-level : specify the message types. You can restrict outputs printed by specifying message_level with this option. For example, specifying 7 as message_level prints common messages and error messages. The maximum value of message_level is 31. 

Prerequisites

You have root permissions on the system.
Fulfilled requirements for kdump configurations and targets. For details, see Supported kdump configurations and targets. 

Procedure

As a root, edit the /etc/kdump.conf configuration file and remove the hash sign ("#") from the beginning of the #core_collector makedumpfile -l --message-level 1 -d 31.
Enter the following command to enable crash dump file compression: 

core_collector makedumpfile -l –message-level 1 -d 31

The -l option specifies the dump compressed file format. The -d option specifies dump level as 31. The –message-level option specifies message level as 1.

Also, consider following examples with the -c and -p options:

To compress a crash dump file by using -c:

core_collector makedumpfile -c -d 31 --message-level 1

To compress a crash dump file by using -p:

core_collector makedumpfile -p -d 31 --message-level 1

Additional resources

makedumpfile(8) man page on your system
Configuration file for kdump 

14.5. Configuring the kdump default failure responses

By default, when kdump fails to create a crash dump file at the configured target location, the system reboots and the dump is lost in the process. You can change the default failure response and configure kdump to perform a different operation when it fails to save the core dump to the primary target. The additional actions are:

dump_to_rootfs Saves the core dump to the root file system. reboot Reboots the system, losing the core dump in the process. halt Stops the system, losing the core dump in the process. poweroff Power the system off, losing the core dump in the process. shell Runs a shell session from within the initramfs, you can record the core dump manually. final_action Enables additional operations such as reboot, halt, and poweroff after a successful kdump or when shell or dump_to_rootfs failure action completes. The default is reboot. failure_action Specifies the action to perform when a dump might fail in a kernel crash. The default is reboot.

Prerequisites

Root permissions.
Fulfilled requirements for kdump configurations and targets. For details, see Supported kdump configurations and targets. 

Procedure

As a root user, remove the hash sign (#) from the beginning of the #failure_action line in the /etc/kdump.conf configuration file.

Replace the value with a required action.

failure_action poweroff

Additional resources

Configuring the kdump target

14.6. Configuration file for kdump

The configuration file for kdump kernel is /etc/sysconfig/kdump. This file controls the kdump kernel command line parameters. For most configurations, use the default options. However, in some scenarios you might need to modify certain parameters to control the kdump kernel behaviour. For example, modifying the KDUMP_COMMANDLINE_APPEND option to append the kdump kernel command-line to obtain a detailed debugging output or the KDUMP_COMMANDLINE_REMOVE option to remove arguments from the kdump command line.

KDUMP_COMMANDLINE_REMOVE

This option removes arguments from the current kdump command line. It removes parameters that can cause kdump errors or kdump kernel boot failures. These parameters might have been parsed from the previous KDUMP_COMMANDLINE process or inherited from the /proc/cmdline file.

When this variable is not configured, it inherits all values from the /proc/cmdline file. Configuring this option also provides information that is helpful in debugging an issue.

To remove certain arguments, add them to KDUMP_COMMANDLINE_REMOVE as follows: 

KDUMP_COMMANDLINE_REMOVE=”hugepages hugepagesz slub_debug quiet log_buf_len swiotlb”

KDUMP_COMMANDLINE_APPEND

This option appends arguments to the current command line. These arguments might have been parsed by the previous KDUMP_COMMANDLINE_REMOVE variable.

For the kdump kernel, disabling certain modules such as mce, cgroup, numa, hest_disable can help prevent kernel errors. These modules can consume a significant part of the kernel memory reserved for kdump or cause kdump kernel boot failures.

To disable memory cgroups on the kdump kernel command line, run the command as follows: 

KDUMP_COMMANDLINE_APPEND=”cgroup_disable=memory”

Additional resources

The Documentation/admin-guide/kernel-parameters.txt file
The /etc/sysconfig/kdump file 

14.7. Testing the kdump configuration

After configuring kdump, you must manually test a system crash and ensure that the vmcore file is generated in the defined kdump target. The vmcore file is captured from the context of the freshly booted kernel. Therefore, vmcore has critical information for debugging a kernel crash. Warning

Do not test kdump on active production systems. The commands to test kdump will cause the kernel to crash with loss of data. Depending on your system architecture, ensure that you schedule significant maintenance time because kdump testing might require several reboots with a long boot time.

If the vmcore file is not generated during the kdump test, identify and fix issues before you run the test again for a successful kdump testing.

If you make any manual system modifications, you must test the kdump configuration at the end of any system modification. For example, if you make any of the following changes, ensure that you test the kdump configuration for an optimal kdump performances for:

Package upgrades.
Hardware level changes, for example, storage or networking changes.
Firmware upgrades.
New installation and application upgrades that include third party modules.
If you use the hot-plugging mechanism to add more memory on hardware that support this mechanism.
After you make changes in the /etc/kdump.conf or /etc/sysconfig/kdump file. 

Prerequisites

You have root permissions on the system.
You have saved all important data. The commands to test kdump cause the kernel to crash with loss of data.
You have scheduled significant machine maintenance time depending on the system architecture. 

Procedure

Enable the kdump service:

# kdumpctl restart

Check the status of the kdump service with the kdumpctl:

# kdumpctl status
  kdump:Kdump is operational

Optionally, if you use the systemctl command, the output prints in the systemd journal.

Start a kernel crash to test the kdump configuration. The sysrq-trigger key combination causes the kernel to crash and might reboot the system if required.

# echo c > /proc/sysrq-trigger

On a kernel reboot, the address-YYYY-MM-DD-HH:MM:SS/vmcore file is created at the location you have specified in the /etc/kdump.conf file. The default is /var/crash/.

Additional resources

Configuring the kdump target

14.8. Files produced by kdump after system crash

After your system crashes, the kdump service captures the kernel memory in a dump file (vmcore) and it also generates additional diagnostic files to aid in troubleshooting and postmortem analysis.

Files produced by kdump:

vmcore - main kernel memory dump file containing system memory at the time of the crash. It includes data as per the configuration of the core_collector program specified in kdump configuration. By default the kernel data structures, process information, stack traces, and other diagnostic information.
vmcore-dmesg.txt - contents of the kernel ring buffer log (dmesg) from the primary kernel that panicked.
kexec-dmesg.log - has kernel and system log messages from the execution of the secondary kexec kernel that collects the vmcore data. 

Additional resources

What is the kernel ring buffer

14.9. Enabling and disabling the kdump service

You can configure to enable or disable the kdump functionality on a specific kernel or on all installed kernels. You must routinely test the kdump functionality and validate its operates correctly.

Prerequisites

You have root permissions on the system.
You have completed kdump requirements for configurations and targets. See Supported kdump configurations and targets.
All configurations for installing kdump are set up as required. 

Procedure

Enable the kdump service for multi-user.target:

# systemctl enable kdump.service

Start the service in the current session:

# systemctl start kdump.service

Stop the kdump service:

# systemctl stop kdump.service

Disable the kdump service:

# systemctl disable kdump.service

Warning

It is recommended to set kptr_restrict=1 as default. When kptr_restrict is set to (1) as default, the kdumpctl service loads the crash kernel regardless of whether the Kernel Address Space Layout (KASLR) is enabled.

If kptr_restrict is not set to 1 and KASLR is enabled, the contents of /proc/kore file are generated as all zeros. The kdumpctl service fails to access the /proc/kcore file and load the crash kernel. The kexec-kdump-howto.txt file displays a warning message, which recommends you to set kptr_restrict=1. Verify for the following in the sysctl.conf file to ensure that kdumpctl service loads the crash kernel:

Kernel kptr_restrict=1 in the sysctl.conf file.

14.10. Preventing kernel drivers from loading for kdump

You can control the capture kernel from loading certain kernel drivers by adding the KDUMP_COMMANDLINE_APPEND= variable in the /etc/sysconfig/kdump configuration file. By using this method, you can prevent the kdump initial RAM disk image initramfs from loading the specified kernel module. This helps to prevent the out-of-memory (OOM) killer errors or other crash kernel failures.

You can append the KDUMP_COMMANDLINE_APPEND= variable by using one of the following configuration options:

rd.driver.blacklist=<modules>
modprobe.blacklist=<modules> 

Prerequisites

You have root permissions on the system.

Procedure

Display the list of modules that are loaded to the currently running kernel. Select the kernel module that you intend to block from loading:

$ lsmod

Module                  Sise  Used by
fuse                  126976  3
xt_CHECKSUM            16384  1
ipt_MASQUERADE         16384  1
uinput                 20480  1
xt_conntrack           16384  1

Update the KDUMP_COMMANDLINE_APPEND= variable in the /etc/sysconfig/kdump file. For example:

KDUMP_COMMANDLINE_APPEND="rd.driver.blacklist=hv_vmbus,hv_storvsc,hv_utils,hv_netvsc,hid-hyperv"

Also, consider the following example by using the modprobe.blacklist=<modules> configuration option:

KDUMP_COMMANDLINE_APPEND="modprobe.blacklist=emcp modprobe.blacklist=bnx2fc modprobe.blacklist=libfcoe modprobe.blacklist=fcoe"

Restart the kdump service:

# systemctl restart kdump

Additional resources

dracut.cmdline man page on your system.

14.11. Running kdump on systems with encrypted disk

When you run a LUKS encrypted partition, systems require certain amount of available memory. If the system has less than the required amount of available memory, the cryptsetup utility fails to mount the partition. As a result, capturing the vmcore file to an encrypted target location fails in the second kernel (capture kernel).

The kdumpctl estimate command helps you estimate the amount of memory you need for kdump. kdumpctl estimate prints the recommended crashkernel value, which is the most suitable memory size required for kdump.

The recommended crashkernel value is calculated based on the current kernel size, kernel module, initramfs, and the LUKS encrypted target memory requirement.

If you are using the custom crashkernel= option, kdumpctl estimate prints the LUKS required size value. The value is the memory size required for LUKS encrypted target.

Procedure

Print the estimate crashkernel= value:

# *kdumpctl estimate*

Encrypted kdump target requires extra memory, assuming using the keyslot with minimum memory requirement
   Reserved crashkernel:    256M
   Recommended crashkernel: 652M

   Kernel image size:   47M
   Kernel modules size: 8M
   Initramfs size:      20M
   Runtime reservation: 64M
   LUKS required size:  512M
   Large modules: <none>
   WARNING: Current crashkernel size is lower than recommended size 652M.

Configure the amount of required memory by increasing the crashkernel= value.
Reboot the system. 

Note

If the kdump service still fails to save the dump file to the encrypted target, increase the crashkernel= value as required.