Home / Linux / Oops! Debugging Kernel Panics | Linux Journal

Oops! Debugging Kernel Panics | Linux Journal

Oops! Debugging Kernel Panics | Linux Journal

A glance into what causes kernel panics and a few utilities to assist acquire
extra data.

Working in a Linux surroundings, how usually have you ever seen a kernel panic?
When it occurs, your system is left in a crippled state till
you reboot it utterly. And, even after you get your system again right into a
purposeful state, you are still left with the query: why? You could haven’t any
thought what occurred or why it occurred. Those questions could be answered
and the next information will assist you to root out the reason for among the situations
that led to the unique crash.


Figure 1. A Typical Kernel Panic

Let’s begin by taking a look at a set of utilities often called
kexec and kdump. kexec lets you boot into
one other kernel from an present (and operating) kernel, and
kdump is a
kexec-based crash-dumping mechanism for Linux.

Installing the Required Packages

First and foremost, your kernel ought to have the next parts statically in-built to its picture:


You can discover this in /boot/config-`uname -r`.

Make certain that your working system is updated with the latest-and-greatest bundle variations:

$ sudo apt replace && sudo apt improve

Install the next packages
(I am at the moment utilizing Debian, however the
similar ought to and can apply to Ubuntu):

$ sudo apt set up gcc make binutils linux-headers-`uname -r`
 ↪kdump-tools crash `uname -r`-dbg

Note: Package names could fluctuate
throughout distributions.

During the set up, you’ll be prompted with inquiries to allow
kexec to deal with reboots (reply no matter you would like, however I answered
“no”; see Figure 2).


Figure 2.
kexec Configuration Menu

And to allow kdump to run and cargo at system boot, reply
“yes” (Figure Three).


Figure Three.
kdump Configuration Menu

Configuring kdump

Open the /and so on/default/kdump-tools file, and on the very prime,
you need to see the next:


Eventually, you will write a customized module that may set off an OOPS kernel
situation, and as a way to have kdump collect and save the state of the
system for autopsy evaluation, you will have to allow your kernel to
panic on this OOPS situation. In order to do that, uncomment the road
that begins with KDUMP_SYSCTL:


The preliminary testing would require that SysRq be enabled. There
are a couple of methods to do this, however right here I present directions
to allow assist for this function on system reboot. Open the
/and so on/sysctl.d/99-sysctl.conf file, and make it possible for the
following line (nearer to the underside of the file) is uncommented:


Now, open this file: /and so on/default/grub.d/kdump-tools.default. You
will discover a single line that appears like this:


Modify the part that reads crashkernel=384M-:128M to

Now, replace your GRUB boot configuration file:

$ sudo update-grub
[sudo] password for petros:
Generating grub configuration file ...
Found linux picture: /boot/vmlinuz-Four.9.Zero-Eight-amd64
Found initrd picture: /boot/initrd.img-Four.9.Zero-Eight-amd64

And, reboot the system.

Verifying Your kdump Environment

After getting back from the reboot, dmesg will log the

$ sudo dmesg |grep -i crash
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-Four.9.Zero-Eight-amd64
 ↪root=UUID=bd76b0fe-9d09-40a9-a0d8-a7533620f6fa ro quiet
[    0.000000] Reserving 128MB of reminiscence at 720MB for crashkernel
 ↪(System RAM: 4095MB)
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/
 ↪root=UUID=bd76b0fe-9d09-40a9-a0d8-a7533620f6fa ro
 ↪quiet crashkernel=128M

While your kernel could have the next options enabled (a “1”
means enabled):

$ sudo sysctl -a|grep kernel|grep -e panic_on_oops -e sysrq
kernel.panic_on_oops = 1
kernel.sysrq = 1

Your kdump service ought to be operating:

$ sudo systemctl standing kdump-tools.service
 kdump-tools.service - Kernel crash dump seize service
   Loaded: loaded (/lib/systemd/system/kdump-tools.service;
    ↪enabled; vendor preset: enabled)
   Active: lively (exited) since Tue 2019-02-26 08:13:34 CST;
    ↪1h 33min in the past
  Process: 371 ExecStart=/and so on/init.d/kdump-tools begin
   ↪(code=exited, standing=Zero/SUCCESS)
 Main PID: 371 (code=exited, standing=Zero/SUCCESS)
    Tasks: Zero (restrict: 4915)
   CGroup: /system.slice/kdump-tools.service

Feb 26 08:13:34 deb-panic systemd[1]: Starting Kernel crash
 ↪dump seize service...
Feb 26 08:13:34 deb-panic kdump-tools[371]: Starting
 ↪kdump-tools: loaded kdump kernel.
Feb 26 08:13:34 deb-panic kdump-tools[505]: /sbin/kexec -p
 ↪--command-line="BOOT_IMAGE=/boot/vmlinuz-Four.9.Zero-Eight-amd64 root=
Feb 26 08:13:34 deb-panic kdump-tools[506]: loaded kdump kernel
Feb 26 08:13:34 deb-panic systemd[1]: Started Kernel crash dump
 ↪seize service.

Your crash kernel ought to be loaded (into reminiscence and within the 128M area
you outlined earlier):

$ cat /sys/kernel/kexec_crash_loaded

You can confirm your kdump configuration additional right here:

$ sudo kdump-config present
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x2d000000
   /var/lib/kdump/vmlinuz: symbolic hyperlink to /boot/
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic hyperlink to /var/lib/kdump/
present state:    able to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/
↪vmlinuz-Four.9.Zero-Eight-amd64 root=UUID=bd76b0fe-9d09-40a9-
↪a0d8-a7533620f6fa ro quiet irqpoll nr_cpus=1 nousb
 ↪--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

Let’s additionally check it with out truly operating it:

$ sudo kdump-config check
USE_KDUMP:         1
KDUMP_SYSCTL:      kernel.panic_on_oops=1
KDUMP_COREDIR:     /var/crash
crashkernel addr:  0x2d000000
kdump kernel addr:
kdump kernel:
   /var/lib/kdump/vmlinuz: symbolic hyperlink to /boot/
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic hyperlink to
kexec command for use:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/
↪vmlinuz-Four.9.Zero-Eight-amd64 root=UUID=bd76b0fe-9d09-40a9-
↪a0d8-a7533620f6fa ro quiet irqpoll nr_cpus=1 nousb
 ↪--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

The Moment of Truth

Now that your surroundings is loaded to utilize kdump, you
in all probability ought to check it, and the easiest way to check it’s by forcing a
kernel crash over SysRq. Assuming your kernel is constructed with SysRq assist,
crashing a operating kernel is so simple as typing:

$ echo "c" | sudo tee -a /proc/sysrq-trigger

What do you have to anticipate? You’ll see a kernel panic/crash just like the
one proven in Figure 1. Following this crash, the kernel loaded over kexec will
acquire the state of the system, which incorporates every thing related in
reminiscence, on the CPU, in dmesg, in loaded modules and extra. It then
will save this useful crash information someplace in /var/crash for
additional evaluation. Once the gathering of data completes, the system
will reboot mechanically and can deliver you again to a purposeful state.

What Now?

You now have your crash file, and once more, it is situated in

$ cd /var/crash/
$ ls
201902261006  kexec_cmd
$ cd 201902261006/

Although earlier than opening the crash file, you in all probability ought to set up the
kernel’s supply bundle:

$ sudo apt supply linux-image-`uname -r`

Earlier, you put in a debug model of your Linux kernel containing
the unstripped debug symbols required for one of these debugging
evaluation. Now you want that kernel. Open the kernel crash file with the
crash utility:

$ sudo crash dump.201902261006 /usr/lib/debug/

Once every thing hundreds, a abstract of the panic will seem on the display:

      KERNEL: /usr/lib/debug/vmlinux-Four.9.Zero-Eight-amd64
    DUMPFILE: dump.201902261006  [PARTIAL DUMP]
        CPUS: Four
        DATE: Tue Feb 26 10:07:21 2019
      UPTIME: 00:04:09
LOAD AVERAGE: Zero.00, Zero.00, Zero.00
       TASKS: 100
    NODENAME: deb-panic
     RELEASE: Four.9.Zero-Eight-amd64
     VERSION: #1 SMP Debian Four.9.144-Three (2019-02-02)
     MACHINE: x86_64  (2592 Mhz)
      MEMORY: Four GB
       PANIC: "sysrq: SysRq : Trigger a crash"
         PID: 563
     COMMAND: "tee"
        TASK: ffff88e69628c080 [THREAD_INFO: ffff88e69628c080]
         CPU: 2

Notice the rationale for the panic: sysrq: SysRq : Trigger
a crash
. Also, discover the command that led to it:
tee. None of this ought to be a shock because you
triggered it.

If you run a backtrace of what the kernel capabilities have been that led to the
panic, you need to see the next (processed by CPU core no. 2):

crash> bt
PID: 563    TASK: ffff88e69628c080  CPU: 2   COMMAND: "tee"
 #Zero [ffffa67440b23ba0] machine_kexec at ffffffffa0c53f68
 #1 [ffffa67440b23bf8] __crash_kexec at ffffffffa0d086d1
 #2 [ffffa67440b23cb8] crash_kexec at ffffffffa0d08738
 #Three [ffffa67440b23cd0] oops_end at ffffffffa0c298b3
 #Four [ffffa67440b23cf0] no_context at ffffffffa0c619b1
 #5 [ffffa67440b23d50] __do_page_fault at ffffffffa0c62476
 #6 [ffffa67440b23dc0] page_fault at ffffffffa121a618
    [exception RIP: sysrq_handle_crash+18]
    RIP: ffffffffa102be62  RSP: ffffa67440b23e78  RFLAGS: 00010282
    RAX: ffffffffa102be50  RBX: 0000000000000063  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff88e69fd10648  RDI: 0000000000000063
    RBP: ffffffffa18bf320   R8: 0000000000000001   R9: 0000000000007eb8
    R10: 0000000000000001  R11: 0000000000000001  R12: 000000000000Zero004
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa67440b23e78] __handle_sysrq at ffffffffa102c597
 #Eight [ffffa67440b23ea0] write_sysrq_trigger at ffffffffa102c9db
 #9 [ffffa67440b23eb0] proc_reg_write at ffffffffa0e7ac00
#10 [ffffa67440b23ec8] vfs_write at ffffffffa0e0b3b0
#11 [ffffa67440b23ef8] sys_write at ffffffffa0e0c7f2
#12 [ffffa67440b23f38] do_syscall_64 at ffffffffa0c03b7d
#13 [ffffa67440b23f50] entry_SYSCALL_64_after_swapgs at ffffffffa121924e
    RIP: 00007f3952463970  RSP: 00007ffc7f3a4e58  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007f3952463970
    RDX: 0000000000000002  RSI: 00007ffc7f3a4f60  RDI: 0000000000000003
    RBP: 00007ffc7f3a4f60   R8: 00005648f508b610   R9: 00007f3952944480
    R10: 0000000000000839  R11: 0000000000000246  R12: 0000000000000002
    R13: 0000000000000001  R14: 00005648f508b530  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: Zero02b

In your backtrace, you need to discover the image tackle of what’s saved in
your Return Instruction Pointer (RIP): ffffffffa102be62. Let’s check out this image tackle:

crash> sym ffffffffa102be62
ffffffffa102be62 (t) sysrq_handle_crash+18 ./debian/construct/
↪build_amd64_none_amd64/./drivers/tty/sysrq.c: 144

Wait a minute! The exception appears to have been triggered in line 144
of the drivers/tty/sysrq.c file and contained in the
sysrq_handle_crash perform. Hmm…I ponder what’s occurring
on this kernel supply file. (This is why I had you put in your kernel supply
bundle moments in the past.) Let’s navigate to the /usr/src
listing and untar the supply bundle:

$ cd /usr/src
$ ls
linux_4.9.144-Three.debian.tar.xz  linux_4.9.144.orig.tar.xz
linux_4.9.144-Three.dsc            linux-headers-Four.9.Zero-Eight-amd64
$ sudo tar xJf linux_4.9.144.orig.tar.xz
$ vim linux-Four.9.144/drivers/tty/sysrq.c

Locate the sysrq_handle_crash perform:

static void sysrq_handle_crash(int key)

And extra particularly, have a look at line 144:

*killer = 1;

It was this line that led to the web page fault logged in line #6 of the

#6 [ffffa67440b23dc0] page_fault at ffffffffa121a618

Okay. So, now you need to have a fundamental understanding of the right way to debug dangerous
kernel code,
however what occurs if you wish to debug your very personal customized kernel modules
(for instance, drivers)? I wrote a easy Linux kernel module that basically
invokes an analogous fashion of a kernel crash when loaded. Call it
test-module.c and put it aside someplace in your house listing:

#embody <linux/init.h>
#embody <linux/module.h>
#embody <linux/model.h>

static int test_module_init(void)

static void test_module_exit(void)



You’ll want a Makefile to compile this kernel module (put it aside within the
similar listing):

obj-m += test-module.o

    $(MAKE) -C/lib/modules/$(shell uname -r)/construct M=$(PWD)

Run the make command to compile the module and do
not delete any of the compilation artifacts; you will want
these later:

$ make
make -C/lib/modules/Four.9.Zero-Eight-amd64/construct M=/residence/petros
make[1]: Entering listing '/usr/src/
  CC [M]  /residence/petros/test-module.o
/residence/petros/test-module.c: In perform "test_module_init":
/residence/petros/test-module.c:7:11: warning: initialization makes
 ↪pointer from integer with no forged [-Wint-conversion]
  int *p = 1;
  Building modules, stage 2.
  MODPOST 1 modules
  LD [M]  /residence/petros/test-module.ko
make[1]: Leaving listing '/usr/src/

Note: you might even see a compilation warning. Ignore it
for now. This warning can be what triggers your kernel crash.

Be cautious now. Once you load the .ko file, the system will
crash, so be certain that every thing is saved and synchronized to disk:

$ sync && sudo insmod test-module.ko

Similar to earlier than, the system will crash, the kexec
kernel/surroundings will assist collect every thing and put it aside someplace in
/var/crash, adopted by an automated reboot. After you might have
rebooted and are again right into a purposeful state, find the brand new crash
listing and alter into it:

$ cd /var/crash/201902261035/

Also, copy the unstripped kernel object file to your test-module from
your private home listing and into the present working listing:

$ sudo cp ~/check.o /var/crash/201902261035/

Load the crash file together with your debug kernel:

$ sudo crash dump.201902261035 /usr/lib/debug/

Your abstract ought to look one thing like this:

      KERNEL: /usr/lib/debug/vmlinux-Four.9.Zero-Eight-amd64
    DUMPFILE: dump.201902261035  [PARTIAL DUMP]
        CPUS: Four
        DATE: Tue Feb 26 10:37:47 2019
      UPTIME: 00:11:16
LOAD AVERAGE: Zero.24, Zero.06, Zero.02
       TASKS: 102
    NODENAME: deb-panic
     RELEASE: Four.9.Zero-Eight-amd64
     VERSION: #1 SMP Debian Four.9.144-Three (2019-02-02)
     MACHINE: x86_64  (2592 Mhz)
      MEMORY: Four GB
       PANIC: "BUG: unable to deal with kernel NULL pointer
 ↪dereference at 0000000000000001"
         PID: 1493
     COMMAND: "insmod"
        TASK: ffff893c5a5a5080 [THREAD_INFO: ffff893c5a5a5080]
         CPU: Three

The motive for the kernel crash is summarized as follows:
BUG: unable to deal with kernel NULL pointer dereference at
. The userspace command that led to the panic
was your insmod.

A backtrace will reveal a web page fault exception at tackle

crash> bt
PID: 1493   TASK: ffff893c5a5a5080  CPU: Three  COMMAND: "insmod"
 #Zero [ffff9dcd013b79f0] machine_kexec at ffffffffa3a53f68
 #1 [ffff9dcd013b7a48] __crash_kexec at ffffffffa3b086d1
 #2 [ffff9dcd013b7b08] crash_kexec at ffffffffa3b08738
 #Three [ffff9dcd013b7b20] oops_end at ffffffffa3a298b3
 #Four [ffff9dcd013b7b40] no_context at ffffffffa3a619b1
 #5 [ffff9dcd013b7ba0] __do_page_fault at ffffffffa3a62476
 #6 [ffff9dcd013b7c10] page_fault at ffffffffa401a618
    [exception RIP: init_module+5]
    RIP: ffffffffc05ed005  RSP: ffff9dcd013b7cc8  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000080000000  RSI: ffff893c5a5a5ac0  RDI: ffffffffc05ed00Zero
    RBP: ffffffffc05ed00Zero   R8: 0000000000020098   R9: 0000000000000006
    R10: 0000000000000000  R11: ffff893c5a4d8100  R12: ffff893c5880d460
    R13: ffff893c56500e80  R14: ffffffffc05ef00Zero  R15: ffffffffc05ef050
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff9dcd013b7cc8] do_one_initcall at ffffffffa3a0218e
 #Eight [ffff9dcd013b7d38] do_init_module at ffffffffa3b81531
 #9 [ffff9dcd013b7d58] load_module at ffffffffa3b04aaa
#10 [ffff9dcd013b7e90] SYSC_finit_module at ffffffffa3b051f6
#11 [ffff9dcd013b7f38] do_syscall_64 at ffffffffa3a03b7d
#12 [ffff9dcd013b7f50] entry_SYSCALL_64_after_swapgs at ffffffffa401924e
    RIP: 00007f124662c469  RSP: 00007fffc4ca04a8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000564213d111f0  RCX: 00007f124662c469
    RDX: 0000000000000000  RSI: 00005642129d3638  RDI: 0000000000000003
    RBP: 00005642129d3638   R8: 0000000000000000   R9: 00007f12468e3ea0
    R10: 0000000000000003  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000564213d10130  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000139  CS: 0033  SS: Zero02b

Let’s try to have a look at the image on the tackle

crash> sym ffffffffc05ed005
ffffffffc05ed005 (t) init_module+5 [test-module]

Hmm. The problem occurred someplace within the module initialization code of
the test-module kernel driver. But what occurred to all of
the small print proven within the earlier evaluation? Well, as a result of this code is
not a part of the debug kernel picture, you will have to discover a method to load
it into your crash evaluation. This is why I instructed you to repeat over the
unstripped object file into your present working listing. Now it is time to load
the module’s object file:

crash> mod -s check ./check.o
     MODULE       NAME                   SIZE  OBJECT FILE
ffffffffc05ef00Zero  check                  16384  ./check.o

Now you’ll be able to return and have a look at the identical image tackle:

crash> sym ffffffffc05ed005
ffffffffc05ed005 (T) init_module+5 [test-module]
 ↪/residence/petros/test-module.c: Eight

And, now it is time to revisit to your code and have a look at line Eight:

$ sed -n 8p check.c
        printk("%dn", *p);

There you might have it. The web page fault occurred whenever you tried to
print the poorly outlined pointer. Remember the compilation warning from
earlier? Well, it was warning you for a motive, and on this present case,
it is the rationale that triggered the kernel panic. You might not be as
lucky in future coding circumstances.

What Else Can You Do Here?

The kernel crash file will protect many artifacts out of your system on the
occasion of your crash. You can record a brief abstract of obtainable instructions with the
assist command:

crash> assist

*            information        mach         repeat       timer
alias        foreach      mod          runq         tree
ascii        fuser        mount        search       union
bt           gdb          web          set          vm
btop         assist         p            sig          vtop
dev          ipcs         ps           struct       waitq
dis          irq          pte          swap         whatis
eval         kmem         ptob         sym          wr
exit         record         ptov         sys          q
prolong       log          rd           activity

For occasion, if you wish to see a common abstract of reminiscence utilization:

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   979869       Three.7 GB         ----
         FREE   835519       Three.2 GB   85% of TOTAL MEM
         USED   144350     563.9 MB   14% of TOTAL MEM
       SHARED     8374      32.7 MB    Zero% of TOTAL MEM
      BUFFERS     3849        15 MB    Zero% of TOTAL MEM
       CACHED        Zero            Zero    Zero% of TOTAL MEM
         SLAB     5911      23.1 MB    Zero% of TOTAL MEM

   TOTAL SWAP  1047807         Four GB         ----
    SWAP USED        Zero            Zero    Zero% of TOTAL SWAP
    SWAP FREE  1047807         Four GB  100% of TOTAL SWAP

 COMMIT LIMIT  1537741       5.9 GB         ----
    COMMITTED    16370      63.9 MB    1% of TOTAL LIMIT

If you need to see what dmesg logged as much as the purpose of
the failure:

crash> log

[    0.000000] Linux model Four.9.Zero-Eight-amd64
 ↪(debian-kernel@lists.debian.org) (gcc model 6.Three.Zero
 ↪20170516 (Debian 6.Three.Zero-18+deb9u1) ) #1 SMP Debian
 ↪Four.9.144-Three (2019-02-02)
[    0.000000] Command line: BOOT_IMAGE=/boot/
↪vmlinuz-Four.9.Zero-Eight-amd64 root=UUID=bd76b0fe-9d09-40a9-
↪a0d8-a7533620f6fa ro quiet crashkernel=128M
[    0.000000] x86/fpu: Supporting XSAVE function 0x001:
 ↪'x87 floating level registers'
[    0.000000] x86/fpu: Supporting XSAVE function 0x002:
 ↪'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE function 0x004:
 ↪'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:

[ .... ]

Using the identical crash utility, you’ll be able to drill even deeper into reminiscence
places and their contents, what’s being dealt with by each CPU core
on the time of the crash and a lot extra. If you need to study extra
about these capabilities, merely kind assist adopted by the
perform title:

crash> assist mount

Something just like a person web page will load onto your display.


So, there you might have it: an introduction into kernel crash debugging. This
barely scrapes the floor, however hopefully, it would present
you with a correct place to begin to assist diagnose kernel crashes in
manufacturing, improvement and check environments.

About Agent

Check Also

SpaceX Successfully Launches 60 More Starlink Satellites as it Continues Towards 2020 Service Debut

SpaceX Successfully Launches 60 More Starlink Satellites as it Continues Towards 2020 Service Debut SpaceX …

Leave a Reply

Your email address will not be published. Required fields are marked *