Wednesday June 15, 2005
Diagnosing kernel hangs/panics with kmdb and moddebug
If you experience hangs or panics during Solaris boot, whether it's during installation or after you've already installed, using the kernel debugger can be a big help in collecting the first set of "what happened" information.
The kernel debugger is named "kmdb" in Solaris 10 and later, and is invoked by supplying the '-k' switch in the kernel boot arguments. So a common request from a kernel engineer starting to examine a problem is often "try booting with kmdb".
Sometimes it's useful to either set a breakpoint to pause the kernel startup and examine something, or to just set a kernel variable to enable or disable a feature, or enable debugging output. If you use -k to invoke kmdb, but also supply the '-d' switch, the debugger will be entered before the kernel really starts to do anything of consequence, so that you can set kernel variables or breakpoints.
So "booting with the -kd flags" is the key to "booting under the kernel debugger". Now, how do we do that?
On modern Solaris and OpenSolaris systems, GRUB is used to boot; to enable the kernel debugger, you add -kd arguments to the "kernel" (or "kernel$") line in the GRUB menu entry. When presented with the GRUB menu, hit 'e' to edit the entry, highlight the kernel line, and hit 'e' again to edit it; add the -kd arguments just after the /platform/i86pc/kernel/$ISADIR/unix argument, so that it says
kernel$ /platform/i86pc/kernel/$ISADIR/unix -kdand then hit 'b' to boot that edited menu entry. '-k' means "start the debugger"; '-d' means "immediately enter the debugger after loading the kernel". After some booting status, you'll see the kernel debugger announce itself like this:
[0]>
(The number in square brackets is the CPU that is running the kernel debugger; that number might change for later entries into the debugger.)
[0]> moddebug/W 80000000 [0]> :cThat will give you debug output for each kernel module that loads. (see /usr/include/sys/modctl.h, near the bottom, for moddebug flag information. I find 0x80000000 is the only one I really ever use.)
[0]> $c
A few other very useful information commands during a panic are
::msgbufwhich will show you the last things the kernel printed onscreen, and
::statuswhich shows a summary of the state of the machine in panic.
[1]> 0::switch
There's obviously a lot more you can do with the kernel debugger, but these small tips will
sometimes help get from a "I have no idea what to do" to "I have a few ideas to try that might
let me continue to boot or install", which can make all the difference.
Technorati Tag:
opensolaris
solaris
( Jun 15 2005, 04:26:17 PM PDT )
Permalink
Comments [6]