
Tuesday September 25, 2007
Why is dump device configured so late ??
Why is dump device configured so late ??
The other day I was playing around with kmdb [setting breakpoints,
single stepping, clearing bkpts etc..] and at one stage, thought, Let
me take a dump and then issued :
[0]> $<systemdump
nopanicdebug:
0
= 0x1
panic[cpu0]/thread=fffffffffbc25d20: BAD TRAP:
type=d (#gp General protection)
rp=fffffffffbc46670 addr=f000ff53f000ff00
fffffffffbc46550 unix:die+ea ()
fffffffffbc46660 unix:trap+3de ()
fffffffffbc46670 unix:cmntrap+e9 ()
skipping system dump - no dump device
configured
rebooting...
It is the last two lines of o/p that matter to this post (so lot was
left out intentionally).
I couldn't collect the crash file - the reason being 'no dump device
configured'; I felt bad that the information I wanted was gone and I
had to repeat the whole cycle to get to that state ...
I know that by default 'primary swap partition' itself acts as a dump
device. And swap device is specified as part of install itself and is
actually available in /etc/vfstab as (something like) :
/dev/dsk/c1t0d0s1
-
- swap
- no -
That brings me to the question: Where in the boot sequence do we
configure swap as dump device ?
Looking a the code to see where the above error msg comes from: it
turns out to be
dumpsys()
uprintf(
"skipping system dump - no dump device configured\n");
dumpvp
not being set is the reason why
dumpsys()
prints out this msg; To understand where it gets set, I put a
break point in
dumpinit()
and then cotinued.
The state when the brkpt is hit :
[1]> $c
dumpinit(ffffff0154b1c880, 0, 0)
swapctl+0x68a(1,
fffffd7fffdffc90, ffffff00042acee4)
uadmin+0x10f(10,
[1]> ::ptree
fffffffffbc250b0 sched
ffffff01513923a8 fsflush
ffffff015138e3a8 pageout
ffffff015138c3a8 init
ffffff0152b7a3a8 svc.configd
ffffff0152aaf3a8 svc.startd
ffffff01548f13a8 fs-usr
ffffff01549ce3a8 swapadd
ffffff0154a1b3a8 swap
[1]
That means, in response to the cmd 'swap -1 -a /dev/dsk/c1t0d0s1',
dumpvp
gets set in the kernel. So if the system were to panic till this
happened we would not be able to collect a crash file for subsequent
analysis. Running '::modinfo' shows 161 kernel modules loaded (on my
test system) - All this state also would be lost.
I am wondering why should it be pushed to this stage - why can't it be
done
earlier?
The way /etc/system is read, /etc/vfstab could also be read, look out
for an entry with FStype as 'swap' and try configuring it as the
dumpvp
at the earliest possible moment - Wondering, if this can be done
immediatly after /dev, /devices and disk dirver (sd?) modules are
loaded.
Even with this change, we can't capture the panics that happen before
dumpvp is configured, but that window will be significantly reduced !!!
Moinak was suggesting,
explore the option of making it part of bootenv.rc - good point, but
that makes it x86 specific - needs more investigation.
Comments please.
( Sep 25 2007, 01:23:43 AM PDT )
Permalink