Weblog

All | General | Solaris
« How do you access... | Main | Dump device configur... »
20070925 Tuesday September 25, 2007

Why is dump device configured so late ?? Why is dump device configured so late ?? The other day I was playing around with kmdb [setting breakpoints, single stepping, clearing bkpts etc..] and at one stage, thought, Let me take a dump and then issued  :

    [0]> $<systemdump
    nopanicdebug:   0               =       0x1

     panic[cpu0]/thread=fffffffffbc25d20: BAD TRAP: type=d (#gp General protection) rp=fffffffffbc46670     addr=f000ff53f000ff00

     fffffffffbc46550 unix:die+ea ()
     fffffffffbc46660 unix:trap+3de ()
     fffffffffbc46670 unix:cmntrap+e9 ()

     skipping system dump - no dump device configured
     rebooting...

It is the last two lines of o/p that matter to this post (so lot was left out intentionally).
I couldn't collect the crash file - the reason being 'no dump device configured'; I felt bad that the information I wanted was gone and I had to repeat the whole cycle to get to that state ...

I know that by default 'primary swap partition' itself acts as a dump device. And swap device is specified as part of install itself and is actually available in /etc/vfstab as (something like) :
        /dev/dsk/c1t0d0s1       -       -       swap    -       no      -

That brings me to the question:  Where in the boot sequence do we configure swap as dump device ?
Looking a the code to see where the above error msg comes from: it turns out to be dumpsys()
          uprintf("skipping system dump - no dump device configured\n");
dumpvp not being set is the reason why dumpsys() prints out this msg;  To understand where it gets set, I put a break point in dumpinit() and then cotinued.
The state when the brkpt is hit :

        [1]> $c
        dumpinit(ffffff0154b1c880, 0, 0)
        swapctl+0x68a(1, fffffd7fffdffc90, ffffff00042acee4)
        uadmin+0x10f(10,
        [1]> ::ptree
        fffffffffbc250b0  sched
           ffffff01513923a8  fsflush
           ffffff015138e3a8  pageout
           ffffff015138c3a8  init
                 ffffff0152b7a3a8  svc.configd
                 ffffff0152aaf3a8  svc.startd
                       ffffff01548f13a8  fs-usr
                            ffffff01549ce3a8  swapadd
                                 ffffff0154a1b3a8  swap
        [1]

That means, in response to the cmd 'swap -1 -a /dev/dsk/c1t0d0s1', dumpvp gets set in the kernel. So if the system were to panic till this happened we would not be able to collect a crash file for subsequent analysis. Running '::modinfo' shows 161 kernel modules loaded (on my test system) - All this state also would be lost.

I am wondering why should it be pushed to this stage - why can't it be done earlier?

The way /etc/system is read, /etc/vfstab could also be read, look out for an entry with FStype as 'swap' and try configuring it as the dumpvp at the earliest possible moment  - Wondering, if this can be done immediatly after /dev, /devices and disk dirver (sd?) modules are loaded.

Even with this change, we can't capture the panics that happen before dumpvp is configured, but that window will be significantly reduced !!!

Moinak was suggesting, explore the option of making it part of bootenv.rc - good point, but that makes it x86 specific - needs more investigation.
Comments please.
( Sep 25 2007, 01:23:43 AM PDT ) Permalink

Comments:

Post a Comment:

Comments are closed for this entry.

Calendar

RSS Feeds

Search

Links

Navigation

Referers