
Wednesday October 10, 2007
Dump device configuration - part2
Dump device configuration - part2
To my earlier
post
on 'why the dump device configured so late', I got a very interesting
reply, questioning the claim: 'window will be significantly reduced'.
Let me be frank, as of the first post, I didn't conduct any studies as
to quantify the gain.
So the mail prompted me to do some tests to quantify the gain.
I want to understand how many function calls get made from the word
'go' [i.e from
main()]
till
dumpinit()
gets called; And also figure out how many modules get loaded in the
same window.
That seems a pretty arduous task. At the same time I can't spend days
trying to just get this data. This is where OpenSolaris rocks - if not
for DTrace, I would have given up on this. DTrace kernel module gets
loaded very early in the boot right after the kernel starts [i.e
unix/genunix/krtld modules are loaded]. DTrace supports a feature
called 'Anonymous tracing' - which can be used in this context just to
get the data I am interested. This won't get me right from
main()
but as close to
main()
as possible. This option is good enough to get an idea of what
the numbers look like.
I needed answers for atleast the following questions :
- How many function calls are made from the point
DTrace can start to the point of dumpinit() ?
- How many function calls are made from the point
root is mounted to the point of dumpinit() ?
- How many function calls are made from
start_other_cpus()
[this routine starts the rest of the CPUs in a multi cpu system; Till
this point, only the boot cpu does all the work; This opens up scope
for race conditions / corruption / hangs - So even if we could capture
crash file from this point, that would still be a nice win] to the
point of dumpinit() ?
I put together the following simple script to answer the 3rd question.
# cat mp_init.d
BEGIN {
dumpinit = 0;
}
/* start collecting data here */
fbt::start_other_cpus:return
/dumpinit == 0/
{
dumpinit = 1;
}
fbt:::entry
/dumpinit == 1/
{
@[probefunc]=count();
}
/* stop collecting data here */
fbt::dumpinit:return
/dumpinit == 1/
{
dumpinit = 0;
}
#
And invoke the script as follows :
#
dtrace -As ./mp_init.d
dtrace: saved anonymous enabling in
/kernel/drv/dtrace.conf
dtrace: added forceload directives to
/etc/system
dtrace: run update_drv(1M) or reboot
to enable changes
#
Then reboot the system and when system comes back, just run :
#
dtrace -ae > mp_init.out
mp_init.out has thousands of
lines which look something like :
tsc_scalehrtime
369164
syscall_exit
375467
Module information can be gotten using 'kmdb' - For this I booted the
system under kmdb, put a break point in
start_other_cpus()
and let it continue. When the break point is hit, I ran
'::modinfo' and saved the o/p; Then I put one more break point in
dumpinit()
and again let it continue; When the new breakpoint is hit collect the
modinfo o/p again and comparing this with the previous one gives you an
idea of how many modules got loaded in between.
With simple changes to the script, I got answers for the 1st and 2nd
questions as well. Module info can be gotten as described above.
Then I used awk(1) to post process
mp_init.out
and the resulting numbers look like this :
Q) How many function calls are made from the
point DTrace can start to the point of dumpinit() ?
A) 59 million function calls were made from
the point dtrace can start
to the point of
dumpinit:return
Q) How many function calls are made from the point
root is mounted to the point of dumpinit() ?
A) 52 million calls were made from mounting root
to dumpinit:return
- 120 modules got
loaded in the same window
Q) How many function calls are made from
start_other_cpus()
to the point of dumpinit() ?
A) 43 million calls were made from
start_other_cpus() to dumpinit:return
- 52 modules got
loaded in the same window
The data speaks for itself !! Data modified by 43million routines can
be captured in a dump, if we could advance dump device configuration to
just before
start_other_cpus().
Gain is more if it could be advanced closer to mount of root FS.
( Oct 10 2007, 05:36:01 AM PDT )
Permalink