Thursday June 30, 2005 | Richard McDougall's Weblog Commentary from Race Control |
|
Using DTrace for Memory Analysis Following on from yesterday's post on using prstat to look at memory slow-downs, here is a more advanced way of investigating. The DTrace probes help us identify all sources of paging in the system, and give us the ability to drill down quickly to identify cause and affect. With DTrace, you can probe more deeply into the sources of activity observed with higher-level memory analysis tools. For example, if you determine that a significant amount of paging activity is due to a memory shortage, you can determine which process is initiating the paging activity. In another example, if you see a significant amount of paging due to file activity, you can drill down to see which process and which file is responsible. DTrace allows for memory analysis through a vminfo provider, and, optionally, through deeper tracing of virtual memory paging with the fbt provider. The vminfo provider probes correspond to the fields in the "vm" named kstat: a probe provided by vminfo fires immediately before the corresponding vm value is incremented. The table below from the DTrace guide lists the probes available from the VM provider. A probe takes the following arguments:
For example, if you should see the following paging activity with vmstat, indicating page-in from the swap device, you could drill down to investigate.
sol8# vmstat -p 3
memory page executable anonymous filesystem
swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf
1512488 837792 160 20 12 0 0 0 0 0 8102 0 0 12 12 12
1715812 985116 7 82 0 0 0 0 0 0 7501 0 0 45 0 0
1715784 983984 0 2 0 0 0 0 0 0 1231 0 0 53 0 0
1715780 987644 0 0 0 0 0 0 0 0 2451 0 0 33 0 0
sol10$ dtrace -n anonpgin'{@[execname] = count()}'
dtrace: description 'anonpgin' matched 1 probe
svc.startd 1
sshd 2
ssh 3
dtrace 6
vmstat 28
filebench 913
Using DTrace to Estimate Memory Slowdowns You can use Using DTrace to, we can directly measure time elapsed time around the page-in probes when a process is waiting for page-in from the swap device, as in this example.
sched:::on-cpu
{
self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
@oncpu[execname] = sum(vtimestamp - self->on);
self->on = 0;
}
vminfo:::anonpgin
{
self->anonpgin = 1;
}
:::pageio_setup:return
{
self->wait = timestamp;
}
:::pageio_done:entry
/self->anonpgin == 1/
{
self->anonpgin = 0;
@pageintime[execname] = sum(timestamp - self->wait);
self->wait = 0;
}
END
{
normalize(@oncpu, 1000000);
printf("Who's on cpu (milliseconds):\n");
printa(" %-50s %15@d\n", @oncpu);
normalize(@pageintime, 1000000);
printf("Who's waiting for pagein (milliseconds):\n");
printa(" %-50s %15@d\n", @pageintime);
}
With an aggregation by execname, you can we can look to see who is being held up by paging the most.
sol10$./whospaging.d ^C Who's on cpu (milliseconds): svc.startd 1 loop.sh 2 sshd 2 ssh 3 dtrace 6 vmstat 28 pageout 60 fsflush 120 filebench 913 sched 84562 Who's waiting for pagein (milliseconds): filebench 230704 The DTrace script displays the amount of time the program spends doing useful work compared to the amount of time it spends waiting for page-in. The next script measures the elapsed time from when a program stalls on a page in from the swap device (anonymous page ins) and when it resumes for a specific pid target, specified on the command line. sched:::on-cpu
/pid == $1/
{
self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
@time["
In the following example, the program spends 0.9 seconds doing useful work, and 230 seconds waiting for page-ins.
sol10$ /pagingtime.d 22599 dtrace: script './pagingtime.d' matched 10 probes ^C 1 2 :END Time breakdown (milliseconds): <on cpu> 913 <paging wait> 230704
Technorati Tag: OpenSolaris Technorati Tag: Solaris Technorati Tag: DTrace ( Jun 30 2005, 01:15:58 PM PDT ) Permalink Comments [1]
|
|
||||||||||||||||||||||||||||||||||||