|
A while ago Jonathan Adams posted an mdb cheatsheet, summarizing the syntax and commands of the Solaris Modular Debugger. It's a pretty handy reference, and last week I used it as a handout when teaching a class that included both mdb and DTrace. Well, I have the mdb cheatsheet - what about a DTrace one?
Here is my DTrace cheatsheet, styled by Todd. This doesn't include all of DTrace - it includes what I use most frequently. As a reminder, I flicked through the DTraceToolkit to see what I frequently used in those scripts.
If you learn what everything does on that one page cheatsheet including the one-liners, you'll know a significant amount of DTrace. If something has been summarized too much on that cheatsheet, refer to the full DTrace Guide.
A few happenings in the increasingly busy world of DTrace:
Alan Hargreaves has prototyped a DTrace Bourne shell provider, and is looking ahead to see how this can be integrated with Solaris and OpenSolaris. This is great news for anyone who would like to analyze or troubleshoot their Bourne shell scripts. It is also good news for people (especially system administrators) wanting to learn DTrace, as it may provide a familiar path (sh scripting) for you to learn user-land DTracing.
Why sh, and not csh, tcsh, ksh, bash or zsh? Alan could have done any, we wanted to start with the Bourne shell to both get it done and as an example for the other shell providers to follow (watch this space). So if you really just care about bash or whatever, then learning about the sh provider now should be a useful heads-up of what should be possible.
hello.sh
When learning to DTrace software, it can be best to start with the simple stuff. Confirm that what you think will happen, does indeed happen, then move to progressivly more complex scripts.
Here is hello.sh, a dead simple script to start with,
# cat -n hello.sh
1 #!./sh
2
3 echo "Hello World!"
4 echo "Hello World!"
5 echo "Hello World!"
6 echo "Hello World!"
7 echo "Hello World!"
The following DTrace one-liner will run hello.sh (-c), trace all sh provider probes (sh*:::), and frequency count their probe names (@[probename] = count()).
# dtrace -Zn 'sh$target::: { @[probename] = count(); }' -c ./hello.sh
dtrace: description 'sh$target::: ' matched 9 probes
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
dtrace: pid 248847 has exited
script-begin 1
script-end 1
builtin-entry 5
builtin-return 5
line 5
So builtin-entry and builtin-return were called 5 times, one for each echo. The line probe was also called 5 times, once for each line that the shell executed.
Frequency counting builtins
Each probe provides useful information in their args, such as the script name, builtin or exec name, line number, etc. Here I'll use the builtin-entry probe to frequency count the builtin names,
# dtrace -Zn 'sh$target:::builtin-entry { @[copyinstr(arg1)] = count(); }' -c ./hello.sh
dtrace: description 'sh$target:::builtin-entry ' matched 1 probe
Hello World!
[...]
dtrace: pid 248851 has exited
echo 5
That found that echo was called 5 times. Nothing suprising there, this is just confirming that the probes behave as they should.
Line number tracing
The line probe allows us to trace line execution. This one liner counts how many times each line was run,
# dtrace -Zn 'sh$target:::line { @[arg1] = count(); }' -c ./hello.sh
dtrace: description 'sh$target:::line ' matched 1 probe
Hello World!
[...]
dtrace: pid 248855 has exited
3 1
4 1
5 1
6 1
7 1
Each of the lines 3 to 7 were called once, as expected. This one-liner should be useful for finding loop hotspots.
And the following one-liner traces line execution live,
# dtrace -Zn 'sh$target:::line { trace(arg1); }' -c ./hello.sh
dtrace: description 'sh$target:::line ' matched 1 probe
Hello World!
[...]
CPU ID FUNCTION:NAME
0 54539 execute:line 3
0 54539 execute:line 4
0 54539 execute:line 5
0 54539 execute:line 6
0 54539 execute:line 7
dtrace: pid 248869 has exited
Great, that works. Is this getting too easy? lets move on...
func_abc.sh
Now for somthing a little harder (and more interesting). In the following script, func_a() calls func_b() which calls func_c().
# cat -n func_abc.sh
1 #!./sh
2
3 func_c()
4 {
5 echo "Function C"
6 sleep 1
7 }
8
9 func_b()
10 {
11 echo "Function B"
12 sleep 1
13 func_c
14 }
15
16 func_a()
17 {
18 echo "Function A"
19 sleep 1
20 func_b
21 }
22
23 func_a
What order will the lines be executed now? Try thinking about it before checking the answer below.
Tracing line execution
# dtrace -Zn 'sh$target:::line { trace(arg1); }' -c ./func_abc.sh
dtrace: description 'sh$target:::line ' matched 1 probe
Function A
CPU ID FUNCTION:NAME
0 54539 execute:line 23
0 54539 execute:line 18
0 54539 execute:line 19 pause
Function B
0 54539 execute:line 20
0 54539 execute:line 11
0 54539 execute:line 12 pause
Function C
0 54539 execute:line 13
0 54539 execute:line 5
0 54539 execute:line 6 pause
dtrace: pid 248873 has exited
There were some conspicuous one-second pauses during the output, which are sadly lost in this screen shot. I've added the word "pause" so that you know when that happened.
The order of the lines does make sense. What might be confusing is that we see the output text "Function B" followed by the trace of line 20 -- this is due to DTrace output buffering.
Tracing function flow - sh_flow.d
Since we have function-entry and function-return probes, we can trace function execution and indent the output based on the function depth. The following demonstrates sh_flow.d - a script which does just that for any script running on the system (with this sh provider enabled binary), and also traces builtin calls and command executions. sh_flow.d is run in one window, and func_abc.sh runs in another.
# ./sh_flow.d
C TIME FILE -- NAME
0 2007 Aug 10 18:41:10 func_abc.sh -> func_a
0 2007 Aug 10 18:41:10 func_abc.sh > echo
0 2007 Aug 10 18:41:10 func_abc.sh | sleep
0 2007 Aug 10 18:41:11 func_abc.sh -> func_b
0 2007 Aug 10 18:41:11 func_abc.sh > echo
0 2007 Aug 10 18:41:11 func_abc.sh | sleep
0 2007 Aug 10 18:41:12 func_abc.sh -> func_c
0 2007 Aug 10 18:41:12 func_abc.sh > echo
0 2007 Aug 10 18:41:12 func_abc.sh | sleep
0 2007 Aug 10 18:41:13 func_abc.sh <- func_c
0 2007 Aug 10 18:41:13 func_abc.sh <- func_b
0 2007 Aug 10 18:41:13 func_abc.sh <- func_a
^C
The legend is,
| -> | function entry |
| <- | function return |
| > | builtin entry |
| | | command entry |
Great! the output makes it clear who is calling who.
The sh_flow.d script is,
#!/usr/sbin/dtrace -Zs
/*
* sh_flow.d - snoop Bourne shell execution showing function flow.
* Written for the Solaris sh DTrace provider.
*
* This traces shell activity from all Bourne shells on the system that are
* running with sh provider support.
*
* USAGE: sh_flow.d # hit Ctrl-C to end
*
* This watches shell function entries and returns, and indents child
* function calls. Shell builtins are also printed.
*
* FIELDS:
* C CPU-id
* TIME Time of execution
* FILE Filename that this function belongs to
* NAME Shell function, builtin or command name
*
* If a name isn't available at the time of tracing, "<null>" is printed.
*
* WARNING: Watch the first column carefully, it prints the CPU-id. If it
* changes, then it is very likely that the output has been shuffled.
*/
#pragma D option quiet
#pragma D option switchrate=10
dtrace:::BEGIN
{
depth = 0;
printf("%s %-20s %-22s -- %s\n", "C", "TIME", "FILE", "NAME");
}
sh*:::function-entry
{
depth++;
printf("%d %-20Y %-22s %*s-> %s\n", cpu, walltimestamp,
basename(copyinstr(arg0)), depth*2, "", copyinstr(arg1));
}
sh*:::function-return
{
printf("%d %-20Y %-22s %*s<- %s\n", cpu, walltimestamp,
basename(copyinstr(arg0)), depth*2, "", copyinstr(arg1));
depth--;
}
sh*:::builtin-entry
{
printf("%d %-20Y %-22s %*s > %s\n", cpu, walltimestamp,
basename(copyinstr(arg0)), depth*2, "", copyinstr(arg1));
}
sh*:::command-entry
{
printf("%d %-20Y %-22s %*s | %s\n", cpu, walltimestamp,
basename(copyinstr(arg0)), depth*2, "", copyinstr(arg1));
}
Tracing function flow with times - sh_flowtime.d
A great feature of DTrace is the availability of high resolution timestamps. Here I've enhanced the previous script to print delta times,
# ./sh_flowtime.d
C TIME FILE DELTA(us) -- NAME
0 2007 Aug 10 18:52:51 func_abc.sh 0 -> func_a
0 2007 Aug 10 18:52:51 func_abc.sh 54 > echo
0 2007 Aug 10 18:52:52 func_abc.sh 1022880 | sleep
0 2007 Aug 10 18:52:52 func_abc.sh 34 -> func_b
0 2007 Aug 10 18:52:52 func_abc.sh 44 > echo
0 2007 Aug 10 18:52:53 func_abc.sh 1029963 | sleep
0 2007 Aug 10 18:52:53 func_abc.sh 44 -> func_c
0 2007 Aug 10 18:52:53 func_abc.sh 43 > echo
0 2007 Aug 10 18:52:54 func_abc.sh 1029863 | sleep
0 2007 Aug 10 18:52:54 func_abc.sh 33 <- func_c
0 2007 Aug 10 18:52:54 func_abc.sh 14 <- func_b
0 2007 Aug 10 18:52:54 func_abc.sh 7 <- func_a
^C
The legend for understanding the delta times is,
The legend is,
| -> | function entry | time from the previous event, to the entry of this function |
| <- | function return | time from the previous event, to the completion of this function |
| > | builtin entry | time from the previous event, to when this builtin completed execution |
| | | command entry | time from the previous event, to when this command completed execution |
Hey - we found the latency in the script, it's those 1.02 second calls to "sleep"! :-)
Being able to present this information in such a compelling way is awsome stuff.
I won't include the source to that script just yet (this blog entry will get too long), but I'll encourage Alan to bundle these scripts along with the patch/binary of the sh provider (when he makes that available).
Tracing type times - sh_typetime.d
If your script is rather long, the above output from sh_flowtime.d may get rather long. This condensed summary may be more practical,
# ./sh_typetime.d
Tracing... Hit Ctrl-C to end.
^C
FILE TYPE NAME TIME(us)
func_abc.sh builtin echo 82
func_abc.sh func func_c 260
func_abc.sh func func_b 285
func_abc.sh func func_a 478
func_abc.sh exec sleep 3080418
It's pretty obvious from that output where the time was spent.
I was experimenting with more exciting forms of that type of script,
# ./sh_typetime2.d
Tracing... Hit Ctrl-C to end.
^C
FILE TYPE NAME COUNT ELAPSED ON-CPU
func_abc.sh func func_c 1 296 31
func_abc.sh func func_b 1 346 44
func_abc.sh func func_a 1 529 61
func_abc.sh builtin echo 3 87 77
func_abc.sh exec sleep 3 3085831 53984
Lots more will be possible - Alan only wrote the provider two days ago, so I haven't had much of a chance to write scripts yet!
Real world tracing
Alan literally wrote this provider from scratch in the last few days, so we haven't had a chance to point it at something big yet. Here's at least one example of something familiar, /etc/profile,
# ./sh_typetime2.d
Tracing... Hit Ctrl-C to end.
^C
FILE TYPE NAME COUNT ELAPSED ON-CPU
profile builtin [ 1 3 2
profile builtin umask 1 4 3
profile builtin export 1 5 3
profile builtin trap 2 30 22
Cool. See Alan's blog for updates.
I recently tried out the new iSCSI Target Server DTrace provider which Adam wrote (which should be included in Solaris sometime soon). It provides observability of iSCSI traffic from the the server's perspective. This is great news if you have an iSCSI Solaris server, or are planning to have one.
In this entry I'll post a few screenshots of the iSCSI provider in action, both one-liners and scripts.
one-liners
I'll start by listing the probes - which may convey a sense of what is possible to trace. Look down the "NAME" column,
# dtrace -ln 'iscsi*:::'
ID PROVIDER MODULE FUNCTION NAME
11696 iscsi100653 iscsitgtd send_async_logout async-send
11697 iscsi100653 iscsitgtd send_async_scsi async-send
11698 iscsi100653 iscsitgtd handle_scsi_data data-receive
11699 iscsi100653 iscsitgtd iscsi_conn_data_rqst data-request
11700 iscsi100653 iscsitgtd send_datain_pdu data-send
11701 iscsi100653 iscsitgtd iscsi_handle_login_pkt login-command
11702 iscsi100653 iscsitgtd send_login_reject login-response
11703 iscsi100653 iscsitgtd iscsi_handle_login_pkt login-response
11704 iscsi100653 iscsitgtd handle_logout_msg logout-command
11705 iscsi100653 iscsitgtd conn_state logout-response
11706 iscsi100653 iscsitgtd handle_noop_cmd nop-receive
11707 iscsi100653 iscsitgtd iscsi_conn_pkt nop-send
11708 iscsi100653 iscsitgtd handle_scsi_cmd scsi-command
11709 iscsi100653 iscsitgtd send_scsi_rsp scsi-response
11710 iscsi100653 iscsitgtd handle_task_mgt task-command
11711 iscsi100653 iscsitgtd iscsi_conn_pkt task-response
11712 iscsi100653 iscsitgtd handle_text_msg text-command
11713 iscsi100653 iscsitgtd handle_text_msg text-response
The easiest one-liner would be to frequency count iSCSI operations,
# dtrace -n 'iscsi*::: { @[probename] = count(); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C
nop-receive 3
nop-send 3
data-receive 55
data-request 55
scsi-response 103
scsi-command 342
data-send 2865
The above output shows that the iSCSI server is doing more send operations (2865 data-sends), than receive (55 data-receives).
Now to see who is using this iSCSI server, by frequency counting remote IP addresses,
# dtrace -n 'iscsi*::: { @[args[0]->ci_remote] = count(); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C
192.168.1.109 1338
The above shows that 192.168.1.108 triggered 1338 iSCSI operations - which may have been sends, receives, nops, etc. Breaking down further is just a little more DTrace,
# dtrace -qn 'iscsi*::: { @[args[0]->ci_remote, probename] = count(); }
END { printa("%32s %18s %@16d\n", @); }'
^C
192.168.1.109 nop-receive 2
192.168.1.109 nop-send 2
192.168.1.109 data-receive 16
192.168.1.109 data-request 16
192.168.1.109 scsi-response 43
192.168.1.109 scsi-command 253
192.168.1.109 data-send 2619
Great, while a simple one-liner - it quickly identifies who is connecting and what operations they are doing. Some performance wins are about locating and eliminating unnecessary work, which one-liners such as this help identify.
Now for printing payload bytes by iSCSI operation,
# dtrace -n 'iscsi*::: { @[probename] = sum(args[1]->ii_datalen); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C
data-request 0
scsi-response 0
data-receive 99328
scsi-command 140800
data-send 20672512
The above one-liner prints total bytes by operation, which does lose some data - what was the count by operation, average, min, max? DTrace can answer all of these seperately, or provide a summary as a distribution plot. Here I've used quantize for a power of 2 plot,
# dtrace -n 'iscsi*::: { @[probename] = quantize(args[2]->ii_datalen); }'
dtrace: description 'iscsi*::: ' matched 18 probes
^C
data-request
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12
1 | 0
scsi-response
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 55
1 | 0
data-receive
value ------------- Distribution ------------- count
512 | 0
1024 |@@@@@@@ 2
2048 | 0
4096 |@@@@@@@@@@@@@ 4
8192 |@@@@@@@@@@@@@@@@@@@@ 6
16384 | 0
scsi-command
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 209
1 | 0
2 | 0
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 | 0
256 | 0
512 |@@@ 18
1024 |@@ 10
2048 |@@ 11
4096 |@ 4
8192 |@@ 12
16384 | 0
data-send
value ------------- Distribution ------------- count
256 | 0
512 | 1
1024 | 16
2048 | 21
4096 | 28
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2559
16384 | 0
The plots above show that for the data-send operation, the payload size was usually in the 8 to 16 Kbyte bucket.
scripts
Apart from one-lines, DTrace allows powerful scripts to be contstructed. I've put a few on the SolarisInternals DTrace Topics: iSCSI page.
Here I'll post my favourite so far - a script which measures read/write latency (or "response time") in nanoseconds,
# ./iscsirwlat.d
Tracing... Hit Ctrl-C to end.
^C
Read Latency (ns),
192.168.1.109
value ------------- Distribution ------------- count
32768 | 0
65536 |@@@ 27
131072 |@@ 17
262144 | 4
524288 |@@@@ 39
1048576 |@@@ 32
2097152 |@@@@@@@@@@@@@@@@@ 164
4194304 |@ 11
8388608 |@ 5
16777216 |@ 10
33554432 |@ 7
67108864 |@ 5
134217728 |@@@@@ 46
268435456 |@ 14
536870912 | 0
Write Latency (ns),
192.168.1.109
value ------------- Distribution ------------- count
32768 | 0
65536 |@@ 14
131072 |@@ 16
262144 |@@@ 20
524288 |@@@ 20
1048576 |@@@@@ 33
2097152 |@ 5
4194304 | 0
8388608 | 2
16777216 |@ 10
33554432 |@@@ 18
67108864 |@@@@@ 33
134217728 |@@@@@@@@@@@@@ 93
268435456 |@@ 12
536870912 | 0
The script is on the DTrace Topics page. It is a glimpse of things to come; how many of those were synchronous requests (which may hurt application performance) versus asynchronous flushing or read ahead (which may not hurt performance)? More DTrace is needed - which will probably include running DTrace on the client side as well...
About 2 months ago I posted an update to the JavaScript DTrace provider to Bugzilla. In case you are interested and missed it, here are some details:
The new additions were,
- the ability to identify anonymous functions (js_function-info)
- access to function entry arguments (js_function-args)
- access to function return value (js_function-rval)
I was finding that the JavaScript provider was only helping with about
4 problems out of 10 (still, better than nothing). The improvements that
I added should take that to around 8 out of 10 - so it is now proving to
be really useful.
Anyhow, here are some before and after screenshots, tracing a simple JavaScript clock (it's enough to demonstrate the features without bogging down in application specifics),
Old,
# ./js_funcflow.d
C TIME FILE -- FUNC
1 2007 May 31 06:07:13 clock2.html -> startTime
1 2007 May 31 06:07:13 clock2.html -> getHours
1 2007 May 31 06:07:13 clock2.html <- getHours
1 2007 May 31 06:07:13 clocklib.js -> padLibZero
1 2007 May 31 06:07:13 clocklib.js <- padLibZero
1 2007 May 31 06:07:13 clock2.html -> getMinutes
1 2007 May 31 06:07:13 clock2.html <- getMinutes
1 2007 May 31 06:07:13 clocklib.js -> padLibZero
1 2007 May 31 06:07:13 clocklib.js <- padLibZero
1 2007 May 31 06:07:13 clock2.html -> getSeconds
1 2007 May 31 06:07:13 clock2.html <- getSeconds
1 2007 May 31 06:07:13 clocklib.js -> padLibZero
1 2007 May 31 06:07:13 clocklib.js <- padLibZero
1 2007 May 31 06:07:13 clock2.html -> getElementById
1 2007 May 31 06:07:13 clock2.html <- getElementById
1 2007 May 31 06:07:13 clock2.html -> setTimeout
1 2007 May 31 06:07:13 clock2.html <- setTimeout
1 2007 May 31 06:07:13 clock2.html <- startTime
^C
New,
# ./js_argflow.d
C TIME FILE -- FUNC()
1 2007 May 31 06:07:34 clock2.html -> startTime()
1 2007 May 31 06:07:34 clock2.html -> getHours()
1 2007 May 31 06:07:34 clock2.html <- startTime:9 = 0x80000000
1 2007 May 31 06:07:34 clocklib.js -> padLibZero(0x6)
1 2007 May 31 06:07:34 clocklib.js <- padLibZero:3 = 0x955fd70
1 2007 May 31 06:07:34 clock2.html -> getMinutes()
1 2007 May 31 06:07:34 clock2.html <- startTime:10 = 0x80000000
1 2007 May 31 06:07:34 clocklib.js -> padLibZero(0x7)
1 2007 May 31 06:07:34 clocklib.js <- padLibZero:3 = 0x955fd50
1 2007 May 31 06:07:34 clock2.html -> getSeconds()
1 2007 May 31 06:07:34 clock2.html <- startTime:11 = 0x80000000
1 2007 May 31 06:07:34 clocklib.js -> padLibZero(0x22)
1 2007 May 31 06:07:34 clocklib.js <- padLibZero:4 = 0x22
1 2007 May 31 06:07:34 clock2.html -> getElementById()
1 2007 May 31 06:07:34 clock2.html <- startTime:12 = 0x80000000
1 2007 May 31 06:07:34 clock2.html -> setTimeout()
1 2007 May 31 06:07:34 clock2.html <- startTime:13 = 0x80000000
1 2007 May 31 06:07:34 clock2.html <- startTime:13 = 0x80000000
^C
Now there is some visibility for function arguments and return values. If arguments are strings, it is possible to print them as such from DTrace.
Old,
# ./js_funccalls.d
Tracing... Hit Ctrl-C to end.
^C
FILE FUNC CALLS
clock2.html getElementById 1
clock2.html getHours 1
clock2.html getMinutes 1
clock2.html getSeconds 1
clock2.html setTimeout 1
clock2.html startTime 1
clocklib.js padLibZero 3
New,
# ./js_funcinfo.d
Tracing... Hit Ctrl-C to end.
^C
BASE[FILE:LINE] FUNCNAME RUN[FILE:LINE] CALLS
clock2.html:7 getElementById clock2.html:12 1
clock2.html:7 getHours clock2.html:9 1
clock2.html:7 getMinutes clock2.html:10 1
clock2.html:7 getSeconds clock2.html:11 1
clock2.html:7 setTimeout clock2.html:13 1
clock2.html:7 startTime clock2.html:13 1
clocklib.js:2 padLibZero clock2.html:10 1
clocklib.js:2 padLibZero clock2.html:11 1
clocklib.js:2 padLibZero clock2.html:9 1
The "BASE" info describes where that function was declared or first referred to; the "RUN" info describes where that function was executed. These details are especially useful for tracing anonymous functions.
The scripts used above can be downloaded in a tar file from the OpenSolaris DTrace JavaScript site.
There are a number of people at Sun and at Mozilla helping to put DTrace probes in Mozilla, not just for JavaScript - but also for tracing browser latencies at a level which is useful for developers. For more information see the OpenSolaris page, and on Bugzilla: 370906, 388564.
In this entry I'll demonstrate DTracing off-CPU time.
On my way back from OSCON 2007, I found myself in Portland airport with a laptop, no Internet and half an hour to kill. In fact, the same laptop that I used during the SVOSUG talk in April. It was a chance to finish the demo that I started back then.
For those that were there or listened in, the talk had an unfortunate start with over an hour of audiovisual issues - including problems with the laptop video driver, laptop boot process, the data projector, lights and microphone. Alan DuBoff did a good job of fixing things while I began plan B (presenting from a SunRay), and he fixed the video driver such that it works better with data projectors than any other Acer Ferrari I've tried.
One issue we didn't fix that night was a 20 second wait when starting up certain applications, such as gnome-terminal. It was to make a great live DTrace demo to finish with, but I ran out of time to do it properly (we finished at around 11:30pm). Here it is as a blog entry instead.
The problem
When running gnome-terminal, there is a 20 second wait before the terminal appears on the screen.
Initial check
During the 20 second wait,
- vmstat 1 showed idle CPUs
- prstat -m showed that gnome-terminal was not on-CPU
DTrace investigation
The gnome-terminal application is off-CPU for some reason, and the CPUs are otherwise idle. There are numerous different ways to begin the investigation with DTrace, including,
- using the sched provider to trace off-CPU to on-CPU event time
- using the syscall provider to trace elapsed time for syscalls
- using the pid provider to trace elapsed time for application binary and library calls
I'll start with the sched provider (the syscall provider would also make a good starting point). I'll keep restarting gnome-terminal, so that I can DTrace all the events (this is harder if you are trying to DTrace something that is already off-CPU).
The following one-liner runs gnome-terminal and measures the time from that process leaving the CPU to when it returns, and only prints user stack traces if that time was over 1 second,
# dtrace -n 'sched:::off-cpu /pid == $target/ { self->start = timestamp; }
sched:::on-cpu /self->start && ((timestamp - self->start) > 1000000000)/
{ printf("waited: %d ms\n", (timestamp - self->start) / 1000000); ustack(); }
' -c gnome-terminal
dtrace: description 'sched:::off-cpu ' matched 6 probes
CPU ID FUNCTION:NAME
0 48718 resume:on-cpu waited: 20034 ms
libc.so.1`door_call+0x1a
libc.so.1`_nsc_trydoorcall+0x213
libnsl.so.1`_door_getipnodebyname_r+0x8f
libnsl.so.1`_get_hostserv_inetnetdir_byname+0xb62
libnsl.so.1`getipnodebyname+0xdf
libsocket.so.1`get_addr+0x126
libsocket.so.1`_getaddrinfo+0x414
libsocket.so.1`getaddrinfo+0x19
libORBit-2.so.0.1.0`get_netid+0x91
libORBit-2.so.0.1.0`link_get_local_hostname+0x35
libORBit-2.so.0.1.0`link_server_setup+0x56
libORBit-2.so.0.1.0`giop_server_new+0x5e
libORBit-2.so.0.1.0`ORBit_ORB_start_servers+0x1d8
libORBit-2.so.0.1.0`IOP_generate_profiles+0x67
libORBit-2.so.0.1.0`ORBit_marshal_object+0x8e
libORBit-2.so.0.1.0`ORBit_marshal_value+0x391
libORBit-2.so.0.1.0`orbit_small_marshal+0xf7
libORBit-2.so.0.1.0`ORBit_small_invoke_stub+0x11c
libORBit-2.so.0.1.0`ORBit_small_invoke_stub_n+0x43
libORBit-2.so.0.1.0`ORBit_c_stub_invoke+0x132
^C
Hmm - looks like that user stack trace was truncated. Time to boost the ustackframes value,
# dtrace -x ustackframes=64 -n '
sched:::off-cpu /pid == $target/ { self->start = timestamp; }
sched:::on-cpu /self->start && ((timestamp - self->start) > 1000000000)/
{ printf("waited: %d ms\n", (timestamp - self->start) / 1000000); ustack(); }
' -c gnome-terminal
dtrace: description 'sched:::off-cpu ' matched 6 probes
dtrace: pid 101415 has exited
CPU ID FUNCTION:NAME
0 48720 resume:on-cpu waited: 20038 ms
libc.so.1`door_call+0x1a
libc.so.1`_nsc_trydoorcall+0x213
libnsl.so.1`_door_getipnodebyname_r+0x8f
libnsl.so.1`_get_hostserv_inetnetdir_byname+0xb62
libnsl.so.1`getipnodebyname+0xdf
libsocket.so.1`get_addr+0x126
libsocket.so.1`_getaddrinfo+0x414
libsocket.so.1`getaddrinfo+0x19
libORBit-2.so.0.1.0`get_netid+0x91
libORBit-2.so.0.1.0`link_get_local_hostname+0x35
libORBit-2.so.0.1.0`link_server_setup+0x56
libORBit-2.so.0.1.0`giop_server_new+0x5e
libORBit-2.so.0.1.0`ORBit_ORB_start_servers+0x1d8
libORBit-2.so.0.1.0`IOP_generate_profiles+0x67
libORBit-2.so.0.1.0`ORBit_marshal_object+0x8e
libORBit-2.so.0.1.0`ORBit_marshal_value+0x391
libORBit-2.so.0.1.0`orbit_small_marshal+0xf7
libORBit-2.so.0.1.0`ORBit_small_invoke_stub+0x11c
libORBit-2.so.0.1.0`ORBit_small_invoke_stub_n+0x43
libORBit-2.so.0.1.0`ORBit_c_stub_invoke+0x132
libgconf-2.so.4.1.0`ConfigServer_add_client+0x4f
libgconf-2.so.4.1.0`gconf_get_config_server+0xb9
libgconf-2.so.4.1.0`gconf_engine_connect+0x24f
libgconf-2.so.4.1.0`gconf_engine_get_default+0x4c
libgconf-2.so.4.1.0`gconf_client_get_default+0x2a
libgnomeui-2.so.0.1401.0`libgnomeui_post_args_parse+0x187
libgnome-2.so.0.1401.0`gnome_program_postinit+0x61
libgnome-2.so.0.1401.0`gnome_program_init_common+0x37b
libgnome-2.so.0.1401.0`gnome_program_initv+0x4d
libgnome-2.so.0.1401.0`gnome_program_init+0x56
gnome-terminal`main+0x2d5
gnome-terminal`_start+0x7a
Good. This stack trace is likely to have led to the process leaving the CPU for the measured 20038 ms (the system was otherwise idle, so it is unlikely to have been kicked off due to scheduling). Reading through the lines, it looks like it is resolving a hostname - a common source of latencies when DNS is misconfigured.
There are a number of lines we could begin studying to confirm what is happening (and arrive at the same answer); I'll start with libnsl.so.1`getipnodebyname+0xdf, which should have a man page entry,
# man getipnodebyname
[...]
struct hostent *getipnodebyname(const char *name, int af,
int flags, int *error_num);
[...]
Now to check what hostname is being resolved,
# dtrace -n 'pid$target:libnsl:getipnodebyname:entry { trace(copyinstr(arg0)); }'
-c gnome-terminal
dtrace: description 'pid$target:libnsl:getipnodebyname:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 50488 getipnodebyname:entry marlin
^C
dtrace: pid 101758 terminated by SIGINT
Oh - "marlin" is the hostname of this laptop, which should resolve just fine. Checking related configuration files,
# grep hosts /etc/nsswitch.conf
hosts: files dns
#
# cat -n /etc/hosts
1 #
2 # Copyright 2006 Sun Microsystems, Inc. All rights reserved.
3 # Use is subject to license terms.
4 #
5 #ident "@(#)hosts 1.1 06/08/04 SMI"
6 #
7 127.0.0.1 localhost
8 192.168.1.166 marlin marlin.sf.fw.jpn.com
Hmm - the name service switch file should cause /etc/hosts (/etc/inet/hosts) to be read first, which contains a valid entry for "marlin".
Lets take a closer look at the getipnodebyname() call, and trace the address family and flags arguments,
# dtrace -n 'pid$target:libnsl:getipnodebyname:entry { printf("%s af=%d flags=%d",
copyinstr(arg0), arg1, arg2); }' -c gnome-terminal
dtrace: description 'pid$target:libnsl:getipnodebyname:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 50488 getipnodebyname:entry marlin af=26 flags=19
^C
dtrace: pid 101821 terminated by SIGINT
Ok, address family 26 is,
# grep 26 /usr/include/sys/socket.h
#define AF_INET6 26 /* Internet Protocol, Version 6 */
IPv6! Should have checked earlier,
# grep ipnodes /etc/nsswitch.conf
ipnodes: files dns
#
# cat -n /etc/inet/ipnodes
1 #
2 # Copyright 2006 Sun Microsystems, Inc. All rights reserved.
3 # Use is subject to license terms.
4 #
5 #ident "@(#)ipnodes 1.1 06/08/04 SMI"
6 #
7 ::1 localhost
8 127.0.0.1 localhost
Adding an entry to the ipnodes file for "marlin" fixed the issue.
Take 2
Was using the sched provider and then pid to trace getipnodebyname() the best path to take? It's hard to say - there are many different ways DTrace can help you solve problems. Lets try some other paths.
Here I'll trace elapsed times from the syscall provider - since if we have an off-CPU issue on an idle system, it will almost certainly be visible at the syscall layer.
The procsystime tool from the DTraceToolkit performs different types of syscall time analysis. I'm using it as it saves a minute or so of typing. Here it runs the gnome-terminal command and produces a report of elapsed times by syscall,
# /opt/DTT/procsystime -e gnome-terminal
Elapsed Times for command gnome-terminal,
SYSCALL TIME (ns)
gtime 1882
sigpending 1921
priocntlsys 4200
sysi86 4354
getgid 5152
sysconfig 5606
getsockname 5904
systeminfo 7040
setcontext 7342
fxstat 7943
getpeername 9393
getrlimit 11717
getuid 13927
uname 14661
sigaction 14698
getpid 19241
setsockopt 23743
getcwd 33784
fsat 41700
stat64 43024
readv 45937
listen 49874
llseek 52341
mkdir 52633
pipe 64612
chmod 65684
fcntl 70864
utime 76677
fstat64 109420
ioctl 133301
access 157803
unlink 165703
open64 221120
accept 276863
bind 281026
writev 343976
brk 344410
memcntl 347475
write 358764
getdents64 422206
munmap 486137
connect 588166
resolvepath 686768
so_socket 748039
close 846782
open 1127761
read 1191102
mmap 1592314
xstat 8521257
pollsys 9817661
doorfs 20035299808
The doorfs() call takes the door descriptor as the first argument,
# dtrace -n 'syscall::doorfs:entry /pid == $target/ { trace(arg0); }' -c gnome-terminal
dtrace: description 'syscall::doorfs:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 572 doorfs:entry 3
0 572 doorfs:entry 3
0 572 doorfs:entry 3
0 572 doorfs:entry 3
^C
dtrace: pid 101833 terminated by SIGINT
#
# dtrace -n 'syscall::doorfs:entry /pid == $target/ { trace(fds[arg0].fi_pathname); }'
-c gnome-terminal
dtrace: description 'syscall::doorfs:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 572 doorfs:entry /var/run/name_service_door
0 572 doorfs:entry /var/run/name_service_door
0 572 doorfs:entry /var/run/name_service_door
0 572 doorfs:entry /var/run/name_service_door
^C
dtrace: pid 101835 terminated by SIGINT
The door calls were to /var/run/name_service_door, pointing to name resolution and the nscd process (which can be confirmed with more DTrace).
Running procsystime on nscd produces,
# /opt/DTT/procsystime -e -n nscd
Hit Ctrl-C to stop sampling...
^C
Elapsed Times for processes nscd,
SYSCALL TIME (ns)
ioctl 2452
llseek 2615
fstat64 3996
sysconfig 13933
open 15416
read 26336
gtime 52980
so_socket 323066
close 349885
xstat 430465
send 454944
connect 795246
doorfs 360127347
lwp_park 10009952745
pollsys 20032365315
nanosleep 153088010532
nscd is multi-threaded, with many threads sleeping as they wait for work - meaning that large off-CPU elapsed times may have nothing to do with gnome-terminal. The time for pollsys() is interesting, as at 20 seconds it matches the measured time in gnome-terminal.
Analysing pollsys() further (the poll() manpage is similar),
# man poll
[...]
int poll(struct pollfd fds[], nfds_t nfds, int timeout);
[...]
The first argument is an array with length specified by the second argument. Looping over an array is difficult from DTrace (due to lack of loops), but this doesn't really present a difficulty when troubleshooting. Here I'll dump the struct in hex, and the other values as ints,
# dtrace -n 'syscall::pollsys:entry /execname == "nscd"/ { printf("nfds=%d", arg1);
tracemem(copyin(arg0, 16), 16); }' -c gnome-terminal
dtrace: description 'syscall::pollsys:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 542 pollsys:entry nfds=1
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 05 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 ....@...........
0 542 pollsys:entry nfds=1
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 05 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 ....@...........
0 542 pollsys:entry nfds=1
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 05 00 00 00 40 00 42 fd 0b 7b ab 46 f0 d7 c0 0d ....@.B..{.F....
0 542 pollsys:entry nfds=1
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 05 00 00 00 40 00 00 00 0b 7b ab 46 f0 d7 c0 0d ....@....{.F....
dtrace: pid 102041 has exited
Don't worry - DTrace lets you cast variables as structs if you like; I'm just dumping the data in hex as this is a short one-liner.
The nfds value shows that there is only ever one file descriptor in the array; and the hex dump (on this little endian architecture) shows that the file descriptor number is "5" (the first member of struct pollfd).
Now DTrace is used to print the pathname for that file descriptor,
# dtrace -n 'syscall::pollsys:entry /execname == "nscd"/
{ trace(fds[*(uint32_t *)copyin(arg0, 4)].fi_pathname); }' -c gnome-terminal
dtrace: description 'syscall::pollsys:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 542 pollsys:entry <unknown>
0 542 pollsys:entry <unknown>
^C
dtrace: pid 102063 terminated by SIGINT
Well, that didn't work. Not all file descriptors have pathnames, of course. It will save some time to borrow pfiles for the next step,
# pfiles `pgrep -x nscd`
101917: /usr/sbin/nscd
Current rlimit: 256 file descriptors
0: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
O_RDWR
/devices/pseudo/mm@0:null
1: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
O_RDWR
/devices/pseudo/mm@0:null
2: S_IFCHR mode:0666 dev:270,0 ino:6815752 uid:0 gid:3 rdev:13,2
O_RDWR
/devices/pseudo/mm@0:null
3: S_IFDOOR mode:0777 dev:279,0 ino:0 uid:0 gid:0 size:0
O_RDWR FD_CLOEXEC door to nscd[101917]
4: S_IFSOCK mode:0666 dev:277,0 ino:17874 uid:0 gid:0 size:0
O_RDWR
SOCK_RAW
SO_SNDBUF(8192),SO_RCVBUF(8192)
sockname: AF_ROUTE
peername: AF_ROUTE
5: S_IFSOCK mode:0666 dev:277,0 ino:53988 uid:0 gid:0 size:0
O_RDWR
SOCK_DGRAM
SO_DGRAM_ERRIND,SO_SNDBUF(57344),SO_RCVBUF(57344)
sockname: AF_INET 192.168.1.78 port: 51697
peername: AF_INET 192.168.1.5 port: 53
File descriptor 5 was a socket to the remote host 192.168.1.5 port 53 (DNS). This shows that nscd is waiting for 20 seconds on a DNS socket, and we know that gnome-terminal has a 20 second wait on the /var/run/name_service_door file. They are probably related.
Proving that they are related, if needed, is simply more DTrace. This time I'll borrow a script from /usr/demo/dtrace, which shows who is waiting for nscd and how long for,
# dtrace -s /usr/demo/dtrace/nscd.d -c gnome-terminal
dtrace: script '/usr/demo/dtrace/nscd.d' matched 27 probes
dtrace: pid 101594 has exited
nscd gnome-session
value ------------- Distribution ------------- count
16384 | 0
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
65536 | 0
nscd gnome-terminal
value ------------- Distribution ------------- count
16384 | 0
32768 |@@@@@@@@@@ 1
65536 |@@@@@@@@@@@@@@@@@@@@ 2
131072 | 0
262144 | 0
524288 | 0
1048576 | 0
2097152 | 0
4194304 | 0
8388608 | 0
16777216 | 0
33554432 | 0
67108864 | 0
134217728 | 0
268435456 | 0
536870912 | 0
1073741824 | 0
2147483648 | 0
4294967296 | 0
8589934592 | 0
17179869184 |@@@@@@@@@@ 1
34359738368 | 0
In the 17.1 to 34.3 second bucket is gnome-terminal waiting for nscd - our 20 second wait.
The /usr/demo/dtrace/nscd.d script works by tracing who is nscd is waking up via the sched:::wakeup probe, after having traced the time when threads sleep (see the "sched" chapter in the Dynamic Tracing Guide on docs.sun.com).
The wrap
DTrace provides many ways to solve performance issues or for troubleshooting. Above were only a few techniques for analysing off-CPU time, but there are more available.
Some techniques can get very complex, and require much systems knowledge. Don't worry: if you only figure out one way to solve your problem - you've still solved your problem. (Systems knowledge should help you solve system problems faster).
I was recently interviewed about DTrace for SDNtv, Sun's Developer Network Channel. You can watch the video in a browser or download it and watch it later.
For the interview we discussed what DTrace is, what DTrace can do, and I ran the colortrace demo. It was a short interview aimed at introducing people to DTrace for the first time.
I found SDNtv itself quite interesting: the host I had, Mark Herring, was technically savvy and both understood DTrace and knew what best to ask. It was also filmed in one take - that was a big suprise - I've always assumed such interviews are cut together creatively, but this was a real interview filmed in one shot.
But what is especially interesting is the idea of videoing Sun engineers as they discuss their own products, and making those videos publically available. Not that long ago (6 years?), I remember being a Sun customer in Australia who was trying to learn more about Solaris and the kernel, and in particular (being a code geek and a Unix history geek) I wanted to know who wrote what and why. There was very little public information about Sun's engineers - it seemed that the very names of the engineers were confidential. And, growing up in Australia, there were no conferences to visit where Sun engineers gave presentations, like there are in the US.
When Bryan first emailed me in early 2004 suggesting I check out something new called DTrace, I had no idea who Bryan was. I remember staring at his sig and thinking "I wish I knew who this guy was". A quick Internet search didn't help much (try that now!).
Times have changed. Engineers now have blogs, and discuss source code on OpenSolaris mailing lists. And SDNtv lets you see and hear them speak!
I was recently asked to give a very short but important DTrace
demonstration to a wide audience, where I would only have time
to run and discuss one script.
It is a tough choice - there are so many to pick from, including
the 100+ scripts and oneliners in the
DTraceToolkit.
Also, a short text-based demo may not convey the incredible power of
DTrace - if you aren't paying attention it may look like just
another tool; DTrace is far more than that.
What a great excuse to use some color escape sequences (yes, I've
wanted to do this since I started writing the DTraceToolkit).
With a colorized output it won't look like just another tool,
and I can draw appropriate attention to a key DTrace feature.
I wrote colortrace.d (or colourtrace.d, for readers from the Commonwealth),
which prints a flow indented trace of function entrys and returns for the
entire softwark stack. Different layers of the softwark stack have been colorized:
- blue - application functions
- green - library functions
- violet - system calls
- red - kernel functions
I added a column to print delta times in nanoseconds, to illustrate
that yes - we can measure whatever times we want. When Adam previously performed a similar demonstration, he added a color for a Java function layer.
Here I run colortrace.d on a bash shell, and while colortrace.d is running
I press the letter "t" on the bash shell:
# ./colortrace.d -p 101695
DELTA(ns) TYPE FUNCTION
120805042 syscall <- read
34461 libc.so.1 <- _read
13474 libc.so.1 <- read
8387 bash <- rl_getc
12899 bash <- rl_read_key
10044 bash -> _rl_dispatch
11760 bash -> _rl_dispatch_subseq
7189 bash -> rl_insert
7020 bash -> _rl_insert_char
6777 bash -> _rl_any_typein
8922 bash <- _rl_any_typein
7489 bash -> rl_insert_text
9230 libc.so.1 -> strlen
12040 libc.so.1 <- strlen
7362 libc.so.1 -> strncpy
6956 libc.so.1 <- strncpy
6996 bash <- rl_insert_text
7095 bash <- _rl_insert_char
6069 bash <- rl_insert
8117 bash <- _rl_dispatch_subseq
6484 bash <- _rl_dispatch
6992 bash -> rl_redisplay
5764 libc.so.1 -> strlen
6577 libc.so.1 <- strlen
5356 libc.so.1 -> strncpy
6249 libc.so.1 <- strncpy
6992 libc.so.1 -> memset
7572 libc.so.1 <- memset
7749 libc.so.1 -> strlen
6118 libc.so.1 <- strlen
6627 bash -> _rl_move_cursor_relative
6573 bash <- _rl_move_cursor_relative
7533 bash -> _rl_output_some_chars
7458 libc.so.1 -> fwrite
6335 libc.so.1 -> _fwrite_unlocked
7085 libc.so.1 -> _realbufend
6167 libc.so.1 -> getxfdat
6250 libc.so.1 <- getxfdat
6490 libc.so.1 <- _realbufend
6306 libc.so.1 <- _fwrite_unlocked
6115 libc.so.1 <- fwrite
6488 bash <- _rl_output_some_chars
7628 libc.so.1 -> fflush
6317 libc.so.1 -> _fflush_u
6407 libc.so.1 -> _xflsbuf
5046 libc.so.1 -> _realbufend
52132 libc.so.1 -> getxfdat
5178 libc.so.1 <- getxfdat
5480 libc.so.1 <- _realbufend
7046 libc.so.1 -> _write
29263 syscall -> write
12044 kernel -> write32
9820 kernel -> write
5170 kernel -> getf
5514 kernel -> set_active_fd
5643 kernel <- set_active_fd
10803 kernel <- getf
7063 kernel -> nbl_need_check
5365 kernel <- nbl_need_check
6075 kernel -> fop_rwlock
5619 kernel -> fs_rwlock
4939 kernel <- fs_rwlock
5117 kernel <- fop_rwlock
5819 kernel -> fop_write
5389 kernel -> spec_write
6382 kernel -> smark
5471 kernel -> gethrestime_sec
5174 kernel -> gethrestime
5159 kernel -> pc_gethrestime
4724 kernel -> gethrtime
4923 kernel -> tsc_gethrtime
4831 kernel <- tsc_gethrtime
4875 kernel <- gethrtime
5240 kernel <- pc_gethrestime
5215 kernel <- gethrestime
5368 kernel <- gethrestime_sec
5710 kernel <- smark
5656 kernel -> strwrite
5529 kernel -> strwrite_common
5543 kernel -> straccess
4976 kernel -> mutex_owned
4761 kernel <- mutex_owned
5611 kernel <- straccess
6066 kernel -> strput
4713 kernel -> mutex_owned
4705 kernel <- mutex_owned
5836 kernel -> canputnext
6037 kernel <- canputnext
6160 kernel -> strmakedata
5542 kernel -> allocb_cred
5513 kernel -> allocb
4982 kernel -> kmem_cache_alloc
5522 kernel -> kmem_cache_alloc_debug
5185 kernel -> verify_and_copy_pattern
5054 kernel <- verify_and_copy_pattern
6464 kernel -> dblk_constructor
4853 kernel -> kmem_cache_alloc
4696 kernel -> kmem_cache_alloc_debug
5188 kernel -> verify_and_copy_pattern
4707 kernel <- verify_and_copy_pattern
5153 kernel -> gethrtime
4843 kernel -> tsc_gethrtime
4587 kernel <- tsc_gethrtime
4836 kernel <- gethrtime
5133 kernel -> getpcstack
4742 kernel <- getpcstack
4849 kernel -> kmem_log_enter
5100 kernel <- kmem_log_enter
5047 kernel <- kmem_cache_alloc_debug
5262 kernel <- kmem_cache_alloc
5773 kernel <- dblk_constructor
4970 kernel -> gethrtime
4610 kernel -> tsc_gethrtime
4504 kernel <- tsc_gethrtime
4975 kernel <- gethrtime
4906 kernel -> getpcstack
4534 kernel <- getpcstack
4991 kernel -> kmem_log_enter
4760 kernel <- kmem_log_enter
4740 kernel <- kmem_cache_alloc_debug
5060 kernel <- kmem_cache_alloc
5642 kernel <- allocb
5394 kernel -> crhold
5467 kernel <- crhold
5435 kernel <- allocb_cred
5531 kernel -> uiomove
5663 kernel -> xcopyin_nta
4871 kernel <- kcopy
5012 kernel <- uiomove
5939 kernel <- strmakedata
5853 kernel -> stream_willservice
5280 kernel <- stream_willservice
6012 kernel -> putnext
4937 kernel -> mutex_owned
4680 kernel <- mutex_owned
5381 kernel -> mutex_owned
4694 kernel <- mutex_owned
6160 kernel -> ttcompatwput
5016 kernel -> putnext
4776 kernel -> mutex_owned
4523 kernel <- mutex_owned
4981 kernel -> mutex_owned
4661 kernel <- mutex_owned
5683 kernel -> ldtermwput
5704 kernel -> bcanputnext
5815 kernel -> claimstr
5410 kernel <- claimstr
5598 kernel -> bcanput
6099 kernel <- bcanput
5602 kernel -> releasestr
5013 kernel -> cv_broadcast
5708 kernel -> getpcstack
4690 kernel <- getpcstack
5167 kernel -> apic_intr_enter
5135 kernel -> psm_get_cpu_id
4733 kernel <- psm_get_cpu_id
5540 kernel -> psm_get_cpu_id
4531 kernel <- psm_get_cpu_id
5295 kernel -> psm_get_cpu_id
4503 kernel <- psm_get_cpu_id
4917 kernel <- apic_intr_enter
5296 kernel -> hilevel_intr_prolog
10448 kernel -> apic_intr_exit
5041 kernel -> psm_get_cpu_id
4481 kernel <- psm_get_cpu_id
5119 kernel <- apic_intr_exit
5333 kernel <- hilevel_intr_epilog
5143 kernel <- cv_broadcast
5124 kernel <- releasestr
5350 kernel <- bcanputnext
5851 kernel -> ldtermwmsg
6826 kernel -> msgdsize
5427 kernel <- msgdsize
5652 kernel -> ldterm_output_msg
4981 kernel -> allocb
4817 kernel -> kmem_cache_alloc
5224 kernel -> kmem_cache_alloc_debug
5270 kernel -> verify_and_copy_pattern
5033 kernel <- verify_and_copy_pattern
5303 kernel -> dblk_constructor
4692 kernel -> kmem_cache_alloc
4928 kernel -> kmem_cache_alloc_debug
5243 kernel -> getpcstack
4569 kernel <- getpcstack
5069 kernel -> apic_intr_enter
4830 kernel <- apic_intr_enter
4972 kernel -> hilevel_intr_prolog
40463 kernel -> apic_intr_exit
5107 kernel -> psm_get_cpu_id
4549 kernel <- psm_get_cpu_id
4995 kernel <- apic_intr_exit
5123 kernel <- hilevel_intr_epilog
5633 kernel -> verify_and_copy_pattern
4867 kernel <- verify_and_copy_pattern
5360 kernel -> gethrtime
4633 kernel -> tsc_gethrtime
4566 kernel <- tsc_gethrtime
4999 kernel <- gethrtime
4856 kernel -> getpcstack
4739 kernel <- getpcstack
5058 kernel -> kmem_log_enter
4818 kernel <- kmem_log_enter
4786 kernel <- kmem_cache_alloc_debug
5189 kernel <- kmem_cache_alloc
5063 kernel <- dblk_constructor
4939 kernel -> gethrtime
4694 kernel -> tsc_gethrtime
4706 kernel <- tsc_gethrtime
4749 kernel <- gethrtime
4852 kernel -> getpcstack
4780 kernel <- getpcstack
4838 kernel -> kmem_log_enter
4669 kernel <- kmem_log_enter
5068 kernel <- kmem_cache_alloc_debug
4970 kernel <- kmem_cache_alloc
4925 kernel <- allocb
6118 kernel -> movtuc
6115 kernel <- movtuc
5246 kernel -> freeb
5098 kernel -> dblk_lastfree
5180 kernel -> crfree
5155 kernel <- crfree
5292 kernel -> kmem_cache_free
4946 kernel -> kmem_cache_free_debug
4814 kernel -> kmem_log_enter
4880 kernel <- kmem_log_enter
4998 kernel -> gethrtime
4598 kernel -> tsc_gethrtime
4653 kernel <- tsc_gethrtime
4949 kernel <- gethrtime
4841 kernel -> getpcstack
4515 kernel <- getpcstack
5073 kernel -> kmem_log_enter
4595 kernel <- kmem_log_enter
5352 kernel -> dblk_destructor
5061 kernel -> kmem_cache_free
4670 kernel -> kmem_cache_free_debug
4721 kernel -> kmem_log_enter
4863 kernel <- kmem_log_enter
4914 kernel -> gethrtime
4555 kernel -> tsc_gethrtime
4853 kernel <- tsc_gethrtime
4761 kernel <- gethrtime
4825 kernel -> getpcstack
4714 kernel <- getpcstack
4888 kernel -> kmem_log_enter
4580 kernel <- kmem_log_enter
5219 kernel -> copy_pattern
4980 kernel <- copy_pattern
4978 kernel <- kmem_cache_free_debug
5806 kernel <- kmem_cache_free
4964 kernel <- dblk_destructor
4893 kernel -> copy_pattern
4530 kernel <- copy_pattern
5010 kernel <- kmem_cache_free_debug
4931 kernel <- kmem_cache_free
5002 kernel <- dblk_lastfree
5293 kernel <- freeb
5397 kernel <- ldterm_output_msg
5285 kernel -> msgdsize
4840 kernel <- msgdsize
6236 kernel -> drv_setparm
5415 kernel <- drv_setparm
5231 kernel -> putnext
4820 kernel -> mutex_owned
4617 kernel <- mutex_owned
5286 kernel -> mutex_owned
4526 kernel <- mutex_owned
5851 kernel -> ptemwput
5325 kernel -> bcanputnext
4759 kernel -> claimstr
4458 kernel <- claimstr
5060 kernel -> bcanput
5074 kernel <- bcanput
4841 kernel -> releasestr
4663 kernel -> cv_broadcast
4771 kernel <- cv_broadcast
4773 kernel <- releasestr
4758 kernel <- bcanputnext
5838 kernel -> ptemwmsg
5242 kernel -> putnext
4584 kernel -> mutex_owned
4811 kernel <- mutex_owned
5169 kernel -> mutex_owned
4473 kernel <- mutex_owned
6277 kernel -> ptswput
5691 kernel -> putq
6413 kernel -> qenable_locked
5001 kernel -> mutex_owned
4522 kernel <- mutex_owned
5765 kernel <- qenable_locked
5307 kernel -> mutex_owned
4553 kernel <- mutex_owned
5277 kernel <- putq
5303 kernel -> cv_broadcast
4546 kernel <- cv_broadcast
5789 kernel <- ptswput
4863 kernel -> mutex_owned
4681 kernel <- mutex_owned
4865 kernel -> mutex_owned
4482 kernel <- mutex_owned
5909 kernel <- putnext
5438 kernel <- ptemwmsg
5338 kernel <- ptemwput
5258 kernel -> mutex_owned
4592 kernel <- mutex_owned
4857 kernel -> mutex_owned
4719 kernel <- mutex_owned
4820 kernel <- putnext
4941 kernel <- ldtermwmsg
5600 kernel <- ldtermwput
4921 kernel -> mutex_owned
4483 kernel <- mutex_owned
5049 kernel -> mutex_owned
4495 kernel <- mutex_owned
4763 kernel <- putnext
5565 kernel <- ttcompatwput
4934 kernel -> mutex_owned
4543 kernel <- mutex_owned
4859 kernel -> mutex_owned
4682 kernel <- mutex_owned
4805 kernel <- putnext
5706 kernel -> stream_runservice
5130 kernel -> mutex_owned
4505 kernel <- mutex_owned
6436 kernel -> queue_service
5501 kernel -> runservice
5329 kernel -> entersq
5686 kernel <- entersq
6490 kernel -> ptswsrv
4974 kernel -> getq
4688 kernel -> getq_noenab
4922 kernel <- getq_noenab
5330 kernel <- getq
4888 kernel -> bcanputnext
5246 kernel -> getpcstack
4641 kernel <- getpcstack
5019 kernel -> apic_intr_enter
5023 kernel <- apic_intr_enter
4991 kernel -> hilevel_intr_prolog
40252 kernel -> apic_intr_exit
4984 kernel -> psm_get_cpu_id
4831 kernel <- psm_get_cpu_id
5005 kernel <- apic_intr_exit
4903 kernel <- hilevel_intr_epilog
5265 kernel -> claimstr
5086 kernel <- claimstr
5119 kernel -> bcanput
5071 kernel <- bcanput
4889 kernel -> releasestr
4679 kernel -> cv_broadcast
4965 kernel <- cv_broadcast
4878 kernel <- releasestr
4769 kernel <- bcanputnext
5142 kernel -> putnext
4871 kernel -> mutex_owned
4567 kernel <- mutex_owned
5209 kernel -> mutex_owned
4526 kernel <- mutex_owned
6066 kernel -> strrput
6220 kernel -> qclaimed
5323 kernel <- qclaimed
5297 kernel -> mutex_owned
4483 kernel <- mutex_owned
5265 kernel -> putq
4667 kernel -> qenable_locked
4592 kernel -> mutex_owned
4739 kernel <- mutex_owned
4880 kernel <- qenable_locked
4880 kernel -> mutex_owned
4659 kernel <- mutex_owned
4864 kernel <- putq
4883 kernel -> mutex_owned
4690 kernel <- mutex_owned
5938 kernel -> pollwakeup
6412 kernel -> pollnotify
4819 kernel -> mutex_owned
4517 kernel <- mutex_owned
5862 kernel -> cv_signal
5412 kernel -> disp_lock_enter
5260 kernel -> lock_set_spl
4898 kernel <- splx
5298 kernel <- lock_set_spl
5513 kernel <- disp_lock_enter
6097 kernel -> sleepq_wakeone_chan
5878 kernel -> sleepq_unlink
5882 kernel <- sleepq_unlink
6150 kernel -> ts_wakeup
6835 kernel -> setbackdq
4878 kernel -> gethrtime_unscaled
4767 kernel -> tsc_gethrtimeunscaled
4768 kernel <- tsc_gethrtimeunscaled
5305 kernel <- gethrtime_unscaled
5345 kernel -> cpu_update_pct
4993 kernel -> scalehrtime
5250 kernel -> tsc_scalehrtime
4980 kernel <- tsc_scalehrtime
5317 kernel <- scalehrtime
5715 kernel -> cpu_decay
5451 kernel -> exp_x
4760 kernel <- exp_x
5225 kernel <- cpu_decay
5253 kernel <- cpu_update_pct
6435 kernel -> cpu_choose
5793 kernel -> disp_lowpri_cpu
6383 kernel <- disp_lowpri_cpu
5624 kernel <- cpu_choose
6285 kernel -> disp_lock_enter_high
5746 kernel <- disp_lock_enter_high
6425 kernel -> cpu_resched
6065 kernel -> poke_cpu
5701 kernel -> apic_send_ipi
5626 kernel -> get_apic_cmd1
5355 kernel <- get_apic_cmd1
5742 kernel <- apic_send_ipi
5167 kernel <- poke_cpu
5254 kernel <- cpu_resched
6656 kernel -> cpu_wakeup
5605 kernel <- cpu_wakeup
5194 kernel <- setbackdq
5575 kernel <- ts_wakeup
5068 kernel -> disp_lock_exit_high
4899 kernel <- disp_lock_exit_high
5750 kernel <- sleepq_wakeone_chan
5206 kernel -> disp_lock_exit
4893 kernel <- splx
5125 kernel <- disp_lock_exit
5716 kernel <- cv_signal
5366 kernel <- pollnotify
5885 kernel <- pollwakeup
5684 kernel <- strrput
5051 kernel -> mutex_owned
4614 kernel <- mutex_owned
5096 kernel -> mutex_owned
4530 kernel <- mutex_owned
4840 kernel <- putnext
5045 kernel -> getq
4681 kernel -> getq_noenab
4647 kernel <- getq_noenab
5013 kernel <- getq
5021 kernel -> cv_broadcast
4747 kernel <- cv_broadcast
5865 kernel <- ptswsrv
6120 kernel -> leavesq
5334 kernel <- leavesq
5146 kernel -> cv_broadcast
4552 kernel <- cv_broadcast
6149 kernel <- runservice
5560 kernel <- queue_service
5922 kernel <- stream_runservice
5748 kernel <- strput
6336 kernel <- strwrite_common
5398 kernel <- strwrite
5481 kernel <- spec_write
5895 kernel <- fop_write
6266 kernel -> fop_rwunlock
5432 kernel -> fs_rwunlock
5318 kernel <- fs_rwunlock
5455 kernel <- fop_rwunlock
5430 kernel -> releasef
5303 kernel -> clear_active_fd
5447 kernel <- clear_active_fd
5098 kernel -> cv_broadcast
4594 kernel <- cv_broadcast
5455 kernel <- releasef
5460 kernel <- write
6000 kernel <- write32
6729 syscall <- write
28571 libc.so.1 <- _write
10001 libc.so.1 <- _xflsbuf
6926 libc.so.1 <- _fflush_u
6176 libc.so.1 <- fflush
7718 bash <- rl_redisplay
54421 bash <- readline_internal_char
8667 bash -> readline_internal_char
11003 libc.so.1 -> sigsetjmp
11960 libc.so.1 -> __csigsetjmp
6691 libc.so.1 <- __csigsetjmp
6367 libc.so.1 <- sigsetjmp
6954 bash -> _rl_init_argument
7834 bash <- _rl_init_argument
7598 bash -> rl_read_key
7065 bash -> _rl_next_macro_key
6371 bash <- _rl_next_macro_key
6299 bash -> rl_getc
6818 libc.so.1 -> fileno
6908 libc.so.1 <- fileno
6850 libc.so.1 -> read
7184 libc.so.1 -> _read
28550 syscall -> read
9637 kernel -> read32
6043 kernel -> read
5360 kernel -> getf
5147 kernel -> set_active_fd
4994 kernel <- set_active_fd
6298 kernel <- getf
5840 kernel -> nbl_need_check
4779 kernel <- nbl_need_check
5503 kernel -> fop_rwlock
5009 kernel -> fs_rwlock
4558 kernel <- fs_rwlock
4867 kernel <- fop_rwlock
5894 kernel -> fop_read
5403 kernel -> spec_read
5469 kernel -> smark
5361 kernel -> gethrestime_sec
4852 kernel -> gethrestime
5152 kernel -> pc_gethrestime
4890 kernel -> gethrtime
4720 kernel -> tsc_gethrtime
4807 kernel <- tsc_gethrtime
5195 kernel <- gethrtime
5355 kernel <- pc_gethrestime
5085 kernel <- gethrestime
4978 kernel <- gethrestime_sec
5017 kernel <- smark
5537 kernel -> strread
5804 kernel -> straccess
4817 kernel -> mutex_owned
4734 kernel <- mutex_owned
5419 kernel <- straccess
5344 kernel -> mutex_owned
4534 kernel <- mutex_owned
5591 kernel -> strget
4831 kernel -> mutex_owned
4535 kernel <- mutex_owned
4874 kernel -> mutex_owned
4675 kernel <- mutex_owned
5018 kernel -> getq_noenab
4821 kernel <- getq_noenab
5847 kernel <- strget
4908 kernel -> mutex_owned
4584 kernel <- mutex_owned
5557 kernel -> qbackenable
4978 kernel <- qbackenable
6433 kernel -> strwaitq
4917 kernel -> mutex_owned
4469 kernel <- mutex_owned
6029 kernel -> allocb_wait
5397 kernel -> allocb
4903 kernel -> kmem_cache_alloc
5236 kernel -> kmem_cache_alloc_debug
5242 kernel -> verify_and_copy_pattern
5294 kernel <- verify_and_copy_pattern
5744 kernel -> dblk_constructor
... 200 lines truncated ...
^C
There is a lot of output as flow passes from the user program to the kernel
and back again, all of which is traceable.
If you were familiar with previous state of the art tracing and debugging tools,
then DTrace has just raised the bar. Previously, truss (or strace)
by default prints some of the lines in violet, apptrace and truss -u
the lines in green, truss -ua.out the lines in blue, software debuggers
the lines in blue and green, and TNF could print a small nuber of the
red lines. DTrace sees all.
DTrace can also print user level instructions, which I could have printed
above in orange. If I had, the output would have been be many times longer.
Perhaps more useful would be to print the higher level kernel events which DTrace
can observe, which can instrument events within functions. Other possible
enhancements would be to print function entry and return arguments,
and to walk user and kernel data structures to print members of interest.
The colortrace.d script which generated the above output is as follows:
#!/usr/sbin/dtrace -s
/*
* colortrace.d - Trace function, syscall and kernel events, in color.
* Written for DTrace (Solaris 10 3/05).
*
* 26-Jan-2005, ver 1.00
*
* USAGE: colortrace.d -p PID
*
* This script demonstrates some key DTrace features: the capability
* to observe function calls across the entire software stack, and
* to measure timestamps for any event. colortrace.d shows the function
* flow of a process, along with delta times between lines of output.
*
* NOTES: Run this on something small, such as a shell. If you are on
* a multi-CPU server, pbind the process to a single CPU else the output
* may by shuffled.
*/
#pragma D option quiet
#pragma D option switchrate=10
/* see "man -s5 dtterm" for a color list */
inline string RED = "\033[31;1m";
inline string BLUE = "\033[34;1m";
inline string GREEN = "\033[32;1m";
inline string VIOLET = "\033[35;1m";
inline string OFF = "\033[0m";
dtrace:::BEGIN
{
last = timestamp;
printf("%-12s %16s %s\n", "DELTA(ns)", "TYPE", "FUNCTION");
}
syscall:::entry
/pid == $target/
{
depth++;
printf("%-12d %16s %s%*s -> %s%s\n", timestamp - last, probeprov,
VIOLET, depth, " ", probefunc, OFF);
self->insyscall = 1;
last = timestamp;
}
syscall:::return
/pid == $target/
{
printf("%-12d %16s %s%*s <- %s%s\n", timestamp - last, probeprov,
VIOLET, depth, " ", probefunc, OFF);
depth = depth > 0 ? --depth : 0;
self->insyscall = 0;
last = timestamp;
}
fbt:::entry
/self->insyscall/
{
depth++;
printf("%-12d %16s %s%*s -> %s%s\n", timestamp - last, "kernel",
RED, depth, " ", probefunc, OFF);
last = timestamp;
}
fbt:::return
/self->insyscall/
{
printf("%-12d %16s %s%*s <- %s%s\n", timestamp - last, "kernel",
RED, depth, " ", probefunc, OFF);
depth = depth > 0 ? --depth : 0;
last = timestamp;
}
pid$target:a.out::entry
{
depth++;
printf("%-12d %16s %s%*s -> %s%s\n", timestamp - last, probemod,
BLUE, depth, " ", probefunc, OFF);
last = timestamp;
}
pid$target:a.out::return
{
printf("%-12d %16s %s%*s <- %s%s\n", timestamp - last, probemod,
BLUE, depth, " ", probefunc, OFF);
depth = depth > 0 ? --depth : 0;
last = timestamp;
}
pid$target:lib*::entry
{
depth++;
printf("%-12d %16s %s%*s -> %s%s\n", timestamp - last, probemod,
GREEN, depth, " ", probefunc, OFF);
last = timestamp;
}
pid$target:lib*::return
{
printf("%-12d %16s %s%*s <- %s%s\n", timestamp - last, probemod,
GREEN, depth, " ", probefunc, OFF);
depth = depth > 0 ? --depth : 0;
last = timestamp;
}
Once you pick up the syntax of DTrace programming (which is straightforward
and based on C), you'll see that this is a trivial script. The version I demonstrated took minutes to write; I've spent a little more time on this version to make the output neater. A great place to start learning the syntax is the
DTrace Guide.
To capture output for this blog entry, I changed the color escape sequences
to print HTML font tags. Since the entire output of DTrace can be programmed,
we could generate XML if needed.
The times printed in the left column are delta times from the previous event -
which will help locate the source of latencys. It can take a little thought
to comprehend what they really mean - a large value does not always mean that
the named function to the right was slow. If needed, how these
time measurements are taken can be customized by modifying the script.
While colortrace.d serves its purpose as a demonstration of DTrace
observability, it doesn't actually represent a common use of DTrace. In practice, we use the
filtering features of DTrace to only trace events of interest, and the
aggregation features of DTrace to summarize data in useful ways.
|
|