« iSCSI DTrace provide... | Main | DTrace Links »
http://blogs.sun.com/brendan/date/20070810 Friday August 10, 2007

DTrace Bourne shell (sh) provider

Alan Hargreaves has prototyped a DTrace Bourne shell provider, and is looking ahead to see how this can be integrated with Solaris and OpenSolaris. This is great news for anyone who would like to analyze or troubleshoot their Bourne shell scripts. It is also good news for people (especially system administrators) wanting to learn DTrace, as it may provide a familiar path (sh scripting) for you to learn user-land DTracing.

Why sh, and not csh, tcsh, ksh, bash or zsh? Alan could have done any, we wanted to start with the Bourne shell to both get it done and as an example for the other shell providers to follow (watch this space). So if you really just care about bash or whatever, then learning about the sh provider now should be a useful heads-up of what should be possible.

hello.sh

When learning to DTrace software, it can be best to start with the simple stuff. Confirm that what you think will happen, does indeed happen, then move to progressivly more complex scripts.

Here is hello.sh, a dead simple script to start with,

The following DTrace one-liner will run hello.sh (-c), trace all sh provider probes (sh*:::), and frequency count their probe names (@[probename] = count()).

So builtin-entry and builtin-return were called 5 times, one for each echo. The line probe was also called 5 times, once for each line that the shell executed.

Frequency counting builtins

Each probe provides useful information in their args, such as the script name, builtin or exec name, line number, etc. Here I'll use the builtin-entry probe to frequency count the builtin names,

That found that echo was called 5 times. Nothing suprising there, this is just confirming that the probes behave as they should.

Line number tracing

The line probe allows us to trace line execution. This one liner counts how many times each line was run,

Each of the lines 3 to 7 were called once, as expected. This one-liner should be useful for finding loop hotspots.

And the following one-liner traces line execution live,

Great, that works. Is this getting too easy? lets move on...

func_abc.sh

Now for somthing a little harder (and more interesting). In the following script, func_a() calls func_b() which calls func_c().

# cat -n func_abc.sh   
     1  #!./sh
     2
     3  func_c()
     4  {
     5          echo "Function C"
     6          sleep 1
     7  }
     8
     9  func_b()
    10  {
    11          echo "Function B"
    12          sleep 1
    13          func_c
    14  }
    15
    16  func_a()
    17  {
    18          echo "Function A"
    19          sleep 1
    20          func_b
    21  }
    22
    23  func_a
What order will the lines be executed now? Try thinking about it before checking the answer below.

Tracing line execution

There were some conspicuous one-second pauses during the output, which are sadly lost in this screen shot. I've added the word "pause" so that you know when that happened.

The order of the lines does make sense. What might be confusing is that we see the output text "Function B" followed by the trace of line 20 -- this is due to DTrace output buffering.

Tracing function flow - sh_flow.d

Since we have function-entry and function-return probes, we can trace function execution and indent the output based on the function depth. The following demonstrates sh_flow.d - a script which does just that for any script running on the system (with this sh provider enabled binary), and also traces builtin calls and command executions. sh_flow.d is run in one window, and func_abc.sh runs in another.

The legend is,

Great! the output makes it clear who is calling who.

The sh_flow.d script is,

Tracing function flow with times - sh_flowtime.d

A great feature of DTrace is the availability of high resolution timestamps. Here I've enhanced the previous script to print delta times,

The legend for understanding the delta times is, The legend is,

Hey - we found the latency in the script, it's those 1.02 second calls to "sleep"! :-)

Being able to present this information in such a compelling way is awsome stuff.

I won't include the source to that script just yet (this blog entry will get too long), but I'll encourage Alan to bundle these scripts along with the patch/binary of the sh provider (when he makes that available).

Tracing type times - sh_typetime.d

If your script is rather long, the above output from sh_flowtime.d may get rather long. This condensed summary may be more practical,

It's pretty obvious from that output where the time was spent.

I was experimenting with more exciting forms of that type of script,

Lots more will be possible - Alan only wrote the provider two days ago, so I haven't had much of a chance to write scripts yet!

Real world tracing

Alan literally wrote this provider from scratch in the last few days, so we haven't had a chance to point it at something big yet. Here's at least one example of something familiar, /etc/profile,

Cool. See Alan's blog for updates.



Posted by brendan [DTrace] ( August 10, 2007 12:23 PM ) Permalink | Comments[1]

Comments:

Great stuff ! Look forward to seeing more on this.

Posted by Jeroen on August 14, 2007 at 03:30 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed