/usr/sbin/wall -a Brendan Gregg
Brendan Gregg, Fishworks engineer

Friday Nov 03, 2006

I've just decreased my vim start up time by 340x, which makes for an interesting example of using DTrace:

I noticed that the vim text editor was taking longer and longer to fire up, and it was now taking up to 14 seconds for vim to initialise. Something is definitely wrong - and there should be some big game for DTrace to hunt down.

Syscall analysis is a good starting point, and so I run procsystime from the DTraceToolkit to show syscall timing information and counts:

# /opt/DTT/procsystime -a -n vim
Hit Ctrl-C to stop sampling...
^C

Elapsed Times for processes vim,

         SYSCALL          TIME (ns)
     sigaltstack               1866
      sigpending               2046
[...]
           close           96958757
         pollsys         1694151254
       nanosleep        12098745228
          TOTAL:        14219906941

CPU Times for processes vim,

         SYSCALL          TIME (ns)
     sigaltstack                382
      sigpending                420
[...]
         connect            2644571
           xstat            3809476
            open            9690091
          TOTAL:           26863724

Syscall Counts for processes vim,

         SYSCALL              COUNT
          access                  1
           chmod                  1
[...]
            mmap                 67
             brk                188
           xstat                213
          TOTAL:               1041

This output has been cut to fit. Reading from bottom up:

  • Syscall Counts show nothing obvious - 1041 syscalls isn't a lot; what is more interesting is their elapsed and on-CPU times -- for example, those 213 xstat()s may have blocked on a slow NFS share.
  • CPU Times in total are 26 ms - so they don't explain our long start up times.
  • Elapsed Times show a total of 14 seconds, 12 seconds of which are during nanosleep. This corresponds to the long delay experienced as vim loaded.

Next is to examine why vim is calling nanosleep. We can start by pulling out user stack traces when nanosleep is called, either from a DTrace one-liner or using dtruss from the DTraceToolkit:

# dtrace -n 'syscall::nanosleep:entry /execname == "vim"/ { @[ustack()] = count(); }'
dtrace: description 'syscall::nanosleep:entry ' matched 1 probe
^C


              libc.so.1`__nanosleep+0x7
              libX11.so.4`_X11TransConnectDisplay+0x6b8
              libX11.so.4`XOpenDisplay+0xbd
              libXt.so.4`XtOpenDisplay+0xa0
              vim`setup_term_clip+0x233
              vim`main+0xb5f
              vim`_start+0x80
               12

# /opt/DTT/dtruss -st nanosleep -n vim
PID/LWP   SYSCALL(args)                  = return
849553/1:  nanosleep(0x8046C50, 0x8046C58, 0xD9213FA8)           = 0 0

              libc.so.1`__nanosleep+0x7
              libX11.so.4`_X11TransConnectDisplay+0x6b8
              libX11.so.4`XOpenDisplay+0xbd
              libXt.so.4`XtOpenDisplay+0xa0
              vim`setup_term_clip+0x233
              vim`main+0xb5f
              vim`_start+0x80
[...]

This shows how nanosleep was being called by vim. It was being called from libX11 in functions such as _X11TransConnectDisplay() -- huh? I'm using vim in text mode, why would it be talking to X11?

Well, if I have an X11 issue, let me start by checking the DISPLAY variable:

$ echo $DISPLAY
:11.0

Ahh - that doesn't seem right. Maybe I'm connected via ssh, and we are doing X11 port forwarding?:

$ ptree $$
369710 screen
  369715 /bin/bash
    851597 ptree 369715

Rather than showing sshd in our ancestory, this has shown screen. The screen software provides text based multiple session recovery. When I initialised screen, I did so via an ssh session; now I'm connecting via the console, and that ssh session is long gone - as is the DISPLAY :11.0.

Setting DISPLAY to :0.0 fixes the issue, and now procsystime shows the following syscall elapsed times:

# /opt/DTT/procsystime -Te -n vim
Hit Ctrl-C to stop sampling...
^C

Elapsed Times for processes vim,

         SYSCALL          TIME (ns)
     sigaltstack               1980
      systeminfo               2221
[...]
          access            4190720
           xstat            7960249
            open           16273686
          TOTAL:           42198388

A total elapsed time of 42 ms is 340 times faster than it was previously!

... I didn't answer why vim was talking to X11 in the first place: in the previous stack trace, the vim function that calls X is setup_term_clip(). There are numerous things we can do with DTrace to learn more about this, but first I'll check the vim source to see if a comment explains what is going on. This is what I found:

   /* Start using the X clipboard, unless the GUI was started. */
[...]
      setup_term_clip()

Ahh - vim was trying to use the X clipboard, which was failing as our DISPLAY variable was corrupt.

Thursday Oct 05, 2006

Yesterday at the CEC 2006 conference, Bryan Cantrill and I gave a demonstration of a prototype DTrace tcp provider, which is one of a suggested collection of DTrace network providers. I've just put together a website to list the demonstrations performed - CEC 2006 DTrace TCP.

Below are a couple of examples from the website. tcpio2.d traces TCP traffic and provides IP address, TCP port, TCP payload size and TCP flag information:

# ./tcpio2.d
           LADDR:PORT                 RADDR:PORT   BYTES FLAGS
   192.168.1.185:79     <-    192.168.1.109:56145      0 (SYN)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (SYN|ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      2 (PUSH|ACK)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     ->    192.168.1.109:56145    126 (PUSH|ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (FIN|ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (ACK)
   192.168.1.185:79     <-    192.168.1.109:56145      0 (FIN|ACK)
   192.168.1.185:79     ->    192.168.1.109:56145      0 (ACK)
^C

And tcpaccept2.d aggregates inbound connections by hostname and service name,

# ./tcpaccept2.d
dtrace: script './tcpaccept2.d' matched 1 probe
Tracing... Hit Ctrl-C to end.
^C
   HOSTNAME                   PORT        COUNT
   marlin6                    finger          1
   bass                       login           2
   marlin                     ssh             2
   marlin                     telnet          2
   marlin                     finger          3

The source to these scripts and numerous other demonstrations are on the website. Check DTrace discuss for discussions about this prototype provider.

Monday Sep 18, 2006

I've recently been writing a DTrace provider for JavaScript, using the user statically defined tracing interface (USDT). The codename for this project is Helper Monkey.

This was achieved by adding USDT probes to the Spider Monkey JavaScript Engine, which is used by the Mozilla family of browsers including Firefox. Since Solaris Nevada ships with Firefox, the DTrace JavaScript provider can be used by running a script that preloads a DTrace enhanced version of libmozjs.so, and runs the shipped version of Firefox.

So far I've added several probes to observe JavaScript function calls, object creation and destruction (garbage collection), and script execution. I've also written several DTrace scripts to report probe activity. While I have more probes planned, this provider has already reached a point where it may be useful to people, and so we are making the first (very alpha) version available.

DTrace meeting JavaScript may sound like an odd combination, especially if you grew impressions of JavaScript when it was first used to make buttons change colour, or to make temperature converters. But now JavaScript, as part of AJAX, is driving new and serious Web 2.0 applications. Google Maps is one of what may be many new (and very cool) applications to embrace these technologies.

Ways already exist to fetch debug information from JavaScript, such as running a debug build of the browser. However with DTrace people can run a normal browser build, with USDT probes, and dynamically enable debug information as needed. Without stopping or starting the browser. The following demonstrations should convey some of the advantages of DTracing JavaScript.


JavaScript: clock.html

The following is the source to clock.html, a simple website with some JavaScript code to print a clock with the format HH:MM:SS.

<HTML>
<HEAD>
<TITLE>Clock, JavaScript</TITLE>
<SCRIPT type="text/javascript">
function padZero(i)
{
        if (i < 10) return "0" + i
        return i
}

function startTime()
{
        var now = new Date
        var time = padZero(now.getHours()) + ":" +
            padZero(now.getMinutes()) + ":" +
            padZero(now.getSeconds());
        document.getElementById('clock').innerHTML = "time: " + time + "<br>"
        var timeout = setTimeout('startTime()', 1000)
}
</SCRIPT>
</HEAD>
<BODY onload="startTime()">
<DIV id="clock"></DIV>
</BODY>
</HTML>

While this clock is running in my browser, the following DTrace scripts are used for analysis.


DTrace/JavaScript: Function Calls

The js_funccalls.d DTrace script counts JavaScript function calls as it is running. Here I run it for 9 seconds then hit Ctrl-C,

# ./js_funccalls.d
Tracing... Hit Ctrl-C to end.
^C
 FILE                             FUNC                                    CALLS
 clock.html                       getElementById                              9
 clock.html                       getHours                                    9
 clock.html                       getMinutes                                  9
 clock.html                       getSeconds                                  9
 clock.html                       setTimeout                                  9
 clock.html                       startTime                                   9
 clock.html                       padZero                                    27

The FUNC column lists JavaScript functions, and the FILE column is where they were called from (their context). DTrace captured 9 updates to the clock, with most of the functions called 9 times. padZero() was called 27 times, as it is called three times during each clock update.

The source to js_funccalls.d is straightforward,

#!/usr/sbin/dtrace -Zs
/*
 * js_funccalls.d - measure JavaScript function calls using DTrace.
 *                  Written for the Solaris Nevada DTrace JavaScript provider.
 *
 * 12-Sep-2006, ver 0.90
 *
 * USAGE: js_calls.d            # hit Ctrl-C to end
 *
 * FIELDS:
 *              FILE            Filename that contained the function
 *              FUNC            JavaScript function name
 *              CALLS           Function calls during this sample
 */

#pragma D option quiet

dtrace:::BEGIN
{
        printf("Tracing... Hit Ctrl-C to end.\n");
}

javascript*:::function-entry
{
        @funcs[basename(copyinstr(arg0)), copyinstr(arg2)] = count();
}

dtrace:::END
{
        printf(" %-32s %-36s %8s\n", "FILE", "FUNC", "CALLS");
        printa(" %-32s %-36s %@8d\n", @funcs);
}

the probe javascript*:::function-entry fires when a JavaScript function begins to execute. There is another probe, javascript*:::function-return for when the function completes. The probe arguments are still being designed, for now arg0 is a the file pathname, arg1 is a classname (if available), and arg2 is the function name.


DTrace/JavaScript: Function Flow

Now we will use both the entry and return probes to trace the flow of the JavaScript, with indentation as we enter and exit functions. The js_funcflow.d script does this.

# ./js_funcflow.d
C TIME                  FILE                     -- FUNC
1 2006 Sep 18 12:03:28  clock.html               -> startTime
1 2006 Sep 18 12:03:28  clock.html                 -> getHours
1 2006 Sep 18 12:03:28  clock.html                 <- getHours
1 2006 Sep 18 12:03:28  clock.html                 -> padZero
1 2006 Sep 18 12:03:28  clock.html                 <- padZero
1 2006 Sep 18 12:03:28  clock.html                 -> getMinutes
1 2006 Sep 18 12:03:28  clock.html                 <- getMinutes
1 2006 Sep 18 12:03:28  clock.html                 -> padZero
1 2006 Sep 18 12:03:28  clock.html                 <- padZero
1 2006 Sep 18 12:03:28  clock.html                 -> getSeconds
1 2006 Sep 18 12:03:28  clock.html                 <- getSeconds
1 2006 Sep 18 12:03:28  clock.html                 -> padZero
1 2006 Sep 18 12:03:28  clock.html                 <- padZero
1 2006 Sep 18 12:03:28  clock.html                 -> getElementById
1 2006 Sep 18 12:03:28  clock.html                 <- getElementById
1 2006 Sep 18 12:03:28  clock.html                 -> setTimeout
1 2006 Sep 18 12:03:28  clock.html                 <- setTimeout
1 2006 Sep 18 12:03:28  clock.html               <- startTime
...

The above output is repeated every second, as the clock is updated. We can see that each of the functions is called within startTime, which was called by the timeout. The arrows denote as we enter (->) or leave (<-) functions.

While this output makes for a great demo, it may not be that practical for larger JavaScript applications, where the output can run into thosands of lines. The output can also be shuffled on multi-CPU clients if the JavaScript engine skips between CPUs; for this reason the first column has been printed, C for CPU-id, if this changes then the output may be out of order.

DTrace/JavaScript: Object Creation

The following DTrace script tallys newly created objects,

# ./js_objnew.d
Tracing... Hit Ctrl-C to end.
^C
 FILE                     LINENO  CLASS                                  COUNT
 clock.html               13      Date                                       5

I ran the script for five seconds then hit Ctrl-C. The output shows that 5 Date objects were created, caused by line 13 of clock.html. If you count to line 13 on clock.html listed earlier, you will see that it was "var now = new Date". Great!

DTrace/JavaScript: Object Leakage

Apart from trace object creation, we can also trace object destruction from the garbage collector. The js_objgcx.d script does this, providing tallys of the objects in memory and their origin, every five seconds.

# ./js_objgcx.d
Tracing... Hit Ctrl-C to end.

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                        5

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                       10

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                       15

 FILE                     LINENO  CLASS                                   TOTAL
 clock.html               13      Date                                       20
...

As objects are created, their tally is increased; as they are garbage collected, their tally is decreased.

Every five seconds we find that there is an additional 5 Date objects in memory, and another summary is printed out. At the end of this output there were 20 Date objects in memory. If I were to leave this running for a long time, eventually the garbage collector would kick in and trim the number down (to, say, 1000) - which we could see happen in our output. For more complex JavaScript which creates objects faster, this will naturally happen sooner.

If our script created objects that were not garbage collected, their count would continue increasing, and the browser would consume more and more RAM.

DTrace/JavaScript: Function Time

Since we can trace function entry and return, we can also measure the elapsed time for functions to execute. This can be useful to identify if there is a slow function whose code could be improved. The js_functime.d script provides several experimental metrics for function calls,

# ./js_functime.d 
Tracing... Hit Ctrl-C to end.
^C
ELAPSED TIME DISTRIBUTION,

  getSeconds                                        
           value  ------------- Distribution ------------- count    
            4096 |                                         0        
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           16384 |                                         0        

  padZero                                           
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@               23       
            8192 |@@@@@@@@@@@@@@                           13       
           16384 |                                         0        

  getElementById                                    
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           65536 |                                         0        

  getHours                                          
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           65536 |                                         0        

  getMinutes                                        
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
           65536 |                                         0        

  setTimeout                                        
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@                               3        
           65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           9        
          131072 |                                         0        

  startTime                                         
           value  ------------- Distribution ------------- count    
          524288 |                                         0        
         1048576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 12       
         2097152 |                                         0        

___ OVERLAP TIMES: ___                                     ______ ELAPSED _____
FILE                     FUNCTION                 COUNT    AVG(us)      SUM(us)
clock.html               getSeconds                  12          8          104
clock.html               padZero                     36          7          275
clock.html               getMinutes                  12         45          548
clock.html               getElementById              12         53          640
clock.html               getHours                    12         59          714
clock.html               setTimeout                  12         67          807
clock.html               startTime                   12       1219        14637

These function times are overlapping, such that the time fore startTime() contains both the startTime() instructions, and any sub functions called - such as padZero(), etc.

What I found interesting from the output was that getHours() (average 59 us) was much slower than getSeconds() (average 8 us); wouldn't they have equivalent times? Some digging with DTrace shows what other userland library functions occured during getHours() and getSeconds():

getHours()

DTracing PID: 278562, "firefox-bin"
CPU FUNCTION                                 
  0  -> js_Interpret                            getHours                         
  0    -> js_Invoke                             libmozjs.so.1                    
  0      -> JS_GetPrivate                       libmozjs.so.1                    
  0      <- JS_GetPrivate                       libmozjs.so.1                    
  0      -> js_ComputeThis                      libmozjs.so.1                    
  0      <- js_ComputeThis                      libmozjs.so.1                    
  0      -> date_getHours                       libmozjs.so.1                    
  0        -> date_getProlog                    libmozjs.so.1                    
  0          -> JS_InstanceOf                   libmozjs.so.1                    
  0          <- JS_InstanceOf                   libmozjs.so.1                    
  0        <- date_getProlog                    libmozjs.so.1                    
  0        -> DaylightSavingTA                  libmozjs.so.1                    
  0  -> PRMJ_DSTOffset                          libmozjs.so.1                    
  0    -> jsll_udivmod                          libmozjs.so.1                    
  0      -> CountLeadingZeros                   libmozjs.so.1                    
  0      <- CountLeadingZeros                   libmozjs.so.1                    
  0      -> norm_udivmod32                      libmozjs.so.1                    
  0      <- norm_udivmod32                      libmozjs.so.1                    
  0    <- jsll_udivmod                          libmozjs.so.1                    
  0    -> PRMJ_basetime                         libmozjs.so.1                    
  0      -> PRMJ_ToExtendedTime                 libmozjs.so.1                    
  0        -> PRMJ_LocalGMTDifference           libmozjs.so.1                    
  0          -> memset                          libc.so.1                        
  0          <- memset                          libc.so.1                        
  0          -> mktime                          libc.so.1                        
  0            -> ___errno                      libc.so.1                        
  0            <- ___errno                      libc.so.1                        
  0            -> getsystemTZ                   libc.so.1                        
  0              -> assert_no_libc_locks_held   libc.so.1                        
  0              <- assert_no_libc_locks_held   libc.so.1                        
  0              -> getenv                      libc.so.1                        
  0                -> findenv                   libc.so.1                  
...
[175 lines in total]

getSeconds()

DTracing PID: 278562, "firefox-bin"
CPU FUNCTION                                 
  0  -> js_Interpret                            getSeconds                       
  0    -> js_Invoke                             libmozjs.so.1                    
  0      -> JS_GetPrivate                       libmozjs.so.1                    
  0  <- JS_GetPrivate                           libmozjs.so.1                    
  0  -> js_ComputeThis                          libmozjs.so.1                    
  0  <- js_ComputeThis                          libmozjs.so.1                    
  0  -> date_getUTCSeconds                      libmozjs.so.1                    
  0    -> date_getProlog                        libmozjs.so.1                    
  0      -> JS_InstanceOf                       libmozjs.so.1                    
  0      <- JS_InstanceOf                       libmozjs.so.1                    
  0    <- date_getProlog                        libmozjs.so.1                    
  0    -> SecFromTime                           libmozjs.so.1                    
  0      -> floor                               libm.so.2                        
  0      <- floor                               libm.so.2                        
  0      -> fmod                                libm.so.2                        
  0      <- fmod                                libm.so.2                        
  0    <- SecFromTime                           libmozjs.so.1                    
  0    -> js_NewNumberValue                     libmozjs.so.1                    
  0    <- js_NewNumberValue                     libmozjs.so.1                    
  0  <- date_getUTCSeconds                      libmozjs.so.1                    
  0  -> PR_AtomicIncrement                      libnspr4.so                      
  0  <- PR_AtomicIncrement                      libnspr4.so                      
  0  -> memset                                  libc.so.1                        
  0  <- memset                                  libc.so.1                        
  0  <- js_Invoke                               libmozjs.so.1                    
  0  -> JS_GetPrivate                           libmozjs.so.1                    
  0  <- JS_GetPrivate                           libmozjs.so.1                    
  0  -> JS_GetStringBytes                       libmozjs.so.1                    
  0    -> js_GetStringBytes                     libmozjs.so.1                    
  0      -> PR_Lock                             libnspr4.so                      
  0        -> mutex_lock                        libc.so.1             
...
[59 lines in total]

Tracing getSeconds() produced 59 lines of output (from about 28 function calls), however getHours() produced 175 lines of output (from about 86 function calls). The output has been trimmed, but enough has been included to see the difference - getHours() looks up the timezone, getSeconds() does not!

Probes and Arguments so far

  • javascript*:::function-entry (filename, classname, funcname)
  • javascript*:::function-return (filename, classname, funcname)
  • javascript*:::object-create-start (filename, classname)
  • javascript*:::object-create (filename, classname, *object, lineno)
  • javascript*:::object-create-done (filename, classname)
  • javascript*:::object-finalize (NULL, classname, *object)
  • javascript*:::execute-start (filename, lineno)
  • javascript*:::execute-done (filename, lineno)

More probes will be added, and the arguments are likely to change (improve).

What's next

The DTrace scripts, the modified Spider Monkey source (for Firefox 1.5.0.4), and a precompiled Firefox 1.5.0.4 libmozjs.so for Solaris Nevada x86, are available in this tar file. This is very much an alpha release for the super-keen (you'll need the latest Solaris Nevada at least, and compilation skills); it even contains code comment tags for debugging that will be dropped ("BDG"). I'll be spending time finding out how best deliver this, such as making it easier for people to get running, and asking the Mozilla folks for help on how best to distrubite it.