Saurabh Mishra's Weblog

20050718 Monday July 18, 2005

Dtrace rocks...


Sometime back I had a problem with my desktop and as a result it started crawling whenever Java ticker used to kick in. I think I must share this with the rest of the world. I'd also share a kernel problem that we cracked and it was related to performance. So Dtrace has helped in solving many problems so far.

My desktop running Solaris 10 started crawling when I noticed that Xsun is eating up 68% of CPU. From prstat(1M)

# prstat
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
   594 ******     85M   78M run     30    0  14:03:19  68% Xsun/1
   796 root       16M   13M sleep   59    0   1:06:25 5.8% stfontserverd/18
[.]

I then started Dtrac'ing Xsun and noticed that lwp_sigmask() syscall call is being made too frequently by Xsun. Here is the data :-

# ./syscall.d
^C
Ran for 26 seconds


  writev                                                         2832
  pollsys                                                        3261
  read                                                           5910
  doorfs                                                        27199
  lwp_sigmask                                                  217592

LWP ID     COUNT
1          217592

              libc.so.1`__systemcall6+0x20
              libc.so.1`pthread_sigmask+0x1b4
              libc.so.1`sigprocmask+0x20
              libc.so.1`sighold+0x54
              libST.so.1`fsexchange+0x78
              libST.so.1`FSSessionDisposeFontInstance+0x8c
             9063

              libc.so.1`__systemcall6+0x20
              libc.so.1`pthread_sigmask+0x1b4
              libc.so.1`sigprocmask+0x20
              libc.so.1`sigrelse+0x54
              libST.so.1`fsexchange+0xc0
              libST.so.1`FSSessionGetFontRenderingParams+0x8c

...and many more such stack traces from libST.so.1`fsexchange().

Infact the stack is like this:-

              libc.so.1`__systemcall6+0x20
              libc.so.1`pthread_sigmask+0x1b4
              libc.so.1`sigprocmask+0x20
              libc.so.1`sighold+0x54
              libST.so.1`fsexchange+0x90
              libST.so.1`FSSessionGetFontRenderingParams+0x8c
              libST.so.1`GetRenderProps+0x344
              libST.so.1`GlyphVectorRepQuery+0xf4
              libST.so.1`STGlyphVectorQuery+0xd0
              SUNWXst.so.1`_XSTUseCache+0x68

Notice that in this stack trace, we are calling sighold() and sigrelse() too frequently. So this process is disabling and enabling signals for some reason. Looks like we are rendering characters, but why do we block and unblock signals in this path?. Here is the Dtrace script which was used :-

#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
        start = timestamp;

}

syscall:::entry
/execname == "Xsun"/
{
        @s[probefunc] = count();
}

syscall::lwp_sigmask:entry
/execname == "Xsun"/
{
        @c[curthread->t_tid] = count();
        @st[ustack(6)] = count();
}

END
{
        printf("Ran for %d seconds\n\n", (timestamp - start) / 1000000000);

        trunc(@s,5);
        printa(@s);

        printf("\n%-10s %-10s\n", "LWP ID", "COUNT");
        printa("%-10d %@d\n", @c);

        printa(@st);
}

In fact Dtrace could help us in solving much more complex problems. Happy Dtrac'ing...
(2005-07-18 19:44:08.0) Permalink

Comments:

Post a Comment:

Comments are closed for this entry.

Locations of visitors to this page
archives
links
referers