I was working on an issue that wasn't reproducible at will but had a failure frequency low enough to dtruss.  The problem with this approach is that it can leave you with a large dataset.  Unless you're actively watching for errors in /var/adm/messages or application logs and can quickly cross-reference or mark the dtruss output the only option is to post-process the ELAPSD column and calculate where in the dtruss dataset you need to look.  Here lies the first issue.  By default dtruss doesn't provide the date & time when the dtruss was started.  Easily fixed by running "date; dtruss <options>" but not ideal.  The first enhancement I've added to Brendan's dtruss is to print a walltimestamp in the dtrace:::BEGIN to make it similar to the truss(1M) output.  The second enhancement I added is to embed the walltimestamp in the truss output as a new column, the first column, which makes finding the relevant bit of dtruss trivial. I've added the '-w' flag to dtruss, also active via the '-a' argument.

# USAGE: dtruss [-acdeflhoLsw] [-t syscall] { -p PID | -n name | command }
#
#               -p PID          # examine this PID
#               -n name         # examine this process name
#               -t syscall      # examine this syscall only
#               -a              # print all details
#               -c              # print system call counts
#               -d              # print relative timestamps (us)
#               -e              # print elapsed times (us)
#               -f              # follow children as they are forked
#               -l              # force printing of pid/lwpid per line
#               -o              # print on cpu times (us)
#               -s              # print stack backtraces
#               -L              # don't print pid/lwpid per line
#               -w              # print walltimestamp (system date/time) 
#               -b bufsize      # dynamic variable buf size (default is "4m") 

Here's a quick example of my dtruss running showing both enhancements:

$ pfexec  ./dtruss -a -p `pgrep nwamd`
dtruss started tracing at 2009 Mar 12 22:57:44 
TIMESTAMP              PID/LWP    RELATIVE  ELAPSD    CPU SYSCALL(args)          = return
2009 Mar 12 22:57:55    4310/11:        65 10010277     11 lwp_park(0x0, 0xFDFCEF3C, 0x0)         = -1 Err#62
2009 Mar 12 22:58:05    4310/11:       118 10009670     10 lwp_park(0x0, 0xFDFCEF3C, 0x0)         = -1 Err#62
^C

CALL                                        COUNT
lwp_park                                        3

You can download my updated version of dtruss here.  Enjoy.

Comments:

Post a Comment:
Comments are closed for this entry.

This blog copyright 2009 by Steve Scargall