| |
Tom Erickson's Weblog
Tom Erickson's Weblog
Wednesday January 24, 2007 |
|
Chime and the DTraceToolkit, Part 2
In my previous entry, Chime and the DTraceToolkit, I demonstrated how to adapt a script from the DTraceToolkit for display in Chime (including screenshots). I chose a simple script, bitesize.d, and deferred the investigation of how easily Chime could run more complicated scripts from the toolkit.
Before I pick up that investigation again (this time using procsystime), I want to highlight a point from the previous entry that may have gone unnoticed in the larger discussion. The point is that Chime can display DTrace programs directly from the command line, provided that they satisfy a few
simple requirements. You can try this with many of the OneLiners from Brendan Gregg's DTrace Tools page, for example:
# Read bytes by process,
dtrace -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'
To run with Chime, simply change dtrace to chime:
/opt/OSOL0chime/bin/chime -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'
You can use -n to specify a program string and -s to specify a program file, just as you would with dtrace(1M). Similarly, you can specify DTrace options using -xoption=value and -Z. The column headers are taken from identifiers in your D program unless you specify headers with the -h option. You can specify a title for the title bar other than "Display" using the -t option. The previous entry has more detail on how to improve the display (and how to save it with the -w option).
Adapting a Shell Script
Most of the DTrace programs used in the toolkit are wrapped in shell scripts that offer extensive options. The iosnoop script, for example, has the following options:
# USAGE: iosnoop [-a|-A|-DeghiNostv] [-d device] [-f filename]
# [-m mount_point] [-n name] [-p PID]
Since the DTrace API cannot compile shell scripts, Chime cannot run them as-is. The DTrace program first needs to be removed from its wrapper. What I found is that the various "snoop" scripts (iosnoop, execsnoop, opensnoop, etc.) are not good candidates for adaptation to Chime because they do not use aggregations and instead rely on
printf(). In a snoop, every probe firing is appended to the output. In Chime, that would translate into an event-driven display rather than a regular sampling of the aggregate. The Java DTrace API provides a ConsumerListener interface to listen for ProbeData in a DataEvent; the formatted elements of a DTrace printf() statement can thus be obtained from the getRecords() method of a PrintfRecord included in the probe data. The point is that it's certainly possible to add an event-driven display module to Chime, but currently Chime does not support printf(). Often it makes sense to change a script that relies on printf() so that it aggregates its data instead, but I don't think that's the case with any of the snoop scripts.
If the shell script wraps a DTrace program that does use aggregations, you still have to look at each shell option one by one and decide how it will be supported in Chime. As an example, I've done this with procsystime, a script that supports the following options:
# USAGE: procsystime [-acehoT] [ -p PID | -n name | command ]
#
# -p PID # examine this PID
# -n name # examine this process name
# -a # print all details
# -c # print syscall counts
# -e # print elapsed times
# -o # print CPU times
# -T # print totals
I decided that the -a, -c, -e, and -o options could all be dropped, since there is no reason for Chime not to display all the fields at once. When running procsystime -a on the command line, syscall counts, elapsed times, and CPU times are displayed in separate output blocks stacked vertically, so that only syscall counts are visible without scrolling up to see the aggregated times. Chime will improve on this by placing the three value types side-by-side in a row for each process, with column headers that never scroll out of view.
Similarly, I decided to drop the -T option, since I saw no reason not to display totals every time.
That left the three mutually exclusive options: [ -p PID | -n name | command ]. Without a shell wrapper, the only way to support these in Chime is with macro arguments for the -p and -n options, and target process for the command option. The basic idea is to replace shell script option assignments with macro argument assignments. So, from procsystime:
/*
* Command line arguments
*/
inline int OPT_elapsed = '$opt_elapsed';
inline int OPT_cpu = '$opt_cpu';
inline int OPT_counts = '$opt_counts';
inline int OPT_filter = '$opt_filter';
inline int OPT_pid = '$opt_pid';
inline int OPT_name = '$opt_name';
inline int OPT_totals = '$opt_totals';
inline int OPT_command = '$opt_command';
inline int PID = '$pid';
inline string NAME = "'$pname'";
inline string COMMAND = "'$command'";
we change the assignments as follows in procsystime.d as it will be used by Chime:
inline string NAME = $$1;
inline int PID = $2;
inline int OPT_pid = (PID > 0);
inline int OPT_name = (NAME != "");
inline int OPT_command = ($target > 0);
inline int OPT_filter = (OPT_pid || OPT_name || OPT_command);
The -a, -c, -e, -o, and -T (totals) options are dropped. I chose not to tab-align the assignment operators, since the alignment does not display correctly in Chime's program viewer (in spite of monospace font). I think that inlining the cacheable expressions should be more efficient than assigning them to non-cacheable global variables in the BEGIN clause, but I may be missing a more efficient way to do this.
Differences such as the removal of the END clause along with its printa() statements were already explained in the previous entry; I won't repeat them here. The unwrapped program is now ready for display in Chime:
/opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time"
The above command generates a draft display (I could have omitted the -h and -t options for this first cut). The most important part of the command to note here is the -xdefaultargs option. This sets the "defaultargs" DTrace compile-time option, which tells DTrace to use zero (0) or empty string ("") as the value for unspecified macro args. Since PID, NAME, and COMMAND are mutually exclusive, we need to leave all but one unspecified. Without -xdefaultargs, that would fail with the following error:
failed to compile script /opt/OSOL0chime/displays/new/procsystime.d: line 44: macro argument $$1 is not defined
Specifying Macro Arguments
Chime reads the macro variables in your program and generates a dialog to prompt for the argument values:
You can specify labels for the macro variables using the -M option, potentially sparing people the effort of inspecting your program to decide what kind of values are expected:
/opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time" \
-M "Process Name, Process ID"
To enforce the mutual exclusivity of these options, surround the macro argument labels after -M with square brackets:
/opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time" \
-M "[Process Name, Process ID, \$target]"
In order to group all three options, it is necessary to include the special string "$target" (escaping the dollar-sign), which indicates the order of the Target Process field relative to the named macro arguments. If the $target variable appears in your DTrace program, and you omit "$target" after -M, the Target Process appears first by default. The above command instead places it last (just for demonstration purposes):
Now it is only possible to specify one of the three options, as intended in the original procsystime script. The first field (in the order specified by -M) is selected initially. The remaining unselected fields are disabled.
It is possible to bracket multiple groups of argument labels after -M, resulting in multiple radio button groups. The first label in each group is selected by default. Without the square brackets, Chime will not generate radio buttons, allowing any or all of the arguments to be specified. (Macro argument grouping can also be done in Chime's display creation wizard.)
Without the -xdefaultargs option, Chime will require all options to be specified before enabling the OK button.
If the macro argument labels are not enough to make clear what is expected, a person can still inspect the DTrace program by clicking the View D Program button:
The procsystime Display
After selecting Target Process in the argument prompt and clicking OK (accepting the "date" value already specified with -c on the command line), the following display appears:
This is a rough first cut, with many opportunities for improvement:
- Add a totals row (to satisfy the original
-T option)
- Link the ranges of the Elapsed Time and CPU Time columns so they are visually comparable
- Add unit labels to the nanosecond time values
- Use a different color for the Count column, to differentiate it from the time columns
- Set the initial sort to Elapsed Time descending
- Make the window a little wider
As with bitesize.d, we'll also want to accumulate values over time, rather than clearing them after each time interval. These changes are not yet supported from the command line, so we need to save what we've done so far by repeating the command with the -w option:
; /opt/OSOL0chime/bin/chime -c 'date' -s procsystime.d -xdefaultargs \
-h "System Call, Count, Elapsed Time, CPU Time" -t "Process System Call Time" \
-M "[Process Name, Process ID, \$target]" -w
Wrote /opt/OSOL0chime/displays/new/process_system_call_time.xml
To run, enter "/opt/OSOL0chime/bin/chime -C /opt/OSOL0chime/displays/new/process_system_call_time.xml".
This writes the .xml display description that we can edit using Chime's display wizard. Both the previous entry and the New Display Wizard page give examples of how to make these kinds of changes, so I won't go through the steps here. Instead I'll skip to the final display after all the changes are finished:
I also took some text from the comments in the procsystime script to put in the Description pane when the new display is selected:
The description helps people understand what they're looking at and remains visible while the display is running as long as the new display is selected.
To try the procsystime display for yourself, add the following files to /opt/OSOL0chime/displays/new:
Then run /opt/OSOL0chime/bin/chime without any options and select New Displays from the Trace Group pulldown. Finally, double-click Process System Call Time ... from the Traces list. Macro argument grouping (for mutually exclusive options) is a new feature in Chime version 1.4.20 (16 Jan 2007), so if you don't already have that version, you can download it here.
( Jan 24 2007, 11:20:43 PM PST / Jan 24 2007, 11:20:43 PM PST )
Permalink
Trackback: http://blogs.sun.com/tomee/entry/chime_and_the_dtracetoolkit_part
|
|
|
Thursday January 04, 2007 |
|
Chime and the DTraceToolkit
A while back someone suggested in the "Chime Comments/Questions" thread on the DTrace discussion list
"Maybe we can all chip in money and beer (or pop if he doesn't drink alcohol) to get Brendan to integrate his toolkit with chime. :)"
I've been meaning to look into this myself. Chime is a graphical tool for displaying DTrace aggregations. Its ability to sort by multiple columns, display column totals, and plot aggregation values over time might provide useful advantages over command-line output for many of the scripts in the toolkit. I wanted to see if the idea of integrating the toolkit is feasible.
Generating a Display from the Toolkit
First, I downloaded and installed the DTraceToolkit. Next I scanned Brendan's DTrace Tools page for a simple script and found bitesize.d:
"a simple program to examine the way in which processes use the disks ..."
This sounded like a good one to start with, so I located bitesize.d in the toolkit:
; find . -name bitesize.d
./DTraceToolkit-0.96/Bin/bitesize.d
./DTraceToolkit-0.96/Disk/bitesize.d
;
I diff'd the files and found that they are identical, so I ran the one in Bin to get an idea of what Chime ought to display:
; ./Bin/bitesize.d
Tracing... Hit Ctrl-C to end.
^C
PID CMD
;
Nothing. Of course, I needed to generate some I/O in order to get any data. So I ran it again, and this time I wrote some bytes to a file in another window:
; echo "cat dog mouse" > tmp.txt
;
then I pressed Ctrl-C:
; ./Bin/bitesize.d
Tracing... Hit Ctrl-C to end.
^C
PID CMD
3 fsflush\0
value ------------- Distribution ------------- count
256 | 0
512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
1024 | 0
0 sched\0
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@ 1
8192 |@@@@@@@@@@@@@@@@@@@@ 1
16384 | 0
;
Without even looking at the script, I could see that it's a distribution of I/O size in bytes aggregated by process ID and command. Chime has -n and -s options like those in dtrace(1M) for running arbitrary DTrace programs, so I used the -s option to specify the program file:
; /opt/OSOL0chime/bin/chime -s ./Bin/bitesize.d
WARNING: printa on line 79 makes aggregation @Size unavailable to Chime.
;
An empty Chime window accompanied the console warning. No problem. This issue with printa() is covered on the Chime project page under Adding New Chime Displays (a link in the left margin) in the section Adapting an Existing DTrace Program, item 2:
If you use the printa() action on an aggregation, that aggregation will thereafter be unavailable to Chime. Typically, printa() is found in a profile probe such as tick-1sec. You should delete these tick clauses from your program. Instead of printing your aggregations, you are now relying on Chime to display them.
So I made a copy of bitesize.d with the following changes:
; diff ./Bin/bitesize.d ~/bitesize.d
48a49
> * 02-Jan-2007 Tom Erickson Adapted for use in Chime.
51,52d51
< #pragma D option quiet
<
54,61d52
< * Print header
< */
< dtrace:::BEGIN
< {
< printf("Tracing... Hit Ctrl-C to end.\n");
< }
<
< /*
71,79d61
< }
<
< /*
< * Print final report
< */
< dtrace:::END
< {
< printf("\n%8s %s\n", "PID", "CMD");
< printa("%8d %S\n%@d\n", @Size);
;
There is no need to print anything in the BEGIN and END clauses, so I deleted them. Chime will display the aggregated distribution on a regular interval. That interval will default to 1 second, but Chime lets you change the interval while the display is running. (It's also possible to set the initial interval by specifying the aggrate option in the display's
XML description.)
It would be nice if Chime could run bitesize.d without modifications. If there was a DTrace option to ignore specified actions, Chime could tell DTrace to ignore both printa() and clear(), but for now you need to delete these actions from existing DTrace programs.
So I tried again with the modified bitesize.d:
; /opt/OSOL0chime/bin/chime -s ~/bitesize.d
Although the console warning about printa() was gone, I still got the same empty Chime window. Of course the reason is that I need to generate some I/O in order to get any data (just as I do when running the script on the command line). Yes, Chime should print a message to that effect on the console. I'll add that soon. (When a colleague of mine is asked for work status he responds, "It's perfect and getting better every day."). Fortunately Chime has the option to wait a given number of seconds before generating the display, just for cases like this, giving time to generate I/O in another window:
; /opt/OSOL0chime/bin/chime -s ~/bitesize.d -S 10
By default, Chime waits one second before generating the display. Waiting ten seconds lets us run the following in another window, as before:
; echo "cat dog mouse" > tmp.txt
;
As I write this, I realize that a better solution might be to drop the -S option and instead make Chime wait as long as it needs for non-empty aggregations before generating the display (although it would need some way to know which aggregations to wait for); but for now, a ten second countdown on the console gets the job done. This time a working display appeared:
The generated column names are taken from the DTrace program (except for "bucket" over the I/O size buckets, which lack a name within the program). Also the default title "Display" in the title bar isn't very helpful. Still this is a good start, and we can easily improve on it:
; /opt/OSOL0chime/bin/chime -s ~/bitesize.d -S 10 -h "PID,Command,Bytes,Count" \
-t "I/O Size Distribution"
This looked good, so I ran it once more with the -w option to write the .xml display description:
; /opt/OSOL0chime/bin/chime -s ~/bitesize.d -S 10 -h "PID,Command,Bytes,Count" \
-t "I/O Size Distribution" -w
0
Wrote /opt/OSOL0chime/displays/new/bitesize.d
Wrote /opt/OSOL0chime/displays/new/i_o_size_distribution.xml
To run, enter "/opt/OSOL0chime/bin/chime -C /opt/OSOL0chime/displays/new/i_o_size_distribution.xml".
;
Chime prints a message telling you how to run the generated display. The -S option to delay a given number of seconds will no longer be necessary, since the display is already generated (it no longer needs data to generate columns and column headers) and will, from now on, run successfully even before any data appears.
Options
Chime also borrows -Z and -x from dtrace(1M) to specify DTrace options for the generated display. For example, if the display was failing due to drops, you could add -xaggsize=4m to increase the size of the aggregation buffer (from Chime's default 256 KB). If you wanted to change the
initial interval from one second to five seconds, you would specify -xaggrate=5s.
To make further changes to the display not supported on the command line, you can edit the XML description in Chime's New Display Wizard. For example, bitesize.d does not use a tick-1sec clause to print data on a regular interval, but instead prints the accumulated data once at the very end. This probably indicates that a display of running totals would more closely match the intention of the script than a display of values per time interval. To
access the wizard, run Chime without any options and click the icon in the toolbar:
Click the "Browse ..." button and select i_o_size_distribution.xml to load the generated display description. Then click "Next" at the bottom of the wizard twice to reach the "Set Cleared Aggregations" step:
Click "Clear only selected aggregations" and leave the "@Size" aggregation unchecked. This will result in running totals, since the aggregation values will never get reset to zero. Then click "Next" at the bottom of the wizard three more times to reach the "Provide a Description" step:
The above description is taken (mostly unchanged) from the comments in the original bitesize.d file. Enter the description and click "Finish".
To run the display, first select "New Displays" from the "Trace Group" pulldown:
Select "I/O Size Distribution" (the display title specified with -t on the command line) to view its description:
Double-click "I/O Size Distribution" to run the display:
Now the counts grow over time, just like they do when running the script on the command line, except that we can see them grow instead of blindly picking the instant for the final viewable result (the Chime display can also be paused at any time).
One script from the DTraceToolkit is now integrated with Chime. Granted, bitesize.d is probably the simplest script in the toolkit (other than the one-liners, which you can try as-is with Chime's -n option, except for a few that don't use aggregations; for example /opt/OSOL0chime/bin/chime -n 'sysinfo:::readch { @bytes[execname] = sum(arg0); }'). The other scripts I looked at will not integrate so easily, since they are shell scripts with extensive options and many do not use aggregations. I'm thinking that some will need custom visualization tools to do them justice. I still think that many of them can be adapted in a way that preserves the original intention and increases usefulness, but the process will be more involved (for example, deciding if changing a script to use aggregations defeats the original performance considerations, or deciding which options are supportable by equivalent features in Chime, or which options justify multiple displays, etc.). Still I hope that this bitesize.d example gives a good idea of how to go about integrating an existing script with Chime.
After integrating a few more scripts from the toolkit, the next thing to do will be to create a separate "DTraceToolkit" subdirectory for them under
/opt/OSOL0chime/displays, move the appropriate .xml and .d files from /opt/OSOL0chime/displays/new to
the new subdirectory, and add a description.xml file to summarize the directory contents. Thereafter the group of displays can be selected from the "Trace Group" pulldown and a description of the group will appear in the "Description" pane. If many scripts from the toolkit are integrated, they can be organized in a subtree under the "DTraceToolkit" directory just as they are in the toolkit itself. Chime automatically generates submenus under the "Load" menu (under
"File" in the menu bar) that mirror the directory subtree under /opt/OSOL0chime/displays. The "Trace Group" pulldown provides a flat list of the trace groups most recently loaded (remembered across multiple Chime sessions).
It would be nice if Chime could manage display directories for you, but for now you still need to create the directories manually. That's on my to-do list, along with a text pane to write the directory description (you'll need to copy and modify an existing description.xml file into each new directory for now).
Let me know if there's a particular script that you'd like to see integrated. If you're interested in integrating a script on your own, excellent! You are welcome to post it on the DTrace mailing list (that is the official discussion list for the Chime project).
( Jan 25 2007, 05:32:21 PM PST / Jan 04 2007, 04:04:36 PM PST )
Permalink
Trackback: http://blogs.sun.com/tomee/entry/chime_and_the_dtracetoolkit
|
|
|
|
| « January 2007 » | | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
| | 1 | 2 | 3 | | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | | 25 | 26 | 27 | 28 | 29 | 30 | 31 | | | | | | | | | | | | | Today |
Today's Page Hits: 30
|