I've recently added some new features to the DTrace
inline feature, so it seems like a good time to go back and review some of the more advanced features of
DTrace's D language, and how these features are used to make observing the system easier for DTrace users.
This entry is a bit long, but if you hang in there you'll be rewarded with a peek at a new DTrace feature
that is headed for Solaris Express.
And while this was indeed hot (and still is), you can immediately see how the relationship between the question ("What are the stack traces of all bcopy() calls performed on behalf of user Bryan of length greater than 1000 bytes?") and its realization in D requires knowledge of the Solaris kernel implementation (i.e. that a kthread_t has a proc_t pointer, which has a cred_t pointer, which contains the UID of the user associated with that process). So while great for us kernel programmers, this immediately presented two challenges for us to grapple with in making DTrace more accessible:
To address these two issues in DTrace, we created the notion of a translator. A translator is a collection of D assignment statements provided by the supplier of an interface that can be used to translate an input expression into an object of struct type. Like any D statements, the body of the translator can refer directly to kernel types and kernel global data structures, as well as other DTrace variables. If you're familiar with object-oriented programming, you can imagine a translator sort of like a class that implements a bunch of "get" methods (of course, we don't have functions in D since we can't allow recursion). Translator definitions correspond to the implementation of some piece of software, like a part of the kernel, but they yield a struct that is in effect a stable interface to that software.
For example, DTrace provides a translator from a kernel thread pointer such as the built-in curthread variable to the /proc lwpsinfo structure. This structure is well-defined and documented in the proc(4) and is what you get if you read the file /proc/pid/lwp/lwpid/lwpsinfo on your Solaris system. Here is an excerpt of the translator definition, which is delivered to you in the file /usr/lib/dtrace/procfs.d:
As you can see, each statement in the translator body is in effect an expression that can be inlined by the D compiler to produce the value of that member when it is referenced. For example, the pr_clname field represents the idea that every Solaris LWP has an associated scheduling class with a well-defined name (e.g. TS=timeshare, RT=real-time) that you can pass to commands like priocntl(1) using the -c option. To retrieve the string, you take the class ID, an integer index into the kernel's sclass array, and then grab the name from the contents of that array. The translator isolates DTrace programs that you write from that implementation detail, so that if we were to say, delete sclass and replace it with a hash table, you could still reliably use pr_clname in DTrace on various versions of Solaris and get the same result.
Here we translate curthread to retrive pr_clname and record the scheduling class of every thread that blocks on an i/o. The results of running this for a few seconds on my desktop look like this:
While using the xlate operator directly is fun (for me, anyway), DTrace also provides an inline facility that makes D programs that use translators easier to read and write. An inline is the declaration of a typed identifier that is replaced by the compiler with the result of an expression whenever that identifier is referenced somewhere else in the program. This is more powerful than simple lexical substitution like the sort provided by C's #define, as we'll see in a moment. Here are some example inline declarations:
Once declared, inlines can be used anywhere as if they were variables provided for you by DTrace. We can also use inlines to substitute translator expressions, which allows us to connect together all of the ideas discussed so far. For example, DTrace provides a built-in curlwpsinfo variable to let you access all of the process model information for the current LWP. This variable is not a variable at all, but instead the following inline provided for you by /usr/lib/dtrace/procfs.d:
So using the inlines and translators provided for you by DTrace, you can rewrite the previous example like this, using only the stable interfaces defined in proc(4):
Together, inlines and translators let us provide stable representations of Solaris kernel interfaces in a form that resembles a Solaris administrative or user-programming concept that is already well-understood, while allowing us to continue to evolve the Solaris implementation underneath.
Given this definition, a reference to the expression a[1, 2] would be as if you typed 3 in your program. Using this new facility, I've added an fds[] array to DTrace that returns information about the file descriptors associated with the process corresponding to the current thread. The array's base type is the fileinfo_t structure already used by DTrace's I/O provider, with a new member for the open(2) flags. Here's an example of fds[] in action:
If I run this command on my desktop and start typing commands in another shell, I see output like this:
That is, given a file descriptor specified as an argument to write(2), I can match writes by ksh where the file descriptor was opened O_APPEND and then print the pathname of the file to which the data is being appended.
All of the implementation for fds[] is provided by a translator and an inline (i.e. zero new kernel support required). The translator converts a kernel file structure to a DTrace fileinfo_t, and then the inline declaration to define fds[] looks like this:
I'll discuss how inlines can affect how we programmatically compute the stability of your DTrace programs in a future blog.
$q
Translators
Early in DTrace's development, once Bryan and I had assembled the nascent DTrace prototype far enough to be able to locate probes, trace data, and execute simple D expressions, it was obvious that even this early stage of DTrace was an incredibly powerful kernel debugging tool. For fun and posterity, here is one of the earliest known actual D programs at work once the compiler, tracing framework, and access to kernel types had been connected together:
From: Bryan Cantrill <bmc@eng.sun.com>
Subject: Leaving now...
To: mws@eng.sun.com (Michael Shapiro)
Date: Tue, 12 Feb 2002 17:47:36 -0800 (PST)
But this is pretty hot:
# dtrace -f 'bcopy/(arg2 > 1000) &&
(curthread->t_procp->p_cred->cr_uid == 31992)/{stack(20)}'
dtrace: 2 probes enabled.
CPU ID FUNCTION:NAME
1 8576 bcopy:entry
genunix`getproc+0x4f8
genunix`cfork+0x64
...
And while this was indeed hot (and still is), you can immediately see how the relationship between the question ("What are the stack traces of all bcopy() calls performed on behalf of user Bryan of length greater than 1000 bytes?") and its realization in D requires knowledge of the Solaris kernel implementation (i.e. that a kthread_t has a proc_t pointer, which has a cred_t pointer, which contains the UID of the user associated with that process). So while great for us kernel programmers, this immediately presented two challenges for us to grapple with in making DTrace more accessible:
- How can we allow administrators and developers to express concepts that they readily understand,
like the idea that a process has a particular UID associated with it, without requiring them to
understand how those concepts are implemented?
- How can we allow DTrace users to write programs that continue to work as the implementation of
these concepts changes over time inside of Solaris?
To address these two issues in DTrace, we created the notion of a translator. A translator is a collection of D assignment statements provided by the supplier of an interface that can be used to translate an input expression into an object of struct type. Like any D statements, the body of the translator can refer directly to kernel types and kernel global data structures, as well as other DTrace variables. If you're familiar with object-oriented programming, you can imagine a translator sort of like a class that implements a bunch of "get" methods (of course, we don't have functions in D since we can't allow recursion). Translator definitions correspond to the implementation of some piece of software, like a part of the kernel, but they yield a struct that is in effect a stable interface to that software.
For example, DTrace provides a translator from a kernel thread pointer such as the built-in curthread variable to the /proc lwpsinfo structure. This structure is well-defined and documented in the proc(4) and is what you get if you read the file /proc/pid/lwp/lwpid/lwpsinfo on your Solaris system. Here is an excerpt of the translator definition, which is delivered to you in the file /usr/lib/dtrace/procfs.d:
translator lwpsinfo_t < kthread_t *T > {
...
pr_syscall = T->t_sysnum;
pr_pri = T->t_pri;
pr_clname = `sclass[T->t_cid].cl_name;
...
};
As you can see, each statement in the translator body is in effect an expression that can be inlined by the D compiler to produce the value of that member when it is referenced. For example, the pr_clname field represents the idea that every Solaris LWP has an associated scheduling class with a well-defined name (e.g. TS=timeshare, RT=real-time) that you can pass to commands like priocntl(1) using the -c option. To retrieve the string, you take the class ID, an integer index into the kernel's sclass array, and then grab the name from the contents of that array. The translator isolates DTrace programs that you write from that implementation detail, so that if we were to say, delete sclass and replace it with a hash table, you could still reliably use pr_clname in DTrace on various versions of Solaris and get the same result.
Inlines
To use a translator in D, you apply the xlate operator to an input expression and specify an output type of either the desired structure or a pointer to it, as shown in the following example:
io:::wait-start
{
printf("%s tid %d waiting for i/o, class=%s\n",
execname, tid, xlate<lwpsinfo_t>(curthread).pr_clname);
}
Here we translate curthread to retrive pr_clname and record the scheduling class of every thread that blocks on an i/o. The results of running this for a few seconds on my desktop look like this:
dtrace: script '/dev/stdin' matched 1 probe CPU ID FUNCTION:NAME 1 2053 biowait:wait-start sched tid 0 waiting for i/o, class=SYS 1 2053 biowait:wait-start cat tid 1 waiting for i/o, class=TS ^C
While using the xlate operator directly is fun (for me, anyway), DTrace also provides an inline facility that makes D programs that use translators easier to read and write. An inline is the declaration of a typed identifier that is replaced by the compiler with the result of an expression whenever that identifier is referenced somewhere else in the program. This is more powerful than simple lexical substitution like the sort provided by C's #define, as we'll see in a moment. Here are some example inline declarations:
inline int c = 123; inline uid_t uid = curthread->t_procp->p_cred->cr_uid;
Once declared, inlines can be used anywhere as if they were variables provided for you by DTrace. We can also use inlines to substitute translator expressions, which allows us to connect together all of the ideas discussed so far. For example, DTrace provides a built-in curlwpsinfo variable to let you access all of the process model information for the current LWP. This variable is not a variable at all, but instead the following inline provided for you by /usr/lib/dtrace/procfs.d:
inline lwpsinfo_t *curlwpsinfo = xlate <lwpsinfo_t *> (curthread);
So using the inlines and translators provided for you by DTrace, you can rewrite the previous example like this, using only the stable interfaces defined in proc(4):
io:::wait-start
{
printf("%s tid %d waiting for i/o, class=%s\n",
execname, tid, curlwpsinfo->pr_clname);
}
Together, inlines and translators let us provide stable representations of Solaris kernel interfaces in a form that resembles a Solaris administrative or user-programming concept that is already well-understood, while allowing us to continue to evolve the Solaris implementation underneath.
Observing File Descriptors
I recently added an extension to the inline facility in DTrace to permit inlines to define identifiers that act like D associative arrays, instead of scalar variables similar to the examples in the previous section. Everything from this point forward will be available in Build 16 of Nevada (aka the next Solaris release) which you will be able to download here at some point in the future. We'll likely backport this feature to a Solaris 10 Update later this year as well. To create an inline that acts like an associative array using the new DTrace feature, you can use a declaration like this:inline int a[int x, int y] = x + y;
Given this definition, a reference to the expression a[1, 2] would be as if you typed 3 in your program. Using this new facility, I've added an fds[] array to DTrace that returns information about the file descriptors associated with the process corresponding to the current thread. The array's base type is the fileinfo_t structure already used by DTrace's I/O provider, with a new member for the open(2) flags. Here's an example of fds[] in action:
$ dtrace -q -s /dev/stdin
syscall::write:entry
/ execname == "ksh" && fds[arg0].fi_oflags & O_APPEND /
{
printf("ksh %d appending to %s\n", pid, fds[arg0].fi_pathname);
}
^D
If I run this command on my desktop and start typing commands in another shell, I see output like this:
ksh 127453 appending to /home/mws/.sh_history ksh 127453 appending to /home/mws/.sh_history ...
That is, given a file descriptor specified as an argument to write(2), I can match writes by ksh where the file descriptor was opened O_APPEND and then print the pathname of the file to which the data is being appended.
All of the implementation for fds[] is provided by a translator and an inline (i.e. zero new kernel support required). The translator converts a kernel file structure to a DTrace fileinfo_t, and then the inline declaration to define fds[] looks like this:
inline fileinfo_t curfds[int fd] = xlate <fileinfo_t> (
fd >= 0 && fd < curthread->t_procp->p_user.u_finfo.fi_nfiles ?
curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL);
I'll discuss how inlines can affect how we programmatically compute the stability of your DTrace programs in a future blog.
$q