Sam Falkner's WeblogSam Falkner's Weblog |
|
Friday Dec 23, 2005
A DTrace Provider for NFS
As a kernel developer working on NFSv4, I love DTrace. The fbt and syscall providers are my constant companions. But last June, at the NFSv4 Bakeathon, I had a more difficult problem -- an infinite loop involving volatile filehandles. The problem was reproducible (good), but it took a couple of minutes running a test suite to get to that point (bad). Thousands of over-the-wire operations occurred before the more interesting things began to happen. Using snoop, which "knows" the NFSv4 protocol, I knew what Solaris NFSv4 was doing, but I didn't know why it was doing it. DTrace could tell me the "why" part, but I couldn't think of a script that would get me just the right information, without drowning me in uninteresting information. Sleep deprived, I began whining about how DTrace should have a new provider; one than "knows" the NFSv4 protocol, and lets you write protocol related events into your scripts. A provider that lets you do things like grab filehandles, and stuff them into variables, to be used later in conditionals. Upon getting home, I began writing such a provider. In fact, I wrote one that could solve the problem discussed above. As it grew, there were some parts of the code that were getting messy, so now I'm in the midst of refactoring the provider. A New ProviderHere is how the NFSv4 provider is starting to look. As usual for DTrace probes, we have: provider:module:function:name Provider is either "nfs" or "nfs-server". Plain old "nfs" is the NFS client -- the thing that lets you mount remote file systems. Yes, this means that technically there are really two providers, but they will be nearly the same structure. Learning one provider will be sufficient to learn both. Module is either "nfs" or "nfssrv", again for client and server. I would recommend just leaving this part blank, since it doesn't buy you any more than the provider slot. If you do not leave it blank, make sure that provider and module match. If they don't match, you won't get any probes. For example, "nfs:nfssrv:<something>:<something>" will not match any probes. Function is an operation or an attribute. More on that below. :-) Name is either "start" or "done". For a probe on an client operation, "start" would fire when the client sent the operation over-the-wire to the server, and "done" would fire when it received its response from the server. For the server, "start" would fire when the server received a request, and "done" would fire when the server answered the request. Callback functions, where the server makes a request of the client, are just what you would expect -- client "start" fires when the client first receives a request from the server, etc. The ProbesAs mentioned above, the function slot is an operation or an attribute. Operations are of the form "op-<operation-name>", where <operation-name> is an operation defined in the NFSv4 protocol. Examples: nfs::op-read:start /* client did a read operation */ nfs-server::op-read:start /* server got a read request */ nfs::op-compound:done /* client's compound op finished */ For attribute probes, we use "attr-<attribute-name>", where <attribute-name> is a type defined in the NFSv4 protocol. It can be either an argument to an operation (examples:) nfs::attr-seqid4:start /* client sent a sequence ID with some op*/ nfs::attr-filehandle:done /* client received a filehandle */ or an attribute of a file. Examples: nfs::attr-owner:start /* client is sending an owner file attribute */ nfs::attr-filehandle:done /* client received a filehandle */ Above, notice that attr-filehandle makes sense as an argument to an operation, or an attribute of a file. The nice thing about the attr-* probes is that you can trace these attributes going over the wire, without caring about what operation is sending them, or how it's being sent. Arguments to the Probes: args[0] and args[1]For all probes, there are two arguments. args[0] is the same for all probes: it is a structure that holds things such as the tag and transaction id (xid) of the operation. It will likely hold the network address of the machine that it's talking with; more things may be added in the future.
The second argument, args[1], is different for every probe. For op-* probes, it's the structure that is being sent or received. For attr-* probes, it's just the attribute itself. Examples:
Any working examples yet?The provider is still being implemented, and it doesn't yet handle very many of the details of the NFSv4 protocol. But the framework is there, and it does handle "compound", which is the fundamental over-the-wire operation in NFSv4. As its name implies, a compound operation can do more than one thing. For example, it can create a new file and get the new file's attributes, all in one over-the-wire operation. As mentioned before, compounds have a field called a "tag", which is a descriptive string attached that very briefly describes the purpose of the request. The tag for our hypothetical create-file-and-get-its-attributes operation might simply be "create". Here is a simple D script that uses the op-compound probe to show which over-the-wire requests the client spends the most time waiting upon. The requests are collated by the tag. You could use such a script to help get an idea of what compounds are taking the most time, and maybe start thinking about where to do further performance analysis. Following the example script is example output. While running this example script, I used "tar" to copy a bunch of files within an NFS mounted directory, and then used "rm -rf" to delete them. Don't treat this as a serious benchmark or anything -- it's just to give an idea of what's easily possible with the new provider.
Results:total time spent in calls renew 276 commit 11054 link 21647 rmdir 367802 mkdir 422512 access 599114 readdir 753105 lookup valid 822106 lookup 862495 read 1026893 close 1095138 getattr 1841042 write 6674532 remove 6802618 setattr 7101811 open 7539189 Comments:
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||