Sam Falkner's WeblogSam Falkner's Weblog |
|
Friday Nov 14, 2008
nnodes
Traditionally, NFS has always been able to share an ordinary file system, and thus has always dealt with vnodes. With the advent of Parallel NFS (pNFS), which has stripes of data distributed across multiple servers, this is no longer the case. The initial implementation of pNFS uses the DMU API provided by ZFS to store its stripe data. The pNFS protocol also requires that a server community support proxy i/o; that is, a client must be able to perform all i/o against one server, and if the data requested is on another node, then the server must perform the i/o by proxy. Neither the DMU nor proxy i/o are accessed via vnodes. Another change brought by pNFS is the distributed nature of the server. What has always been confined to a single server is now distributed over multiple servers. This necessitates a control protocol, for communication among the server nodes, and implies that some server tasks may have longer latencies than before. In pathological cases, e.g. a server reboot of one particular server, the latencies may be very high. This will likely require new ways that the NFS server is implemented, and it will become advantageous for the NFS server code to use new APIs, APIs with different design goals from vnodes. Asynchronous methods become more desirable, to deal with the new latencies involved with processing client requests. Enter nnodes. nnodes can be thought of as vnodes, but customized for the needs of NFS (especially pNFS), and with three distinct ops-vector/private-data sections. The figure below shows an NFS server implementation interacting with an nnode that is backed by a vnode.
The fact that the nnode has the three distinct sections for data, metadata, and state, makes it easy to mix and match commonly needed implementations. Here are some examples:
But this is just the beginning. There will doubtless be more constructions in the future. nnodes also serve as a place to cache or store information relevant to a file-like object. For example, in the pNFS data server, we can cache stateids that are known to be valid. Thus, the data server will not need to contact the metadata server on every i/o operation. Today I have been writing a more verbose comment header for the header file "nnode.h". Look for it in our repository soon. Posted at 03:54PM Nov 14, 2008 by samf in NFS |
Wednesday May 23, 2007
nfs4trace: a New Direction
My previous DTrace provider for NFS has been floundering, due to a needed design change, and my priorities with pNFS. Fortunately, a new design is being worked by two engineers recently assigned to the project. [Read More]Posted at 02:00PM May 23, 2007 by samf in NFS |
Thursday Jun 08, 2006
nfs4trace release: all ops covered!
I have just pushed a new release of the DTrace provider for NFSv4. This is in fact the second release I've made -- the first release didn't get a blog entry. You can be sure to see all releases by watching the announcements section of the nfs4trace project. An RSS feed can be had here: http://www.opensolaris.org/rss/os/project/nfs4trace/announcements/rss2.xml This release is based on Nevada build 41. I mention this so that you will know what bugs and bug fixes are present in this release. The release is in the form of bfu archives, so follow the usual procedure to install them. New FeaturesThis release of nfs4trace provides a set of probes for every NFSv4 operation. There is an overarching probe for the compound op, called "op-compound". Each op within a compound has its own probe, e.g. "op-lookup". For every probe, args[0] is a pointer to a nfs4_dtrace_info_t. This has the following fields:
Besides the normal compound operations, there is a probe for all callback-related operations. The probe "cb-compound" is analagous to "op-compound", but covers the callback channel. Each operation within a callback, e.g. OP_CB_GETATTR, also has a probe -- for this example, "cb-getattr".
To see a complete list of all probes, run
What's Left?Eventually, there will be probes for every nontrivial attribute, which will fire regardless of which operation is using them. But not all of these attributes are accounted for yet. Stay tuned. What else is left? Your feedback will help decide this! Please try this, and let me know if you have any input. Thanks! Posted at 09:44AM Jun 08, 2006 by samf in NFS | Comments[1]
Monday Nov 21, 2005
IETF, ZFS and DTrace
What have I been up to?I recently traveled to beautiful Vancouver for the 64th IETF. There, Lisa and I attended many meetings, most notably the NFSv4 working group meeting, and presented our recent Internet Draft. The Internet Draft concerns NFSv4 ACLs. It attempts to clear up ambiguities in the NFSv4 spec, RFC3530. It also proposes a way for ACLs and UNIX-style modes to live together in harmony. Hopefully, it will end up as one or more RFCs, and NFSv4 clients and servers can have a truly useful ACL model. Meanwhile, as has been mentioned in many other places, ZFS has been released. Check out blog entries from Lisa and Mark on how ACLs work in ZFS. I also gave a presentation on DTrace at a joint meeting of the Front Range OpenSolaris User Group (FROSUG) and the Front Range Unix User Group (FRUUG). Why was I, a humble NFS engineer, giving a presentation on DTrace? Well, for one thing, I use DTrace quite a bit. But I'll be blogging more about DTrace and NFS a bit later. Technorati tags: IETF NFS OpenSolaris Solaris ZFS DTrace Posted at 05:00PM Nov 21, 2005 by samf in NFS | Comments[0]
Tuesday Jun 14, 2005
ACLs Everywhere
I was tempted to call this posting ACLs are my resume, but that's a bit extreme. Still, Access Control Lists (ACLs) seem to follow me around. Here are some of the places where I have worked on ACLs.
Solaris 10Lisa and I managed to integrate support for NFSv4 ACLs into Solaris 10. The effort to add ACL support began late in the Solaris 10 release cycle. Some of the problems we hit (outlined below) weren't even thought about when we began. We couldn't do much testing against other vendors until very late in the release cycle. There were a few show stopper bugs we had to fix before Solaris 10 could ship. This was one of the most intense but rewarding projects I've worked on! NFSv4 ACL support breaks down into two big pieces: support for the over-the-wire operations involving ACLs, and translation between the various ACL models (more on them below). The over-the-wire pieces of ACL handling are scattered throughout NFSv4. nfs4_vnops.c has the usual vnode ops. See nfs4_getsecattr() and nfs4_setsecattr() for the front end (as far as the file system is concerned) to ACLs. Other pieces are in nfs4_client.c and nfs4_xdr.c. The translators are contained in nfs4_acl.c. For this article, I will focus on the translators. TranslationSolaris has had support for ACLs for a long time. The ACL model supported before Solaris 10 is called POSIX-draft. This was supposed to become a POSIX standard, but the effort was abandoned. The latest draft is what was implemented for Solaris. For on-disk file systems, the Solaris UFS filesystem implements POSIX-draft ACLs. For versions two and three of the Network File System (NFS), an undocumented side-band protocol enables users to manipulate ACLs on the server. To the best of my knowledge, the only implementation of this protocol outside of Solaris is for Linux. NFSv4 introduces a powerful new ACL model. It's powerful enough that every POSIX-draft ACL can be translated into an NFSv4 ACL. But NFSv4 ACLs can go beyond POSIX-draft semantics; thus, not all NFSv4 ACLs can be translated into POSIX-draft ACLs. The presence of two ACL models makes it desirable to seamlessly translate between the two, so we implemented ACL translation in the kernel for NFSv4. Translation gives us many benefits:
At Connectathon 2005, we gave a presentation on implementing NFSv4 ACLs in Solaris 10. However, now that Solaris is open, I would like to talk about the translators at the code level. Translating POSIX-draft ACLs into NFSv4 ACLsIn the Solaris kernel, ACLs are passed around inside of vsecattr_t structures. The main entry point for translating POSIX-draft ACLs into NFSv4 ACLs is vs_aent_to_ace4(). Here is what the call stack looks like when a POSIX-draft ACL is translated into an NFSv4 ACL. vs_aent_to_ace4() ln_aent_to_ace4() (once for the regular ACL, once for the default ACL) ln_aent_preprocess() for every ACE: mode_to_ace4_access() access_mask_set() ace4_make_deny() access_mask_set()
Translating NFSv4 ACLs into POSIX-draft ACLsThe entry point for translating NFSv4 ACLs into POSIX-draft ACLs is vs_ace4_to_aent(). Here is what the call stack looks like when making this translation. vs_ace4_to_aent() ln_ace4_to_aent() ace4_list_init() ace4_to_aent_legal() ace4vals_find() ace4_list_to_aent() ace4_list_free()
To see when some things go wrong, you can turn on nfs4_acl_debug. The debugging code isn't very complete, and it might be replaced by static Dtrace probes some day. But for now, you can do this: # mdb -kw > nfs4_acl_debug/W 1 Future ACL support in SolarisAs you can see from reading the code, and perhaps from all of the debugging prints, there are lots of things that can go wrong when translating ACLs from NFSv4 into POSIX-draft format. Besides that, wouldn't it be nice to be able to set an arbitrary NFSv4 ACL, and use the new ACL model? Things are getting much better with the arrival of ZFS. The goal of ZFS's ACL implementation is to implement NFSv4 ACLs in a way that is compatible with Solaris. The ZFS ACL model is still in flux, but it is rapidly solidifying. We will be releasing an Internet Draft in the near future, in which we will propose a way for UNIX and UNIX-like systems to support NFSv4 ACLs. If all goes well, this will be the ACL model used by ZFS. Posted at 09:20AM Jun 14, 2005 by samf in NFS | Comments[0] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||