Connectathon 2009 wrapped up a few weeks ago. The event was held at the TechMart in Santa Clara, CA. My group was there testing NFSv3, NFSv4.0 and NFSv4.1/pNFS. In addition to the testing that went on, Connectathon included an interesting set of talks.
All of the presentations given were related to the topic of NFS and many of them were on NFSv4.1. The presentations took a different approach this year, compared to years past. The agenda contained a lot of WiPs or Work in Progress topics. The purpose of this was to create an environment where rather than just spewing information at the audience, it would engage them in a conversation about interesting problems encountered during development. The talks are open to the public each year and the presentations are also posted afterward, but here is a short summary of what was presented.
Tigran Mkrtchyan (a dCache engineer from DESY) presented the work that he is doing around creating pNFS access methods for dCache.
Karen Rochford (Sun) presented the OpenSolaris Delivery of pNFS. This covered many of the aspects of the NFSv4.1/pNFS project in OpenSolaris including how to download and install the OpenSolaris implementation.
Bill Baker and Rich Brown (Sun) presented the Solaris Client WiP. They discussed the OpenSolaris method for generating layout hints (i.e. the Simple Policy Engine (SPE)), different strategies for when to issue GETDEVICEINFO/LAYOUTGET and methods for observing pNFS activity.
The SPE allows an administrator on the client to identify policies that express a set of attributes (i.e. path, uid, gid, file extension (e.g, .jpeg, .mpeg), time, date) that are used to form a layout_hint to send to the pNFS server at file creation time. The beauty of the SPE is that administrators on the client are able to control the layout of the file without modification of the applications. The downfall is that the pNFS server is not required to honor the layout_hint.
The set of tools for observing pNFS activity on OpenSolaris include NFS DTrace Providers, nfsstat -l to view layouts on the client, nfsstat -c -v 41 to view the NFSv4.1 operations executed by a client and snoop/wireshark.
Rob Thurlow (Sun) gave a Federated File System (FedFS) update. FedFS will be a work item on the new NFSv4 Working Group (WG) charter. FedFS controls how servers use referrals to stitch together a multi-vendor, multi-server namespace. Rob discussed some of the misconceptions of pNFS vs. FedFS, how FedFS works and the current problems being solved.
Mike Eisler (NetApp) presented an update on NFSv4.1. With the NFSv4.1 specification in the RFC editor's queue, the NFSv4 WG is currently going through the process to update its charter. New, approved work items include FedFS and currently proposed work items include MAC labeling, Metadata striping and de-duplication. Mike also touched on a topic that I fully agree with: If you only think of pNFS as a way to achieve high-performance you are missing the point. The data, metadata separation that pNFS allows for new data management possibilities. The positioning of pNFS in the industry should not neglect the management aspects.
Brian Pawlowski (NetApp) gave a presentation titled "NFS and Its Use". The presentation discussed business trends such as Utility Computing, Cloud Computing and Storage, or better yet, Applications as a Service. He then asked us to ponder whether or not NFS is changing with the world.
Ricardo Labiaga (NetApp) presented a WiP about the Linux NFSv4.1 back channel. The plans for the architecture and implementation of the NFSv4.1 back channel were discussed.
Mahesh Siddheshwar (Sun) talked about the OpenSolaris NFS/RDMA support. OpenSolaris has had NFS over RDMA support that is in compliance with the mentioned IETF drafts since Solaris Nevada build 98. It is shipping as a part of the OpenSolaris 2008.11 supported release. Mahesh gave an overview of technology, provided some preliminary performance information and covered the status of Linux client and OpenSolaris server interoperability. An interesting side discussion occurred during this presentation where Tom Talpey stated that the NFS direct document for NFSv4.1 needs to be written. Documents exist for NFSv2, NFSv3 and NFSv4.0, but the new document needs to cover how to do the new NFSv4.1 ops (e.g. LAYOUTGET, etc.). This is expected to be a short, but none the less, important document.
Piyush Shivam, Jeff Smith and I (Sun) presented three different pNFS server related topics in the OpenSolaris pNFS Server WiP. Piyush presented on the state division that pNFS introduces. Jeff presented on trade-offs for when to store the layouts persistently. I presented on how to the management of the OpenSolaris pNFS implementation will look. This presentation generated a good amount of discussion. One of the side discussions that occurred was about the OpenSolaris Control Protocol. The Linux community is also looking at the requirements of the MDS to DS communication and this is another aspect of pNFS that we could collaborate on.
Benny Halevy (Panasas) gave two talks. First, he was the moderator for a discussion on the Linux VFS API for the Server. Second, he presented the dirty page state model Linux pNFS. The first presentation brought up the point, again, of standardization of the control protocol and whether or not this should be an NFSv4 WG item. The second presentation covered the page flushing in NFS and the differences between regular NFS and pNFS. The presentation gives a great picture of the state model for the dirty page syncing with pNFS. It was an interesting talk that discussed a lot of the corner cases of page syncing and Layouts. For example, what happens if your page spans multiple layouts. What happens if you have a Layout when you write, but don't have one when you commit? I also learned that the NetApp server always returns with STABLE writes because of its use of NVRAM in the filer.
Pranoop Erasani (NetApp) presented on the Clustered ONTAP pNFS Server. This gave a detailed overview of the next version of NetApp's next generation ONTAP server.
Sam Falkner (Sun) presented on nnodes, an abstraction layer for NAS. Nnodes are a major part of the OpenSolaris NFS server architecture. The need for something like nnodes was apparent when the NFSv4.1 server entities (MDS and DS) in the OpenSolaris server were similar in many areas (e.g. sessions implementation), but very different in others (e.g. method for accessing data). Nnodes allows for the right flexibility in data and metadata access methods.
Bruce Fields (CITI) presented an overview of the CITI Linux Projects currently on their plate.
GPFS2 pNFS:
CITI was working on GFS2 backed pNFS implementation, but it is on the back burner right now. They are assuming that the infrastructure built for GFS2 will work for gpfs. I don't recall if this was a comment that Bruce made or a comment from someone in the audience, but it is still interesting: Findings have been that while cluster file system may make NFS work well, it makes pNFS hard. No reasons were given, but this is an interesting statement.
Blocks-based pNFS:
CITI is working on a block client for testing against the EMC server. Additionally, pyNFS server was done to simulate a block server to fill the gap when they didn't have the a server to test against, but now LSI and EMC have block servers. Therefore, pyNFS is not actively developed/used now.
NFSv4.1 SSV:
CITI has a prototype of SSV and would like someone to test against.
Directory Delegations:
Since no one in the community has a implementation of directory delegations, their implementation will stay on the shelf for a while longer.
File Delegations Problem:
They are fixing problem in that delegations are not recalled when files are REMOVE, RENAME, CHOWN, etc. on the local file system.
Client-side Referrals:
Trond has been doing work on this and it works if we have IPv4 addresses passed to the referral.