Lisa Week's Weblog

     
 
Connectathon 2009

Connectathon 2009 wrapped up a few weeks ago. The event was held at the TechMart in Santa Clara, CA. My group was there testing NFSv3, NFSv4.0 and NFSv4.1/pNFS. In addition to the testing that went on, Connectathon included an interesting set of talks.

All of the presentations given were related to the topic of NFS and many of them were on NFSv4.1. The presentations took a different approach this year, compared to years past. The agenda contained a lot of WiPs or Work in Progress topics. The purpose of this was to create an environment where rather than just spewing information at the audience, it would engage them in a conversation about interesting problems encountered during development. The talks are open to the public each year and the presentations are also posted afterward, but here is a short summary of what was presented.

Tigran Mkrtchyan (a dCache engineer from DESY) presented the work that he is doing around creating pNFS access methods for dCache.

Karen Rochford (Sun) presented the OpenSolaris Delivery of pNFS. This covered many of the aspects of the NFSv4.1/pNFS project in OpenSolaris including how to download and install the OpenSolaris implementation.

Bill Baker and Rich Brown (Sun) presented the Solaris Client WiP. They discussed the OpenSolaris method for generating layout hints (i.e. the Simple Policy Engine (SPE)), different strategies for when to issue GETDEVICEINFO/LAYOUTGET and methods for observing pNFS activity.

The SPE allows an administrator on the client to identify policies that express a set of attributes (i.e. path, uid, gid, file extension (e.g, .jpeg, .mpeg), time, date) that are used to form a layout_hint to send to the pNFS server at file creation time. The beauty of the SPE is that administrators on the client are able to control the layout of the file without modification of the applications. The downfall is that the pNFS server is not required to honor the layout_hint.

The set of tools for observing pNFS activity on OpenSolaris include NFS DTrace Providers, nfsstat -l to view layouts on the client, nfsstat -c -v 41 to view the NFSv4.1 operations executed by a client and snoop/wireshark.

Rob Thurlow (Sun) gave a Federated File System (FedFS) update. FedFS will be a work item on the new NFSv4 Working Group (WG) charter. FedFS controls how servers use referrals to stitch together a multi-vendor, multi-server namespace. Rob discussed some of the misconceptions of pNFS vs. FedFS, how FedFS works and the current problems being solved.

Mike Eisler (NetApp) presented an update on NFSv4.1. With the NFSv4.1 specification in the RFC editor's queue, the NFSv4 WG is currently going through the process to update its charter. New, approved work items include FedFS and currently proposed work items include MAC labeling, Metadata striping and de-duplication. Mike also touched on a topic that I fully agree with: If you only think of pNFS as a way to achieve high-performance you are missing the point. The data, metadata separation that pNFS allows for new data management possibilities. The positioning of pNFS in the industry should not neglect the management aspects.

Brian Pawlowski (NetApp) gave a presentation titled "NFS and Its Use". The presentation discussed business trends such as Utility Computing, Cloud Computing and Storage, or better yet, Applications as a Service. He then asked us to ponder whether or not NFS is changing with the world.

Ricardo Labiaga (NetApp) presented a WiP about the Linux NFSv4.1 back channel. The plans for the architecture and implementation of the NFSv4.1 back channel were discussed.

Mahesh Siddheshwar (Sun) talked about the OpenSolaris NFS/RDMA support. OpenSolaris has had NFS over RDMA support that is in compliance with the mentioned IETF drafts since Solaris Nevada build 98. It is shipping as a part of the OpenSolaris 2008.11 supported release. Mahesh gave an overview of technology, provided some preliminary performance information and covered the status of Linux client and OpenSolaris server interoperability. An interesting side discussion occurred during this presentation where Tom Talpey stated that the NFS direct document for NFSv4.1 needs to be written. Documents exist for NFSv2, NFSv3 and NFSv4.0, but the new document needs to cover how to do the new NFSv4.1 ops (e.g. LAYOUTGET, etc.). This is expected to be a short, but none the less, important document.

Piyush Shivam, Jeff Smith and I (Sun) presented three different pNFS server related topics in the OpenSolaris pNFS Server WiP. Piyush presented on the state division that pNFS introduces. Jeff presented on trade-offs for when to store the layouts persistently. I presented on how to the management of the OpenSolaris pNFS implementation will look. This presentation generated a good amount of discussion. One of the side discussions that occurred was about the OpenSolaris Control Protocol. The Linux community is also looking at the requirements of the MDS to DS communication and this is another aspect of pNFS that we could collaborate on.

Benny Halevy (Panasas) gave two talks. First, he was the moderator for a discussion on the Linux VFS API for the Server. Second, he presented the dirty page state model Linux pNFS. The first presentation brought up the point, again, of standardization of the control protocol and whether or not this should be an NFSv4 WG item. The second presentation covered the page flushing in NFS and the differences between regular NFS and pNFS. The presentation gives a great picture of the state model for the dirty page syncing with pNFS. It was an interesting talk that discussed a lot of the corner cases of page syncing and Layouts. For example, what happens if your page spans multiple layouts. What happens if you have a Layout when you write, but don't have one when you commit? I also learned that the NetApp server always returns with STABLE writes because of its use of NVRAM in the filer.

Pranoop Erasani (NetApp) presented on the Clustered ONTAP pNFS Server. This gave a detailed overview of the next version of NetApp's next generation ONTAP server.

Sam Falkner (Sun) presented on nnodes, an abstraction layer for NAS. Nnodes are a major part of the OpenSolaris NFS server architecture. The need for something like nnodes was apparent when the NFSv4.1 server entities (MDS and DS) in the OpenSolaris server were similar in many areas (e.g. sessions implementation), but very different in others (e.g. method for accessing data). Nnodes allows for the right flexibility in data and metadata access methods.

Bruce Fields (CITI) presented an overview of the CITI Linux Projects currently on their plate.
GPFS2 pNFS: CITI was working on GFS2 backed pNFS implementation, but it is on the back burner right now. They are assuming that the infrastructure built for GFS2 will work for gpfs. I don't recall if this was a comment that Bruce made or a comment from someone in the audience, but it is still interesting: Findings have been that while cluster file system may make NFS work well, it makes pNFS hard. No reasons were given, but this is an interesting statement.
Blocks-based pNFS: CITI is working on a block client for testing against the EMC server. Additionally, pyNFS server was done to simulate a block server to fill the gap when they didn't have the a server to test against, but now LSI and EMC have block servers. Therefore, pyNFS is not actively developed/used now.
NFSv4.1 SSV: CITI has a prototype of SSV and would like someone to test against.
Directory Delegations: Since no one in the community has a implementation of directory delegations, their implementation will stay on the shelf for a while longer.
File Delegations Problem: They are fixing problem in that delegations are not recalled when files are REMOVE, RENAME, CHOWN, etc. on the local file system.
Client-side Referrals: Trond has been doing work on this and it works if we have IPv4 addresses passed to the referral.

@ 08:30 AM PDT [ Comments [0] ]
 
 
 
 
pNFS Server Control Protocol Documentation
The NFSv4.1/pNFS server control protocol describes the communication between the MDS and DS components of the pNFS server. The following picture (taken from the NFSv4.1 specification) illustrates where the control protocol fits in:
    +-----------+
    |+-----------+                                 +-----------+
    ||+-----------+                                |           |
    |||           |        NFSv4.1 + pNFS          |           |
    +||  Clients  |<------------------------------>|   Server  |
     +|           |                                |           |
      +-----------+                                |           |
           |||                                     +-----------+
           |||                                           |
           |||                                           |
           ||| Storage        +-----------+              |
           ||| Protocol       |+-----------+             |
           ||+----------------||+-----------+  Control   |
           |+-----------------|||           |    Protocol|
           +------------------+||  Storage  |------------+
                               +|  Devices  |
                                +-----------+

The control protocol is not standardized as a part of the NFSv4.1 specification, therefore, it is left up to each implementer to specify the control protocol for their implementation.

The OpenSolaris implementation's control protocol documentation has been available on the NFSv4.1/pNFS OpenSolaris project for a while, but I have recently migrated the documentation to our wiki in order to allow for more collaboration.

I am currently working on the implementation of the control protocol along with some others in our team. Specifically, I am working on DS_REMOVE. This is the capability to remove a file from the namespace and from storage. The implementation will come in a couple of phases. The first phase of the DS_REMOVE is fairly simple. It just instruments mds_op_remove() function (in nfs41_srv.c) in order to send a DS_REMOVE message to the data servers contained in the layout. This will prompt the data servers to remove the objects identified by the arguments DS_REMOVE. The first phase will not address any failure to contact the data servers contained in the layout. It will also not address any failure of the MDS. Future phases will deal appropriate state invalidation upon remove, truncation of files (e.g. SETATTR of size 0) and failures of the MDS/DS.

@ 08:50 PM PST [ Comments [2] ]
 
 
 
 
DTrace Providers for NFS

NFSv3 and NFSv4 Server Providers Delivered

In the past couple of months, when I have not been heads-down on pNFS, I have helped Sam and Adam with the DTrace Providers for NFS. Our focus for this round of work was on the server side providers and we are happy to report that the NFSv4 server provider was putback to Solaris Nevada build 80 and the NFSv3 server provider was putback to Solaris Nevada build 84.

Follow-on work will include providers for the client side (NFSv3 and NFSv4) as well as both the NFSv4.1 client and server. Currently, we don't have any dates for when you can expect the follow-on work to start up, but keep checking the project page for updates.

What does a DTrace Provider for NFS do for you?

Having DTrace providers for NFS allows us to use DTrace to trace the NFS activity on a system. It makes probes available at the start and finish of every NFS over-the-wire operation. This gives users a very powerful method of collecting information about NFS usage (e.g. how many and which operations were executed, how long did the operation take, etc.) And, for someone like me, a NFS developer, it provides a great debugging tool. Instead of having to use snoop to capture NFS activity and separately using DTrace probes (usually sdt or fbt) to see what is going on in the kernel, we can use DTrace from end to end.

Examples and documentation

For some examples on how to use the providers check out the NFSv3 and NFSv4 provider documentation on the DTrace wiki.

@ 02:41 PM PST [ Comments [1] ]
 
 
 
 
pNFS source, BFU archives and totally random facts

pNFS source and BFU archives you say...

The first part of the title for this blog entry mentions pNFS source and BFU archives... Well, the reason that is mentioned is that we have just posted the latest source and BFU archives for the OpenSolaris NFSv4.1/pNFS client and server. As you probably know, we have not integrated into Nevada yet, heck we haven't even gone through PSARC Inception (PSARC is the Platform Software Architecure Review Committee and Inception is one of the first steps in the architecture review process) yet, but we hope this early release gives you the opportunity to play around with pNFS a bit.

Go ahead. Download the source and archives then tell us what you think. I, personally, as well as many other members of our team, put a lot of effort into getting everything ready for this posting as well as trying to produce useful and accurate release notes and documentation. So, we'd like to hear your feedback, especially if you run into problems. Actually, if you don't run into problems we want to know about that too! Success stories are cool. We are always available at nfsv41-discuss AT opensolaris DOT org.

Testing this implementation at the NFSv4.1 Bakeathon

Just in case you are interested, this implementation was the one that we tested while at the NFSv4.1 Bakeathon in Austin, TX earlier this month. The NFSv4.1 Bakeathon is an interoperability testing event where all of the people that are active in implementing to the NFSv4.1 specification bring their stuff to test. This event was like past NFSv4 Bakeathons and Connectathons, but differed in the way that it was held in order to focus on NFSv4.1 testing and ironing out issues with the NFSv4.1 specification, which is currently in draft form. We tested the OpenSolaris client and server against all other NFSv4.1 implementations in attendance.

I have been attending the Bakeathons and Connectathons since about 2004 and unfortunately, due to unforseen circumstances, I wasn't able to attend this last one. This was a bummer because these are cool events to attend and take part in. I, personally, always gain a deeper understanding about the product and protocol that I am implementing because it is just a bunch of engineers sitting in a room together hacking on code and discussing their interpretation of the specification in order to get the first glimmer of interoperability. I look forward to the next event.

And, finally some completely useless and random facts...

489 - Number of pages in the latest draft of the NFSv4.1 specification

Rich Lowe (richlowe) - First person that downloaded and installed our archives after being announced on 6/25/2007. Rich sent me mail at 12:20am MT on 6/26/2007. 21 minutes after announcing the release on nfs-discuss AT opensolaris DOT org. That seems pretty hard to beat...but let me know if you did.

128 - Number of new or modified files in the pNFS source (as of the June 25, 2007 release)

3 - Number of female engineers working on pNFS

14 - Number of days until our PSARC Inception Review (it is on July 11, 2007).

6 - Number of days until our PSARC Inception Review materials are due... I better get going. :)

@ 08:49 AM PDT [ Comments [0] ]
 
 
 
 
pNFS Screencast Available
It has been a while since my last blog entry so I thought I'd give you a little update on what I have been up to!

I have been spending my days working on pNFS (Parallel NFS). pNFS is a distributed, parallel file system which provides a highly scalable solution to data access and management. It does this by allowing for parallel data transfers across many NFSv4.1 file servers and by providing for a single, unified namespace for all objects in the pNFS file system.

The pNFS protocol is being standardized in the NFSv4 Working Group of the IETF (Internet Engineering Task Force) as a part of the NFSv4.1 specification effort.

With that, I'd like to invite you to check out a demo of the pNFS technology. And, while you are there take a look at what is going on with the NFS version 4.1 pNFS project.
@ 08:02 PM PDT [ Comments [0] ]
 
 
 
 
NFSv4 and ZFS ACLs

Who I am:

Well, since this is my first blog entry, I suppose I should introduce myself...
I am Lisa Week.  I've been working at Sun Microsystems for almost 4.5 years.  I graduated with a Bachelor of Science Degree in Computer Engineering from the South Dakota School of Mines and Technology (SDSM&T).  I started working for Sun as an intern and was offered a full-time position after I graduated.  When I came to Sun as a full-time employee I was in the Data Resource Management Group.  We developed and maintained GUIs and APIs for things like file system mounts and shares and the Solaris Volume Manager (actually, I didn't work on this much).  I then went on to develop a CIM/WBEM provider for NFS along with Evan.  This is what got me interested in and introduced me to NFS.  I am now having fun and being challenged by working on NFSv4.  As of late, I have been focusing most of my time on NFSv4 ACLs, therefore, this is what I'll be writing about for my maiden blog...

Thanks for reading and I hope you come back...

What I've been up to:

Before going much further, I'd like to cover what ACLs are for those of you who might not know.  ACLs are Access Control Lists.  They give users the ability to have fine-grained access control over their files and directories.  This includes giving users the ability to specify access to their files in a simple, meaningful way.

Over the last several months, I've been doing a lot of work with NFSv4 ACLs.  First, I worked with Sam to get NFSv4 ACL support into Solaris 10.  The major portion of this work involved implementing the pieces to be able to pass ACLs over-the-wire as defined by section 5.11 of the NFSv4 specification (RFC3530) and the translators (code to translate from UFS (or also referred to as POSIX-draft) ACLs to NFSv4 ACLs and back).  At that point, Solaris was further along with regard to ACLs than it ever had been, but was still not able to support the full semantics of NFSv4 ACLs.  So...here comes ZFS!

After getting the support for NFSv4 ACLs into Solaris 10, I started working on the ZFS ACL model with Mark and Sam.  So, you might wonder why a couple of NFS people (Sam and I) would be working with ZFS (Mark) on the ZFS ACL model...well that is a good question.   The reason for that is because ZFS has implemented native NFSv4 ACLs.  This is really exciting because it is the first time that Solaris is able to support the full semantics of NFSv4 ACLs as defined by RFC3530.

In order to implement native NFSv4 ACLs in ZFS, there were a lot of problems we had to overcome.  Some of the biggest struggles were ambiguities in the NFSv4 specification and the requirement for ZFS to be POSIX compliant.  These problems have been captured in an Internet Draft submitted by Sam and me on October 14, 2005.

ACLs in the Computer Industry:

What makes NFSv4 ACLs so special...so special to have the shiny, new ZFS implement them?  No previous attempt to specify a standard for ACLs has succeeded, therefore, we've seen a lot of different (non-standard) ACL models in the industry.  With NFS Version 4, we now have an IETF approved standard for ACLs.

As well as being a standard, the NFSv4 ACL model is very powerful.  It has a rich set of inheritance properties as well as a rich set of permission bits outside of just read, write and execute (as explained in the Access mask bits section below).  And for the Solaris NFSv4 implementation this means better interoperability with other vendor's NFSv4 implementations.

ACLs in Solaris:

Like I said before, ZFS has native NFSv4 ACLs!  This means that ZFS can fully support the semantics as defined by the NFSv4 specification (with the exception of a couple things, but that will be mentioned later).

What makes up an ACL?

ACLs are made up of zero or more Access Control Entries (ACEs).  Each ACE has multiple components and they are as follows:

1.) Type component:
        The type component of the ACE defines the type of ACE.  There
        are four types of ACEs: ALLOW, DENY, AUDIT, ALARM.


        The ALLOW type ACEs permit access.
        The DENY type ACES restrict access.
        The AUDIT type ACEs audit accesses.
        The ALARM type ACEs alarm accesses.

        The ALLOW and DENY type of ACEs are implemented in ZFS.
        AUDIT and ALARM type of ACEs are not yet implemented in ZFS.

        The possibilities of the AUDIT and ALARM type ACEs are described below.  I
        wanted to explain the flags that need to be used in conjunction with them before
        going into any detail on what they do, therefore, I gave this description its own
        section.

2.) Access mask bits component:
        The access mask bit component of the ACE defines the accesses
        that are controlled by the ACE.

        There are two categories of access mask bits:
        1.) The bits that control the access to the file
                i.e. write_data, read_data, write_attributes, read_attributes
        2.) The bits that control the management of the file
                i.e. write_acl, write_owner

        For an explanation of what each of the access mask bits actually control in ZFS,
        check out Mark's blog.

3.) Flags component:
        There are three categories of flags:
        1.) The bits that define inheritance properties of an ACE.
                i.e. file_inherit, directory_inherit, inherit_only,
                      no_propagate_inherit
                Again, for an explanation of these flags, check out Mark's blog.
        2.) The bits that define whether or not the ACE applies to a user or group
                i.e. identifier_group
        3.) The bits that work in conjunction with the AUDIT and ALARM type ACEs
                i.e. successful_access_flag, failed_access_flag.
                ZFS doesn't support these flags since they don't support AUDIT and
                ALARM type ACEs.

4.) who component:
        The who component defines the entity that the ACE applies to.

        For NFSv4, this component is a string identifier and it can be a user, group or
        special identifier (OWNER@, GROUP@, EVERYONE@).  An important thing to
        note about the EVERYONE@ special identifier is that it literally means everyone
        including the file's owner and owning group.  EVERYONE@ is not equivalent to
        the UNIX other entity.  (If you are curious as to why NFSv4 uses strings rather
        than integers (uids/gids), check out Eric's blog.)

        For ZFS, this component is an integer (uid/gid).

What do AUDIT and ALARM ACE types do?

The AUDIT and ALARM type of ACES trigger an audit or alarm event upon the successful or failed accesses depending  on the presence of the successful/failed access flags  (described above) as defined in the access mask bits of the ACE.  The ACEs of type AUDIT and ALARM don't play a role when  doing access checks on a file.  They only define an action to happen in the event that a certain access is attempted.

For example, lets say we have the following ACL:

lisagab:write_data::deny
lisagab:write_data:failed_access_flag:alarm

The first ACE affects the access that user, "lisagab", has to the  file.  The second ACE says if user, "lisagab", attempts to access this file for writing and fails, trigger an alarm event.

One important thing to remember is the fact that what we do in the event of auditing or alarming is still undefined.  Although, you can  think of it like this: when the access in question happens, auditing could be the logging the event to a file and alarming could be the sending of an email to an administrator.

How is access checking done?

To quote the NFSv4 specification:
 To determine if a request succeeds, each nfsace4 entry is processed
in order by the server. Only ACEs which have a "who" that matches
the requester are considered. Each ACE is processed until all of the
bits of the requester's access have been ALLOWED. Once a bit (see
below) has been ALLOWED by an ACCESS_ALLOWED_ACE, it is no longer
considered in the processing of later ACEs. If an ACCESS_DENIED_ACE
is encountered where the requester's access still has unALLOWED bits
in common with the "access_mask" of the ACE, the request is denied.

What this means is:

The most important thing to note about access checking with NFSv4 ACLs is that it is very order dependent.  If a request for access is made, each ACE in the ACL is traversed in order.  The first ACE that matches the who of the requester and defines the access that is being requested is honored.

For example, lets say user, "lisagab", is requesting the ability to read the data of file, "foo" and "foo" has the following ACL:

everyone@:read_data::allow
lisagab:write_data::deny

lisagab would be allowed the ability to read_data because lisagab is covered by "everyone@".

Another thing that is important to know is that the access determined is cumulative.

For example, lets say user, "lisagab", is requesting the ability to read and write the data of file, "bar" and "bar" has the following ACL:

lisagab:read_data::allow
lisagab:write_data::allow

lisagab would be allowed the ability to read_data and write_data.

How to use ZFS/NFSv4 ACLs on Solaris:

Many of you may remember the setfacl(1) and getfacl(1) commands.  Well, those are still around, but won't help you much with manipulating ZFS or pure NFSv4 ACLs.  Those commands are only capable of manipulating the POSIX-draft ACLs as implemented by UFS.

As a part of the ZFS putback, Mark has modified the chmod(1) and ls(1) command line utilities in order to manipulate ACLs on Solaris.

chmod(1) and ls(1) now give us the ability to manipulate ZFS/NFSv4 ACLs.  Interestingly enough, these utilities can also manipulate POSIX-draft ACLs so, now there is a one stop shop for all your ACL needs.

The End:

I would like to say congratulations to the ZFS team and thanks for the t-shirt.
@ 01:40 PM PST [ Comments [5] ]
 
 
 
 
 
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today

[RSS Newsfeed]

Valid XHTML or CSS?

[This is a Roller site]
Theme by Rowell Sotto.
 
© Lisa Week's Weblog