« June 2008
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Tom Haynes

loghyr.com
excfb.com

Blogs to Gander At

Navigation

Editing

AllMarks

Referers

Today's Page Hits: 1875

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

www.alesti.org

Add to Alesti RSS Reader

South Park as I was 10 years ago

South Park Fantasy

South Park today

South Park Reality

I have more hair and it isn't so grey. :->

10 years ago, really

Toon Tom

Today, literally

Tom Today

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.

Main | Next page »
20080518 Sunday May 18, 2008
Connectathon.org is down, enjoy at least my talks

Down for routine maintenance - you can at least enjoy the following until then:


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20080517 Saturday May 17, 2008
Slides for Connectathon 2008 are being posted

Cthon '08 went off without a hitch. It started out uneventfully as Kerberos worked right out of the box. Evidently Sun's Kerberos team have been working on making initial configuration being painless. And they succeeded.

The public talks were well received and we've started posting the slides as they are sent in. You can check them out on Talks 08.

I'll post more as they arrive.

Also, we videoed most of the talks this year. As that content becomes available, we will post it up as well.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20080511 Sunday May 11, 2008
It is Connectathon time again

Be sure to visit www.connectathon.org and see when the talks are scheduled. These are open to the public.

Sun Microsystems, Inc. is involved with 6 presentations and then NetApp has 5 of them. I'll be giving two of them, but I'm actually more excited about the one on nfsreplay by Shehjar Tikoo and the Linux development git one by Bruce Fields and Benny Halevy.

Normally we can't share images of the event, but here is one from before the other vendors setting up their gear:

Not shown

Each of the Sun workstations is probably a node in a pNFS community.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20080224 Sunday February 24, 2008
What is a BakeAThon?

I've given up trying to explain to people what it is I do for a living. But I think I'm going to keep on trying to explain what a BakeAThon is to them.

How can I do one without the other? Well, I can abstract the process.

So a BakeAThon (and a ConnectAThon) is an interoperability testing event. And the best way to describe it is that you have 10 different people with a set of rules for a game (if you say AD&D edition 3 rules, you have another set of problems to deal with). Each of them has read the rules and believes that they know all of the intricacies. But none of them have played the game with anyone else.

So they all get together and start to play each other. And they start to argue about each and every move. Sometimes it is pretty obvious whose interpretation is wrong. And sometimes they call someone else over to help decide.

As soon as player A is done with player B, they start with player C. Except sometimes they are also playing with player D at the same time. Or player B comes back to see if they have gotten rule 5.3 correct now.

Sometimes they all vote on how to interpret a rule and even change the rule book. And sometimes player F was at the bathroom when that happened and causes the debate to start back up again.

Then they all go away again for 3 months, promising to play games against each other remotely. They meet back up again at the next BakeAThon - sometimes there is a new player or someone didn't show up. But they are willing to chime in over email.

But they have to start all over from scratch because no one played remotely and they've been busy playing with themselves.

A further complication is that some people only play defense and some only play offense. Sometimes you get a team where they split those duties. So when you talk to one person about how they run their offense, they shrug and say that they only do defense. And the problem that arises here is when the team's offense only plays against their defense - they get pretty good at it and understand some simple shortcuts that make it easy. But when they play another team, those same shortcuts cause problems.

So a BakeAThon is pretty much like that. The major difference is that the competitiveness isn't in winning a game but in getting the game adopted by other people. I.e., Foo Inc. and Bar Inc. may have differences and fight over customers, but while at a BakeAThon, they work together to make NFS a better protocol.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20071207 Friday December 07, 2007
Steve Dickson (of RedHat) releases prototype pseudo-fs root for Linux NFSv4

I've slammed the Linux NFSv4 implementation before for not having the same namespace as NFSv3. I.e., it used the 'fsid=0' hack to export the root of the v4 namespace and thus that path may not be the same as '/'.

Well, over on the nfsv4 <at> linux-nfs.org mailing list, Steve just announced a prototype which fixes that problem! And the crowd goes wild!

The following patch series gives rpc.mountd the ability to allocate
a dynamic pseudo root, so the 'fsid=0' export option is no longer 
required. This allows v2, v3 and v4 clients mounts without any 
changes to the server's exports list.

One anomaly of the Linux NFS server is that it requires a pseudo root
to be defined. Currently the only way a pseudo root can be defined is by 
setting the fsid to zero (i.e. fsid=0). So if we wanted to make v4
the default mounting version and have things just work like v2/v3
all of the existing exports configurations would have to change 
(i.e. a 'fsid=0' would have to be added) to support a v4 mounts,
which, imho, is unacceptable. So this patch series address
this problem.

I think this might also mark the first major piece of work on the Linux NFSv4 code to come from some place other than CITI. I might be wrong, but I think this is a sign of the maturity of code.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20071022 Monday October 22, 2007
We just delivered Mirror Mounts for NFSv4!

What a group effort. I love how teammates come out of the woodwork to help you get the final touches on a project. The code was putback today and you should start seeing it in build 77. I'll leave you with a teaser of things to come:

The putback for

PSARC 2007/563 Add _AT_TRIGGER to fstatat(2)
PSARC 2007/416 NFSv4 Mirror-mounts
5035401 allow clients to cross server filesystem boundaries if the fs is visible
6613892 nftw(3C) has potential security issues

enhances the NFSv4 clients to automatically mount filesystems when they are encountered at the NFSv4 server; this enhancement does not require the use of the automounter and therefore does not rely on the content or propagation of automounter maps. An example of the utility of this feature is in the presence of ZFS at the NFS server. With the ease of creation and management of numerous ZFS filesystems, the enhanced NFSv4 client will immediately provide access to the newly created and shared ZFS filesystems.

And here is a roll call:

Core team:
Calum Mackay: Cambridge, UK (team lead)
Tom Haynes: Tulsa, OK, senior engineer
Bill Baker: Austin, TX, srstaff engr/architect/advisor
Rob Thurlow: Ft Collins, CO, senior engr/advisor
Spencer Shepler: Austin, TX, srstaff engr/RTI advocate/advisor
Helen Chao: Menlo Park, CA, QE lead
Lily Li: Beijing, QE engineer
Evan Layton: Broomfield, CO, engineer
Alok Aggarwal: Atlanta, GA, engineer
Rich Brown: Chicago, IL, sr engr/PSARC Intern

And there were more who stepped up and made significant contributions to get the work out of the door.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20071009 Tuesday October 09, 2007
Mirror mounts on the way

I've been very busy trying to get Mirror Mounts out the door. Our last task was to fix a find bug:

[tdh@mrx ~]> sudo mount kanigix:/ /mnt
[tdh@mrx ~]> cd /mnt
[tdh@mrx /mnt]> cd zoo
[tdh@mrx zoo]> cd mms
[tdh@mrx mms]> ls -la
total 20
drwxr-xr-x   5 root     sys            5 Jul 19 21:41 .
drwxr-xr-x  11 root     sys           11 Oct  8 13:08 ..
drwxr-xr-x   5 root     sys            5 Jul 19 21:41 node1
drwxr-xr-x   5 root     sys            5 Jul 19 21:41 node2
drwxr-xr-x   5 root     sys            5 Jul 19 21:42 node3
[tdh@mrx mms]> df -F nfs -h
Filesystem             size   used  avail capacity  Mounted on
kanigix:/               20G   8.9G    11G    46%    /mnt
kanigix:/zoo           637G    42K   637G     1%    /mnt/zoo
kanigix:/zoo/mms       637G    31K   637G     1%    /mnt/zoo/mms

Now if we try to find, it should mount a subdirectory for us and traverse down it.

[tdh@mrx mms]> find .
.
./node3
find: cannot open .: Resource temporarily unavailable

It fails because of a security check to make sure the stat(2) information from before an opendir(2) matches a fstat(2) from after the opendir(2). The pre-stat gets the vnode which will be mounted on and the post-fstat gets the root of the new filesystem.

But now we can see that since the mount took place, that we pass the test which just failed.

[tdh@mrx mms]> df -F nfs -h
Filesystem             size   used  avail capacity  Mounted on
kanigix:/               20G   8.9G    11G    46%    /mnt
kanigix:/zoo           637G    42K   637G     1%    /mnt/zoo
kanigix:/zoo/mms       637G    31K   637G     1%    /mnt/zoo/mms
kanigix:/zoo/mms/node3
                       637G    31K   637G     1%    /mnt/zoo/mms/node3
[tdh@mrx mms]> find .
.
./node3
./node3/sub2
find: cannot open .: Resource temporarily unavailable
[tdh@mrx mms]> df -F nfs -h
Filesystem             size   used  avail capacity  Mounted on
kanigix:/               20G   8.9G    11G    46%    /mnt
kanigix:/zoo           637G    42K   637G     1%    /mnt/zoo
kanigix:/zoo/mms       637G    31K   637G     1%    /mnt/zoo/mms
kanigix:/zoo/mms/node3
                       637G    31K   637G     1%    /mnt/zoo/mms/node3
kanigix:/zoo/mms/node3/sub2
                       637G    31K   637G     1%    /mnt/zoo/mms/node3/sub2

But we puke on the next mirror mount.

The problem is actually deep down in nftw(2C) and you can look at the PSARC case at Add S_IFTRIGGER to st_mode [PSARC/2007/563 FastTrack timeout 10/04/2007]. This is pretty interesting reading. Most people told me that my proposal would generate considerable controversy.

The PSARC community decided that there was still a hole in the security check inside nftw(2C). While I can argue against this, it is pretty hard to rename/move an export on a live filesystem, in the end I decided that a counter-proposal made more sense that mine.

You've got to realize, I've been working on this project for the past 6 months. We were going to putback on Friday - time to turn over onto the pNFS project.

We decided as a group that the new proposal was the right thing to do for both the Mirror Mount project and Solaris (and indirectly OpenSolaris).


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070626 Tuesday June 26, 2007
OpenSolaris Project Models and pNFS

I believe that the majority of OpenSolaris development occurs within Sun Microsystems Engineering. As much as we would like for it to snowball in the wild, that has not happened. I'm saying this from my biased view, I know some projects have been proposed externally from Sun, e.g., the i18n port of the closed library. I also acknowledge the work that Dennis Clark is leading for the PPC port. There are more and I am not trying to take away from them. I am relating my experience with trying to get projects off the ground on OpenSolaris - see for example OpenSolaris Project: NFS Server in non-Global Zones.

So what does happen is that a new project gets started and there is no external indication of forward progress. People might start asking for code drops and the reality is that because of the huge internal pressure towards quality in Sun Engineering, that is not going to happen until the code has baked a bit. It gets to the point that a prime question on new project proposals is will code be released. Again, there isn't some hidden agenda within Sun to withhold the code - we are just new to this model and we want things to be perfect, not just good enough.

Look back at the discussion that went on for Project Proposal -- Honeycomb Information and dev tools and the lack of a code drop. The OpenSolaris Project: HoneyComb Fixed Content Storage already shows a binary drop and plans for a code drop in the Fall of 2007. Some valid reasons for a group to not drop code right away are that they do not understand the process (they need someone to help them) and they need to clear a legal hurdle to make sure that they are not violating the rights of either an individual or a company. I've seen both occur internally. The good news is that we have internal people ready and willing to help development groups.

What I find really exciting are projects that have a significant external presence. And sometimes that external pressure doesn't contribute directly to the code work. In NFSv4 and NFSv4.1, the external collaboration takes place through the IETF and Connectathon. Both companies and open source developers come together to design and implement future NFS protocol extensions. Interoperability across multiple OS platforms is ensured via the yearly meetings at Connectathon. And with the UMICH CITI developers working on Projects: NFS Version 4 Open Source Reference Implementation, which is mainly distributed to Linux, but forms a reference for both BSD directly and OSX indirectly, and Sun working on OpenSolaris, it is possible for vendors to do compatibility testing all year long.

Take for example NetApp, which provides only a NFS server. They are able to test new NFSv4.1 features against Linux and OpenSolaris clients. Admittedly this isn't new, NetApp was able to use the Solaris 10 beta code to test NFSv4. And the companies in question all sign NDAs and exchange hardware and engineering drops of binaries for testing.

So there is almost no work being driven from OpenSolaris into this open design project. There is a OpenSolaris Project: NFS version 4.1 pNFS, but it is mainly a portal to the Sun NFS team's work. A question that they asked themselves was whether they were going to do binary drops, code drops, or any drop at all. It wasn't a legal issue, the design is done in the open and all of the coding is new development. It wasn't a fear of the unknown, they had already shared binaries in the past. No, rather it was a concern on the impact of providing a drop on the development schedule. Would the overhead of publishing code and/or binaries kill the final deliverable?

Another OpenSolaris reality is that Sun expects to make money. I know that is an evil concept to some open source developers, but we bet the company on being able to deliver quality and sell service along with the source. So making the deadline for the pNFS deliverable is a major concern for the group.

I'm happy that the group decided that they could both deliver on time and make code and binary drops. Lisa just announced for the group the latest drop in FYI: pNFS Code and BFU Archives posted. You can check out the b66 implementation by downloading it. The code is rough in the sense that you wouldn't want to put it in production, but it gives other developers a chance to see what is going on and allows them to test their own implementations. Remember, this code has not been putback into Nevada - it lives in a group workspace. Before OpenSolaris, it would have only been shared under NDA and the expectation that the person installing the code assumed responsibility for any problems.

Project development in OpenSolaris is different than that occurring in other open source communities. There are different hurdles to jump, but there are different expectations as well. Internal developers are proud of the quality that they demand of the code and want to keep that bar high. That in turn makes early code drops hard for them to deliver. It is something they are learning to do. And the pNFS team is leading the way.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070502 Wednesday May 02, 2007
Linux NFSv4 namespace implementation fools ya with false advertising

We occasionally get a bug assigned to us about how a Solaris client can not mount a Linux export over NFSv4, but can over NFSv3. In some cases we get it as an automounter bug. We will close the bug because it is an intrinsic problem with the NFSv4 implementation on Linux - well, it is actually false advertising.

Consider the following export on a Linux box:

[tdh@adept ~]> more /etc/exports
/home *(rw,fsid=0,insecure,no_subtree_check,sync,anonuid=65534,anongid=65534)
[tdh@adept ~]> exportfs
/home           <world>
[tdh@sandman ~]> showmount -e adept
export list for adept:
/home/mrx    *
/home/tdh    *
/home/spud   *
/home/coach  *
/home/loghyr *

Hmm, that looks wrong - I know I had them exported like that earlier, but clearly the server is only exporting one export.

Anyway, it would seem that the one export is '/home' and that we should be able to go to '/home' and then access the remote filesystem. And we can't:

[tdh@sandman ~]> cd /net/adept/home
[tdh@sandman home]> ls -la
total 7
dr-xr-xr-x   6 root     root           6 May  2 16:31 .
dr-xr-xr-x   2 root     root           2 May  2 16:31 ..
dr-xr-xr-x   1 root     root           1 May  2 16:31 coach
dr-xr-xr-x   1 root     root           1 May  2 16:31 loghyr
dr-xr-xr-x   1 root     root           1 May  2 16:31 mrx
dr-xr-xr-x   1 root     root           1 May  2 16:31 spud
dr-xr-xr-x   1 root     root           1 May  2 16:31 tdh
[tdh@sandman home]> cd tdh
tdh: Permission denied.

What happened?

A snoop trace shows right off the bat that things are not going well:

     sandman -> adept.internal.excfb.com NFS C NULL4
adept.internal.excfb.com -> sandman      NFS R NULL4
     sandman -> adept.internal.excfb.com NFS C 4 (secinfo     ) PUTROOTFH LOOKUP home SECINFO tdh
adept.internal.excfb.com -> sandman      NFS R 4 (secinfo     ) NFS4ERR_NOENT PUTROOTFH NFS4_OK LOOKUP NFS4ERR_NOENT

We should remove the automounter from our testing:

[tdh@sandman home]> sudo mount -o vers=4 adept:/home /mnt
nfs mount: adept:/home: No such file or directory

Again, we see:

     sandman -> adept.internal.excfb.com NFS C 4 (secinfo     ) PUTROOTFH SECINFO home
adept.internal.excfb.com -> sandman      NFS R 4 (secinfo     ) NFS4ERR_OP_ILLEGAL PUTROOTFH NFS4_OK ILLEGAL NFS4ERR_OP_ILLEGAL

Okay, but it works with NFSv3:

[tdh@sandman home]> ls -la /mnt
total 162
drwxr-xr-x  20 root     root        4096 Feb 18 17:49 .
drwxr-xr-x  27 root     root        1024 Apr 23 09:31 ..
drwxr-xr-x   2 1094     100         4096 Feb 18 13:12 nfsv2
drwxr-xr-x   2 1813     100         4096 Feb 18 13:12 nfsv3
drwxr-xr-x   4 3530     100         4096 Feb 18 17:46 nfsv4

Note that we now have uids and earlier we had root ownerships.

What is going on here? Well another simple example shows what the Linux server is doing to us:

[tdh@sandman home]> sudo umount /mnt
[tdh@sandman home]> sudo mount -o vers=4 adept:/ /mnt
[tdh@sandman home]> ls -la /mnt
total 162
drwxr-xr-x   2 nobody   nobody      4096 Feb 18 13:12 nfsv2
drwxr-xr-x   2 nobody   nobody      4096 Feb 18 13:12 nfsv3
drwxr-xr-x   4 nobody   nobody      4096 Feb 18 17:46 nfsv4

And note that I have chopped some of the output for space.

Okay, it is clear to me that adept:/home is actually adept:/ as far as NFSv4 is concerned. The problem is that the Linux NFSv4 implementation of the pseudo-fs namespace is done such that one of the exports is made the root fs. If we look at the export again, we see:

[tdh@adept ~]> more /etc/exports
/home *(rw,fsid=0,insecure,no_subtree_check,sync,anonuid=65534,anongid=65534)

The 'fsid=0' option is assigning '/home' to be '/' as far as NFSv4 is concerned. The problem is that this breaks automounter impelmentations which use the MOUNTPROC_EXPORTALL RPC call to get a list a mountable exports. It also breaks humans reading such output to get a sense of what to mount.

You can find a discussion of how the NFSv4 namespace is built for Linux at Exporting Directories.

Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070211 Sunday February 11, 2007
Posted slides for Connectathon 2007

I've started posting the slides for Connectathon 2007: Talks 2007. As I get the remaining slides, I'll add them there.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070207 Wednesday February 07, 2007
You know you've been at a convention too long when ...

Went to the local Starbucks here at Connectathon 2007. The guy looked up and said "Awake, right?" The guy I was with was floored. I told him, what is so hard - I'm 6'5", currently sporting a handlebar, and always wearing a Green Lantern hoodie.

The event is going along fine. The main problem is that NFSv3 is too solid and the NFSv4 implementations are also getting that way. The NFSv4.1 stuff is really still in the design phase. But developers are getting small victories when they either get code to compile or even run against other vendors. I think that Connectathon 2008 will be more frantic and the victories will be larger.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070122 Monday January 22, 2007
Connectathon 2007 - Talk Schedule posted

I just posted the presentation schedule for Connectathon 2007 as Talks 2007. The talks are open to the public (go ahead, jam the room and ask probing questions):

Parkside Hall
180 Park Ave.
San Jose, CA 95113

This is the building which is connected to the The Tech Museum of Innovation in downtown San Jose. In most other cities, I'd tell you to look for the introverted and pale computer geeks. That wouldn't work for that area.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070110 Wednesday January 10, 2007
Dot NFS in action

Wow, I was in the Sun blogging tool, it is Apache Roller, and I deleted an image file. The name had a typo. Anyway, I caught it displaying a .nfs file:

Not shown

.nfs797 is a file which was deleted, but which some other application has a file handle open. Since it was deleted and that file name might be reused, a temporary file name is assigned to the deleted file. You might see this a lot on some of your systems. Now, if the server reboots before the client releases the file handle, it might keep .nfs on disk for a very long time. And if these files are large, they will eat up your space.

Under Solaris, you can use /usr/lib/fs/nfs/nfsfind to clean up ones which are over a week old:

if [ ! -s /etc/dfs/sharetab ]; then exit ; fi

# Get all NFS filesystems exported with read-write permission.

DIRS=`/usr/bin/nawk '($3 != "nfs") { next }
        ($4 ~ /^rw$|^rw,|^rw=|,rw,|,rw=|,rw$/) { print $1; next }
        ($4 !~ /^ro$|^ro,|^ro=|,ro,|,ro=|,ro$/) { print $1 }' /etc/dfs/sharetab`

for dir in $DIRS
do
        find $dir -type f -name .nfs\* -mtime +7 -mount -exec rm -f {} \;
done

For fun, what do you think happens to any of those .nfs files which are still referenced by a client? Well, they get a new .nfs name. The name isn't special, it is just a convention. Think about it for a while.


Orginally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070104 Thursday January 04, 2007
Connectathon 2007 is just about here

I've got my airline tickets, I've got my hotel room, and I've let everyone know I'll be in town. Connectathon 2007 is about to happen. Watch that space for a list of public presentations - if you are in the downtown San Jose area, you are welcome to drop in to listen.


Technorati Tags:
Orginally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20061231 Sunday December 31, 2006
How NFSv4 should work when crossing filesystems

In Some fun with NFSv4 and automount across a ssh tunnel, I revealed the work going on in Solaris for Mirror Mounts. The example was a desire to automount across a ssh tunnel. Well, I dusted off wont, the box from hell (being used by my son for video games) and created some zfs filesystems on it:

# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
zoo                398K   118G  24.5K  /zoo
zoo/home           256K   118G  35.5K  /export/zfs
zoo/home/braves   24.5K   118G  24.5K  /export/zfs/braves
zoo/home/kanigix  24.5K   118G  24.5K  /export/zfs/kanigix
zoo/home/loghyr   24.5K   118G  24.5K  /export/zfs/loghyr
zoo/home/mrx      24.5K   118G  24.5K  /export/zfs/mrx
zoo/home/nfsv2    24.5K   118G  24.5K  /export/zfs/nfsv2
zoo/home/nfsv3    24.5K   118G  24.5K  /export/zfs/nfsv3
zoo/home/nfsv4    24.5K   118G  24.5K  /export/zfs/nfsv4
zoo/home/spud     24.5K   118G  24.5K  /export/zfs/spud
zoo/home/tdh      24.5K   118G  24.5K  /export/zfs/tdh
# uname -a
SunOS wont 5.11 snv_55 i86pc i386 i86pc

I then opened a ssh tunnel to it on my Fedora Core 4 box and did a little bit of exploring:

[tdh@adept tdh]> uname -a
Linux adept 2.6.15-1.1833_FC4 #1 Wed Mar 1 23:41:37 EST 2006 i686 i686 i386 GNU/Linux
[tdh@adept ~/usenix]> ssh -fN -L "5049:wont:2049" wont
Password:
[tdh@adept ~/usenix]> sudo mount -o port=5049 -t nfs4 localhost:/ /nfs4/wont
[tdh@adept ~/usenix]> cd /nfs4/wont
[tdh@adept wont]> ls -la
total 6
drwxr-xr-x  38 root root 1024 Dec 31 17:49 .
drwxr-xr-x   4 root root 4096 Dec 31 18:17 ..
drwxr-xr-x   4 root sys   512 Dec 31 17:50 export
[tdh@adept wont]> cd export
[tdh@adept export]> ls -la
total 4
drwxr-xr-x   4 root sys   512 Dec 31 17:50 .
drwxr-xr-x  38 root root 1024 Dec 31 17:49 ..
drwxr-xr-x  11 root sys    11 Dec 31 17:50 zfs
[tdh@adept export]> cd zfs
[tdh@adept zfs]> ls -la
total 16
drwxr-xr-x  11 root sys  11 Dec 31 17:50 .
drwxr-xr-x   4 root sys 512 Dec 31 17:50 ..
drwxr-xr-x   2 root sys   2 Dec 31 17:50 braves
drwxr-xr-x   2 root sys   2 Dec 31 17:50 kanigix
drwxr-xr-x   2 root sys   2 Dec 31 17:50 loghyr
drwxr-xr-x   2 root sys   2 Dec 31 17:50 mrx
drwxr-xr-x   2 root sys   2 Dec 31 17:50 nfsv2
drwxr-xr-x   2 root sys   2 Dec 31 17:50 nfsv3
drwxr-xr-x   2 root sys   2 Dec 31 17:50 nfsv4
drwxr-xr-x   2 root sys   2 Dec 31 17:50 spud
drwxr-xr-x   2 root sys   2 Dec 31 17:50 tdh
[tdh@adept zfs]> cd tdh
[tdh@adept tdh]> ls -la
total 3
drwxr-xr-x   2 root sys  2 Dec 31 17:50 .
drwxr-xr-x  11 root sys 11 Dec 31 17:50 ..

Notice that I only did one mount command. As I crossed down into the exported filesystems, the Linux 2.16 implementation of NFSv4 did the mounts automatically for me in the background. Also, note that since '/' is not exported from wont, this must be a pseudo-fs:

[tdh@adept tdh]> showmount -e wont
Export list for wont:
/export/zfs         (everyone)
/export/zfs/nfsv2   (everyone)
/export/zfs/nfsv3   (everyone)
/export/zfs/nfsv4   (everyone)
/export/zfs/tdh     (everyone)
/export/zfs/loghyr  (everyone)
/export/zfs/kanigix (everyone)
/export/zfs/mrx     (everyone)
/export/zfs/spud    (everyone)
/export/zfs/braves  (everyone)

Let's export '/' and see what happens:

# share -F nfs -o rw -d "root" /

And on the Linux box:

[tdh@adept tdh]> cd /nfs4/wont
[tdh@adept wont]> ls -la
total 6
drwxr-xr-x  38 root root 1024 Dec 31 17:49 .
drwxr-xr-x   4 root root 4096 Dec 31 18:17 ..
drwxr-xr-x   4 root sys   512 Dec 31 17:50 export

What happened? Why didn't we see the root directory on wont? Well, when we did the mount command earlier, we basically got a reference to a file handle in the pseudo-fs. We need to flush this by umounting and remounting:

[tdh@adept wont]> cd
[tdh@adept ~]> sudo umount /nfs4/wont/
[tdh@adept ~]> sudo mount -o port=5049 -t nfs4 localhost:/ /nfs4/wont
[tdh@adept ~]> cd /nfs4/wont
[tdh@adept wont]> ls -la
total 67
drwxr-xr-x  38 root root 1024 Dec 31 17:49 .
drwxr-xr-x   4 root root 4096 Dec 31 18:17 ..
lrwxrwxrwx   1 root root    9 Dec 31 13:17 bin -> ./usr/bin
drwxr-xr-x   5 root sys   512 Dec 31 14:12 boot
drwxr-xr-x   2 root root  512 Dec 31 14:51 Desktop
drwxr-xr-x  24 root sys  4096 Dec 31 14:42 dev
drwxr-xr-x  10 root sys   512 Dec 31 14:42 devices
drwxr-xr-x   2 root root  512 Dec 31 14:51 Documents
drwxr-xr-x   9 root root  512 Dec 31 17:31 .dt
-rwxr-xr-x   1 root root 5111 Dec 31 14:51 .dtprofile
-rw-------   1 root root   16 Dec 31 17:31 .esd_auth
drwxr-xr-x  87 root sys  4608 Dec 31 17:52 etc
drwxr-xr-x   4 root sys   512 Dec 31 17:50 export
...

Let's walk down the paths again and see what happens:

[tdh@adept wont]> cd export [tdh@adept export]> ls -la total 5 drwxr-xr-x 4 root sys 512 Dec 31 17:50 . drwxr-xr-x 38 root root 1024 Dec 31 17:49 .. drwxr-xr-x 2 root root 512 Dec 31 13:17 home drwxr-xr-x 11 root sys 11 Dec 31 17:50 zfs [tdh@adept export]> cd zfs [tdh@adept zfs]> ls -la total 16 drwxr-xr-x 11 root sys 11 Dec 31 17:50 . drwxr-xr-x 4 root sys 512 Dec 31 17:50 .. drwxr-xr-x 2 root sys 2 Dec 31 17:50 braves drwxr-xr-x 2 root sys 2 Dec 31 17:50 kanigix drwxr-xr-x 2 root sys 2 Dec 31 17:50 loghyr drwxr-xr-x 2 root sys 2 Dec 31 17:50 mrx drwxr-xr-x 2 root sys 2 Dec 31 17:50 nfsv2 drwxr-xr-x 2 root sys 2 Dec 31 17:50 nfsv3 drwxr-xr-x 2 root sys 2 Dec 31 17:50 nfsv4 drwxr-xr-x 2 root sys 2 Dec 31 17:50 spud drwxr-xr-x 2 root sys 2 Dec 31 17:50 tdh [tdh@adept zfs]> cd tdh [tdh@adept tdh]> ls -la total 3 drwxr-xr-x 2 root sys 2 Dec 31 17:50 . drwxr-xr-x 11 root sys 11 Dec 31 17:50 ..

Let's make sure we are in the right place:

# scp sandman:/export/home/tdh/.tcshrc .
Password:
.tcshrc              100% |*******************************************************************|  5417       00:00
# chown tdh:staff .tcshrc
# ls -la
total 18
drwxr-xr-x   2 root     sys            3 Dec 31 18:10 .
drwxr-xr-x  11 root     sys           11 Dec 31 17:50 ..
-rw-------   1 tdh      staff       5417 Dec 31 18:10 .tcshrc

And on the client:

[tdh@adept tdh]> ls -la
total 9
drwxr-xr-x   2 root sys       3 Dec 31 18:10 .
drwxr-xr-x  11 root sys      11 Dec 31 17:50 ..
-rw-------   1 tdh  nobody 5417 Dec 31 18:10 .tcshrc
[tdh@adept tdh]> grep 10 /etc/group
wheel:x:10:root

The nobody shows up for the group because there is no mapping between the string "staff" and "wheel". In NFSv3, the numeric 10 would have gone across the wire and the ls command would have spit out "wheel".

Okay, let's check to see what the Solaris client would have done:

[tdh@sandman ~]> ssh -fN -L "5049:wont:2049" wont
Password:
[tdh@sandman ~]> su -
Password:
Sun Microsystems Inc.   SunOS 5.11      snv_54  October 2007
# mkdir -p /nfs4/wont
# mount -o port=5049 localhost:/ /nfs4/wont
# exit
[tdh@sandman ~]> cd /nfs4/wont
[tdh@sandman wont]> ls -la
total 134
drwxr-xr-x  38 root     root        1024 Dec 31 17:49 .
drwxr-xr-x   3 root     root         512 Dec 31 18:17 ..
...
drwxr-xr-x   2 root     root         512 Dec 31 14:51 Desktop
drwxr-xr-x   2 root     root         512 Dec 31 14:51 Documents
lrwxrwxrwx   1 root     root           9 Dec 31 13:17 bin -> ./usr/bin
drwxr-xr-x   5 root     sys          512 Dec 31 14:12 boot
drwxr-xr-x  24 root     sys         4096 Dec 31 14:42 dev
drwxr-xr-x  10 root     sys          512 Dec 31 14:42 devices
drwxr-xr-x  87 root     sys         4608 Dec 31 17:52 etc
drwxr-xr-x   4 root     sys          512 Dec 31 17:50 export
...
[tdh@sandman wont]> cd export
[tdh@sandman export]> ls -la
total 9
drwxr-xr-x   4 root     sys          512 Dec 31 17:50 .
drwxr-xr-x  38 root     root        1024 Dec 31 17:49 ..
drwxr-xr-x   2 root     root         512 Dec 31 13:17 home
drwxr-xr-x  11 root     sys           11 Dec 31 17:50 zfs
[tdh@sandman export]> cd zfs
[tdh@sandman zfs]> ls -la
total 5
drwxr-xr-x  11 root     sys           11 Dec 31 17:50 .
drwxr-xr-x   4 root     sys          512 Dec 31 17:50 ..

Okay, we have hit the crux of the problem for Mirror Mounts. We have a filesystem crossing on the server which needs to be mirrored on the client. We have to do this manually (or with an automounter if the ports are open):

[tdh@sandman zfs]> cd
[tdh@sandman ~]> su -
Password:
Sun Microsystems Inc.   SunOS 5.11      snv_54  October 2007
# mount -o port=5049 localhost:/export/zfs /nfs4/wont/export/zfs
# ls -la /nfs4/wont/export/zfs
total 32
drwxr-xr-x  11 root     sys           11 Dec 31 17:50 .
drwxr-xr-x   4 root     sys          512 Dec 31 17:50 ..
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 braves
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 kanigix
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 loghyr
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 mrx
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 nfsv2
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 nfsv3
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 nfsv4
drwxr-xr-x   2 root     sys            2 Dec 31 17:50 spud
drwxr-xr-x   2 root     sys            3 Dec 31 18:10 tdh
# tcsh
# ls -la /nfs4/wont/export/zfs/tdh
total 6
drwxr-xr-x   2 root     sys            3 Dec 31 18:10 .
drwxr-xr-x  11 root     sys           11 Dec 31 17:50 ..
# mount -o port=5049 localhost:/export/zfs/tdh /nfs4/wont/export/zfs/tdh
# ls -la  /nfs4/wont/export/zfs/tdh
total 18
drwxr-xr-x   2 root     sys            3 Dec 31 18:10 .
drwxr-xr-x  11 root     sys           11 Dec 31 17:50 ..
-rw-------   1 tdh      staff       5417 Dec 31 18:10 .tcshrc

Notice how the '/export/zfs' gave information about the child filesystems whereas '/' did not. Also, note how we get the correct group name because the '/etc/group' is the same on the two Solaris hosts. Finally, even with zfs presenting up the child filesystems, we did have to manually mount the child in order to peer into it.

So the Mirror Mounts project in the NFSv4 development team is going to fix all of this. Under the hood, the client is going to understand it is about to traverse to a different filesystem and do the equivalent of a NFSv3 mount.


Technorati Tags:
Orginally posted on Kool Aid Served Daily
Copyright (C) 2006, Kool Aid Served Daily

Copyright (C) 2007, Kool Aid Served Daily