Okay, I got hot and bothered - I thought I found a bug in the ZFS code. Work with me as I show my reasoning and see if you say "Doh!" before I do...
I create some zfs filesystems:
# zpool create zoo c0d1s0 c0d1s1 # zfs create zoo/home # zfs set mountpoint=/export/zfs zoo/home # zfs set sharenfs=on zoo/home # zfs set compression=on zoo/home # zfs set quota=10G zoo/home # zfs create zoo/home/nfsv2 # zfs create zoo/home/nfsv3 # zfs create zoo/home/nfsv4 # zfs list NAME USED AVAIL REFER MOUNTPOINT zoo 3.21G 131G 99.5K /zoo zoo/home 395K 10.0G 99.5K /export/zfs zoo/home/nfsv2 98.5K 10.0G 98.5K /export/zfs/nfsv2 zoo/home/nfsv3 98.5K 10.0G 98.5K /export/zfs/nfsv3 zoo/home/nfsv4 98.5K 10.0G 98.5K /export/zfs/nfsv4 zoo/x86 3.21G 131G 3.21G /zoo/x86
I then create some user accounts to go along with the new filesystems:
# useradd -m -u 1094 -g 100 -c "Mr. NFSv2" -d /export/zfs/nfsv2 nfsv2 # chown nfsv2:100 /export/zfs/nfsv2 # useradd -m -u 1813 -g 100 -c "Mr. NFSv3" -d /export/zfs/nfsv3 nfsv3 # chown nfsv3:100 /export/zfs/nfsv3 # useradd -m -u 3530 -g 100 -c "Mr. NFSv4" -d /export/zfs/nfsv4 nfsv4 # chown nfsv4:100 /export/zfs/nfsv4 # ls -al /export/zfs total 10 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . drwxr-xr-x 4 root sys 512 Mar 20 00:31 .. dr-xr-xr-x 3 root root 3 Mar 20 00:40 .zfs drwxr-xr-x 2 nfsv2 protos 2 Mar 20 00:33 nfsv2 drwxr-xr-x 2 nfsv3 protos 2 Mar 20 00:33 nfsv3 drwxr-xr-x 2 nfsv4 protos 2 Mar 20 00:33 nfsv4
This is on wont, an Opteron box, and I'm checking things out on sandman, an Ultra 5 (sparc). I haven't created users on it yet. What do I see:
# cd /net/wont/export/zfs/ # ls -la total 6 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:00 .zfs dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv2 dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv3 dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv4
So the bug is obviously the owner and group are not being reported correctly. It could be ZFS or NFSv4. I add an user account to check to see if it is an NFSv4 ID mapping problem:
# useradd -m -u 3530 -m -g 100 -c "Mr. NFSv4" -d /export/home/nfsv4 nfsv4 64 blocks # pwd /net/wont/export/zfs # ls -la total 6 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:00 .zfs dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv2 dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv3 dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv4
Nope, no luck. How about some content?
# su - nfsv4 Sun Microsystems Inc. SunOS 5.11 snv_35 October 2007 $ ls -la total 4 drwxr-xr-x 2 nfsv4 protos 2 Mar 20 00:33 . drwxr-xr-x 5 root sys 5 Mar 20 00:33 .. dr-xr-xr-x 3 root root 3 Mar 20 01:04 .zfs $ touch it $ ls -la total 5 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 . drwxr-xr-x 5 root sys 5 Mar 20 00:33 .. dr-xr-xr-x 3 root root 3 Mar 20 01:04 .zfs -rw-r--r-- 1 nfsv4 protos 0 Mar 20 01:04 it
And then if I check on the client:
> cd nfsv4 > ls -la total 5 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 . drwxr-xr-x 5 root sys 5 Mar 20 00:33 .. dr-xr-xr-x 3 root root 3 Mar 20 01:06 .zfs -rw-r--r-- 1 nfsv4 protos 0 Mar 20 01:04 it > ls -la total 7 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:07 .zfs dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv2 dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv3 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 nfsv4
And now the correct owner appears... Was it some name caching issue? Does it repeat?
We can see the NFSv4 ID mapping taking place:
> cd nfsv3 > ls -la total 4 drwxr-xr-x 2 nobody protos 2 Mar 20 00:33 . drwxr-xr-x 5 root sys 5 Mar 20 00:33 .. dr-xr-xr-x 3 root root 3 Mar 20 01:08 .zfs > cd .. > ls -la total 8 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:08 .zfs dr-xr-xr-x 1 root root 1 Mar 20 00:56 nfsv2 drwxr-xr-x 2 nobody protos 2 Mar 20 00:33 nfsv3 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 nfsv4
So, before I cd into a directory, the client should Doh!
The problem here is that before I cd into the directory, the mount has not taken place. This is in /net and thus we are using the automounter. You could argue that since we are presenting a list of exported filesystems to the user, we should have already gone down into them and gathered some information.
What happens if we have 1300 or 25,000 ZFS user accounts here? And we have 1k or 25k clients trying to connect?
The answer is what we call a mount storm and it kills server performance.
I've seen this really kill NAS boxes in customer's data centers (Linux clients with the AMD automounter). I'm glad the Solaris automounter is not getting the data ahead of time.
Now that we know what is going on, lets have some fun. First, we let the automounts time out:
# ls -la total 6 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:27 .zfs dr-xr-xr-x 1 root root 1 Mar 20 01:22 nfsv2 dr-xr-xr-x 1 root root 1 Mar 20 01:22 nfsv3 dr-xr-xr-x 1 root root 1 Mar 20 01:22 nfsv4
Lets confirm our hypothesis:
# cd nfsv4 # ls -la total 5 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 . drwxr-xr-x 5 root sys 5 Mar 20 00:33 .. dr-xr-xr-x 3 root root 3 Mar 20 01:27 .zfs -rw-r--r-- 1 nfsv4 protos 0 Mar 20 01:04 it # cd .. # ls -la total 7 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:27 .zfs dr-xr-xr-x 1 root root 1 Mar 20 01:22 nfsv2 dr-xr-xr-x 1 root root 1 Mar 20 01:22 nfsv3 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 nfsv4
And lets try to nuke the stuff:
# chown nfsv4 nfsv3 chown: nfsv3: Not owner # ls -la total 8 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:27 .zfs dr-xr-xr-x 1 root root 1 Mar 20 01:22 nfsv2 drwxr-xr-x 2 nobody protos 2 Mar 20 00:33 nfsv3 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 nfsv4 # useradd -m -u 1094 -m -g 100 -c "Mr. NFSv2" -d /export/home/nfsv2 nfsv2 64 blocks # ls -la total 8 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:30 .zfs dr-xr-xr-x 1 root root 1 Mar 20 01:22 nfsv2 drwxr-xr-x 2 nobody protos 2 Mar 20 00:33 nfsv3 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 nfsv4 # chown nfsv4 nfsv2 chown: nfsv2: Not owner # ls -la total 9 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:30 .zfs drwxr-xr-x 2 nfsv2 protos 2 Mar 20 00:33 nfsv2 drwxr-xr-x 2 nobody protos 2 Mar 20 00:33 nfsv3 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 nfsv4
The automounter is lazy, it only goes to the server when it has to service a user request.
Evidently the name cache clears pretty quickly as well:
# useradd -m -u 1813 -m -g 100 -c "Mr. NFSv3" -d /export/home/nfsv3 nfsv3 64 blocks # ls -la total 9 drwxr-xr-x 5 root sys 5 Mar 20 00:33 . dr-xr-xr-x 2 root root 2 Mar 20 00:56 .. dr-xr-xr-x 3 root root 3 Mar 20 01:32 .zfs drwxr-xr-x 2 nfsv2 protos 2 Mar 20 00:33 nfsv2 drwxr-xr-x 2 nfsv3 protos 2 Mar 20 00:33 nfsv3 drwxr-xr-x 2 nfsv4 protos 3 Mar 20 01:04 nfsv4
Finally, why does it work correctly in the following sequence?
# mkdir -p /nfsv4/wont/zfs/nfsv2 # ls -la /nfsv4/wont/zfs/nfsv2 total 4 drwxr-xr-x 2 root root 512 Mar 20 01:34 . drwxr-xr-x 3 root root 512 Mar 20 01:34 .. # mount wont:/export/zfs/nfsv2 /nfsv4/wont/zfs/nfsv2 # ls -la /nfsv4/wont/zfs/nfsv2 total 4 drwxr-xr-x 2 nfsv2 protos 2 Mar 20 00:33 . drwxr-xr-x 3 root root 512 Mar 20 01:34 .. dr-xr-xr-x 3 root root 3 Mar 20 01:34 .zfs # mkdir -p /nfsv4/wont/zfs/nfsv3 # ls -la /nfsv4/wont/zfs total 8 drwxr-xr-x 4 root root 512 Mar 20 01:35 . drwxr-xr-x 3 root root 512 Mar 20 01:34 .. drwxr-xr-x 2 nfsv2 protos 2 Mar 20 00:33 nfsv2 drwxr-xr-x 2 root root 512 Mar 20 01:35 nfsv3 # mount wont:/export/zfs/nfsv3 /nfsv4/wont/zfs/nfsv3 # ls -la /nfsv4/wont/zfs total 8 drwxr-xr-x 4 root root 512 Mar 20 01:35 . drwxr-xr-x 3 root root 512 Mar 20 01:34 .. drwxr-xr-x 2 nfsv2 protos 2 Mar 20 00:33 nfsv2 drwxr-xr-x 2 nfsv3 protos 2 Mar 20 00:33 nfsv3
Because we explicitly do the mount request, which causes directory information to be gathered.