« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20060320 Monday March 20, 2006
zfs, NFSv4, and the automounter

Okay, I got hot and bothered - I thought I found a bug in the ZFS code. Work with me as I show my reasoning and see if you say "Doh!" before I do...

I create some zfs filesystems:

# zpool create zoo c0d1s0 c0d1s1
# zfs create zoo/home
# zfs set mountpoint=/export/zfs zoo/home
# zfs set sharenfs=on zoo/home
# zfs set compression=on zoo/home
# zfs set quota=10G zoo/home
# zfs create zoo/home/nfsv2
# zfs create zoo/home/nfsv3
# zfs create zoo/home/nfsv4
# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
zoo                   3.21G   131G  99.5K  /zoo
zoo/home               395K  10.0G  99.5K  /export/zfs
zoo/home/nfsv2        98.5K  10.0G  98.5K  /export/zfs/nfsv2
zoo/home/nfsv3        98.5K  10.0G  98.5K  /export/zfs/nfsv3
zoo/home/nfsv4        98.5K  10.0G  98.5K  /export/zfs/nfsv4
zoo/x86               3.21G   131G  3.21G  /zoo/x86

I then create some user accounts to go along with the new filesystems:

# useradd -m -u 1094 -g 100 -c "Mr. NFSv2" -d /export/zfs/nfsv2 nfsv2
# chown nfsv2:100 /export/zfs/nfsv2
# useradd -m -u 1813 -g 100 -c "Mr. NFSv3" -d /export/zfs/nfsv3 nfsv3
# chown nfsv3:100 /export/zfs/nfsv3
# useradd -m -u 3530 -g 100 -c "Mr. NFSv4" -d /export/zfs/nfsv4 nfsv4
# chown nfsv4:100 /export/zfs/nfsv4
# ls -al /export/zfs
total 10
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
drwxr-xr-x   4 root     sys          512 Mar 20 00:31 ..
dr-xr-xr-x   3 root     root           3 Mar 20 00:40 .zfs
drwxr-xr-x   2 nfsv2    protos         2 Mar 20 00:33 nfsv2
drwxr-xr-x   2 nfsv3    protos         2 Mar 20 00:33 nfsv3
drwxr-xr-x   2 nfsv4    protos         2 Mar 20 00:33 nfsv4

This is on wont, an Opteron box, and I'm checking things out on sandman, an Ultra 5 (sparc). I haven't created users on it yet. What do I see:

# cd /net/wont/export/zfs/
# ls -la
total 6
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:00 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv2
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv3
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv4

So the bug is obviously the owner and group are not being reported correctly. It could be ZFS or NFSv4. I add an user account to check to see if it is an NFSv4 ID mapping problem:

#  useradd -m -u 3530 -m -g 100 -c "Mr. NFSv4" -d /export/home/nfsv4 nfsv4
64 blocks
# pwd
/net/wont/export/zfs
# ls -la
total 6
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:00 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv2
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv3
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv4

Nope, no luck. How about some content?

# su - nfsv4
Sun Microsystems Inc.   SunOS 5.11      snv_35  October 2007
$ ls -la
total 4
drwxr-xr-x   2 nfsv4    protos         2 Mar 20 00:33 .
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:04 .zfs
$ touch it
$ ls -la
total 5
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 .
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:04 .zfs
-rw-r--r--   1 nfsv4    protos         0 Mar 20 01:04 it

And then if I check on the client:

> cd nfsv4
> ls -la
total 5
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 .
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:06 .zfs
-rw-r--r--   1 nfsv4    protos         0 Mar 20 01:04 it
> ls -la
total 7
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:07 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv2
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv3
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 nfsv4

And now the correct owner appears... Was it some name caching issue? Does it repeat?

We can see the NFSv4 ID mapping taking place:

> cd nfsv3
> ls -la
total 4
drwxr-xr-x   2 nobody   protos         2 Mar 20 00:33 .
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:08 .zfs
> cd ..
> ls -la
total 8
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:08 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 00:56 nfsv2
drwxr-xr-x   2 nobody   protos         2 Mar 20 00:33 nfsv3
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 nfsv4

So, before I cd into a directory, the client should Doh!

The problem here is that before I cd into the directory, the mount has not taken place. This is in /net and thus we are using the automounter. You could argue that since we are presenting a list of exported filesystems to the user, we should have already gone down into them and gathered some information.

What happens if we have 1300 or 25,000 ZFS user accounts here? And we have 1k or 25k clients trying to connect?

The answer is what we call a mount storm and it kills server performance.

I've seen this really kill NAS boxes in customer's data centers (Linux clients with the AMD automounter). I'm glad the Solaris automounter is not getting the data ahead of time.

Now that we know what is going on, lets have some fun. First, we let the automounts time out:

# ls -la
total 6
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:27 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 01:22 nfsv2
dr-xr-xr-x   1 root     root           1 Mar 20 01:22 nfsv3
dr-xr-xr-x   1 root     root           1 Mar 20 01:22 nfsv4

Lets confirm our hypothesis:

# cd nfsv4
# ls -la
total 5
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 .
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:27 .zfs
-rw-r--r--   1 nfsv4    protos         0 Mar 20 01:04 it
# cd ..
# ls -la
total 7
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:27 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 01:22 nfsv2
dr-xr-xr-x   1 root     root           1 Mar 20 01:22 nfsv3
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 nfsv4

And lets try to nuke the stuff:

# chown nfsv4 nfsv3
chown: nfsv3: Not owner
# ls -la
total 8
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:27 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 01:22 nfsv2
drwxr-xr-x   2 nobody   protos         2 Mar 20 00:33 nfsv3
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 nfsv4
# useradd -m -u 1094  -m -g 100 -c "Mr. NFSv2" -d /export/home/nfsv2 nfsv2
64 blocks
# ls -la
total 8
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:30 .zfs
dr-xr-xr-x   1 root     root           1 Mar 20 01:22 nfsv2
drwxr-xr-x   2 nobody   protos         2 Mar 20 00:33 nfsv3
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 nfsv4
# chown nfsv4 nfsv2
chown: nfsv2: Not owner
# ls -la
total 9
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:30 .zfs
drwxr-xr-x   2 nfsv2    protos         2 Mar 20 00:33 nfsv2
drwxr-xr-x   2 nobody   protos         2 Mar 20 00:33 nfsv3
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 nfsv4

The automounter is lazy, it only goes to the server when it has to service a user request.

Evidently the name cache clears pretty quickly as well:

# useradd -m -u 1813 -m -g 100 -c "Mr. NFSv3" -d /export/home/nfsv3 nfsv3
64 blocks
# ls -la
total 9
drwxr-xr-x   5 root     sys            5 Mar 20 00:33 .
dr-xr-xr-x   2 root     root           2 Mar 20 00:56 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:32 .zfs
drwxr-xr-x   2 nfsv2    protos         2 Mar 20 00:33 nfsv2
drwxr-xr-x   2 nfsv3    protos         2 Mar 20 00:33 nfsv3
drwxr-xr-x   2 nfsv4    protos         3 Mar 20 01:04 nfsv4

Finally, why does it work correctly in the following sequence?

# mkdir -p /nfsv4/wont/zfs/nfsv2
# ls -la /nfsv4/wont/zfs/nfsv2
total 4
drwxr-xr-x   2 root     root         512 Mar 20 01:34 .
drwxr-xr-x   3 root     root         512 Mar 20 01:34 ..
# mount wont:/export/zfs/nfsv2 /nfsv4/wont/zfs/nfsv2
# ls -la /nfsv4/wont/zfs/nfsv2
total 4
drwxr-xr-x   2 nfsv2    protos         2 Mar 20 00:33 .
drwxr-xr-x   3 root     root         512 Mar 20 01:34 ..
dr-xr-xr-x   3 root     root           3 Mar 20 01:34 .zfs
# mkdir -p /nfsv4/wont/zfs/nfsv3
# ls -la /nfsv4/wont/zfs
total 8
drwxr-xr-x   4 root     root         512 Mar 20 01:35 .
drwxr-xr-x   3 root     root         512 Mar 20 01:34 ..
drwxr-xr-x   2 nfsv2    protos         2 Mar 20 00:33 nfsv2
drwxr-xr-x   2 root     root         512 Mar 20 01:35 nfsv3
# mount wont:/export/zfs/nfsv3 /nfsv4/wont/zfs/nfsv3
# ls -la /nfsv4/wont/zfs
total 8
drwxr-xr-x   4 root     root         512 Mar 20 01:35 .
drwxr-xr-x   3 root     root         512 Mar 20 01:34 ..
drwxr-xr-x   2 nfsv2    protos         2 Mar 20 00:33 nfsv2
drwxr-xr-x   2 nfsv3    protos         2 Mar 20 00:33 nfsv3

Because we explicitly do the mount request, which causes directory information to be gathered.


Technorati Tags:
Orginally posted on Kool Aid Served Daily
Copyright (C) 2006, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/zfs_nfsv4_and_the_automounter
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed