I need to be able to see if a mount request generated an error in the mountd daemon. I have a custom kernel that has changed the mount() function call to return the error code. So, now I'm using a very simple DTrace script to catch the error codes:
#!/usr/sbin/dtrace -s
pid$1::mount:return
{
printf("rc = %d", args[1]);
}
pid$1::audit_mountd_mount:return
{
printf("rc = %d", args[1]);
}
The issue is what happens when I try to run it:
[root@pnfs-4-11 ~]> ./mountd_res.d `pgrep -x mountd` dtrace: failed to compile script ./mountd_res.d: line 18: args[ ] may not be referenced because probe description pid100824::mount:return matches an unstable set of probes
What I think this means is that I've got multiple declarations of mount() and they all do not return something. Okay, I can narrow down the probe to just the one I want:
[root@pnfs-4-11 ~]> dtrace -l -f mount ID PROVIDER MODULE FUNCTION NAME 2378 lx-syscall mount entry 2379 lx-syscall mount return 13708 fbt genunix mount entry 13709 fbt genunix mount return 31190 syscall mount entry 31191 syscall mount return 65944 pid100824 mountd mount entry 65945 pid100824 libc.so.1 mount entry 65946 pid100824 mountd mount return 65947 pid100824 libc.so.1 mount return
And if I adjust my script:
pid$1:mountd:mount:return
{
printf("rc = %d", args[1]);
}
We see I've fixed this issue!
[root@pnfs-4-11 ~]> ./mountd_res.d `pgrep -x mountd` dtrace: failed to compile script ./mountd_res.d: line 18: index 1 is out of range for pid100824:mountd:mount:return args[ ]
Okay, I got my syntax wrong for the return code:
pid$1:mountd:mount:return
{
printf("rc = %d", arg1);
}
And now I see the correct output:
[root@pnfs-4-11 ~]> ./mountd_res.d `pgrep -x mountd` dtrace: script './mountd_res.d' matched 2 probes CPU ID FUNCTION:NAME 0 65946 mount:return rc = 0 0 65240 audit_mountd_mount:return rc = 1 0 65946 mount:return rc = 0 0 65240 audit_mountd_mount:return rc = 1
We've had some interesting question on nfs-discuss lately about mountd and I thought I'd use an userland script I wrote, snoop, and DTrace to show some interesting properties about mountd.
My perl script will send UDP mount requests to a server and spoof the client IP. I want to control what I send and I'll sometimes spoof a request from a non-existent machine.
BTW - you will notice I don't talk about what the share is that much, unless I state otherwise, it is:
[root@silver ~]> share | grep tdh -@tank/home /export/zfs/tdh rw ""
We can try a host without a name mapping:
[tdh@slayer ~/src]> host 10.10.20.41 Host 41.20.10.10.in-addr.arpa. not found: 3(NXDOMAIN) [tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 10.10.20.42
Note that since we don't have a client to receive the reply, we'll snoop it:
30 0.01297 10.10.20.42 -> silver MOUNT3 C Mount /export/zfs/tdh 31 0.01712 silver -> thens.internal.excfb.com DNS C 42.20.10.10.in-addr.arpa. Internet PTR ? 32 0.05912 thens.internal.excfb.com -> silver DNS R Error: 3(Name Error) 33 0.00522 silver -> thens.internal.excfb.com DNS C 42.20.10.10.in-addr.arpa. Internet PTR ? 34 0.00056 thens.internal.excfb.com -> silver DNS R Error: 3(Name Error) 35 0.00777 silver -> 10.10.20.42 MOUNT3 R Mount Permission denied
BTW - right off the bat we can see that mountd tries to resolve the client IP. What happens if there is a reverse entry?
[tdh@slayer ~/src]> host 192.168.4.14 14.4.168.192.in-addr.arpa domain name pointer blast-4-14.internal.excfb.com. [tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 192.168.4.14
Well?
37 0.03198 blast-4-14.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh 38 0.00089 silver -> thens.internal.excfb.com DNS C 14.4.168.192.in-addr.arpa. Internet PTR ? 39 0.00051 thens.internal.excfb.com -> silver DNS R 14.4.168.192.in-addr.arpa. Internet PTR blast-4-14.internal.excfb.com. 40 0.00290 silver -> blast-4-14.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix
So a client has to have a reverse mapping from IP to host name before we allow a mount to succeed. And we can see that in the source code for usr/src/cmd/fs.d/nfs/mountd/mountd.c:
875 getclientsnames(transp, &nb, &clnames);
876 if (clnames == NULL || nb == NULL) {
877 /*
878 * We failed to get a name for the client, even 'anon',
879 * probably because we ran out of memory. In this situation
880 * it doesn't make sense to allow the mount to succeed.
881 */
882 error = EACCES;
883 goto reply;
884 }
What if I change the share to be:
[root@silver ~]> zfs set sharenfs=rw=blast4-14:blast4-15 tank/home/tdh
Will it work or fail?
[tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 192.168.4.14 [tdh@slayer ~/src]> sudo ./udp_raw.pl src_addr 192.168.4.15
Note that I wanted to show something warm in the cache and something cold:
24 0.03500 blast-4-14.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh 25 0.00129 silver -> blast-4-14.internal.excfb.com MOUNT3 R Mount Permission denied 41 0.03706 blast-4-15.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh (retransmit) 42 0.01419 silver -> thens.internal.excfb.com DNS C 15.4.168.192.in-addr.arpa. Internet PTR ? 43 0.00048 thens.internal.excfb.com -> silver DNS R 15.4.168.192.in-addr.arpa. Internet PTR blast-4-15.internal.excfb.com. 44 0.00089 silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied
Two points here, the DNS cache is not flushed when a share reloads, but the nfs auth cache must be. If it were not flushed, we would have gotten permission granted.
Okay, I can show you what is going on by this example:
[root@silver ~]> zfs set sharenfs=rw=blast4-14.internal.excfb.com:blast4-15 tank/home/tdh
And;
24 0.03769 blast-4-14.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh 25 0.00122 silver -> blast-4-14.internal.excfb.com MOUNT3 R Mount Permission denied 41 0.03115 blast-4-15.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh (retransmit) 42 0.00092 silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied
Okay, we should have gotten permission granted for the first!
[root@silver ~]> zfs set sharenfs=rw=blast4-17.internal.excfb.com:blast4-15 tank/home/tdh
And that fails as well:
20 0.03105 blast-4-17.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh 21 0.00086 silver -> thens.internal.excfb.com DNS C 17.4.168.192.in-addr.arpa. Internet PTR ? 22 0.00056 thens.internal.excfb.com -> silver DNS R 17.4.168.192.in-addr.arpa. Internet PTR blast-4-17.internal.excfb.com. 23 0.00163 silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount Permission denied
Ohhh! Smug mode off
Okay, I'm interested in what the function in_access_list will tell me. I've got a small DTrace script, which I iterated over the development off, which will let me know what is going on here:
!/usr/sbin/dtrace -Fs
/*
* Thanks to Peter Harvey;
* http://blogs.sun.com/peteh/entry/dereferencing_user_space_pointers_in
*
* # ./mountd.d `pgrep -x mountd`
*
*/
dtrace:::BEGIN
{
printf("Sampling... Hit Ctrl-C to end.\n");
}
pid$1::check_client_new:return
{
printf("Access permission is %d", arg1);
}
pid$1::in_access_list:entry
{
self->trace_me = 1;
printf("Access list is %s", copyinstr(arg2));
}
pid$1::in_access_list:return
{
self->trace_me = 0;
printf("Access permission is %d", arg1);
}
pid$1::strcasecmp:entry
/self->trace_me == 1/
{
printf("host vs list entry: |%s|, |%s|\n", copyinstr(arg0),
copyinstr(arg1));
}
pid$1::strcasecmp:entry
/self->trace_me == 1/
{
printf("Comparison is %d", arg1);
}
Note that I need to use strcasecmp because I can't iterate over the array. I get as a result:
[root@silver ~]> ./mount_trace.sh + pgrep -x mountd + /root/mountd.d 634 dtrace: script '/root/mountd.d' matched 6 probes CPU FUNCTION 0 | :BEGIN Sampling... Hit Ctrl-C to end. 0 -> in_access_list Access list is blast4-17.internal.excfb.com:blast4-15 0 -> strcasecmp host vs list entry: |blast4-17.internal.excfb.com|, |blast-4-17.internal.excfb.com| 0 | strcasecmp:entry Comparison is 135156944 0 | strcasecmp:entry host vs list entry: |blast4-15|, |blast-4-17.internal.excfb.com| 0 | strcasecmp:entry Comparison is 135156944 0 <- in_access_list Access permission is 0 ^C
D'oh! My share is wrong, wrong I say!
If instead I try:
[root@silver ~]> zfs set sharenfs=rw=blast-4-17.internal.excfb.com:blast-4-15 tank/home/tdh
We expect blast-4-15 to fail and blast-4-17 to succeed see:
28 0.03211 blast-4-15.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh 29 0.00150 silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied 49 0.03460 blast-4-17.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh (retransmit) 50 0.00087 silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix
Which shows again, that you need an exact match and we don't append the domain name to end of the hostname. What would happen if added this to end of the server's /etc/hosts?
192.168.4.14 blast-4-15
I expect it to work. Does it?
[root@silver ~]> grep MOUNT xxx 24 0.03176 blast-4-15.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh 27 0.00124 silver -> blast-4-15.internal.excfb.com MOUNT3 R Mount Permission denied 49 0.00160 blast-4-17.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh (retransmit) 52 0.00098 silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix
Hmm, I bet the name entry is cached, which I can show by trying one which is not cached:
[root@silver ~]> zfs set sharenfs=rw=blast-4-17.internal.excfb.com:blast-4-31 tank/home/tdh
And
[root@silver ~]> grep MOUNT xxx 24 0.03633 blast-4-17.internal.excfb.com -> silver MOUNT3 C Mount /export/zfs/tdh 27 0.00102 silver -> blast-4-17.internal.excfb.com MOUNT3 R Mount OK FH=01CC Auth=unix 45 0.02997 blast-4-31 -> silver MOUNT3 C Mount /export/zfs/tdh (retransmit) 46 0.00189 silver -> blast-4-31 MOUNT3 R Mount OK FH=01CC Auth=unix
And we can see the lack of caching because of two clues, the name output in snoop, i.e., "blast-4-31", and the lack of DNS activity between the two packets.
Some of the behaviour shocked me and I made some stupid mistakes that were hard to figure out what I had done. As an exercise in triage, it was great. I now have the beginnings of a DTrace debugging tool that I can point people at if they need some help. I'm very, very happy about that part!
I'm trying to track down whether the client address is ever being set in a NFS request. I've checked with build 117, 112, 109, 85, and now I'm trying 79a. I've got a VMWare image running on a laptop. But for some reason my probe isn't loading:
# ./req.d dtrace: failed to compile script ./req.d: line 3: probe description ::rfs_dispatch:entry does not match any probes # dtrace -f rfs_dispatch dtrace: invalid probe specifier rfs_dispatch: probe description ::rfs_dispatch: does not match any probes
A clue can be found here:
# share #
The clue is that with no shares loaded, then the 'nfssrv' module is not loaded. If we create a share, we see:
# share -F nfs -o rw,anon=0 /export/home # dtrace -f rfs_dispatch dtrace: description 'rfs_dispatch' matched 2 probes ^C # ./req.d dtrace: script './req.d' matched 1 probe
We have success!
We had a recent integration that exposed some nasty interactions between a OpenSolaris client and a Linux server. There are bugs on both sides, but what I want to do here is document the behavior you'll see and what you can do to fix it.
The first problem was that the fix for 6790413 AUTH_NONE implementation in kernel RPC caused a nasty interaction with a Linux server in that it tried the first security flavor in the array returned by the MOUNTD request to the server. The issue can be seen here:
[thud@adept nfs]> more /etc/exports / *(sync) /home 192.168.1.0/255.255.255.0(rw,async,no_subtree_check,insecure,no_root_squash)
And a mount request from an OpenSolaris client:
[thud@witch ~]> sudo mount -o vers=3 wont:/home /mnt [thud@witch ~]> cd /mnt [thud@witch /mnt]> ls -la total 35 drwxr-xr-x 3 root root 4096 Feb 25 2008 . drwxr-xr-x 27 root root 30 Jul 17 00:34 .. drwx------ 25 thud staff 4096 Mar 19 00:22 thud [thud@witch /mnt]> cd thud thud: Permission denied.
Why, well look at what the server sends back:
MOUNT:----- NFS MOUNT ----- MOUNT: MOUNT:Proc = 1 (Add mount entry) MOUNT:Status = 0 (OK) MOUNT:File handle = [DADF] MOUNT: 01000700010005000000000053CF6DE4FF1C4572BB2950392EB6993C MOUNT:Authentication flavor = none,unix,390003,390004,390005 MOUNT:
The OpenSolaris server selected AUTH_NONE, as it was first. If we try this again:
[thud@witch ~]> sudo umount /mnt [thud@witch ~]> sudo mount -o vers=3,sec=sys wont:/home /mnt [thud@witch ~]> cd /mnt/thud
We are happy.
Note that this case works for Linux because if there is no command line option, the client will default to AUTH_SYS. It ignores the list from the server.
Well, we discussed whether we wanted to use the default security flavor as defined in nfssec.conf(4) or if we wanted to re-order the array on strongest flavor or if we wanted to do both (i.e., re-order only if the default was not present).
It turns out that you should honor the array's order as much as possible (See Section 2.7 of RFC2623). We've decided to use any option provided on the command line, then the default, and then the first entry in the array. I.e., if no command line option and no default, we consult the server's list. Also, if there is a command line option, it has to be present in the list or the mount fails. If on the other hand the default is not present, then we take the first entry in the list.
You can track this fix in 6860784 mount_nfs needs to choose default auth first for NFSv3 mounts. If you need relief, for now specify 'sec=sys' on your mount command or add it to your automount maps.
In the meantime, I started a discussion with the Linux NFS developers about the issue (Security negotiation), and it turns out that they decided that returning AUTH_NONE as the first flavor was a bug. This was fixed in nfs-utils (commit 3c1bb23c0379864722e79d19f74c180edcf2c36e in version 1.1.3).
And sure enough, my stock Fedora Core 8 server has a version of 1.1.0. So I updated my server to Fedora Core 11 to see what would happen. I was actually surprised, with version 1.1.5 that the mount failed:
[root@witch ~]> mount -o vers=3 adept:/home /mnt nfs mount: security mode does not match the server exporting adept:/home
It turns out that the Linux server is not returning any security flavors with the exact same exports as before!
MOUNT:----- NFS MOUNT ----- MOUNT: MOUNT:Proc = 1 (Add mount entry) MOUNT:Status = 0 (OK) MOUNT:File handle = [DADF] MOUNT: 01000700010005000000000053CF6DE4FF1C4572BB2950392EB6993C MOUNT:Authentication flavor = MOUNT:
Again, this works with a Linux client, and that is because they basically ignore the array of security flavors and try AUTH_SYS by default.
The bug (which I later verified has been seen by others (Red Hat Bugzilla – Bug 467613 rpc.mountd does not announce any flavors) is that if no 'sec=' is mentioned in the export definition, then no security flavor is set. If we change the export to instead be:
/home 192.168.1.0/255.255.255.0(sec=sys,rw,async,no_subtree_check,insecure,no_root_squash)
Then we restore interoperability.
There is a lesson buried in here, don't just test against your own client/server. Both sides failed that lesson at different points. Also, we do cutting edge pNFS and NFSv4.1 interoperability testing all the time, but we don't with NFSv3. While as developers we may think that development work is over, we do make bug fixes to support customers and we need to be careful to reduce customer pain.
AUTH_NONE is one of the least understood security flavors you can use with NFS (see nfssec(5) for more details). When you share a resource, you can specify the security flavors with 'sec'. You can also specify an anonymous uid with 'anon'. I mention that because the two interact.
The way they interact is that any unauthenticated user id is mapped to the anonymous uid. The primary way to be unauthenticated is to be the root uid on the client and not be in the 'root' access list. As the default for 'anon' is -1, this means that the client root typically has no permissions on the server. A server admin can grant clients root permissions by saying 'anon=0' in the share. As we will see, that can be very dangerous.
The secondary way to be unauthenticated requires that the share to have 'sec=none' set. share_nfs(1M) states that if the client uses either AUTH_NONE or a security mode is one that is not in the share, then the NFS request is treated as unauthenticated.
Let's try some examples:
On the server
[root@pnfs-9-24 ~]> zfs create pnfs2/sysnone [root@pnfs-9-24 ~]> zfs create pnfs2/sysnone0 [root@pnfs-9-24 ~]> zfs create pnfs2/sysnone55 [root@pnfs-9-24 ~]> zfs set sharenfs=sec=sys:none,anon=0 pnfs2/sysnone0 [root@pnfs-9-24 ~]> zfs set sharenfs=sec=sys:none pnfs2/sysnone [root@pnfs-9-24 ~]> zfs set sharenfs=sec=sys:none,anon=55 pnfs2/sysnone55 [root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone
Note that we have held off on doing the following:
[root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone0 [root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone55
And on the client:
[root@pnfs-9-23 ~]> mount -o vers=3,sec=none pnfs-9-24:/pnfs2/sysnone /mnt [root@pnfs-9-23 ~]> ls -la /mnt total 8 drwxrwxrwx 2 root root 2 Feb 12 16:20 . drwxr-xr-x 35 root root 37 Feb 12 16:31 .. [root@pnfs-9-23 ~]> touch /mnt/foo [root@pnfs-9-23 ~]> ls -la !$ ls -la /mnt/foo -rw-r--r-- 1 nobody nobody 0 Feb 12 21:58 /mnt/foo [root@pnfs-9-23 ~]> umount /mnt
Since there was no anon set, we get -1.
[root@pnfs-9-23 ~]> mount -o vers=3,sec=none pnfs-9-24:/pnfs2/sysnone0 /mnt [root@pnfs-9-23 ~]> touch /mnt/foo [root@pnfs-9-23 ~]> ls -la /mnt/foo -rw-r--r-- 1 root root 0 Feb 12 22:00 /mnt/foo [root@pnfs-9-23 ~]> umount /mnt
Since 'anon=0', we are going to use uid 0. I'll point out the danger later.
[root@pnfs-9-23 ~]> mount -o vers=3,sec=none pnfs-9-24:/pnfs2/sysnone55 /mnt [root@pnfs-9-23 ~]> touch /mnt/foo touch: cannot create /mnt/foo: Permission denied
What happened here? Well, remember that we didn't set directory permissions, so it is most likely that root owns this 'directory':
[root@pnfs-9-24 ~]> ls -la /pnfs2/sysnone55 total 6 drwxr-xr-x 2 root root 2 Feb 12 22:15 . drwxr-xr-x 12 root root 12 Feb 12 22:15 .. [root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone0 [root@pnfs-9-24 ~]> chmod 777 /pnfs2/sysnone55
The prior example worked because 'anon=0' matched up perfectly. So now:
[root@pnfs-9-23 ~]> touch /mnt/foo [root@pnfs-9-23 ~]> ls -la /mnt/foo -rw-r--r-- 1 55 55 0 Feb 12 22:01 /mnt/foo [root@pnfs-9-23 ~]> nfsstat -m /mnt /mnt from pnfs-9-24:/pnfs2/sysnone55 Flags: vers=3,proto=tcp,sec=none,hard,intr,link,symlink,acl,rsize=32768,wsize=32768,retrans=5,timeo=600 Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60
Let's see some non-root behavior here:
[thud@pnfs-9-23 ~]> touch /mnt/bar [thud@pnfs-9-23 ~]> ls -la /mnt total 10 drwxrwxrwx 2 root root 4 Feb 12 22:18 . drwxr-xr-x 35 root root 37 Feb 12 16:31 .. -rw-r--r-- 1 55 55 0 Feb 12 22:18 bar -rw-r--r-- 1 55 55 0 Feb 12 22:01 foo
And if we go back to the prior case (pnfs2/sysnone0):
[thud@pnfs-9-23 ~]> touch /mnt/bar [thud@pnfs-9-23 ~]> ls -la /mnt total 10 drwxrwxrwx 2 root root 4 Feb 12 22:20 . drwxr-xr-x 35 root root 37 Feb 12 16:31 .. -rw-r--r-- 1 root staff 0 Feb 12 22:20 bar -rw-r--r-- 1 root root 0 Feb 12 22:00 foo
So if we mix 'sec=none' and 'anon=0', it is easy enough to give every remote user root access on the server.
But we haven't examined the real power of 'sec=none' here:
[root@pnfs-9-23 ~]> mount -o vers=3,sec=krb5i pnfs-9-24:/pnfs2/sysnone0 /mnt nfs mount: security mode does not match the server exporting pnfs-9-24:/pnfs2/sysnone0 [root@pnfs-9-23 ~]> mount -o vers=4,sec=krb5i pnfs-9-24:/pnfs2/sysnone0 /mnt [root@pnfs-9-23 ~]> nfsstat -m /mnt /mnt from pnfs-9-24:/pnfs2/sysnone0 Flags: vers=4,proto=tcp,sec=krb5i,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600 Attr cache: acregmin=3,acregmax=60,acdirmin=30,acdirmax=60
Okay, our NFS tester extraordinary, Helen Chao, found a bug here. According to the man pages, you can argue that either the v3 case should have seen the mount succeed or fail. On the one hand, you specified that you wanted the mount to only be krb5i. On the other hand, you told the share to map all unlisted modes to AUTH_NONE. The slight bug here is that we need to clearly document what our expectations are here.
The next bug is not as minor one - v3 and v4 should have the same behavior. We don't care which side of the fence we fall on about allowed or denied, we just want consistency.
The next bug is very subtle here - 'nfsstat -m' reports that the mount is via krb5i. But is it? (Is this even a bug?)
[root@pnfs-9-23 ~]> touch /mnt/gar [root@pnfs-9-23 ~]> ls -la /mnt total 11 drwxrwxrwx 2 root root 5 Feb 12 22:29 . drwxr-xr-x 35 root root 37 Feb 12 16:31 .. -rw-r--r-- 1 root staff 0 Feb 12 22:20 bar -rw-r--r-- 1 root root 0 Feb 12 22:00 foo -rw-r--r-- 1 root root 0 Feb 12 22:29 gar
Can't tell anything there. Can I as a normal user?
[thud@pnfs-9-23 ~]> touch /mnt/googoo touch: cannot stat /mnt/googoo: Permission denied [thud@pnfs-9-23 ~]> kinit Password for thud@NFSV4.SUN.COM: [thud@pnfs-9-23 ~]> touch /mnt/googoo [thud@pnfs-9-23 ~]> ls -al /mnt total 12 drwxrwxrwx 2 root root 6 Feb 12 22:32 . drwxr-xr-x 35 root root 37 Feb 12 16:31 .. -rw-r--r-- 1 root staff 0 Feb 12 22:20 bar -rw-r--r-- 1 root root 0 Feb 12 22:00 foo -rw-r--r-- 1 root root 0 Feb 12 22:29 gar -rw-r--r-- 1 root root 0 Feb 12 22:32 googoo
Okay, so it isn't a bug at all - the client is correct here. I.e., the client is using kerberos to talk to the server. The share does not absolve the server from having to understand kerberos. We can clearly see that in that the user without a ticket is denied permission to create a file. And we also see that the uid is clearly mapped to the anon uid on the server.
Helen Chao, a colleague who had never really used Linux, asked me to help configure a kernel. I asked why and she said she needed to test RDMA over NFSv4. It turns out that the stock 2.6.25 kernel with Fedora Core 9 already had the support in it. We followed the directions at the nfs-rdma.txt and were not able to get it running.
Helen (a great test engineer) proceeded to investigate from there and couldn't get a simple loopback or NFS mount to succeed.
So I exported the root to all hosts and went to work debugging this issue. A 'rpcinfo -p' on the server showed the expected registered services. The same call from a client failed, but a ping worked:
[th199096@jhereg ~]> rpcinfo -p pnfs-9-30 ^C [th199096@jhereg ~]> rpcinfo -p pnfs-9-30 ^C [th199096@jhereg ~]> sudo mount -o vers=3 pnfs-9-30:/ /mnt ^C [th199096@jhereg ~]> sudo mount -o vers=3 pnfs-9-30:/ /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: retrying: /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out ^C [th199096@jhereg ~]> ping pnfs-9-30 pnfs-9-30 is alive
I thought that perhaps it was a firewall issue and disabled IPTABLES.
No luck and I knew the mount should succeed - I tried it with my home Core 8 box and an OpenSolaris server. It worked, but then again, that Linux box has been configured for ages. Long story short, I asked Chuck Lever for help.
His only suggestion was to turn off selinux or as he puts it:
Also disable selinux, just so your systems behave like normal Unix.
So I followed the directions I found here: How to Disable SELinux and now the mount works:
# mount -o vers=3 pnfs-9-30:/ /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: retrying: /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: /mnt: mounted OK #
Most of the help I found with google on the RPC messages wasn't informative. Either the suggestion was to turn off IPTABLES or there was no reply.
AUTH_SYS is an insecure security mode, yet it is commonly used within companies. It can be used as the proverbial open lock on a door - the fact that the lock is there means do not enter. But I've seen people terminated for ignoring that lock.
With that in mind, I want to go over the simple security schemes employed within a company and show why they don't work. The punchline will be of course Kerberos. Speaking of myths, one is that you need NFSv4 in order to deploy Kerberos. You don't - common servers and clients easily speak Kerberos with NFSv3. And ignore NFSv2, please, please.
With an export (or share), the most lax security is typically the default:
[root@pnfs-9-26 ~]> zfs create rootpool/export/home/secure [root@pnfs-9-26 ~]> share [root@pnfs-9-26 ~]> zfs set sharenfs=on rootpool/export/home/secure [root@pnfs-9-26 ~]> share -@rootpool/exp /export/home/secure rw ""
I.e., every machine in the world can mount pnfs-9-26:/export/home/secure. The reasons for this default are simple:
By default, root has access almost like any other user, but it is mapped to the user nobody. We can see this here if we grant wide open permissions on the export:
[root@pnfs-9-26 ~]> chmod 777 /export/home/secure [root@pnfs-9-26 ~]> ls -la /export/home/secure total 6 drwxrwxrwx 2 root root 2 Oct 5 11:39 . drwxr-xr-x 5 th199096 staff 6 Oct 5 11:39 ..
We should be able to create a file as anyone from another machine:
[root@jhereg ~]> mount -o vers=3 pnfs-9-26:/export/home/secure /mnt [root@jhereg ~]> touch /mnt/i_am_root
That worked:
[root@pnfs-9-26 secure]> ls -la total 7 drwxrwxrwx 2 root root 3 Oct 5 11:51 . drwxr-xr-x 5 th199096 staff 6 Oct 5 11:39 .. -rw-r--r-- 1 nobody nobody 0 Oct 5 11:51 i_am_root
Notice that root has been mapped to nobody. What happens if we do it as a normal user:
[th199096@jhereg ~]> touch /mnt/i_am_jhereg [th199096@jhereg ~]> touch /mnt/i_am_th199096
And we get the correct user:
[root@pnfs-9-26 secure]> ls -la total 9 drwxrwxrwx 2 root root 5 Oct 5 11:54 . drwxr-xr-x 5 th199096 staff 6 Oct 5 11:39 .. -rw-r--r-- 1 th199096 staff 0 Oct 5 11:54 i_am_jhereg -rw-r--r-- 1 nobody nobody 0 Oct 5 11:51 i_am_root -rw-r--r-- 1 th199096 staff 0 Oct 5 11:54 i_am_th199096
Now what happens if we try to remove i_am_th199096 as root?
[root@jhereg ~]> rm /mnt/i_am_th199096 rm: /mnt/i_am_th199096: override protection 644 (yes/no)? y
We are allowed to do that, but is it a property of being root or the permissions? We can check this with a simple change of the share:
[root@pnfs-9-26 secure]> zfs set sharenfs=anon=-1 rootpool/export/home/secure [root@pnfs-9-26 secure]> share -@rootpool/exp /export/home/secure anon=-1 ""
See share_nfs(1M) for a description of anon. Notice I didn't specify whether rw is set or not. We can retry the delete:
[root@jhereg ~]> rm /mnt/i_am_jhereg NFS3 getattr failed for pnfs-9-26: RPC: Authentication error; s1 = 13, s2 = 0 rm: /mnt/i_am_jhereg: Permission denied
If you want to make sure to deny root level access to a share, then you need to set anon=-1.
Conversely, if you want to enable root level access to a share, you can set anon=0:
[root@pnfs-9-26 secure]> zfs set sharenfs=anon=0 rootpool/export/home/secure [root@pnfs-9-26 secure]> share -@rootpool/exp /export/home/secure anon=0 ""
I've recreated the two files in the background (which shows by the way that rw is the default). And when we test the deletion:
[root@jhereg ~]> rm /mnt/i_am_jhereg [root@jhereg ~]>
No pesky question that implies I am not a god!
If I want to allow root access from one host but deny it from all others, I can use the root= access list:
[root@pnfs-9-26 secure]> zfs set sharenfs=root=pnfs-9-25.central.sun.com rootpool/export/home/secure [root@pnfs-9-26 secure]> share -@rootpool/exp /export/home/secure sec=sys,root=pnfs-9-25 ""
PS: The sec=sys is stating this is an AUTH_SYS share. Also, since I am using DNS for hosts in /etc/resolv.conf, I need a FQDN.
Try to remove:
[root@jhereg ~]> rm /mnt/i_am_th199096 rm: /mnt/i_am_th199096: override protection 644 (yes/no)? yes
Since it worked and we got a prompt, it has to be the permission set which is enabling this. If we tighten things down a bit more:
[root@pnfs-9-26 secure]> zfs set sharenfs=root=pnfs-9-25.central.sun.com,anon=-1 rootpool/export/home/secure [root@pnfs-9-26 secure]> share -@rootpool/exp /export/home/secure anon=-1,sec=sys,root=pnfs-9-25 ""
We can see we are locked out:
[root@jhereg ~]> rm /mnt/i_am_root rm: /mnt/i_am_root: Permission denied
versus
[root@pnfs-9-25 ~]> rm /mnt/i_am_root [root@pnfs-9-25 ~]>
And yet the other machine reigns supreme:
We'll revisit the use effectiveness of root= without anon=, when we look at permissions.
So we can keep machines from getting access altogether by restricting the rw= access list:
[root@pnfs-9-26 ~]> zfs set sharenfs=rw=pnfs-9-25.central.sun.com rootpool/export/home/secure [root@pnfs-9-26 ~]> share -@rootpool/exp /export/home/secure sec=sys,rw=pnfs-9-25.central.sun.com ""
which yields on the two clients:
[root@jhereg ~]> ls -la /mnt /mnt: Permission denied
and
[root@pnfs-9-25 ~]> ls -la /mnt drwxrwxrwx 2 root root 6 Oct 5 19:33 . drwxr-xr-x 36 root root 39 Oct 5 19:11 .. -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_here -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs-9-25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs_9_25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_th199096
Note that the client jhereg must be caching a file handle for the root of the export /export/home/secure on the server pnfs-9-26. If it were not, we would have to reissue the mount request, which would have to fail. Also note, it is not just the mountd requests which have to check access list permissions. If it were, then the above operations would always work. SunOS used to work this way and the Solaris NFS team made a change back in the 1995/96 time frame, see for example Brent Callaghan's presentation at the 1996 Connectathon: NFS Client Authentication. And quickly, the security reason for doing so is the implication that if a rogue client someone sniffed out a valid file handle, then it had complete access to all of the information on that share.
We can likewise grant read only access via the ro= access list.
All of rw, rw=, ro, and ro= interact as described by sharenfs(1M).
So access lists work on machines. If a machine is able to mount a share from a server, then all users on that client can access everything on that server. Right?
Wrong. The directory and file permissions determine user access. Contrast this with a model derived from a client only having one user logged in at a time. In that situation, it may not be the machine which is important but rather the user..
If I wanted to only grant access to a single user, then I would set the owner of the share to be that user and I would also set the permissions to be 700:
[root@pnfs-9-26 ~]> chown th199096:staff /export/home/secure/ [root@pnfs-9-26 ~]> chmod 700 /export/home/secure/ [root@pnfs-9-26 ~]> ls -la /export/home/secure/ total 10 drwx------ 2 th199096 staff 6 Oct 5 19:33 . drwxr-xr-x 5 th199096 staff 6 Oct 5 11:39 .. -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_here -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs-9-25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs_9_25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_th199096
And lets change the share to be wide open:
[root@pnfs-9-26 ~]> zfs set sharenfs=on rootpool/export/home/secure [root@pnfs-9-26 ~]> share -@rootpool/exp /export/home/secure rw ""
We see root access is denied (because it maps to nobody):
[root@pnfs-9-25 ~]> ls -la /mnt /mnt: Permission denied total 3
But on that same machine, th199096 is granted access:
[root@pnfs-9-25 ~]> su - th199096 [th199096@pnfs-9-25 ~]> ls -la /mnt total 12 drwx------ 2 th199096 staff 6 Oct 5 19:33 . drwxr-xr-x 36 root root 39 Oct 5 19:11 .. -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_here -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs-9-25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs_9_25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_th199096
By the way, if we grant either root= or anon=0 access, then this all goes out the window:
[root@pnfs-9-26 ~]> zfs set sharenfs=rw,anon=0 rootpool/export/home/secure
yields:
[root@pnfs-9-25 ~]> ls -la /mnt total 12 drwx------ 2 th199096 staff 6 Oct 5 19:33 . drwxr-xr-x 36 root root 39 Oct 5 19:11 .. -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_here -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs-9-25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:27 i_am_pnfs_9_25 -rw-r--r-- 1 th199096 staff 0 Oct 5 13:30 i_am_th199096
A client's root only gets to boss things around if the server grants permission.
Take a server for which the root account is locked down. Assume admins who don't want an inadvertent 'rm -rf /net' to nuke their server, so by default they create shares of the form:
[root@pnfs-9-26 ~]> zfs set sharenfs=rw,anon=-1 rootpool/export/home/secure
And further, at some point someone decides to lock down a share's permissions, i.e., 700 on the user th199096.
How long would it take someone to get access over AUTH_SYS?
Not long - even though we know root access is out and we can assume they do not know my password. Since we use NIS, they can do a 'ypcat passwd | grep th199096' and grab my uid. Then they only have to create a dummy account a test machine.
What if we create a special account, not in NIS? Well, they may not have root access on the server, but if they have any access, then they could cd to the parent directory, issue an 'ls -la', see the user name, and then grep for it out of /etc/passwd.
You could lock down the machine, lock down the NIS database, etc. But the fact remains that if I can mount it, then I can create a simple script to try every UID until I get access. How many servers out there check for getattr storms?
The answer is to further restrict the access lists. But eventually, if I'm able to gain access to one of the restricted machines or if I can bring up my box with the same IP as one of the restricted machines, I can get access.
But all I need to do to combat this without all of these "extreme" measures is to enable Kerberos on the server:
[root@pnfs-9-26 ~]> zfs set sharenfs=sec=krb5,rw,anon=-1 rootpool/export/home/secure [root@pnfs-9-26 ~]> share -@rootpool/exp /export/home/secure anon=-1,sec=krb5,rw ""
I am the right user (actually my uid on pnfs-9-25 matches that of the uid of the user th199096 on pnfs-9-26), but it fails:
[th199096@pnfs-9-25 ~]> ls -al /mnt NFS3 access failed for pnfs-9-26: RPC: Authentication error; s1 = 13, s2 = 0 /mnt: Permission denied total 3
I call a "share" an "export" because I learned the terminology at another company, one based on the SunOS style and not the Solaris style. It turns out I have other expectations on how shares work. I thought the following was legal:
[root@pnfs-9-24 ~]> zfs set sharenfs=rw=pnfs-9-25:jhereg rootpool/export/home/secure
And all I got was:
[root@pnfs-9-25 ~]> mount pnfs-9-24:/export/home/secure /mnt nfs mount: mount: /mnt: Permission denied
I reinstrumented mountd to spit out some debug messages and I saw:
[root@pnfs-9-24 ~]> Oct 5 16:04:27 pnfs-9-24 mountd[1598]: Considering |pnfs-9-25| vs |pnfs-9-25.Central.Sun.COM| Oct 5 16:04:27 pnfs-9-24 mountd[1598]: Considering |jhereg| vs |pnfs-9-25.Central.Sun.COM| Oct 5 16:04:27 pnfs-9-24 mountd[1598]: Considering |pnfs-9-25| vs |pnfs-9-25.Central.Sun.COM| Oct 5 16:04:27 pnfs-9-24 mountd[1598]: Considering |jhereg| vs |pnfs-9-25.Central.Sun.COM| Oct 5 16:04:27 pnfs-9-24 mountd[1598]: pnfs-9-25.Central.Sun.COM denied access to /export/home/secure
So it never considers the FQDN. Interesting, so what happens if we add it?
[root@pnfs-9-24 ~]> zfs set sharenfs=root=pnfs-9-25.Central.sun.com,anon=-1 rootpool/export/home/secure
We see:
[root@pnfs-9-25 ~]> mount pnfs-9-24:/export/home/secure /mnt [root@pnfs-9-25 ~]>
And on the console:
[root@pnfs-9-24 ~]> Oct 5 16:06:27 pnfs-9-24 mountd[1598]: Considering |pnfs-9-25.Central.sun.com| vs |pnfs-9-25.Central.Sun.COM|
By the way, the compare is case insensitive. This took me way longer to track down than I liked. And it had me going down dead-ends with other "bugs".
The share_nfs(1M) has this to say:
access_list The access_list argument is a colon-separated list whose components may be any number of the following: hostname The name of a host. With a server con- figured for DNS or LDAP naming in the nsswitch "hosts" entry, any hostname must be represented as a fully quali- fied DNS or LDAP name.
And sure enough:
[root@pnfs-9-24 ~]> grep hosts /etc/nsswitch.conf # "hosts:" and "services:" in this file are used only if the #hosts: nis [NOTFOUND=return] files hosts: files dns # before searching the hosts databases.
Besides RTFMing myself, which I had done earlier, but not well enough, I was struck by the thought that I wish we had made this choice at a previous company. It solves a lot of problems, reduces a lot of name server queries (which was many of the problems), but is not as flexible. Consider a multi-homed client thorton which can either be thorton.central.sun.com or thorton.be.central.sun.com. With just rw=thorton, we can leverage the search domains to allow access to both interfaces as once.
But, depending on the ordering in the search domains, we may end up sending more name lookups than we want. Also, I've heard some sysadmins expose the belief that those interfaces represent different machines. And if you want both to have access, you explicitly grant them both access.