« February 2007 »
SunMonTueWedThuFriSat
    
1
3
4
5
8
9
10
13
15
20
21
22
23
25
26
27
28
   
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20070224 Saturday February 24, 2007
Dusting off ElHam

ElHam is a filesystem testing tool designed to detect corruption, be multiprotocol, and stress a filesystem. It isn't designed to be a benchmark. I'm in the middle of debugging a real nasty NFSv4 bug, read that to mean we haven't a real clue as to what is going on or how to reproduce it, and I need to generate sufficient load on a test system.

So I went and got ElHam from SourceForge.net. I wrote it when I was at NetApp as a tool we could use internally to get multiprotocol lock testing, generate metadir traffic, and to hand out to customers for corruption testing. As such, we stuck a BSD license on it and hung it off of SourceForge.net.

It still needs work done on it - for example, I figured out that it wasn't detecting big endianess. I also have to make a pass through it and make sure that I capture all returns from function calls and check that they are valid. One of the things you need for corruption testing is early detection of problems.

Sometimes in trying to detect corruption, you can get a false positive because of client side caching. If your focus is strictly on the server, i.e., you are testing a filer, that is bad. So you might be tempted to turn off client side caching. It also appears to go faster, but again, ElHam is not designed to be a benchmark.

The other evil with turning off client side caching is that it effectively negates both locking in general and NFSv4 delegations. ElHam is designed to have multiple readers and writers, both local and remote, changing files in a directory tree. Client side caching issues are something it should have to live with.

Anyway, multiple instances (from different architectures and OSes) are possible because ElHam records what is supposed to be in every data block. So when another instance comes along, it is able to compute what should be in the data block and then it can see if the on-disk image is corrupt. I need to write a small application to inject corruption - this will help me get signatures to show people what ElHam has detected.

The current big issue is that ElHam is designed to push a filesystem to capacity and back off. I.e., reads and writes in the face of a full filesystem are interesting. To aid in that testing, it is best that the 'data', 'meta', and 'history' (see ElHam docs) directories each be on a different filesystem. Well yesterday I had all three on the same filesystem and it got full. So I'm trying to reproduce that and see what is happening.

A really neat way to do this is to use ZFS to create different filesystems and then set quotas to control how much space each filesystem is allowed:

# zfs create zoo/elham 
# zfs set sharenfs=on zoo/elham
# zfs create zoo/elham/data
# zfs create zoo/elham/meta
# zfs create zoo/elham/history
# zfs list zoo/elham/*
NAME                USED  AVAIL  REFER  MOUNTPOINT
zoo/elham/data     36.7K   654G  36.7K  /zoo/elham/data
zoo/elham/history  36.7K   654G  36.7K  /zoo/elham/history
zoo/elham/meta     36.7K   654G  36.7K  /zoo/elham/meta
# zfs set quota=2G zoo/elham/data
# zfs set quota=20G zoo/elham/meta
# zfs set quota=20G zoo/elham/history
# zfs list zoo/elham/*
NAME                USED  AVAIL  REFER  MOUNTPOINT
zoo/elham/data     36.7K  2.00G  36.7K  /zoo/elham/data
zoo/elham/history  36.7K  20.0G  36.7K  /zoo/elham/history
zoo/elham/meta     36.7K  20.0G  36.7K  /zoo/elham/meta

Note that I give the 'history' and 'meta' filesystems much more of a quota. I don't want to run out of space on them.

I'm going to kick off several instances of ElHam and see if I can fill this puppy up.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070219 Monday February 19, 2007
grub: error 17, cannot mount selected partition - type 0xbf

Okay, I just took the machine which has been running Fedora Core 4 for the longest time and installed Solaris Nevada b56 on it. And I had one of the most painful experiences ever with Solaris. The install went fine, but when it came up, GRUB dropped to a command line prompt and gave out:

error 17, cannot mount selected partition

When pushed with a 'cat /', it would also mention that it did not like partition type 0xbf.

I did everything, reboot the DVD, dropping into single user mode. I reinstalled GRUB, etc. No luck.

I thought it was my BIOS, I kept on changing the boot device. But that didn't make sense - it was at least booting into GRUB. In retrospect, it does. The BIOS would get the hard drive to boot, but GRUB had no idea about the very same hard drive that it was on.

Okay, I noticed that when I was booting in single user mode and when the bios was reporting the hard drives, that the single hard drive was on the 2nd IDE loop. I.e., it was /dev/dsk/c1d0s0. I checked /etc/vfstab, and it was slated to read from there.

I finally got mad enough and swapped the IDE cables - this took 10-15 minutes because the cables in my Shuttle SS51G are tight and I had to pull out the drive cage. Anyway, when I rebooted, I did get farther. It would go through the GRUB menu and reboot.

I got in single user mode and fixed up /etc/vfstab to use /dev/dsk/c0d0s0. Still no luck. A quick search turned up this goldmine: Swapping drives between Solaris machines. Okay, it wasn't as quick as I wanted, I had to go through several pages first. Anyway, I had suspected I had to touch 'devfsadm' and 'bootadm'. I was right.

I followed the instructions:

  1. Boot into Solaris Safeboot mode. You can get access at the Grub menu, usually the 2nd option. Note: I had to use the DVD install media to do this.
  2. Mount the found Solaris partition on /a Safeboot will usually find the slice on the disk with Solaris and ask if you want it to mount on /a. Select Yes.
  3. Move /a/dev, /a/devices, and /a/etc/path_to_inst to another name (I just append .orig) and then create new directories, (mkdir) /a/dev and /a/devices, and touch /a/etc/path_to_inst. I did not do this step.
  4. Run "devfsadm -r /a" to rebuild the device tree.
  5. Edit /a/boot/solaris/bootenv.rc and modify the line with "setprop bootpath '/pci@0,0....' to match the path you'll find mounted for /a (i.e. run a 'df -k' command, and you should see /a mounted from /dev/dsk/c1d0s0 or something, then run 'ls -l /dev/dsk/c1d0s0' or whatever your device listed was, and you should see the actual link point to ../../devices/pci@0,0/...) The path to bootpath you want should be the hard disk which is mounted as /a and you just need to find the expanded /devices/pci@0,0/... path and put that in the bootenv.rc file on the Solaris root filesystem on the hard disk (sans the /devices/ prefix of course). This is a key step.
  6. Now run "bootadm update-archive -v -R /a" to rebuild the boot-archive on /a.
  7. Make sure to edit /etc/vfstab
  8. run a 'touch /a/reconfigure'
  9. Run "cd /; sync; sync; sync; umount /a" And I skipped this one.
  10. and finally reboot.

The system came up.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070218 Sunday February 18, 2007
Linux sendmail not sending mail and no error messages

Can I get a big "Doh!" from the crowd? I'm trying to upgrade my domain server from Fedora Core 4 to Fedora Core 6. I want to isolate what I will need to change to go to Solaris. Everything is kinda going okay, network addresses did not change after a reboot.

But sendmail is queuing my outgoing mail and not logging anything. And it was telling me what was going wrong, but the verbage was just too weird.

Make my changes to sendmail.mc and make:

[root@adept mail]# make
WARNING: 'sendmail.mc' is modified. Please install package sendmail-cf to update your configuration.

This actually means do the following:

[tdh@adept doc]> sudo yum install sendmail-cf

I just couldn't parse it correctly. Here is how I found my "Doh!" moment:

The mail queues have entries: [root@adept mail]# mailq /var/spool/mqueue (4 requests) -----Q-ID----- --Size-- -----Q-Time----- ------------Sender/Recipient----------- l1J0feON002810* 9 Sun Feb 18 18:41 <root@adept.internal.excfb.com> <tdh@sun.com> ...

Some testing:

[root@adept mail]# sendmail -v loghyr@loghyr.com kdjfjklfs . loghyr@loghyr.com... Connecting to [127.0.0.1] via relay... 220 adept.internal.excfb.com ESMTP Sendmail 8.13.8/8.13.8; Sun, 18 Feb 2007 18:52:31 -0600 >>> EHLO adept.internal.excfb.com 250-adept.internal.excfb.com Hello [127.0.0.1], pleased to meet you 250-ENHANCEDSTATUSCODES 250-PIPELINING 250-8BITMIME 250-SIZE 250-DSN 250-ETRN 250-AUTH DIGEST-MD5 CRAM-MD5 250-DELIVERBY 250 HELP >>> MAIL From:<root@adept.internal.excfb.com> SIZE=10 AUTH=root@adept.internal.excfb.com 250 2.1.0 <root@adept.internal.excfb.com>... Sender ok >>> RCPT To:<loghyr@loghyr.com> >>> DATA 250 2.1.5 <loghyr@loghyr.com>... Recipient ok 354 Enter mail, end with "." on a line by itself >>> . 250 2.0.0 l1J0qV2d002864 Message accepted for delivery loghyr@loghyr.com... Sent (l1J0qV2d002864 Message accepted for delivery) Closing connection to [127.0.0.1] >>> QUIT 221 2.0.0 adept.internal.excfb.com closing connection

Note that it is talking to 127.0.0.1 and that is not right. What does the sendmail config files look like:

[root@adept mail]# ls -la send*
-rw-r--r--   1 root root 58203 Feb 11 10:58 sendmail.cf
-rw-r--r--   1 root root  7257 Feb 18 17:29 sendmail.mc
-rw-r--r--   1 root root  7209 Feb 18 17:19 sendmail.mc.stock

Okay, that hasn't changed today.

[root@adept mail]# make
WARNING: 'sendmail.mc' is modified. Please install package sendmail-cf to update your configuration.

I then get the "Doh!" and install sendmail-cf as shown above!

[root@adept mail]# make
[root@adept mail]# ls -la send*
-rw-r--r-- 1 root root 59161 Feb 18 18:54 sendmail.cf
-rw-r--r-- 1 root root 58203 Feb 11 10:58 sendmail.cf.bak
-rw-r--r-- 1 root root  7257 Feb 18 17:29 sendmail.mc
-rw-r--r-- 1 root root  7209 Feb 18 17:19 sendmail.mc.stock
[root@adept mail]# service sendmail restart
Shutting down sm-client:                                   [  OK  ]
Shutting down sendmail:                                    [  OK  ]
Starting sendmail:                                         [  OK  ]
Starting sm-client:                                        [  OK  ]

Still not delivering and I am suspicious of why is it trying to talk to domains directly:

l1J0o9AP002853       33 Sun Feb 18 18:50 <tdh@adept.internal.excfb.com>
                 (Deferred: Connection timed out with www.loghyr.com.)
                                         <loghyr@loghyr.com>

I have to send outgoing mail through cox.net. Look what I have in my sendmail.mc:

[root@adept mail]# grep cox.net sendmail.mc
dnl define(`SMART_HOST', `smtp.central.cox.net')dnl

Bzzt, fix it!

And that flushes a bunch of requests after a make and restart!


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070217 Saturday February 17, 2007
In Kernel Sharetab: Have a single file psuedo-fs working!

I had originally implemented the In Kernel Sharetab with GFS and followed the requirement that everything had to be in a directory. As such, I made /system/dfs/sharetab and symlinked /etc/dfs/sharetab to it. Well, while I believe that is really the proper place for it to be at in the name space, I decided to hack GFS to allow me to have a single file as a filesystem - think /etc/mnttab. Between a lab shutdown yesterday and redoing the entire set of changes on kanigix and the latest OpenSolaris drop, I've got the thing working:

[tdh@kanigix dfs]> ls -la
total 17
drwxr-xr-x   2 root     sys          512 Feb 13 07:16 .
drwxr-xr-x  88 root     sys         4608 Feb 17 14:19 ..
-rw-r--r--   1 root     sys          354 Feb 17 11:46 dfstab
-rw-r--r--   1 root     root          68 Feb 17 12:07 fstypes
-r--r--r--   1 root     root         246 Feb 17 14:22 sharetab
[tdh@kanigix dfs]> cat sharetab
/export/zfs/tdh -       nfs     rw
/export/zfs/monster     -       nfs     rw
/export/zfs/nfsv4       -       nfs     rw
/export/zfs/nfsv2       -       nfs     rw
/       -       nfs     rw
/zoo/isos       -       nfs     rw
/export/zfs/nfsv3       -       nfs     rw
/export/home    -       nfs     sec=sys,rw=engineering  home dirs
/export/zfs     -       nfs     rw

What I don't have working just right is the attribute changes:

[tdh@kanigix dfs]> sudo unshare -F nfs /
[tdh@kanigix dfs]> ls -la
total 17
drwxr-xr-x   2 root     sys          512 Feb 13 07:16 .
drwxr-xr-x  88 root     sys         4608 Feb 17 14:19 ..
-rw-r--r--   1 root     sys          354 Feb 17 11:46 dfstab
-rw-r--r--   1 root     root          68 Feb 17 12:07 fstypes
-r--r--r--   1 root     root         246 Feb 17 14:34 sharetab

The size and time will not change until I read the file:

[tdh@kanigix dfs]> cat sharetab
/export/zfs/tdh -       nfs     rw
/export/zfs/monster     -       nfs     rw
/export/zfs/nfsv4       -       nfs     rw
/export/zfs/nfsv2       -       nfs     rw
/zoo/isos       -       nfs     rw
/export/zfs/nfsv3       -       nfs     rw
/export/home    -       nfs     sec=sys,rw=engineering  home dirs
/export/zfs     -       nfs     rw
[tdh@kanigix dfs]> ls -la sharetab 
-r--r--r--   1 root     root         234 Feb 17 14:34 sharetab

I can easily fix that. Instead of recompiling the BFUs, I'm just going to rebuild 'sharefs'. Note if I could unload this module, I could do all of this without rebooting:

[tdh@kanigix sharefs]> pwd
/home/tdh/ws/kanigix/usr/src/uts/common/fs/sharefs
tdh@kanigix sharefs]> cd ../../../intel/sharefs
[tdh@kanigix sharefs]> dmake
dmake: defaulting to parallel mode.
See the man page dmake(1) for more information on setting up the .dmakerc file.
kanigix --> 1 job
...

To see what I need to get into /kernel, a 'dmake install' will tell me a lot:

[tdh@kanigix sharefs]> dmake install
dmake: defaulting to parallel mode.
See the man page dmake(1) for more information on setting up the .dmakerc file.
/usr/bin/rm -f /home/tdh/ws/kanigix/proto/root_i386/kernel/fs/amd64/sharefs; install -s -m 755 -f /home/tdh/ws/kanigix/proto/root_i386/kernel/fs/amd64 debug64/sharefs
/usr/bin/rm -f /home/tdh/ws/kanigix/proto/root_i386/kernel/fs/sharefs; install -s -m 755 -f /home/tdh/ws/kanigix/proto/root_i386/kernel/fs debug32/sharefs
[tdh@kanigix sharefs]> sudo cp /home/tdh/ws/kanigix/proto/root_i386/kernel/fs/amd64/sharefs /kernel/fs/amd64/sharefs
[tdh@kanigix sharefs]> sudo cp /home/tdh/ws/kanigix/proto/root_i386/kernel/fs/sharefs /kernel/fs/sharefs

Now I reboot and test!

The attributes are now following the changes:

[tdh@kanigix dfs]> ls -la sharetab 
-r--r--r--   1 root     root         182 Feb 17 14:43 sharetab
[tdh@kanigix dfs]> sudo share -F nfs /
[tdh@kanigix dfs]> ls -la sharetab
-r--r--r--   1 root     root         194 Feb 17 14:46 sharetab

I'm pretty much done with the project. I have to pull out some code changes, tidy things up, do some more unit testing, and ship it off for some quality assurance. I'll check to see if I can get code put up to OpenSolaris.org in case people want to play with it.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070216 Friday February 16, 2007
Followup on the download speed of kanigix

Been pretty busy, so I popped off a download on mrx to test Starting a performance analysis of my Frankenstien vs Sun w2100z. It is currently running Fedora Core 6 (for another project) and saw it get:

[tdh@mrx ~/]> wget http://mirror.mcs.anl.gov/pub/ubuntu-iso/DVDs/ubuntu/edgy/release/ubuntu-6.10-dvd-i386.iso

 0% [] 2,753,773    791K/s  eta 1h 47m 

I think it is safe to say that the cache at Sun is not helping my w2100z to beat my Frankenstien. Also, it could be that the VPN is imposing a penalty on the w2100z.

My guess is going to go either with the NIC/driver or the harddisk or ZFS. There, I've narrowed it down.

First, lets look at ZFS:


[tdh@kanigix ~]> zfs get all zoo
NAME  PROPERTY       VALUE                  SOURCE
zoo   type           filesystem             -
zoo   creation       Sun Jan 14 14:08 2007  -
zoo   used           15.2G                  -
zoo   available      668G                   -
zoo   referenced     39.6K                  -
zoo   compressratio  1.04x                  -
zoo   mounted        yes                    -
zoo   quota          none                   default
zoo   reservation    none                   default
zoo   recordsize     128K                   default
zoo   mountpoint     /zoo                   default
zoo   sharenfs       off                    default
zoo   shareiscsi     off                    default
zoo   checksum       on                     default
zoo   compression    off                    default
zoo   atime          on                     default
zoo   devices        on                     default
zoo   exec           on                     default
zoo   setuid         on                     default
zoo   readonly       off                    default
zoo   zoned          off                    default
zoo   snapdir        hidden                 default
zoo   aclmode        groupmask              default
zoo   aclinherit     secure                 default
zoo   canmount       on                     default
zoo   xattr          on                     default

So, no compression enabled. Bzzt, we have to dig deeper:

[tdh@kanigix ~]> zfs get all zoo/home
NAME      PROPERTY       VALUE                  SOURCE
zoo/home  type           filesystem             -
zoo/home  creation       Sun Jan 14 14:10 2007  -
zoo/home  used           8.48G                  -
zoo/home  available      668G                   -
zoo/home  referenced     44.1K                  -
zoo/home  compressratio  1.08x                  -
zoo/home  mounted        yes                    -
zoo/home  quota          none                   default
zoo/home  reservation    none                   default
zoo/home  recordsize     128K                   default
zoo/home  mountpoint     /export/zfs            local
zoo/home  sharenfs       on                     local
zoo/home  shareiscsi     off                    default
zoo/home  checksum       on                     default
zoo/home  compression    on                     local
zoo/home  atime          on                     default
zoo/home  devices        on                     default
zoo/home  exec           on                     default
zoo/home  setuid         on                     default
zoo/home  readonly       off                    default
zoo/home  zoned          off                    default
zoo/home  snapdir        hidden                 default
zoo/home  aclmode        groupmask              default
zoo/home  aclinherit     secure                 default
zoo/home  canmount       on                     default
zoo/home  xattr          on                     default

Okay, before we fiddle with ZFS, lets check to see if we can eliminate it as a suspect:

[tdh@kanigix /kanigix]> df -h .
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c1d1s4         21G    21M    20G     1%    /kanigix
[tdh@kanigix /kanigix]> wget http://mirror.mcs.anl.gov/pub/ubuntu-iso/DVDs/ubuntu/edgy/release/ubuntu-6.10-dvd-i386.iso

 0% [ ] 4,213,237    291.40K/s  ETA 3:22:18^C

So it does look like a factor! Not really - while I am getting better speeds than the other day, they are on par with the zfs filesystem today:


[tdh@kanigix ~]> df -h .
Filesystem             size   used  avail capacity  Mounted on
zoo/home/tdh           683G   8.5G   668G     2%    /export/zfs/tdh
[tdh@kanigix ~]> wget http://mirror.mcs.anl.gov/pub/ubuntu-iso/DVDs/ubuntu/edgy/release/ubuntu-6.10-dvd-i386.iso

 0% [  ] 2,832,077    323.02K/s  ETA 3:22:50^C

I think I've eliminated both the disk and ZFS from being the problem. I think the issue is probably the network card or the driver. I'll have to see if there is a fix for my nge0 problem and then I can try it instead of the rge0.

[tdh@kanigix ~]> ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
rge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 192.168.2.115 netmask ffffff00 broadcast 192.168.2.255

Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070214 Wednesday February 14, 2007
Starting a performance analysis of my Frankenstien vs Sun w2100z

I started downloading the Ubuntu DVD image and it said it would take 10 hours:

[tdh@kanigix isos]> wget http://mirror.mcs.anl.gov/pub/ubuntu-iso/DVDs/ubuntu/edgy/release/ubuntu-6.10-dvd-i386.iso
--11:44:04-- http://mirror.mcs.anl.gov/pub/ubuntu-iso/DVDs/ubuntu/edgy/release/ubuntu-6.10-dvd-i386.iso          
          => `ubuntu-6.10-dvd-i386.iso'
Resolving mirror.mcs.anl.gov... 146.137.96.7, 146.137.96.15
Connecting to mirror.mcs.anl.gov|146.137.96.7|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3,725,318,144 (3.5G) [application/x-iso9660-image]

 0% [      ] 2,078,715    100.13K/s ETA 10:09:22

This is on the desktop I built over the Christmas break. I routinely download the Solaris Nevada DVDs in a much shorter time, so I decided it must be the server was gating my performance. I wanted to download the DVD to a lab system at work and then download it to my home system. The SWAN dns looks hosed, so I couldn't do that. Instead, I started a download on my w2100z and was told it would take about 2 hours:


[tdh@warlock ~]> wget http://mirror.mcs.anl.gov/pub/ubuntu-iso/DVDs/ubuntu/edgy/release/ubuntu-6.10-dvd-i386.iso
--11:44:33--  http://mirror.mcs.anl.gov/pub/ubuntu-iso/DVDs/ubuntu/edgy/release/ubuntu-6.10-dvd-i386.iso
          => `ubuntu-6.10-dvd-i386.iso.1'
Resolving webcache.central.sun.com... 129.147.62.26, 129.147.62.30, 129.147.62.25
Connecting to webcache.central.sun.com|129.147.62.26|:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 3,725,318,144 (3.5G) [application/x-iso9660-image]

1% [>   ] 67,507,355   541.70K/s  ETA 1:57:51

Since the two systems share the same link into my house and they are both running close to the same OS, I think they should have the same transfer speeds. I'm wrong. Why?

For kanigix, it could be:

  • Minus - Crappy driver for my cheap NIC.
  • Minus - DVD is on the same loop as my root HD.
    • By the way, I'm not writing to that disk.
  • Unknown - Filesystem is a real ZFS pool of 4 disks.

For warlock, it could be:

  • Plus - Proxy cache is filling faster than system is emptying.
  • Minus - Going through VPN software.
  • Plus - Optimized drivers for NIC.
  • Unknown - Filesystem is a ZFS pool on 1 disk.

A rule of thumb I use is that VPN software imposes a 33% penalty on transfers. I could be wrong in this scenario.

Without looking at solid data, I think my next step would be to disconnect the VPN session, which will remove the first two points on warlock. If I'm still getting much better transfer speeds, I'll know to look at the drivers.

I can also look at transfer speeds to other systems in the house.

I also need to start learning some performance tools (probably dtrace) to see what is going.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070212 Monday February 12, 2007
Trying to get a Kerberized NFSv4 server/client on a NSLU2

Normally I don't summarize what I'm about to write about, however, I think this entry is all over the place. But there is useful information in here, So, I'm trying to get first kerberos and then NFSv4 working on a NSLU2 running OpenSlug. In order to validate my results, I also try to get a Linux NFSv4 server up and running on one of my Shuttle SS51G boxes. I finally get that to work, but I have no luck on getting the NSLU2 working correctly as either a server or client.

I decided to try another Linux client to see if I could get the process streamlined:

[tdh@sandman ~]> kadmin -p tdh/admin
Couldn't open log file /var/krb5/kdc.log: Permission denied
Authenticating as principal tdh/admin with password.
Password for tdh/admin@INTERNAL.EXCFB.COM:
kadmin:  addprinc -randkey nfs/mrbill.internal.excfb.com
WARNING: no policy specified for nfs/mrbill.internal.excfb.com@INTERNAL.EXCFB.COM; defaulting to no policy
Principal "nfs/mrbill.internal.excfb.com@INTERNAL.EXCFB.COM" created.
kadmin:  addprinc -randkey host/mrbill.internal.excfb.com
WARNING: no policy specified for host/mrbill.internal.excfb.com@INTERNAL.EXCFB.COM; defaulting to no policy
Principal "host/mrbill.internal.excfb.com@INTERNAL.EXCFB.COM" created.
kadmin:  ktadd -k /export/keytabs/mrbill.keytab -e des-cbc-crc:normal nfs/mrbill.internal.excfb.com
kadmin: No such file or directory while adding key to keytab

Okay, not only do I need to fix the above, I also need to fix not being able to add to /var/krb5/kdc.log. We can get the keytab generated with:

[tdh@sandman /export]> sudo chown tdh:staff keytabs/

And we see:

kadmin:  ktadd -k /export/keytabs/mrbill.keytab -e des-cbc-crc:normal nfs/mrbill.internal.excfb.com
Entry for principal nfs/mrbill.internal.excfb.com with kvno 4, encryption type DES cbc mode with CRC-32 added to keytab WRFILE:/export/keytabs/mrbill.keytab.
kadmin:  ktadd -k /export/keytabs/mrbill.keytab -e des-cbc-crc:normal host/mrbill.internal.excfb.com
Entry for principal host/mrbill.internal.excfb.com with kvno 3, encryption type DES cbc mode with CRC-32 added to keytab WRFILE:/export/keytabs/mrbill.keytab.

Okay, the first thing to note is that mrbill is running OpenSlug:

root@mrbill:~# uname -a
Linux mrbill 2.6.16 #1 PREEMPT Fri Jun 9 07:34:31 PDT 2006 armv5teb unknown unknown GNU/Linux

We try to get the keytab:

root@mrbill:~# mount sandman:/export/keytabs /mnt/sandman/keytabs
mount: can't get address for sandman
root@mrbill:~# host sandman
-sh: host: not found

Why? Well it turns out that:

root@mrbill:~# cat /etc/resolv.conf
search mshome
nameserver 192.168.2.108
nameserver 182.168.2.1

I thought that the domain entered in the turnup init was for the CIFS domain. Easy enough to fix...

root@mrbill:~# cat /etc/resolv.conf
search internal.excfb.com
nameserver 192.168.2.108
nameserver 182.168.2.1
root@mrbill:~#  mount sandman:/export/keytabs /mnt/sandman/keytabs
root@mrbill:~# cd /etc
root@mrbill:/etc# cp /mnt/sandman/keytabs/mrbill.keytab krb5.keytab
cp: cannot open `/mnt/sandman/keytabs/mrbill.keytab' for reading: Permission denied

What now? (Permissions)

root@mrbill:/etc# ls -la /mnt/sandman/keytabs
total 9
drwxr-xr-x  2 tdh  uucp  512 Feb 12  2007 .
drwxr-xr-x  5 root root 4096 Feb 12 08:22 ..
-rw-r--r--  1 root root 1968 Feb 12 06:50 krb5.conf
-rw-------  1 tdh  uucp  161 Feb 12  2007 mrbill.keytab
-rw-r--r--  1 root root  155 Feb 12 06:48 mrx.keytab

Fix them up on the server and:

root@mrbill:/etc# cp /mnt/sandman/keytabs/mrbill.keytab krb5.keytab

We need to get a good copy of krb5.conf, idmapd.conf, and sysconfig/nfs. For now, we will leave idmapd.conf alone, to illustrate the NFSv4 mapid issue.

root@mrbill:/etc# scp mrx:/etc/krb5.conf .
root@mrbill:/etc# scp mrx:/etc/sysconfig/nfs sysconfig

Now this time I know kerberos is not installed:

root@mrbill:/# ls -la ./usr/kerberos/bin/kinit
ls: ./usr/kerberos/bin/kinit: No such file or directory

And we can easily add it:

root@mrbill:/# ipkg list | grep krb5
kernel-module-rpcsec-gss-krb5 - 2.6.16-r6.6 - rpcsec-gss-krb5 kernel module
root@mrbill:/# ipkg install kernel-module-rpcsec-gss-krb5
Installing kernel-module-rpcsec-gss-krb5 (2.6.16-r6.6) to root...
Downloading http://ipkg.nslu2-linux.org/feeds/slugos-bag/cross/3.10-beta/kernel-module-rpcsec-gss-krb5_2.6.16-r6.6_ixp4xxbe.ipk
Installing kernel-module-auth-rpcgss (2.6.16-r6.6) to root...
Downloading http://ipkg.nslu2-linux.org/feeds/slugos-bag/cross/3.10-beta/kernel-module-auth-rpcgss_2.6.16-r6.6_ixp4xxbe.ipk
Configuring kernel-module-auth-rpcgss
Configuring kernel-module-rpcsec-gss-krb5

Still not there for me:

root@mrbill:/# ls -la ./usr/kerberos/bin/kinit
ls: ./usr/kerberos/bin/kinit: No such file or directory
root@mrbill:/# find . -name kinit

My guess is that you can export with kerberos, you just can't mount it.

We should confirm that!

root@mrbill:~# mkdir /home/nfs4
root@mrbill:~# chmod 777 /home/nfs4
root@mrbill:~# cd /home/nfs4
root@mrbill:/home/nfs4# touch see_me
root@mrbill:/home/nfs4# chown tdh:10 see_me
root@mrbill:/home/nfs4# ls -la
total 8
drwxrwxrwx  2 root root 4096 Feb 12 09:00 .
drwxrwxr-x  8 root root 4096 Feb 12 09:00 ..
-rw-r--r--  1 tdh  uucp    0 Feb 12 09:00 see_me

And I try to add the export:

root@mrbill:/home/nfs4# more /etc/exports
/home/NFS4 172.16.0.0/16(rw,fsid=0,insecure,no_subtree_check,sync,anonuid=65534,anongid=65534)
root@mrbill:/home/nfs4# cd ..
root@mrbill:/home# ls -la
total 32
drwxrwxr-x   8 root root  4096 Feb 12 09:00 .
drwxr-xr-x  18 root root  4096 Feb  5 22:44 ..
drwxrwxrwx   2 tdh  uucp  4096 Feb  5 23:03 NFS4
drwxrwxrwx   2 root root  4096 Feb 12 09:00 nfs4
drwxr-xr-x   2 root root  4096 Feb  5 22:53 nfsv2
drwxr-xr-x   2 root root  4096 Feb  5 22:53 nfsv3
drwxr-xr-x   2 root root  4096 Feb  5 22:53 nfsv4
lrwxrwxrwx   1 root root     7 Feb  5 22:26 root -> ../root
drwxr-xr-x   2 tdh  staff 4096 Feb  7 21:21 tdh
root@mrbill:/home#

Looks like /home/NFS4 was created for me, or I'm suffering from severe memory loss...

I could have done this last week, note the time stamp.

root@mrbill:/home# ls -la NFS4
total 8
drwxrwxrwx  2 tdh    uucp 4096 Feb  5 23:03 .
drwxrwxr-x  8 root   root 4096 Feb 12 09:00 ..
-rw-r--r--  1 200096 uucp    0 Feb  5 23:03 ut

Must be memory loss!

root@mrbill:/home# cd NFS4/
root@mrbill:/home/NFS4# touch see_me
root@mrbill:/home/NFS4# chown tdh:10 see_me
root@mrbill:/home/NFS4# ls -la
total 8
drwxrwxrwx  2 tdh    uucp 4096 Feb 12 09:03 .
drwxrwxr-x  8 root   root 4096 Feb 12 09:00 ..
-rw-r--r--  1 tdh    uucp    0 Feb 12 09:03 see_me
-rw-r--r--  1 200096 uucp    0 Feb  5 23:03 ut

And yes:

[tdh@mrx ipk]> showmount -e mrbill
Export list for mrbill:
/home/NFS4 172.16.0.0/16

I was in 172.16.0.0/16 space last week. Touch up the export and:

[tdh@mrx ipk]> showmount -e mrbill
Export list for mrbill:
/home/NFS4 192.168.2.0/24

Okay, I do the mount and I'll claim it gets done as nfsv3:

[tdh@mrx ipk]> sudo mount mrbill:/home/NFS4 /mnt/mrbill/NFS4
[tdh@mrx ipk]> ls -la /mnt/mrbill/NFS4
total 8
drwxrwxrwx 2 tdh    wheel 4096 Feb 12 03:03 .
drwxr-xr-x 3 root   root  4096 Feb 12 11:08 ..
-rw-r--r-- 1 tdh    wheel    0 Feb 12 03:03 see_me
-rw-r--r-- 1 200096 wheel    0 Feb  5 17:03 ut

Why do I claim it is nfsv3? Because I suspect that the idmapping should be hosed. Can we verify this? Yes:

[tdh@mrx ipk]> sudo umount /mnt/mrbill/NFS4
[tdh@mrx ipk]> sudo mount -o vers=3 mrbill:/home/NFS4 /mnt/mrbill/NFS4
[tdh@mrx ipk]> ls -la /mnt/mrbill/NFS4
total 8
drwxrwxrwx 2 tdh    wheel 4096 Feb 12 03:03 .
drwxr-xr-x 3 root   root  4096 Feb 12 11:08 ..
-rw-r--r-- 1 tdh    wheel    0 Feb 12 03:03 see_me
-rw-r--r-- 1 200096 wheel    0 Feb  5 17:03 ut
[tdh@mrx ipk]> sudo umount /mnt/mrbill/NFS4
[tdh@mrx ipk]> sudo mount -o vers=4 mrbill:/home/NFS4 /mnt/mrbill/NFS4
'vers=4' is not supported.  Use '-t nfs4' instead.
[tdh@mrx ipk]> sudo mount -t nfs4 mrbill:/home/NFS4 /mnt/mrbill/NFS4
mount.nfs4: mount point /mnt/mrbill/NFS4 does not exist

Okay, mrbill knows nothing about NFSv4 as far as I can tell:

root@mrbill:/home/NFS4# mount -t nfs4 sandman:/export/home /mnt/sandman/home
mount: unknown filesystem type 'nfs4'

I'm sensing protocol discrimination here:

root@mrbill:/home/NFS4# ipkg list | grep -i nfs
kernel-module-lockd - 2.6.16-r6.6 - lockd kernel module; NFS file locking service version 0.5.
kernel-module-nfs - 2.6.16-r6.6 - nfs kernel module
kernel-module-nfs - 2.6.16-r6.4 -
kernel-module-nfsd - 2.6.16-r6.6 - nfsd kernel module
nfs-utils - 1.0.6-r7 - userspace utilities for kernel nfs
nfs-utils-doc - 1.0.6-r7 - userspace utilities for kernel nfs

Time to check the log file:

Feb 12 09:08:29 (none) user.warn kernel: nfsd: nfsv4 idmapping failing: has idmapd not been started?

Okay, configure idmapping and reboot:

Feb 12 09:16:37 (none) user.info kernel: Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Feb 12 09:16:37 (none) user.warn kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Feb 12 09:16:37 (none) user.warn kernel: NFSD: unable to find recovery directory /var/lib/nfs/v4recovery
Feb 12 09:16:37 (none) user.warn kernel: NFSD: starting 90-second grace period

Try the mount again:

[tdh@mrx ipk]> sudo mount -t nfs4 mrbill:/home/NFS4 /mnt/mrbill/NFS4
mount.nfs4: Permission denied

And try it from a Solaris client:

[tdh@sandman keytabs]> sudo mount mrbill:/home/NFS4 /mnt/mrbill/NFS4
[tdh@sandman keytabs]> sudo mount mrbill:/home/NFS4 /mnt/mrbill/NFS4
NFS compound failed for server mrbill: error 7 (RPC: Authentication error)
NFS compound failed for server mrbill: error 7 (RPC: Authentication error)
NFS compound failed for server mrbill: error 7 (RPC: Authentication error)
NFS compound failed for server mrbill: error 7 (RPC: Authentication error)
NFS compound failed for server mrbill: error 7 (RPC: Authentication error)
NFS compound failed for server mrbill: error 7 (RPC: Authentication error)
nfs mount: mount: /mnt/mrbill/NFS4: Permission denied

Okay, can we get Kerberos working at all on the NSLU2?

root@mrbill:~# more /etc/exports
/home/NFS4 192.168.2.0/24(rw,fsid=0,sec=krb5,insecure,no_subtree_check,sync,anonuid=65534,anongid=65534)
root@mrbill:~# exportfs -rv
exportfs: /etc/exports:1: unknown keyword "sec=krb5"
unexporting sandman.internal.excfb.com:/home/NFS4 from kernel

The keyword is not correct? Time to try on a known good linux config:

[tdh@mrx ipk]> cat /etc/exports
/home/tdh 192.168.2.0/24(rw,fsid=0,sec=krb5,insecure,no_subtree_check,sync,anonuid=65534,anongid=65534)
[tdh@mrx ipk]> sudo exportfs -rv
exportfs: /etc/exports:1: unknown keyword "sec=krb5"

Okay, here is what we are supposed to do:

[tdh@mrx ipk]> cat /etc/exports
/home/tdh gss/krb5(rw,fsid=0,insecure,no_subtree_check,sync,anonuid=65534,anongid=65534)
[tdh@mrx ipk]> sudo exportfs -rv
exporting gss/krb5:/home/tdh
exporting gss/krb5:/home/tdh to kernel
gss/krb5:/home/tdh: Cannot allocate memory

By sheer effort of will, I determined that the firewall was on.

root@mrbill:~# showmount -e mrx
Export list for mrx:
/home/tdh gss/krb5

First lets see what happens without kerberos:

[tdh@sandman ~]> sudo mount -o vers=3 mrx:/home/tdh /mnt/mrx/tdh
[tdh@sandman ~]> ls -la /mnt/mrx/tdh
total 230394
drwxr-xr-x   7 tdh      staff       4096 Feb 12 02:01 .
drwxr-xr-x   3 root     root         512 Feb 12 11:49 ..

And NFSv4:

[tdh@sandman ~]> sudo mount mrx:/home/tdh /mnt/mrx/tdh
nfs mount: mrx:/home/tdh: No such file or directory

Okay, I knew about this, but forgot it. I think I heard Bruce complaining about still having it:

[tdh@sandman ~]> sudo mount mrx:/ /mnt/mrx/tdh
[tdh@sandman ~]> ls -al /mnt/mrx/tdh
total 230394
drwxr-xr-x   7 tdh      nobody      4096 Feb 12 02:01 .
drwxr-xr-x   3 root     root         512 Feb 12 11:49 ..
-rw-------   1 tdh      nobody        68 Feb 12 01:51 .Xauthority
-rw-------   1 tdh      nobody        96 Feb 12 11:31 .lesshst

And now we turn on kerberos:

[tdh@sandman ~]> sudo mount mrx:/ /mnt/mrx/tdh
NFS compound failed for server mrx: error 7 (RPC: Authentication error)
NFS compound failed for server mrx: error 7 (RPC: Authentication error)
NFS compound failed for server mrx: error 7 (RPC: Authentication error)
nfs mount: mount: /mnt/mrx/tdh: Permission denied

We can be very specific about what security flavor we want to use:

[tdh@sandman ~]> sudo mount -o sec=krb5 mrx:/ /mnt/mrx/tdh
nfs mount: mount: /mnt/mrx/tdh: Permission denied

Note that the compound fails messages must have been about AUTH_NONE, AUTH_SYS, and AUTH_DH.

I think I've found the answer in Mike Eisler's blog Real Authentication in NFS, scroll down into the comments:

> Also, does NetApp require a root principle like Solaris did prior to 10?

Actually even prior to Solaris 10, the Solaris NFS server would allow
an NFSv3 mount if root didn't have Kerberos credentials. ONTAP is the
same way. However, if using NFSv4, because NFSv4 has no separate mount
protocol, an NFSv4 server cannot distinguish a mount from a LOOKUP. If
a volume is exported with sec=krb5, then the NFSv4 requests need to be
using Kerberos. Since UNIX clients usually require one to be superuser
to do an NFS mount, superuser (root) needs to have credentials. Root
credentials aren't required, but whatever uid the credentials map to
has to have search permissions for the path name.

And we can try that here:

kadmin:  addprinc root
WARNING: no policy specified for root@INTERNAL.EXCFB.COM; defaulting to no policy
Enter password for principal "root@INTERNAL.EXCFB.COM":
Re-enter password for principal "root@INTERNAL.EXCFB.COM":
Principal "root@INTERNAL.EXCFB.COM" created.

And then we grab a ticket:

[tdh@sandman ~]> sudo kinit root
Password for root@INTERNAL.EXCFB.COM:
[tdh@sandman ~]> sudo mount -o sec=krb5 mrx:/ /mnt/mrx/tdh

Aargh!

[tdh@sandman ~]> ls -la /mnt/mrx/tdh
total 230394
drwxr-xr-x   7 tdh      nobody      4096 Feb 12 02:01 .
drwxr-xr-x   3 root     root         512 Feb 12 11:49 ..
-rw-------   1 tdh      nobody        68 Feb 12 01:51 .Xauthority
-rw-------   1 tdh      nobody        96 Feb 12 11:31 .lesshst

Since we can't even get the export shared without kerberos on mrbill, that does not explain the issue on that machine.

This works:

[tdh@sandman ~]> sudo mount -o vers=3 mrbill:/home/NFS4 /mnt/mrbill/NFS4

And this does not:

[tdh@sandman ~]> sudo mount -o vers=4 mrbill:/ /mnt/mrbill/NFS4
nfs mount: mount: /mnt/mrbill/NFS4: Resource temporarily unavailable

I'll come back to this later...


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily
Installing a Kerberos KDC and setting up NFS mounts

We always seem to have problems at Connectathon setting up Kerberos. So I decided to take the cookbook we use there and get kerberos working on my home systems. Please note that I could easily clean up the notes to not show some errors I make. But then, where is the love?

Also, as with any first foray into a new tool, I have no clue what I am doing. I kinda understand tickets and the ideas behind Kerberos, but I'm really in the dark as to what I'm supposed to do.

First edit /etc/krb5/krb5.conf:

# diff krb5.conf stock/krb5.conf
35c35
<         default_realm = INTERNAL.EXCFB.COM
---
>         default_realm = ___default_realm___
38,41c38,43
<         INTERNAL.EXCFB.COM = {
<                 kdc = sandman.internal.excfb.com
<                 kdc = ultralord.internal.excfb.com
<                 admin_server = sandman.internal.excfb.com
---
>         ___default_realm___ = {
>                 kdc = ___master_kdc___
>                 kdc = ___slave_kdc1___
>                 kdc = ___slave_kdc2___
>                 kdc = ___slave_kdcN___
>                 admin_server = ___master_kdc___

Then edit /etc/krb5/kdc.conf:

# diff kdc.conf stock/kdc.conf
32c32
<       INTERNAL.EXCFB.COM = {
---
>       ___default_realm___ = {
41,42d40
<               sunw_dbprob_enable = true
<               sunw_dbprop_master_ulogsize = 1000

Make sure you can get at the kdcs via DNS (or whatever name service in /etc/resolv.conf)

# host sandman
sandman.internal.excfb.com has address 192.168.2.109
# host sandman.internal.excfb.com
sandman.internal.excfb.com has address 192.168.2.109

Create the kerberos database

# /usr/sbin/kdb5_util create -r INTERNAL.EXCFB.COM -s
Initializing database '/var/krb5/principal' for realm 'INTERNAL.EXCFB.COM',
master key name 'K/M@INTERNAL.EXCFB.COM'
You will be prompted for the database Master Password.
It is important that you NOT FORGET this password.
Enter KDC database master key:
Re-enter KDC database master key to verify:

Start getting some principals:

# /usr/sbin/kadmin.local
Authenticating as principal root/admin@INTERNAL.EXCFB.COM with password.
kadmin.local:  addprinc tdh/admin
WARNING: no policy specified for tdh/admin@INTERNAL.EXCFB.COM; defaulting to no policy
Enter password for principal "tdh/admin@INTERNAL.EXCFB.COM":
Re-enter password for principal "tdh/admin@INTERNAL.EXCFB.COM":
Principal "tdh/admin@INTERNAL.EXCFB.COM" created.

Get some kiprop installed:

kadmin.local:  addprinc -randkey kiprop/sandman.internal.excfb.com
WARNING: no policy specified for kiprop/sandman.internal.excfb.com@INTERNAL.EXCFB.COM; defaulting to no policy
add_principal: Principal or policy already exists while creating "kiprop/sandman.internal.excfb.com@INTERNAL.EXCFB.COM".
kadmin.local:  addprinc -randkey kiprop/ultralord.internal.excfb.com
WARNING: no policy specified for kiprop/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM; defaulting to no policy
Principal "kiprop/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM" created.

Enable kadmin and changepw:

kadmin.local:  ktadd -k /etc/krb5/kadm.keytab kadmin/sandman.internal.excfb.com
Entry for principal kadmin/sandman.internal.excfb.com with kvno 3, encryption type AES-128 CTS mode with 96-bit SHA-1 HMAC added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal kadmin/sandman.internal.excfb.com with kvno 3, encryption type Triple DES cbc mode with HMAC/sha1 added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal kadmin/sandman.internal.excfb.com with kvno 3, encryption type ArcFour with HMAC/md5 added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal kadmin/sandman.internal.excfb.com with kvno 3, encryption type DES cbc mode with RSA-MD5 added to keytab WRFILE:/etc/krb5/kadm.keytab.
kadmin.local:  ktadd -k /etc/krb5/kadm.keytab changepw/sandman.internal.excfb.com
Entry for principal changepw/sandman.internal.excfb.com with kvno 3, encryption type AES-128 CTS mode with 96-bit SHA-1 HMAC added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal changepw/sandman.internal.excfb.com with kvno 3, encryption type Triple DES cbc mode with HMAC/sha1 added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal changepw/sandman.internal.excfb.com with kvno 3, encryption type ArcFour with HMAC/md5 added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal changepw/sandman.internal.excfb.com with kvno 3, encryption type DES cbc mode with RSA-MD5 added to keytab WRFILE:/etc/krb5/kadm.keytab.

Enable kiprop:

kadmin.local:  ktadd -k /etc/krb5/kadm.keytab kiprop/sandman.internal.excfb.com
Entry for principal kiprop/sandman.internal.excfb.com with kvno 3, encryption type AES-128 CTS mode with 96-bit SHA-1 HMAC added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal kiprop/sandman.internal.excfb.com with kvno 3, encryption type Triple DES cbc mode with HMAC/sha1 added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal kiprop/sandman.internal.excfb.com with kvno 3, encryption type ArcFour with HMAC/md5 added to keytab WRFILE:/etc/krb5/kadm.keytab.
Entry for principal kiprop/sandman.internal.excfb.com with kvno 3, encryption type DES cbc mode with RSA-MD5 added to keytab WRFILE:/etc/krb5/kadm.keytab.

Quit:

kadmin.local:  quit

Enable the services:

# svcadm enable -r network/security/krb5kdc
# svcadm enable -r network/security/kadmin

Authenticate the admin account:

# /usr/sbin/kadmin -p tdh/admin
Authenticating as principal tdh/admin with password.
Password for tdh/admin@INTERNAL.EXCFB.COM:
kadmin: Communication failure with server while initializing kadmin interface

Hmm, I got the right password. I can see what happens when it is wrong:

# /usr/sbin/kadmin -p tdh/admin
Authenticating as principal tdh/admin with password.
Password for tdh/admin@INTERNAL.EXCFB.COM:
kadmin: Incorrect password while initializing kadmin interface

Ahh, lets see if kerberos is up and running:

# grep kadmin /var/adm/messages
Feb 11 23:31:19 sandman svc.startd[7]: [ID 748625 daemon.error] network/security/kadmin:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Feb 11 23:31:57 sandman kadmin[4143]: [ID 737709 user.error] unable to open connection to ADMIN server (t_error 9)
Feb 11 23:33:56 sandman kadmin[4146]: [ID 737709 user.error] unable to open connection to ADMIN server (t_error 9)

No, it is not.

# svcs -xv
svc:/network/security/kadmin:default (Kerberos administration daemon)
 State: maintenance since Sun Feb 11 23:31:19 2007
Reason: Restarting too quickly.
   See: http://sun.com/msg/SMF-8000-L5
   See: man -M /usr/share/man -s 1M kadmind
   See: /var/svc/log/network-security-kadmin:default.log
Impact: This service is not running.

Clear the maintenance state:

# svcadm clear /network/security/kadmin:default

Restart:

# svcadm enable -r network/security/kadmin

Check:

# svcs -xv #

And try again:

# /usr/sbin/kadmin -p tdh/admin
Authenticating as principal tdh/admin with password.
Password for tdh/admin@INTERNAL.EXCFB.COM:
kadmin: Communication failure with server while initializing kadmin interface

If we look at kadm5.acl:

*/admin@___default_realm___ *

Hmm, touch that up:

*/admin@INTERNAL.EXCFB.COM *

And for sanity:

# grep default *
kdc.conf:[kdcdefaults]
kdc.conf:               default_principal_flags = +preauth
krb5.conf:[libdefaults]
krb5.conf:        default_realm = INTERNAL.EXCFB.COM
krb5.conf:      ___domainname___ = ___default_realm___
krb5.conf:        default = FILE:/var/krb5/kdc.log
krb5.conf:[appdefaults]

Okay, time to fix up krb5.conf as well:

[domain_realm]
        ___domainname___ = INTERNAL.EXCFB.COM

And restart:

# svcadm restart network/security/krb5kdc
# svcadm restart network/security/kadmin

And try again:

# /usr/sbin/kadmin -p tdh/admin
Authenticating as principal tdh/admin with password.
Password for tdh/admin@INTERNAL.EXCFB.COM:
kadmin: Communication failure with server while initializing kadmin interface

Okay, we know it is talking to something, i.e., it understands a bad password.

Lets try something else:

# kadmin.local
Authenticating as principal root/admin@INTERNAL.EXCFB.COM with password.
kadmin.local:  addprinc admin/admin@INTERNAL.EXCFB.COM
WARNING: no policy specified for admin/admin@INTERNAL.EXCFB.COM; defaulting to no policy
Enter password for principal "admin/admin@INTERNAL.EXCFB.COM":
Re-enter password for principal "admin/admin@INTERNAL.EXCFB.COM":
Principal "admin/admin@INTERNAL.EXCFB.COM" created.
kadmin.local:  quit

Okay, time to search. If we look at System Administration Guide: Security Services :

Communication failure with server while initializing kadmin interface

    Cause: The host that was entered for the admin server, also called the master KDC,
    did not have the kadmind daemon running.

    Solution: Make sure that you specified the correct host name for the master KDC.
    If you specified the correct host name, make sure that kadmind is running on
    the master KDC that you specified.

But wait:

# svcs | grep krb
online         23:43:04 svc:/network/security/krb5kdc:default
# svcs | grep kad
maintenance    23:42:54 svc:/network/security/kadmin:default
# svcs -vx
svc:/network/security/kadmin:default (Kerberos administration daemon)
 State: maintenance since Sun Feb 11 23:42:54 2007
Reason: Restarting too quickly.
   See: http://sun.com/msg/SMF-8000-L5
   See: man -M /usr/share/man -s 1M kadmind
   See: /var/svc/log/network-security-kadmin:default.log
Impact: This service is not running.

Lets look at the log file:

Feb 11 23:42:53 sandman kadmind[4275](Error): Keytab file "/etc/krb5/kadm5.keytab" does not exist
Feb 11 23:42:53 sandman kadmind[4275](Error): Keytab file "/etc/krb5/kadm5.keytab" does not exist
Feb 11 23:42:53 sandman kadmind[4275](info): No dictionary file specified, continuing without one.
Feb 11 23:42:53 sandman kadmind[4275](Error): Unable to set RPCSEC_GSS service names ('kadmin@sandman.internal.excfb.com,changepw@sandman.internal.excfb.com')
krb5kdc: Interrupted system call - while selecting for network input(1)
Feb 11 23:43:03 sandman krb5kdc[4105](info): shutting down

Hmm, we need to create a keytab:

# ls -la /etc/krb5/kadm5.keytab
/etc/krb5/kadm5.keytab: No such file or directory

Ack, why do I have a kadm.keytab and not a kadm5.keytab?

# mv kadm.keytab kadm5.keytab

Because that is what I frigging entered in my session!

# /usr/sbin/kadmin -p tdh/admin
Authenticating as principal tdh/admin with password.
Password for tdh/admin@INTERNAL.EXCFB.COM:
kadmin:

The correct incantations should have been:

kadmin.local:  ktadd -k /etc/krb5/kadm5.keytab kadmin/sandman.internal.excfb.com
kadmin.local:  ktadd -k /etc/krb5/kadm5.keytab changepw/sandman.internal.excfb.com
kadmin.local:  ktadd -k /etc/krb5/kadm5.keytab kiprop/sandman.internal.excfb.com

Okay, back to our regularly scheduled programming:

What principals exist?

kadmin:  listprincs
K/M@INTERNAL.EXCFB.COM
admin/admin@INTERNAL.EXCFB.COM
changepw/sandman.internal.excfb.com@INTERNAL.EXCFB.COM
kadmin/changepw@INTERNAL.EXCFB.COM
kadmin/history@INTERNAL.EXCFB.COM
kadmin/sandman.internal.excfb.com@INTERNAL.EXCFB.COM
kiprop/sandman.internal.excfb.com@INTERNAL.EXCFB.COM
kiprop/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
krbtgt/INTERNAL.EXCFB.COM@INTERNAL.EXCFB.COM
tdh/admin@INTERNAL.EXCFB.COM

To kerberize NFS, we need to touch up /etc/nfssec.conf:

# diff nfssec.conf nfssec.conf.stock
48,50c48,50
< krb5          390003  kerberos_v5     default -               # RPCSEC_GSS
< krb5i         390004  kerberos_v5     default integrity       # RPCSEC_GSS
< krb5p         390005  kerberos_v5     default privacy         # RPCSEC_GSS
---
> #krb5         390003  kerberos_v5     default -               # RPCSEC_GSS
> #krb5i                390004  kerberos_v5     default integrity       # RPCSEC_GSS
> #krb5p                390005  kerberos_v5     default privacy         # RPCSEC_GSS

We need to add a nfs principal:

kadmin:  addprinc -randkey nfs/sandman.internal.excfb.com
WARNING: no policy specified for nfs/sandman.internal.excfb.com@INTERNAL.EXCFB.COM; defaulting to no policy
Principal "nfs/sandman.internal.excfb.com@INTERNAL.EXCFB.COM" created.
kadmin:  ktadd nfs/sandman.internal.excfb.com
Entry for principal nfs/sandman.internal.excfb.com with kvno 3, encryption type AES-128 CTS mode with 96-bit SHA-1 HMAC added to keytab WRFILE:/etc/krb5/krb5.keytab.
Entry for principal nfs/sandman.internal.excfb.com with kvno 3, encryption type Triple DES cbc mode with HMAC/sha1 added to keytab WRFILE:/etc/krb5/krb5.keytab.
Entry for principal nfs/sandman.internal.excfb.com with kvno 3, encryption type ArcFour with HMAC/md5 added to keytab WRFILE:/etc/krb5/krb5.keytab.
Entry for principal nfs/sandman.internal.excfb.com with kvno 3, encryption type DES cbc mode with RSA-MD5 added to keytab WRFILE:/etc/krb5/krb5.keytab.

Verify that is does indeed exist:

# klist -k
Keytab name: FILE:/etc/krb5/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   3 nfs/sandman.internal.excfb.com@INTERNAL.EXCFB.COM
   3 nfs/sandman.internal.excfb.com@INTERNAL.EXCFB.COM
   3 nfs/sandman.internal.excfb.com@INTERNAL.EXCFB.COM
   3 nfs/sandman.internal.excfb.com@INTERNAL.EXCFB.COM

And now we are going to have to make a share that is kerberized and setup a client to access it:

# /usr/sbin/kclient

Starting client setup

---------------------------------------------------
Do you want to use DNS for kerberos lookups ? [y/n]: n
        No action performed.
Enter the Kerberos realm: INTERNAL.EXCFB.COM
Specify the KDC hostname for the above realm: sandman.internal.excfb.com
sandman.internal.excfb.com

Note, this system and the KDC's time must be within 5 minutes of each other for Kerberos to function.  Both systems should run some form of time
 synchronization system like Network Time Protocol (NTP).

Setting up /etc/krb5/krb5.conf.

Enter the krb5 administrative principal to be used: tdh/admin
Obtaining TGT for tdh/admin ...
Password for tdh/admin@INTERNAL.EXCFB.COM:

Do you have multiple DNS domains spanning the Kerberos realm INTERNAL.EXCFB.COM ? [y/n]: n
        No action performed.

Do you plan on doing Kerberized nfs ? [y/n]: y

nfs/ultralord.internal.excfb.com entry ADDED to KDC database.
nfs/ultralord.internal.excfb.com entry ADDED to keytab.

host/ultralord.internal.excfb.com entry ADDED to KDC database.
host/ultralord.internal.excfb.com entry ADDED to keytab.

Do you want to copy over the master krb5.conf file ? [y/n]: y
Enter the pathname of the file to be copied: /etc/krb5/krb5.conf
cp: /etc/krb5/krb5.conf and /etc/krb5/krb5.conf are identical

Copy of /etc/krb5/krb5.conf failed, exiting.
---------------------------------------------------
Setup FAILED.

Hmm, how are we supposed to enter that? I bet we need to use /net. Which I don't have configured right now. Okay, the hard way:

# scp sandman:/etc/krb5/krb5.conf /etc/krb5/krb5.conf

Now, lets set up a test share:

# cd /export
# mkdir kerberos
# cd kerberos
# touch see_me
# chown tdh:staff see_me
# ls -la
total 4
drwxr-xr-x   2 root     root         512 Feb 12 00:23 .
drwxr-xr-x   4 root     sys          512 Feb 12 00:23 ..
-rw-r--r--   1 tdh      staff          0 Feb 12 00:23 see_me
# share -F nfs -o sec=krb5:krb5i:krb5p -d "Kerberos" /export/kerberos
# share -F nfs -d "Home dirs" /export/home
# share
-               /export/kerberos   sec=krb5,sec=krb5i,sec=krb5p   "Kerberos"
-               /export/home   rw   "Home dirs"

Now try to get some access:

[tdh@ultralord ~]> kinit
kinit(v5): Client not found in Kerberos database while getting initial credentials
[tdh@ultralord ~]> sudo klist -k
Keytab name: FILE:/etc/krb5/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   4 nfs/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
   4 nfs/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
   4 nfs/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
   4 nfs/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
   4 host/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
   4 host/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
   4 host/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM
   4 host/ultralord.internal.excfb.com@INTERNAL.EXCFB.COM

Okay, I think I need to add user principals for tdh:

kadmin:  addprinc tdh
WARNING: no policy specified for tdh@INTERNAL.EXCFB.COM; defaulting to no policy
Enter password for principal "tdh@INTERNAL.EXCFB.COM":
Re-enter password for principal "tdh@INTERNAL.EXCFB.COM":
Principal "tdh@INTERNAL.EXCFB.COM" created.

[tdh@ultralord ~]> kinit
Password for tdh@INTERNAL.EXCFB.COM:

And now I want to get a mount:

[tdh@ultralord ~]> sudo mkdir -p /mnt/sandman/home
[tdh@ultralord ~]> sudo mkdir -p /mnt/sandman/kerberos
[tdh@ultralord ~]> sudo showmount -e sandman
export list for sandman:
/export/kerberos (everyone)
/export/home     (everyone)
[tdh@ultralord ~]> sudo mount sandman:/export/kerberos /mnt/sandman/kerberos
[tdh@ultralord ~]> sudo mount sandman:/export/home /mnt/sandman/home
[tdh@ultralord ~]> ls -al /mnt/sandman/kerberos
total 4
drwxr-xr-x   2 root     root         512 Feb 12 00:23 .
drwxr-xr-x   4 root     root         512 Feb 12 00:36 ..
-rw-r--r--   1 tdh      staff          0 Feb 12 00:23 see_me
[tdh@ultralord ~]> ls -la /mnt/sandman/home
total 22
drwxr-xr-x   4 root     root         512 Dec 30 15:01 .
drwxr-xr-x   4 root     root         512 Feb 12 00:36 ..
drwx------   2 root     root        8192 Dec 20 11:28 lost+found
drwxr-xr-x   4 tdh      staff        512 Jan 21 20:48 tdh

Success!

But wait, we need to show that a client without kerberos enabled will be denied access to sandman:/export/kerberos:

[tdh@kanigix ~]> sudo mkdir -p /mnt/sandman/home
[tdh@kanigix ~]> sudo mkdir -p /mnt/sandman/kerberos
[tdh@kanigix ~]> sudo mount sandman:/export/kerberos /mnt/sandman/kerberos
nfs mount: mount: /mnt/sandman/kerberos: Permission denied

Some other things to do would be to setup /etc/pam.conf to allow single signon - i.e., use ssh without a password. We also need to setup ultralord as a slave.

But before I tune this out, we need to get a Linux client up and running. Why? Because we need to show we can interoperate.

Some systems only support single DES, so we need to create special keytabs for them:

kadmin:  addprinc -randkey nfs/mrx.internal.excfb.com
WARNING: no policy specified for nfs/mrx.internal.excfb.com@INTERNAL.EXCFB.COM; defaulting to no policy
Principal "nfs/mrx.internal.excfb.com@INTERNAL.EXCFB.COM" created.
kadmin:  addprinc -randkey host/mrx.internal.excfb.com
WARNING: no policy specified for host/mrx.internal.excfb.com@INTERNAL.EXCFB.COM; defaulting to no policy
Principal "host/mrx.internal.excfb.com@INTERNAL.EXCFB.COM" created.

Now, I've created /export/keytabs to store the keytab files we will need:

# cd /export
# mkdir keytabs
# share -F nfs -o ro /export/keytabs

And we can create the keytab:

kadmin:  ktadd -k /export/keytabs/mrx.keytab -e des-cbc-crc:normal nfs/mrx.internal.excfb.com
Entry for principal nfs/mrx.internal.excfb.com with kvno 3, encryption type DES cbc mode with CRC-32 added to keytab WRFILE:/export/keytabs/mrx.keytab.
kadmin:  ktadd -k /export/keytabs/mrx.keytab -e des-cbc-crc:normal host/mrx.internal.excfb.com
Entry for principal host/mrx.internal.excfb.com with kvno 3, encryption type DES cbc mode with CRC-32 added to keytab WRFILE:/export/keytabs/mrx.keytab.

We see we are in business:

# cp /etc/krb5/krb5.conf /export/keytabs/
# ls -la
total 10
drwxr-xr-x   2 root     root         512 Feb 12 00:50 .
drwxr-xr-x   5 root     sys          512 Feb 12 00:46 ..
-rw-r--r--   1 root     root        1968 Feb 12 00:50 krb5.conf
-rw-------   1 root     root         155 Feb 12 00:48 mrx.keytab
# chmod +r mrx.keytab

And now we setup the Linux machine:

[root@mrx ~]# mkdir -p /mnt/sandman/keytabs
[root@mrx ~]# showmount -e sandman
Export list for sandman:
/export/kerberos (everyone)
/export/home     (everyone)
/export/keytabs  (everyone)
[root@mrx ~]# mount sandman:/export/keytabs /mnt/sandman/keytabs

We should make sure we do not have access to sandman:/export/kerberos:

[root@mrx ~]# mkdir -p /mnt/sandman/kerberos
[root@mrx ~]# mkdir -p /mnt/sandman/home
[root@mrx ~]# mount sandman:/export/kerberos /mnt/sandman/kerberos
mount: sandman:/export/kerberos failed, security flavor not supported

What do we need to change:

[root@mrx ~]# cd /etc
[root@mrx etc]# ls -la k*
-rw-r--r-- 1 root root  657 Jan  9 14:03 krb5.conf
-rw-r--r-- 1 root root 2241 Jul 13  2006 krb.conf
-rw-r--r-- 1 root root 1296 Jul 13  2006 krb.realms
[root@mrx etc]# mkdir stock
[root@mrx etc]# cp k* stock
[root@mrx etc]# cp /mnt/sandman/keytabs/krb5.conf .
cp: overwrite `./krb5.conf'? y
[root@mrx etc]# cp /mnt/sandman/keytabs/mrx.keytab krb5.keytab

And we try to authenticate:

[tdh@mrx ~]> kinit
kinit: Command not found.

Okay, we need to install the kerberos packages:

[tdh@mrx /]> sudo yum install krb5-workstation
Loading "installonlyn" plugin
Setting up Install Process
Setting up repositories
Reading repository metadata in from local files
Parsing package install arguments
Nothing to do

No, we don't. Where is that rascally rabbit?

[tdh@mrx /]> sudo find . -name kinit
./usr/kerberos/bin/kinit
[tdh@mrx /]> ./usr/kerberos/bin/kinit
Password for tdh@INTERNAL.EXCFB.COM:

And we try the mount:

[tdh@mrx /]> sudo mount sandman:/export/kerberos /mnt/sandman/kerberos
mount: sandman:/export/kerberos failed, security flavor not supported
[tdh@mrx /]> ./usr/kerberos/bin/klist
Ticket cache: FILE:/tmp/krb5cc_1066
Default principal: tdh@INTERNAL.EXCFB.COM

Valid starting     Expires            Service principal
02/12/07 01:01:42  02/12/07 09:01:42  krbtgt/INTERNAL.EXCFB.COM@INTERNAL.EXCFB.COM
        renew until 02/13/07 00:59:17



Kerberos 4 ticket cache: /tmp/tkt1066
klist: You have no tickets cached

What is up here?

# snoop -x 0,2000 -o /tmp/m2s.snoop sandman mrx
Using device /dev/hme (promiscuous mode)
33 ^C

Note: I used -x 0,2000 to get payload data. I knew I would want to look at most of the packet.

And

[tdh@mrx ~]> sudo mount -t nfs4 sandman:/export/kerberos /mnt/sandman/kerberos
mount.nfs4: Operation not permitted

 26   0.00034 mrx.internal.excfb.com -> sandman      NFS C 4 () PUTFH FH=324D LOOKUP export GETFH GETATTR 10011a 30a23a
 27   0.00030      sandman -> mrx.internal.excfb.com NFS R 4 () NFS4_OK PUTFH NFS4_OK LOOKUP NFS4_OK GETFH NFS4_OK FH=30E6 GETATTR NFS4_OK
 28   0.00033 mrx.internal.excfb.com -> sandman      NFS C 4 () PUTFH FH=30E6 LOOKUP kerberos GETFH GETATTR 10011a 30a23a
 29   0.00021      sandman -> mrx.internal.excfb.com NFS R 4 () NFS4ERR_WRONGSEC PUTFH NFS4_OK LOOKUP NFS4ERR_WRONGSEC

I popped into wireshark and I found out that mrx is only sending AUTH_SYS and AUTH_NULL.

Note: I used wireshark because it will parse the payload data for me. I didn't want to be doing byte conversions and consulting some specs!

In NetApp Filer, NFSv4, and Linux, we find using -o sec=krb5. We can try that:

[tdh@mrx ~]> sudo mount -t nfs4 -o sec=krb5 sandman:/export/kerberos /mnt/sandman/kerberos
Warning: rpc.gssd appears not to be running.
mount.nfs4: Invalid argument

Which is strange, since it is running:

[tdh@mrx ~]> sudo chkconfig --list | grep rpcgssd
rpcgssd         0:off   1:off   2:off   3:on    4:on    5:on    6:off
[tdh@mrx ~]> sudo chkconfig --list | grep rpcidmapd
rpcidmapd       0:off   1:off   2:off   3:on    4:on    5:on    6:off

What does the log state:

RPC: Couldn't create auth handle (flavor 390003)

I've copied the stock krb5.conf back and now the diffs are:

[tdh@mrx /etc]> diff krb5.conf stock/krb5.conf
7c7
<  default_realm = INTERNAL.EXCFB.COM
---
>  default_realm = EXAMPLE.COM
14,17c14,17
<  INTERNAL.EXCFB.COM = {
<   kdc = sandman.internal.excfb.com:88
<   admin_server = sandman.internal.excfb.com:749
<   default_domain = internal.excfb.com
---
>  EXAMPLE.COM = {
>   kdc = kerberos.example.com:88
>   admin_server = kerberos.example.com:749
>   default_domain = example.com
21,22c21,22
<  .internal.excfb.com = INTERNAL.EXCFB.COM
<  internal.excfb.com = INTERNAL.EXCFB.COM
---
>  .example.com = EXAMPLE.COM
>  example.com = EXAMPLE.COM

You know what, rpc.gssd is not running!

[tdh@mrx /etc]> ps -ef | grep rpc
rpc       1877     1  0 01:49 ?        00:00:00 portmap
root      1898     1  0 01:49 ?        00:00:00 rpc.statd
root      1931     1  0 01:49 ?        00:00:00 rpc.idmapd
tdh       2697  2519  0 02:04 pts/0    00:00:00 grep rpc

[tdh@mrx /etc]> sudo sh -c "ulimit -c unlimited;/usr/sbin/rpc.gssd -f -vvv"
Using keytab file '/etc/krb5.keytab'
Processing keytab entry for principal 'nfs/mrx.internal.excfb.com@INTERNAL.EXCFB.COM'
We will use this entry (nfs/mrx.internal.excfb.com@INTERNAL.EXCFB.COM)
Processing keytab entry for principal 'host/mrx.internal.excfb.com@INTERNAL.EXCFB.COM'
We will NOT use this entry (host/mrx.internal.excfb.com@INTERNAL.EXCFB.COM)
Using (machine) credentials cache: 'MEMORY:/tmp/krb5cc_machine_INTERNAL.EXCFB.COM'

And I put it in the background. Hmm, why doesn't it like the host entry?

Alright, I went back to why isn't rpc.gssd starting up at boot:

[ -f /etc/sysconfig/nfs ] && . /etc/sysconfig/nfs
[ "${SECURE_NFS}" != "yes" ] && exit 0

# ls -la /etc/sysconfig/nfs
#

Time to create it (look at Learning NFSv4 with Fedora Core 2 (Linux 2.6. 5 kernel))

# This entry should be "yes" if you are using RPCSEC_GSS_KRB5 (auth=krb5,krb5i, or krb5p)
SECURE_NFS="yes"
# This entry sets the number of NFS server processes.  8 is the default
RPCNFSDCOUNT=8

[tdh@mrx sysconfig]> sudo /etc/init.d/rpcgssd start
Starting RPC gssd:                                         [  OK  ]

God I'm totally hacked about this:

[tdh@mrx sysconfig]> sudo mount -o sec=krb5 sandman:/export/kerberos /mnt/sandman/kerberos
[tdh@mrx sysconfig]> ls -la /mnt/sandman/kerberos
total 5
drwxr-xr-x 2 root root   512 Feb 12 00:23 .
drwxr-xr-x 5 root root  4096 Feb 12 00:49 ..
-rw-r--r-- 1 tdh  wheel    0 Feb 12 00:23 see_me

Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070211 Sunday February 11, 2007
Posted slides for Connectathon 2007

I've started posting the slides for Connectathon 2007: Talks 2007. As I get the remaining slides, I'll add them there.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070207 Wednesday February 07, 2007
You know you've been at a convention too long when ...

Went to the local Starbucks here at Connectathon 2007. The guy looked up and said "Awake, right?" The guy I was with was floored. I told him, what is so hard - I'm 6'5", currently sporting a handlebar, and always wearing a Green Lantern hoodie.

The event is going along fine. The main problem is that NFSv3 is too solid and the NFSv4 implementations are also getting that way. The NFSv4.1 stuff is really still in the design phase. But developers are getting small victories when they either get code to compile or even run against other vendors. I think that Connectathon 2008 will be more frantic and the victories will be larger.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily
Ben Rockwood's talk at Connectathon 2007

Ben's talk was interesting - the take home point I got was that sysadmins are not dumb and they can create unique architectures off of the building blocks you provide them. The more tools you can give them (dtrace and source code), the more they can do. That might not have been what others took home. I can't help that.

One of the things I had a problem with when I was a sysadmin was in talking to developers who would discount my ideas. I had one discount my suggestions about a new command syntax. The product has been deployed for over 5 years and everyone probably uses that syntax without thinking. My way wasn't necessarily better, just a different way of approaching the syntax. What was frustrating then though (and still to this day with other products) was the fact that the engineer who didn't have to administer the box didn't want to listen to the guy who did.

I'm back to wearing my developer hat, but I still try to listen to the sysadmins. I made a recent decision with the In Kernel Sharetab to use a symlink to solve a problem I could have coded over. I decided to scrap that idea, not because of a design review, but because I finally listened to that sysadmin in my head who told me the symlink would be a pain to work with.

So for me, I liked listening to Ben tell developers how he deploys their products and makes money doing so. He came in, said he was nervous and explained how his wife had told him that was silly. He said he told her it was like going to 3M to give a presentation on sticky notes - the audience laughed. I told him after the talk the reason why he got invited to 3M was because he was doing things with the sticky notes that 3M couldn't envision.

I.e., the innovation of sticky notes was in the past. In order for 3M to make more money, they needed to go outside their safe idea of what they thought people could do with sticky notes.

The thing which really seemed to spark the most debate (and which I started) was when Ben claimed in a room full of protocol developers that NFSv4 was too risky compared to NFSv3. Yet this was right after he said he was using ZFS and not UFS. What he wasn't articulating very well was that they went to ZFS for feature sets that they could exploit to sell to customers. The provisioning and manageability of ZFS far outweighed the stability of UFS.

In comparing NFSv4 vs NFSv3, his company did not find that overwhelming a need with respect to their business model. Or in other words, NFSv3 is sufficient for their customer's needs. Another business might find that NFSv3 is not sufficient for their customer's needs.

The other point he made was that they needed the replication (and to some extent migration) that NFSv4.1 was going to provide. This message was well received. I think it gives developers here ammo to take back to their management trees.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070206 Tuesday February 06, 2007
Sharemgr and In Kernel Sharetab presentation at Connectathon 2007

I found out late Sunday that my presentation was Monday instead of Tuesday. Not a problem! I was working off of a set of slides that Doug McCallum had put together for an internal presentation on just the sharemgr work. I tied it together with my work by looking at a case study on unshareall. You can read all about it here: The Management of Shares.

What was really interesting for me was the contrast between what I presented and what was presented before me. The discussion before mine was a heated debate about the state of pnfs and NFSv4.1. This stuff is in the early design phase. By that I mean there are prototypes which interoperate to a degree, but the spec is changing.

Anyway, my presentation was not on that technical level. And I felt a little bit indifference to what I was talking about. I presented a very simple problem, one that when the design was drafted, made perfect sense. And I talked about how we are keeping the spirit of the design intact and fixing what will be a performance issue.

I felt better after the presentation when two different people approached me and told me how they had similar issues facing them. They were interested in the approaches Doug and I took to solve our problems. One of them even took an OpenSolaris starter kit in order to look at how Doug solved his management problems. In short, this was a big win for OpenSolaris. By the way, I had plenty of people asking me for starter kits once they knew I had them.

The other thing which came out of my presentation was that it was related to the one I gave last year: Scaling NFS Services. In that one, I looked at what outside of the server can cause issues (think processor farms) and in this one, I looked at how we can fix some of the problems caused by scalability.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070202 Friday February 02, 2007
Kerbilicous security for ZFS

We've decided that everything at Connectathon has to be secured by Kerberos - we want the additional testing that we can get. It wasn't clear to me how to invoke a complex share command on a zfs filesystem. In particular, I couldn't find an example which set a security style or had multiple options. So here is what I did.

First I prepare some areas, note I'm pretty explicit about what is available.

# zfs create zoo/home/krb5
# zfs create zoo/home/all
# zfs create zoo/home/krb5i
# zfs create zoo/home/krb5p
# zfs create zoo/home/sys
# zfs create zoo/home/krb

And now we let zfs know what we want:

# zfs set sharenfs="sec=krb5:krb5i:krb5p:sys,rw" zoo/home/all
# zfs set sharenfs="sec=krb5:krb5i:krb5p,rw" zoo/home/krb
# zfs set sharenfs="sec=krb5i,rw" zoo/home/krb5i
# zfs set sharenfs="sec=krb5p,rw" zoo/home/krb5p
# zfs set sharenfs="sec=krb5,rw" zoo/home/krb5

And to check the properties:

# zfs list -o name,sharenfs
NAME               SHARENFS
zoo                off
zoo/home           on
zoo/home/all       sec=krb5:krb5i:krb5p:sys,rw
zoo/home/krb       sec=krb5:krb5i:krb5p,rw
zoo/home/krb5      sec=krb5,rw
zoo/home/krb5i     sec=krb5i,rw
zoo/home/krb5p     sec=krb5p,rw
zoo/home/nfsv2     on
zoo/home/nfsv3     on
zoo/home/nfsv4     on
zoo/home/sys       on
zoo/home/tdh       on
zoo/ws             off

And since I am testing my bits for the In Kernel Sharetab:

# cat /system/dfs/sharetab
/export/zfs/tdh -       nfs     rw
/export/zfs/krb5p       -       nfs     sec=krb5p,rw
/export/zfs/nfsv4       -       nfs     rw
/export/zfs/nfsv2       -       nfs     rw
/export/zfs/krb5i       -       nfs     sec=krb5i,rw
/export/zfs/krb5        -       nfs     sec=krb5,rw
/export/zfs/sys -       nfs     rw
/export/zfs/nfsv3       -       nfs     rw
/export/zfs/all -       nfs     sec=krb5,rw,sec=krb5i,rw,sec=krb5p,rw,sec=sys,rw
/export/zfs     -       nfs     rw
/export/zfs/krb -       nfs     sec=krb5,rw,sec=krb5i,rw,sec=krb5p,rw

Hmm, I think those entries should be compacted.

By the way, if there is no sec=, then the default is sys.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily
Converting a UFS slice to ZFS

When I installed a machine, instead of figuring out how to leave a lot of space in a slice, I went ahead and made a slice to be mounted as /zfs. I knew that I wanted to be able to reuse that space later for zfs. When I went to create a pool, this is what I did:

First I found the slice number:

# df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c1d0s0         20G   6.8G    13G    36%    /
/devices                 0K     0K     0K     0%    /devices
/dev                     0K     0K     0K     0%    /dev
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                    10G   788K    10G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
/usr/lib/libc/libc_hwcap2.so.1
                        20G   6.8G    13G    36%    /lib/libc.so.1
fd                       0K     0K     0K     0%    /dev/fd
swap                    10G    52K    10G     1%    /tmp
swap                    10G    32K    10G     1%    /var/run
/dev/dsk/c1d0s3         20G   487M    19G     3%    /altroot
/dev/dsk/c1d0s5        163G    64M   161G     1%    /zfs
/dev/dsk/c1d0s7         20G    20M    19G     1%    /export/home
/dev/dsk/c0t0d0s2      3.6G   3.6G     0K   100%    /media/CDROM
/dev/lofi/1            467M   467M     0K   100%    /isos/mnt/companion

Next I took the UFS filesystem off the system and out of /etc/vfstab:

# umount /zfs
# vi /etc/vfstab
...

Then I tried to create the new pool:

# zpool create zoo /dev/dsk/c1d0s5
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c1d0s5 contains a ufs filesystem.

One of the features I really like about zfs is not only does it tell me exactly what is wrong, it also tells me how to fix it. I don't have to go look something up. So to fix it up:

# zpool create -f zoo /dev/dsk/c1d0s5
#

And here is is later:

[tdh@sunnfsv4-109 ~]> zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
zoo                     165G   3.10G    162G     1%  ONLINE     -

Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily