That last bug was actually pretty easy to solve. It turned out to be more code which had never been tested before. If we look at the code in usr/src/uts/common/fs/nfs/dserv_server.c (which will be modified by the time you read this), we see:
711 mutex_enter(&inst->dmi_content_lock);
712 error = find_open_root_objset(inst, sid, &root_objset);
713 if (error) {
714 error = (error == ENOENT) ? EIO : error;
715 goto out;
716 }
717
718 error = find_open_mdsfs_objset(inst, dataset_id, root_objset,
719 &(dnd->dnd_objset));
720 if (error == 0 || error != ENOENT) {
721 if (error == 0)
722 dnd->dnd_flags |= DSERV_NNODE_FLAG_OBJSET;
723 goto out;
724 }
The EAGAIN in Hard day of debugging the hard stuff, but now have multiple datasets on the same DS was returned on 712 by find_open_root_objset(). The new panic was occurring down in the call to find_open_mdsfs_objset() on 718. Before my changes, find_open_root_objset() always returned the first pNFS dataset root. So if you had two, you only ever saw the one.
I decided to skip right to the code and it ended up being easy to spot in find_open_root_objset(), I started there because I hadn't changed find_open_mdsfs_objset() and it was known to work.
541 static int
542 find_open_root_objset(dserv_mds_instance_t *inst, mds_sid mds_sid,
543 open_root_objset_t **root_objset)
544 {
...
603 /*
604 * Find the root pNFS object set.
605 */
606 for (tmp_root = list_head(&inst->dmi_datasets); tmp_root != NULL;
607 tmp_root = list_next(&inst->dmi_datasets, tmp_root)) {
608 if (ds_guid.dg_zpool_guid ==
609 tmp_root->oro_ds_guid.dg_zpool_guid &&
610 ds_guid.dg_objset_guid ==
611 tmp_root->oro_ds_guid.dg_objset_guid) {
612 /*
613 * This is our root pNFS object set!
614 */
615 found_root_objset = 1;
616 break;
617 }
618 }
*root_objset is never set when it is found. So, when we use it on 718, it is garbage. The fix is a simple assignment. Hmm, I could replace tmp_root with *root_objset, which is what the original author seemed to think was going on here.
But in any event, here is the 'pee' in pNFS:
[root@pnfs-17-21 ~]> mount -o vers=4 pnfs-17-24:/pnfs2/pnfs /pnfs/pnfs-17-24
[root@pnfs-17-21 ~]> cp /etc/passwd /pnfs/pnfs-17-24/qwhoei
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/qwhoei
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 1
Layout [0]:
Layout obtained at: Sat Jul 25 02:20:00:343367 2009
status: UNKNOWN, iomode: LAYOUTIOMODE_RW
offset: 0, length: EOF
num stripes: 4, stripe unit: 32768
Stripe [0]:
tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
Stripe [1]:
tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
Stripe [2]:
tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
Stripe [3]:
tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
[root@pnfs-17-21 ~]> ls -la /pnfs/pnfs-17-24/qwhoei
-rw-r--r-- 1 root root 881 Jul 25 02:20 /pnfs/pnfs-17-24/qwhoei
Well, actually, there really isn't any parallel activity going on here. The file is 881 bytes and the default stripe size is 32k. So it all goes to stripe 0. But there is a lot going on in the background which was touched by my changes.
I could either start testing the kspe to get a smaller stripe size or I can use mkfile to get a file of 2*4*32k such that there are two writes to each stripe.
[root@pnfs-17-21 ~]> mkfile 256k chunky
[root@pnfs-17-21 ~]> ls -la chunky
-rw------T 1 root root 262144 Jul 25 02:34 chunky
[root@pnfs-17-21 ~]> cp chunky /pnfs/pnfs-17-24/pChunky
[root@pnfs-17-21 ~]> ls -la /pnfs/pnfs-17-24/pChunky
-rw------- 1 root root 262144 Jul 25 02:34 /pnfs/pnfs-17-24/pChunky
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/pChunky
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 8
Layout [0]:
Layout obtained at: Sat Jul 25 02:34:22:325669 2009
status: UNKNOWN, iomode: LAYOUTIOMODE_RW
offset: 0, length: EOF
num stripes: 4, stripe unit: 32768
Stripe [0]:
tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
Stripe [1]:
tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
Stripe [2]:
tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
Stripe [3]:
tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
But I have no idea if the file is identical. I'll test that later...
I can test that with using a large text file, say 440k:
[root@pnfs-17-21 ~]> cp nfs4_vnops.c /pnfs/pnfs-17-24/Pnfs4_vnops.c
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/Pnfs4_vnops.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 13
Layout [0]:
Layout obtained at: Sat Jul 25 02:55:17:818063 2009
status: UNKNOWN, iomode: LAYOUTIOMODE_RW
offset: 0, length: EOF
num stripes: 4, stripe unit: 32768
Stripe [0]:
tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
Stripe [1]:
tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
Stripe [2]:
tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
Stripe [3]:
tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
[root@pnfs-17-21 ~]> ls -la /pnfs/pnfs-17-24/Pnfs4_vnops.c
-rw-r--r-- 1 root root 409540 Jul 25 02:55 /pnfs/pnfs-17-24/Pnfs4_vnops.c
[root@pnfs-17-21 ~]> diff nfs4_vnops.c /pnfs/pnfs-17-24/Pnfs4_vnops.c
So we have pNFS!