« July 2009 »
SunMonTueWedThuFriSat
   
2
3
4
5
6
7
9
10
11
12
13
14
15
18
19
20
21
22
23
 
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20090724 Friday July 24, 2009
Starting to test massive layout, device, and kspe changes

I'm in the process of finishing off some coding I started before the BAT last month. It started off as a simple observation, my DSes with two datasets were only using the first dataset. It was time to rewrite the device and layouts to use the mds_sids. That is an unique value which identifies the ZFS dataset on a zpool. It should be unique in allowing a mapping from dataset name: 'ds1:ppool/pnfs' for humans and the value used by computers.

Well, doing all that also means it is time for the kspe (kernel simple policy engine) to be integrated. I had all of the above coded and the prototype for the kspe, but had to go to BAT to test other stuff. Well, this week I picked things back up and finished the code off. I'm still not completely happy with it, especially in where the kspe lives, i.e., in the 'nfs' or 'nfssrv' module.

I got my MDS up and running, but I was hitting issues with the above. But that only caused the modules not to load. I was able to use scp to get new modules over. And when I fixed those issues, I ran into the next major one. I had added another index to the layout nfs4 database table and I'd only told the table it had one.

I got in a nasty reboot loop and because of fastboot I couldn't get to to the grub menu. I ended up doing a power cycle from the LOM and that caused the grub menu to come up. I could then boot into single user mode and do:

# echo "setprop boot-file 'kmdb'" >> /a/boot/solaris/bootenv.rc
# reboot

That ended up fixing the fastboot issue. Now what I would do in a panic situation was get into single user mode and issue:

# cd /
# rm kernel/misc/amd64/nfssrv kernel/misc/nfssrv 
# reboot

That would bring me up safely in multi-user mode and then I could scp the new kernel modules over.

But I forgot that the DSes use the same initialization routines as the MDSes. If I hadn't already made all of the boxes configured to drop into kmdb on a panic, I would have been ticked.

Hey, I have a DS up and running with one dataset (gotta start small) and a down MDS:

[root@pnfs-17-24 ~]> 
panic[cpu1]/thread=ffffff01d9ee70c0: assertion failed: e->refcnt > 1, file: ../../common/fs/nfs/nfs4_db.c, line: 133

ffffff00081aa980 genunix:assfail+7e ()
ffffff00081aa9b0 nfssrv:rfs4_dbe_rele+67 ()
ffffff00081aaa90 nfssrv:ds_reportavail+4d8 ()
ffffff00081aab40 nfssrv:nfs_ds_cp_dispatch+9e ()
ffffff00081aac30 rpcmod:svc_getreq+20d ()
ffffff00081aaca0 rpcmod:svc_run+197 ()
ffffff00081aacd0 rpcmod:svc_do_run+81 ()
ffffff00081abeb0 nfs:nfssys+a0e ()
ffffff00081abf00 unix:brand_sys_syscall32+292 ()

I'm off to solve it. See ya' later!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily