Open Language Tools logo

20070329 Thursday March 29, 2007

ZFS Bootable datasets - happily rumbling

For the last while, I've been helping with the testing of the recently announced ZFS Boot bits that Lin putback yesterday.

We've got the regression testing on these bits completed - these changes don't break existing ZFS functionality, and we've validated that the basic functionality of ZFS bootable datasets works as designed.

I'm now looking at some additional tests for these bits, trying to boot mirrors with missing/detached disks, that sort of thing. (This week, I brought up a Thumper with root on a 47-way mirrored pool! :-)

As with the ZFS Mountroot bits before, I thought that writing a script to automate the install of these bits would be pretty useful while we don't yet have full ZFS support in the installer. Here it is: zfs-actual-root-install.sh.

This is how you use it:

root@usuki[88] ./zfs-actual-root-install.sh --help
Usage : zfs-actual-root-install.sh [options to pass to zpool]
eg. ./zfs-actual-root-install.sh mirror c0t0d0s0 c0t0d1s0

You need to be running a fresh install of at least snv_50
 (with a BFU of Lin's zfsboot bits) for this to work.

Note also, you must supply a disk using slice notation: we need SMI
labels to boot, whereas "zpool create c0t0d0" would use EFI labels.
Only single disks, or mirrors are supported. No stripes or raidz please.

If you set the environment variable $ROOT_FS, we use that as the root
filesystem.

As mentioned above, ZFS root boot only works with SMI labeled disks - if you've ever given ZFS the entire disk before, it'll have put an EFI label on the disk, so you need to remove that using fdisk, then rewrite the label using format, or fmthard. Not too scary - here's me having just changed the disk type:

             Total disk size is 8924 cylinders
             Cylinder size is 16065 (512 byte) blocks

                                               Cylinders
      Partition   Status    Type          Start   End   Length    %
      =========   ======    ============  =====   ===   ======   ===
          1       Active    Solaris2          1  8923    8923    100


SELECT ONE OF THE FOLLOWING:
   1. Create a partition
   2. Specify the active partition
   3. Delete a partition
   4. Change between Solaris and Solaris2 Partition IDs
   5. Exit (update disk configuration and exit)
   6. Cancel (exit without updating disk configuration)
Enter Selection: 5


format> l
[0] SMI Label
[1] EFI Label
Specify Label type[1]: 0
Warning: This disk has an EFI label. Changing to SMI label will erase all
current partitions.
Continue? y
Auto configuration via format.dat[no]? 
Auto configuration via generic SCSI-2[no]?

Here's the script in action:

root@usuki[92] ./zfs-actual-root-install.sh mirror c2t0d0s0 c2t1d0s0
Updating vfstab on UFS root
Starting to copy data from UFS root to /zfsroot - this may take some time.
.
.
.
10576640 blocks
.
.
There's a copy of the old UFS root in /zfsroot/etc/vfstab.old-ufs-root
diffs are new vs. old :
6a7
> /dev/dsk/c0d0s0       /dev/rdsk/c0d0s0        /       ufs     1       no      -
12,13c13
< rootpool/rootfs - / zfs - no -
< /dev/dsk/c0d0s0 /dev/rdsk/c0d0s0 /ufsroot ufs - yes -
---
> rootpool/rootfs - /zfsroot zfs - yes -
Creating ram disk for /zfsroot
updating /zfsroot/platform/i86pc/amd64/boot_archive...this may take a minute
updating /zfsroot/platform/i86pc/boot_archive...this may take a minute
Installing grub on /dev/rdsk/c2t0d0s0
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 260 sectors starting at 50 (abs 16115)
Installing grub on /dev/rdsk/c2t1d0s0
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 260 sectors starting at 50 (abs 16115)

Okay, assuming we haven't broken anything, when you next reboot, you
should be able to select a grub menu entry for ZFS on root!
Remember to report anything suspicious via bugster or
zfs-discuss@opensolaris.org.
If your boot device has changed because of this, remember to change your
bios settings.  (you should now boot from /dev/dsk/c2t0d0s0 /dev/dsk/c2t1d0s0)

And finally, here's me booting with the new root:

# df -h / 
Filesystem             size   used  avail capacity  Mounted on
rootpool/rootfs         67G   4.6G    62G     7%    /
# zfs list
NAME              USED  AVAIL  REFER  MOUNTPOINT
rootpool         4.57G  62.4G    24K  /rootpool
rootpool/rootfs  4.57G  62.4G  4.57G  legacy
# zpool status -v
  pool: rootpool
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        rootpool      ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c2t0d0s0  ONLINE       0     0     0
            c2t1d0s0  ONLINE       0     0     0

errors: No known data errors
# 

You can install a root pool to a slice that isn't slice 0, but in that case, the script won't work this out, and will run installgrub to that slice - if that's the case, you should manually run the installgrub command to put the new ZFS-capable grub on whatever your boot device is.

One other thing to watch out for, is that if you're BFUing development archives, you might run into 6528202 - so take a copy of /boot/platform/i86pc/kernel/unix before you BFU! If you're happy to wait for a full install of nv_62 or later, then you don't need to worry about this step.

I've said it before, but having ZFS on your root filesystem is just completely awesome - being able to incrementally backup, snapshot and rollback your root filesystem really is amazingly useful. I wrote mountrootadm to help out even more. Of course, eventually I suspect LiveUpgrade will handle all this for you, but in the meantime this does the trick.

Let me know if you've any thoughts or comments about the script. Happy rumbling!

Update: Bart quite rightly pointed out to us that there's no need for all that mucking around with failsafe-boot in order to reconstruct the /dev and /devices filesystems. Much easier and faster is:

mkdir -p /zfs-root-tmp.$$
mount -F lofs -o nosub / /zfs-root-tmp.$$
(cd /zfs-root-tmp.$$; tar cvf - devices dev ) | (cd /zfsroot; tar xvf -)
umount /zfs-root-tmp.$$
rm -rf /zfs-root-tmp.$$

So I've updated the post above to change that, fixed the script and tested it - works just fine. Thanks Bart!

Update: Lin pointed out a typo in the create_dirs script where /tmp was being given the wrong permissions, so I've fixed that in this version of the script too.

Update: - we were wrong about it being a typo. Normal service resuming..

(2007-03-29 06:32:47.0) Permalink Comments [9]

Trackback URL: http://blogs.sun.com/timf/entry/zfs_bootable_datasets_happily_rumbling
Comments:

Excellent, thanks Tim. An automated procedure is almost as good as out-of-the-box support.

Posted by Dick Davies on March 30, 2007 at 01:31 AM IST #

Tim,
Nice script, but I believe you missed a Note in #(4) Populate the UFS root content to the ZFS root filesystem:
Copy all of the files in the UFS root filesystem to the newly created ZFS root filesystem. The following command does this without crossing mountpoints. This command will take on the order of 30 minutes, give or take. Note, this will not cross mountpoints, if /usr, /var, or other filesystems are on other mountpoints, they will need to be copied over following this command.
# cd / # find . -xdev -depth -print | cpio -pvdm /zfsroot
I have modified your script to copy /usr, /var. I also create zfs filesystems for them. If you like, I can email my modified script.
Ron Halstead

Posted by Ron Halstead on April 24, 2007 at 06:05 PM IST #

Thanks for pointing that out Ron - you're right about the script not crossing mountpoints, so users beware!

I guess I could check in /etc/vfstab to see if /var and /usr are separate filesystems, but where do I end - should I check /opt, /usr/local and all other mounted filesystems ? There's an interesting thread on zfs-discuss about this idea, started from Lori's blog post - now that we can easily carve up the filesystem namespace, where should we start ? All good things to keep in mind!

Posted by Tim Foster on April 26, 2007 at 09:32 PM IST #

Worked as described. I have one thing happening now at boot. I get an error message about /dev/random: "No randomness provider enabled for /dev/random. Use cryptoadm to provide one." Does anyone know how to fix this? Another thing: Is it possible to upgrade to a higher build when using zfs boot?

Posted by Nicolas Linkert on June 02, 2007 at 10:25 PM IST #

Hi Nicolas, Glad you're finding zfs-boot useful! I see you asked zfs-discuss about this too - I'll just point to my reply there - hope this helps ?

Posted by Tim Foster on June 03, 2007 at 04:17 PM IST #

This looks super. I'm running on snv_65 and cannot seem to change the EFI label to SMI. Perhaps I'm just too new to this to see what I'm doing wrong, but I've tried a number of different methods that I thought should do the trick. Could you please point me in the proper direction with somewhat detailed instructions for this or perhaps there is now an easier method with builds around 65? Thanks much!

Posted by ylon on June 07, 2007 at 08:17 PM IST #

Hi ylon, that's odd - have you tried "format -e" ? More here.

Posted by Tim Foster on June 11, 2007 at 12:14 AM IST #

gosh! i felt so crazy doing diz one

Posted by marjorie on March 08, 2008 at 07:45 AM GMT #

Tim,
You might want to add some checking for the dlmgmt bug that causes issues with the /etc/.dlmgmt_door object. To get your script to work, I had to do 'svcadm disable datalink-management' before feeding the script my slices. Upon rebooting, the system drops to maintenance mode since the datalink service is running, but the service can be re-enabled from the console. Without doing this, the datalink service tries to create the door file before /etc has become writeable. There is a bug for this that is expected to be fixed in nv86 I think?

Posted by Blake on March 19, 2008 at 01:33 AM GMT #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed