星期六 十一月 01, 2008

  今天参加了OpenSPARC的研讨会,在会上展示ZFS文件系统的特性,现场气氛感觉很不错,自己编写的一些简单的Demo程序将ZFS的数据安全、Samba共享、压缩等特性做了很好的展示,效果自我感觉非常好。

  然而展示过程中出现了一个小小的意外,由于一个此前的一个ZFS Pool的挂载点没有删除,导致我的另一个Demo程序报告了错误,存储池创建失败。

  # zpool create r1pool raidz c1t0d0s3 c1t0d0s4
  mountpoint /r1pool is not empty, the pool has been created, but not mouted.

  于是我手工去删除该挂载点/r1pool的时候,由于匆忙,误将/rpool的目录下面内容删除了,当时系统提示了busy的错误,但是为时已晚,最重要的是我的系统安装的是ZFS Root, 根存储池的名字就是rpool,在/rpool/boot/grub下面有最重要的menu.lst文件,也一并被删除了。

  当然没有意识到这个问题,演示结束后匆匆关机,到晚上一打开机器,what? 为什么系统无法进入了,只能到 grub> 菜单!

  grub>

  这可糟糕了,因为这台笔记本上还装有Windows XP,是双系统,这两个系统的启动切换都要依靠menu.lst完成,这个文件一旦被删,何止Solaris无法启动,Windows XP也无法启动了。

  于是我尝试了若干手工命令findroot,均告失败

  grub> findroot (pool_rpool, 1, a)
  File not found

  哦,原来/rpool/etc/bootsign, /rpool/boot/grub/bootsign也同样丢失了,实际上findroot命令已经不太可能起作用了,于是改用了传统的root命令,进入了系统。

  grub> root (hd0, 1, a)
  grub> kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
  grub> module$ /platform/i86pc/$ISADIR/boot_archive
  grub> boot

  Okay,这时进入了Solaris系统,于是我开始手工恢复/rpool下面原有的内容,

  # mkdir -p /rpool/boot/grub/bootsign /rpool/etc
  # echo "pool_rpool" > /rpool/etc/bootsign
  # touch /rpool/boot/grub/bootsign/pool_rpool

  然后找了一个menu.lst的样例文件修改后放到了 /rpool/boot/grub/menu.lst

default 0
timeout 10
#---------- ADDED BY BOOTADM - DO NOT EDIT ----------
title Solaris snv_101
findroot (pool_rpool,1,a)
kernel$ /platform/i86pc/kernel/$ISADIR/unix -B $ZFS-BOOTFS
module$ /platform/i86pc/$ISADIR/boot_archive
#---------------------END BOOTADM--------------------
title Windows XP
        rootnoverify (hd0,0)
        chainloader +1

  重启之后,万事大吉了,两个系统都能够通过grub正常启动了。

  通过这次事件使我意识到一个比较现实的问题,到底/rpool是要自动挂载还是不自动挂载呢?如果自动挂载,就可能出现像我这样的白痴情况,误操作删了grub启动所需的重要文件;而如果不自动挂载,很多用户可能又不知道如何编辑这个menu.lst。这成了一个两难问题,看起来如果给顶级文件系统打快照snapshot可以避免这种误操作的发生,但是即便有快照,由于系统无法正常进入grub,事实上数据仍然无法恢复(grub界面中无法修改ZFS的相关数据)。

  不过经过这次教训,我还是倾向于不要在系统启动时挂载/rpool,而只在必要的时候才挂载。

  # zfs set canmount=noauto rpool

  这样,系统启动的时候就不会挂载这个顶层文件系统,也就不会出现我这样的误删重要文件的情况。如果需要临时修改 /rpool 下面内容的时候(机会非常小),只需要手工挂载就可以:

  # zfs mount rpool

  我觉得这样的设置代价相对较小,而对保护系统重要文件的安全是有好处的。

  当然如果真的出现了我遇到的问题,就只能按这种方式手工输入grub命令进入系统了,对记忆力和键盘录入是一个小小的考验,Good Lucky!


星期四 九月 11, 2008

  最近几天连续被人问到如何已启动的系统中,查找并修改安装在其他设备上的 ZFS 根系统内容。

 

  (1) 一个最简单的方式就是进入 failsafe 模式,failsafe 可以自动查找系统中的所有设备,列出可能的根文件系统所在设备(无论 UFS/ZFS),询问是否将其挂载在 /a 目录下

SunOS Release 5.11 Version snv_97 32-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Booting to milestone "milestone/single-user:default".
Configuring /dev
Searching for installed OS instances...

ROOT/snv_97 was found on rpool.
Do you wish to have it mounted read-write on /a? [y,n,?] y
mounting rpool on /a

Starting shell.
# zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool    68G  6.02G  62.0G     8%  ONLINE  /a
# zfs list
NAME                             USED  AVAIL  REFER  MOUNTPOINT
rpool                           6.32G  60.6G  36.5K  /a/rpool
rpool/ROOT                      5.32G  60.6G    18K  legacy
rpool/ROOT/snv_97               5.32G  60.6G  5.32G  /a
rpool/dump                       512M  60.6G   512M  -
rpool/export                      37K  60.6G    19K  /a/export
rpool/export/home                 18K  60.6G    18K  /a/export/home
rpool/swap                       512M  60.9G   210M  -

  这样进入 /a ,就可以根据需要修改该 ZFS 文件系统中的内容了。


  (2)如果你的系统没有 failsafe 模式,另一种方式是手工 import ,这种方式不需要进入 failsafe 模式,但改动前后需要对 ZFS 根文件系统的结构知识有一定了解:

   # zpool import (列出所有可待选的存储池)

  pool: rpool
    id: 7154650442157903689
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
config:

        rpool       ONLINE
          c0t1d0s0  ONLINE

  # zpool import -f -R /mnt rpool

  这里需要讲一下为什么要用这些选项:

  -f 是强制导入,因为该存储池在之前可能曾被活跃系统使用,-f 可以忽略与此相关的警告错误;
  -R /mnt 或其他替换的根路径,这是避免该存储池中的 <rpool>/export, <rpool>/export/home 等文件系统与当前系统中的相同挂载点冲突,导致某些文件系统无法挂载,采用 -R 选项可以使这些文件系统顺利挂载到指定挂载点之下的子目录中;

  import 操作成功之后,执行 zfs mount 命令可查看已挂载的文件系统,这时候你可能会注意到,该存储池中有些文件系统可能没有自动挂载,比如查看结果如下

# zfs mount
rpool/export                    /mnt/export
rpool/export/home               /mnt/export/home
rpool                           /mnt/rpool

  rpool/ROOT/snv_97 就没有挂载上,这是因为该文件系统曾是存储池中的根文件系统,这种文件系统必须手动挂载,步骤如下:

  # zfs unmount -a
  # rm -rf /mnt/*
  # zfs mount rpool/ROOT/snv_97
  # zfs mount -a

  这样进入 /mnt 目录,就可以编辑原存储池根文件系统下的内容了。

  修改完毕之后,如果不希望在将来的系统中看到这个存储池,仍然可以用 zpool export rpool 将其导出,这样与此相关的设备可以拔除到其他系统使用。



星期六 五月 24, 2008

  I'd like to write another Chinese version for some spare time, but sorry for I'm crasy busy currently.

  Here's Robin Guo from ZFS testing team in ERI, I just saw Navada 90
is released 2 days ago.

  If you're not ever heard about ZFS, that might be a good chance to start with it.

  And If you have known but you didn't ever install your machine with ZFS as
Root Filesystem (/), that might be a good start point try to do it!
See ZFS Boot on OpenSolaris

  Navada 90 is just released with the *Great feature* of ZFS (Install+Boot) as the Root.Filesystem (/), this putback is occurred at May 14, 2008, and integrated into Navada 90 finally.

  As the Testing Lead for the ZFS testing group, I'd like to see more and more
experience from everybody to use ZFS on their machine and enjoy the benefit from ZFS.

  Here's the new feature ZFS related in Navada 90, it support all kind of platforms.(Sun4u/Sun4v/x86/amd64)

  (1) You can select the install type as ZFS by text install (UFS still be the default, though). Unfortunately, if you use GUI install, you may miss this great feature for the GUI install is UFS-only, so far.
  If you choose the install type as ZFS, then a menu will appear to ask you input the pool name, root filesystem name, pool size, and you have choice to decide if you'd like to set /var as another filesystem separate from /. The configuration is just meaningful and straight forward, and the installation will launch after all setting is done.

  (2)  You can use ZFS Volume  (ZVOL)  as the dump or swap device now, the only
difference from before is the dump & swap must be separate filesystem on ZFS. (Not like UFS before, we could specify a slice to be dump & swap as the same time on UFS)

  (3) You can get almost *all benefit* from ZFS such as snapshot your Root Filesystem without any space cost while initial created and could rollback to the specific snapshot whenever you want. You can install your Root Filesystem as mirror configuration in case of disk corruption. You can set compression on your disk to save the space usage, and you can get the performance improvement by ZFS from the very beginning.
  For example, we once measure the install time on some machine which reduce the
OS install period from 25min (UFS) to 20min (ZFS). Yes, that could means amazing
improvement!

  (4) You could always LiveUpgrade (LU, the recommended way) migrate your old OS  to Navada 90 or further release, LU implement quite a lot in this version and will provide the way to upgrade the existence Zones as well, regardless it resides on UFS or ZFS before.

  Surely, it could possible be a risk, for you, to install your Root Filesystem as ZFS. Since ZFS is still new, and ZFS Root is just released as the 1st version both support on sparc/x86/amd64, that's your may have your choice.

  *Note*, So far, there's some limitation for ZFS Root I could share with you and to help you decide if you'd like to have a try.

  A. ZFS Root could only reside on a single slice, or multiple slices with mirror configuation, the disk label need to be SMI, it's not possible to create a raidz ZFS Root pool, so far.

  B. The preserve of old existence slice is not support very well on ZFS Install currently, so if you has very important data slice reside on your single disk, you may drop me or mail to alias zfs-discuss@opensolris.org to get support.

  C. If you'd like to use LiveUpgrade to upgrade your system, it deserve a try, but if you have Zones already, you may need to check the upgrade_log after LU finished,to make sure all your Zones also upgrade successful. You could also drop me or mail to alias zfs-discuss@opensolris.org to get further support about LU.

  Yah, I don't want write too long since this message is about to deliver to *all*. ZFS testing team is currently crazy busy for the project of ZFS (Install+Boot) backport to s10u6 as well. We might has good chance to see a official business release of above amazing features in s10u6.

  Let ZFS Root roll ! I wish you enjoy it, really.

  Any question you could access either from below links,
http://opensolaris.org/os/community/zfs/
http://www.opensolaris.org/os/community/zfs/boot/
  or contact ZFS community zfs-discuss@opensolris.org , it's always welcome!

  - Cheers

星期五 二月 15, 2008

   不知道是不是因为缺乏运气,某台 LX50 x86 一直无法正常完成 Liveupgrade 从 UFS->ZFS 的工作,每次到了 luactivate 的时候都报告错误:

ERROR: No matching BIOS id found for: </dev/dsk/c1t10d0s0>
ERROR: Cannot determine GRUB id for ABE disk </dev/dsk/c1t10d0s0>

   于是打算花点时间弄明白是怎么一回事,试验的步骤如下。

   首先怀疑是 /boot/grub/menu.lst 的问题,但经对照另一台 amd64 (LU from UFS->ZFS succeed) 的 menu.lst ,排除了这个假设。

   查看了一下 /sbin/luactivate,是一个 shell 脚本!于是打开 shell 跟踪一直到出错的语句,
发现调用了系统的 /sbin/biosdev 来获取可用的设备,这个命令返回了什么呢?

# /sbin/biosdev
0x80 /pci@1,0/pci8086,340f@7/sd@9,0
0x84 /pci@1,0/pci8086,340f@7/sd@d,0
0x85 /pci@1,0/pci8086,340f@7/sd@e,0
0x86 /pci@1,0/pci8086,340f@7,1/sd@0,0
0x87 /pci@1,0/pci8086,340f@7,1/sd@1,0

Wow... where're the 0x81, 0x82, 0x83 gone? (这三块硬盘竟然没有在这里列出,难怪
luactivate 找不到 BIOS id),尝试一下诊断参数项 -d

# /sbin/biosdev -d
adding /pci@1,0/pci8086,340f@7/sd@9,0
adding /pci@1,0/pci8086,340f@7/sd@a,0
adding /pci@1,0/pci8086,340f@7/sd@b,0
adding /pci@1,0/pci8086,340f@7/sd@c,0
adding /pci@1,0/pci8086,340f@7/sd@d,0
adding /pci@1,0/pci8086,340f@7/sd@e,0
adding /pci@1,0/pci8086,340f@7,1/sd@0,0
adding /pci@1,0/pci8086,340f@7,1/sd@1,0
matching edd 0x80
magic not valid ddbe pathinfolen 44
No matches by edd
matching edd 0x81
magic not valid ddbe pathinfolen 44
No matches by edd
matching edd 0x82
magic not valid ddbe pathinfolen 44
No matches by edd
matching edd 0x83
magic not valid ddbe pathinfolen 44
No matches by edd
matching edd 0x84
magic not valid ddbe pathinfolen 44
No matches by edd
matching edd 0x85
magic not valid ddbe pathinfolen 44
No matches by edd
matching edd 0x86
magic not valid ddbe pathinfolen 44
No matches by edd
matching edd 0x87
magic not valid ddbe pathinfolen 44
No matches by edd
matching first block 0x80
matched by first block
0x80 /pci@1,0/pci8086,340f@7/sd@9,0
matching first block 0x81
No matches by first block
Could not match 0x81
matching first block 0x82
No matches by first block
Could not match 0x82
matching first block 0x83
No matches by first block
Could not match 0x83
matching first block 0x84
matched by first block
0x84 /pci@1,0/pci8086,340f@7/sd@d,0
matching first block 0x85
matched by first block
0x85 /pci@1,0/pci8086,340f@7/sd@e,0
matching first block 0x86
matched by first block
0x86 /pci@1,0/pci8086,340f@7,1/sd@0,0
matching first block 0x87
matched by first block
0x87 /pci@1,0/pci8086,340f@7,1/sd@1,0
0x80 /pci@1,0/pci8086,340f@7/sd@9,0
0x84 /pci@1,0/pci8086,340f@7/sd@d,0
0x85 /pci@1,0/pci8086,340f@7/sd@e,0
0x86 /pci@1,0/pci8086,340f@7,1/sd@0,0
0x87 /pci@1,0/pci8086,340f@7,1/sd@1,0

    很明显,诸如这类的信息就是阻挡了该硬盘被列出的问题所在

 

matching first block 0x83
No matches by first block
Could not match 0x83

到此可以汇报一个 bug 了,奇怪的是这个问题只在 LX50 上面看到过,而且只要
是 SMI label 的硬盘,biosdev 就不认,但是 zfsboot 目前只能需要 SMI label 的
硬盘,因此这个 bug 对于该机型是致命的。对一个能够正常显示的硬盘尝试以下操作:

# fdisk /dev/rdsk/c1t13d0p0
remove the EFI label (if it exist...)
create a new partition of SOLARIS2 (SMI label)

 

# /sbin/biosdev
0x80 /pci@1,0/pci8086,340f@7/sd@9,0
0x85 /pci@1,0/pci8086,340f@7/sd@e,0
0x86 /pci@1,0/pci8086,340f@7,1/sd@0,0
0x87 /pci@1,0/pci8086,340f@7,1/sd@1,0

See that? c1t13d0 is gone!

File CR #6663634 to trace this issue. Take a rest, nice weekend...

星期日 二月 03, 2008

众所周知的,在 Opensolaris Navada 62 的版本中已经支持了 ZFS root for x86,
参见,http://www.opensolaris.org/os/community/zfs/boot/


但是缺省安装仍然是在 UFS 上,用户需要参照 http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/
将 UFS 转换到 ZFS 上。这里提供了一个脚本 http://blogs.sun.com/timf/entry/zfs_bootable_datasets_happily_rumbling

ZFS 的下一个目标就是作为缺省安装的文件系统,这一项目在 Newboot 集成到 Navada 之后准备条件已经充足。目前这个项目正在开发和测试阶段,不久将在 OpenSolaris 社区的新版本中推出。作为测试成员中的一员,我有幸对其安装过程有了一个初步体验。

(1) 用户可以在安装时选择文件系统类型,UFS/ZFS,如果 UFS,那么与此前的安装没有任何变化


(2) 选取一些磁盘设备作为创建 rootpool 的物理设备,请注意,
    rootpool can be a single disk device, or a device slice, or in a mirrored configuration. If you use a whole disk for a rootpool, you must use a slice notation (e.g. c0d0s0) so that it is labeled with an SMI label.

(3) 配置 rootpool 和文件系统的名称,以及指定 swap 和 dump 分区的大小,例如,
    Storage Pool: rootpool
    File system: rootfs
    Swap: 1g
    Dump: 2g

(4) 继续安装,所有的软件包将安装在指定的ZFS文件系统上,安装结束后,系统自动进入 ZFS root

如果用户此前有安装在 UFS 系统上的 Solaris,希望将其转换升级到 ZFS root 上,那么需要通过 LiveUpgrade 的方式(以下步骤假设您已经拥有了包含 ZFS install 特性的光盘或网络安装介质,Suppose you've get the CD/Netinstall image which contain ZFS install feature!

If your system is an old OS, such as Solaris 10, s10 update x, s9, s8, snv before the zfs install OS, You should do LiveUpgrade( step 1 & 2) or Upgrade install to the new OS on UFS at first.

If already has the zfs install OS feature, it could bypass the step (2), but still need (1) to upgrade the package of LU.


(1) 升级 LU 的软件包

LOC=<CD/netinstall image>

$LOC/Solaris_11/Tools/Installers/liveupgrade20 -nodisplay
if [[ $? -ne 0 ]]; then
        echo "liveupgrade20 fails!"
        exit 1
fi

(2) UFS Liveupgrade, an extra blank slice needed, such as /dev/dsk/c0t0d0s7

    # lucreate -c oldBE -n ufsBE -m /:<slice>:ufs
    # luupgrade -u -n ufsBE -s $LOC
    # luactivate ufsBE
    # init 6

Now the system will become NewOS on UFS, then if you want to convert UFS->ZFS, do following steps,

 
(3) 选取一些磁盘设备作为创建 rootpool 的物理设备,请注意:

    rootpool can be a single disk device, or a device slice, or in a mirrored configuration. If you use a whole disk for a rootpool, you must use a slice notation (e.g. c0d0s0) so that it is labeled with an SMI label.

(4) 创建 rootpool
    # zpool create -f rootpool <vdev>

(5) LiveUpgrade

    # lucreate -c ufsBE -n rootfs -p rootpool
    # luupgrade -u -n rootfs -s $LOC
    # luactivate rootfs
    # init 6

    系统重新启动后,如果一切顺利将进入 ZFS root,你可以通过以下命令查验当前状态

# df -k
Filesystem            kbytes    used   avail capacity  Mounted on
rootpool/BE/rootfs   17289216 2880402 11784717    20%    /
/devices                   0       0       0     0%    /devices
/dev                       0       0       0     0%    /dev
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  592052     488  591564     1%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
sharefs                    0       0       0     0%    /etc/dfs/sharetab
/usr/lib/libc/libc_hwcap2.so.1
                     14665119 2880402 11784717    20%    /lib/libc.so.1
fd                         0       0       0     0%    /dev/fd
swap                  591656      92  591564     1%    /tmp
swap                  591608      44  591564     1%    /var/run
/dev/dsk/c2t0d0s2      61316   61316       0   100%    /media/CDROM
rootpool/export      17289216      19 11784717     1%    /export
rootpool/export/home 17289216      19 11784717     1%    /export/home
ts-auto-pool           28160      18   28055     1%    /ts-auto-pool

# swap -l
swapfile             dev    swaplo   blocks     free
/dev/zvol/dsk/rootpool/swap 182,1         8  1048568  1048568

# dumpadm
      Dump content: kernel pages
       Dump device: /dev/zvol/dsk/rootpool/dump (dedicated)
Savecore directory: /var/crash/woopass
  Savecore enabled: yes


  I will run some more experiment on ZFS root and post afterwards.

    Have fun! Thanks for your interest of ZFS, especially, ZFS rock on / !    

 

I'd like to have chance to introduce some developers & QE engineers who has blog on this site related to ZFS. Accrding to this link location:

http://www.opensolaris.org/os/community/zfs/blogs/ 

Developers: (I wish I could introduce more, but here just list some of their effort or "features", isn't it a good chance to visit their blogs ? )

 
QE Engineer

  • Michael Byrne  Ireland (Quality Lead For ZFS testing)
  • Tim Foster  Ireland (Besides ZFS, Tim also contribute much effort on OpenSoris community)
  • Robin Guo   BJ ERI (** It's ME)
  • Forrest Wu   BJ ERI ( Our Star !!!)
  • Huajian Luo   BJ ERI
  • Jesse Zhang   BJ ERI
  • Anastasios Christopolos (Ireland)

Have fun! You may find them by ping their blog.

ZFS related question could be submitted by: zfs-discuss@opensolaris.org

This blog copyright 2009 by Robin Guo