Our good friend
Isaac Rozenfeld talks about the Multiplicity of
Solaris. When talking about Solaris I will use the phrase "The Vastness of Solaris".
If you have attended a Solaris Boot Camp or Tech Day in the last few years you get
an idea of what we are talking about - when we go on about Solaris hour after hour
after hour.
But the key point in Isaac's multiplicity discussion is how the cornucopia of
Solaris features work together to do some pretty spectacular (and competitively
differentiating) things. In the past we've looked at combinations such as
ZFS and Zones or
Service Management, Role Based Access Control (RBAC) and Least Privilege. Based on
a conversation last week in St. Louis, let's consider how ZFS and Solaris
Fault Management (FMA) play together.
Preparation
Let's begin by creating some fake devices that we can play with. I don't have enough disks
on this particular system, but I'm not going to let that slow me down. If you have sufficient
real hot swappable disks, feel free to use them instead.
# mkfile 1g /dev/disk1
# mkfile 1g /dev/disk2
# mkfile 512m /dev/disk3
# mkfile 512m /dev/disk4
# mkfile 1g /dev/disk5
Now let's create a couple of zpools using the fake devices.
pool1 will be a 1GB
mirrored pool using
disk1 and
disk2.
pool2 will be a 512MB mirrored
pool using
disk3 and
disk4. Device
spare1 will spare both pools in case of a problem -
which we are about to inflict upon the pools.
# zpool create pool1 mirror disk1 disk2 spare spare1
# zpool create pool2 mirror disk3 disk4 spare spare1
# zpool status
pool: pool1
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk1 ONLINE 0 0 0
disk2 ONLINE 0 0 0
spares
spare1 AVAIL
errors: No known data errors
pool: pool2
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
pool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk3 ONLINE 0 0 0
disk4 ONLINE 0 0 0
spares
spare1 AVAIL
errors: No known data errors
So far so good. If we were to run a scrub on either pool, it will complete immediately.
Remember that unlike hardware RAID disk replacement,
ZFS scrubbing and resilvering only
touches blocks that contain actual data. Since there is no data in these pools (yet),
there is little for the scrubbing process to do.
# zpool scrub pool1
# zpool scrub pool2
# zpool status
pool: pool1
state: ONLINE
scrub: scrub completed with 0 errors on Mon Feb 18 09:24:16 2008
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk1 ONLINE 0 0 0
disk2 ONLINE 0 0 0
spares
spare1 AVAIL
errors: No known data errors
pool: pool2
state: ONLINE
scrub: scrub completed with 0 errors on Mon Feb 18 09:24:17 2008
config:
NAME STATE READ WRITE CKSUM
pool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk3 ONLINE 0 0 0
disk4 ONLINE 0 0 0
spares
spare1 AVAIL
errors: No known data errors
Let's populate both pools with some data. I happen to have a directory of
scenic images that I use as screen backgrounds - that will work nicely.
# cd /export/pub/pix>
# find scenic -print | cpio -pdum /pool1
# find scenic -print | cpio -pdum /pool2
# df -k | grep pool
pool1 1007616 248925 758539 25% /pool1
pool2 483328 248921 234204 52% /pool2
And yes, cp -r would have been just as good.
Problem 1: Simple data corruption
Time to inflict some harm upon the pool. First, some simple corruption.
Writing some zeros over half of the mirror should do quite nicely.
# dd if=/dev/zero of=/dev/dsk/disk1 bs=8192 count=10000 conv=notrunc
10000+0 records in
10000+0 records out
At this point we are unaware that anything has happened to our data. So let's
try accessing some of the data to see if we can observe ZFS self healing in action.
If your system has plenty of memory and is relatively idle, accessing the data may
not be sufficient. If you still end up with no errors after the cpio, try a
zpool scrub - that will catch all errors in the data.
# cd /pool1
# find . -print | cpio -ov > /dev/null
416027 blocks
Let's ask our friend fmstat(1m) if anything is wrong ?
# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.1 0 0 0 0 0 0
disk-transport 0 0 0.0 366.5 0 0 0 0 32b 0
eft 0 0 0.0 2.6 0 0 0 0 1.4M 0
fmd-self-diagnosis 1 0 0.0 0.2 0 0 0 0 0 0
io-retire 0 0 0.0 1.1 0 0 0 0 0 0
snmp-trapgen 1 0 0.0 16.0 0 0 0 0 32b 0
sysevent-transport 0 0 0.0 620.3 0 0 0 0 0 0
syslog-msgs 1 0 0.0 9.7 0 0 0 0 0 0
zfs-diagnosis 162 162 0.0 1.5 0 0 1 0 168b 140b
zfs-retire 1 1 0.0 112.3 0 0 0 0 0 0
As the guys in the Guinness commercial say, "Brilliant!" The important thing to note
here is that the zfs-diagnosis engine has run several times indicating that there is
a problem somewhere in one of my pools. I'm also running this on Nevada so the
zfs-retire engine has also run, kicking in a hot spare due to excessive errors.
So which pool is having the problems ? We continue our FMA investigation
to find out.
# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH Major
Fault class : fault.fs.zfs.vdev.checksum
Description : The number of checksum errors associated with a ZFS device
exceeded acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Response : The device has been marked as degraded. An attempt
will be made to activate a hot spare if available.
Impact : Fault tolerance of the pool may be compromised.
Action : Run 'zpool status -x' and replace the bad device.
# zpool status -x
pool: pool1
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress, 44.83% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
pool1 DEGRADED 0 0 0
mirror DEGRADED 0 0 0
spare DEGRADED 0 0 0
disk1 DEGRADED 0 0 162 too many errors
spare1 ONLINE 0 0 0
disk2 ONLINE 0 0 0
spares
spare1 INUSE currently in use
errors: No known data errors
This tells us all that we need to know. The device
disk1 was found to have
quite a few checksum errors - so many in fact that it was replaced automatically
by a hot spare. The spare was
resilvering
and a full complement of data replicas would be available soon. The entire process was
automatic and completely observable.
Since we inflicted harm upon the (fake) disk device ourself, we know that it is in fact quite
healthy. So we can restore our pool to its original configuration rather simply - by detaching
the spare and clearing the error. We should also clear the FMA counters and repair the
ZFS vdev so that we can tell if anything else is misbehaving in either this or another pool.
# zpool detach pool1 spare1
# zpool clear pool
# zpool status pool1
pool: pool1
state: ONLINE
scrub: resilver completed with 0 errors on Mon Feb 18 10:25:26 2008
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk1 ONLINE 0 0 0
disk2 ONLINE 0 0 0
spares
spare1 AVAIL
errors: No known data errors
# fmadm reset zfs-diagnosis
# fmadm reset zfs-retire
# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.5 0 0 0 0 0 0
disk-transport 0 0 0.0 223.5 0 0 0 0 32b 0
eft 1 0 0.0 4.6 0 0 0 0 1.4M 0
fmd-self-diagnosis 4 0 0.0 0.6 0 0 0 0 0 0
io-retire 1 0 0.0 1.1 0 0 0 0 0 0
snmp-trapgen 4 0 0.0 8.8 0 0 0 0 32b 0
sysevent-transport 0 0 0.0 372.7 0 0 0 0 0 0
syslog-msgs 4 0 0.0 5.4 0 0 0 0 0 0
zfs-diagnosis 0 0 0.0 1.4 0 0 0 0 0 0
zfs-retire 0 0 0.0 0.0 0 0 0 0 0 0
# fmdump -v -u d82d1716-c920-6243-e899-b7ddd386902e
TIME UUID SUNW-MSG-ID
Feb 18 09:51:49.3025 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
100% fault.fs.zfs.vdev.checksum
Problem in:
Affects: zfs://pool=pool1/vdev=449a3328bc444732
FRU: -
Location: -
# fmadm repair zfs://pool=pool1/vdev=449a3328bc444732
fmadm: recorded repair to zfs://pool=pool1/vdev=449a3328bc444732
# fmadm faulty
Problem 2: Device failure
Time to do a little more harm. In this case I will simulate the failure of
a device by removing the fake device. Again we will access the pool and then
consult fmstat to see what is happening (are you noticing a pattern here????).
# rm -f /dev/dsk/disk2
# cd /pool1
# find . -print | cpio -oc > /dev/null
416027 blocks
# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.5 0 0 0 0 0 0
disk-transport 0 0 0.0 214.2 0 0 0 0 32b 0
eft 1 0 0.0 4.6 0 0 0 0 1.4M 0
fmd-self-diagnosis 4 0 0.0 0.6 0 0 0 0 0 0
io-retire 1 0 0.0 1.1 0 0 0 0 0 0
snmp-trapgen 4 0 0.0 8.8 0 0 0 0 32b 0
sysevent-transport 0 0 0.0 372.7 0 0 0 0 0 0
syslog-msgs 4 0 0.0 5.4 0 0 0 0 0 0
zfs-diagnosis 0 0 0.0 1.4 0 0 0 0 0 0
zfs-retire 0 0 0.0 0.0 0 0 0 0 0 0
Rats, the find ran totally out of cache from the last example. As before, should
this happen,proceed directly to zpool scrub.
# zpool scrub pool1
# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.5 0 0 0 0 0 0
disk-transport 0 0 0.0 190.5 0 0 0 0 32b 0
eft 1 0 0.0 4.1 0 0 0 0 1.4M 0
fmd-self-diagnosis 5 0 0.0 0.5 0 0 0 0 0 0
io-retire 1 0 0.0 1.0 0 0 0 0 0 0
snmp-trapgen 6 0 0.0 7.4 0 0 0 0 32b 0
sysevent-transport 0 0 0.0 329.0 0 0 0 0 0 0
syslog-msgs 6 0 0.0 4.6 0 0 0 0 0 0
zfs-diagnosis 16 1 0.0 70.3 0 0 1 1 168b 140b
zfs-retire 1 0 0.0 509.8 0 0 0 0 0 0
Again, hot sparing has kicked in automatically. The evidence of this is the
zfs-retire engine running.
# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 18 11:07:29 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3 Major
Feb 18 11:16:43 06bfe323-2570-46e8-f1a2-e00d8970ed0d
Fault class : fault.fs.zfs.device
Description : A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for
more information.
Response : No automated response will occur.
Impact : Fault tolerance of the pool may be compromised.
Action : Run 'zpool status -x' and replace the bad device.
# zpool status -x
pool: pool1
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: resilver in progress, 4.94% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
pool1 DEGRADED 0 0 0
mirror DEGRADED 0 0 0
disk1 ONLINE 0 0 0
spare DEGRADED 0 0 0
disk2 UNAVAIL 0 0 0 cannot open
spare1 ONLINE 0 0 0
spares
spare1 INUSE currently in use
errors: No known data errors
As before, this tells us all that we need to know. A device (disk2) has failed and
is no longer in operation. Sufficient spares existed and one was automatically
attached to the damaged pool. Resilvering completed successfully and the data is
once again fully mirrored.
But here's the magic. Let's repair the device - again simulated with our fake
device.
# mkfile 1g /dev/dsk/disk2
# zpool repair pool1 disk2
# zpool status pool1
pool: pool1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 4.86% done, 0h1m to go
config:
NAME STATE READ WRITE CKSUM
pool1 DEGRADED 0 0 0
mirror DEGRADED 0 0 0
disk1 ONLINE 0 0 0
spare DEGRADED 0 0 0
replacing DEGRADED 0 0 0
disk2/old UNAVAIL 0 0 0 cannot open
disk2 ONLINE 0 0 0
spare1 ONLINE 0 0 0
spares
spare1 INUSE currently in use
errors: No known data errors
Get a cup of coffee while the resilvering process runs.
# zpool status
pool: pool1
state: ONLINE
scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk1 ONLINE 0 0 0
disk2 ONLINE 0 0 0
spares
spare1 AVAIL
# fmadm faulty
Notice the nice integration with FMA. Not only was the new device resilvered, but
the hot spare was detached and the FMA fault was cleared. The fmstat counters still
show that there was a problem and the fault report still existes in the fault log for later
interrogation.
# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.5 0 0 0 0 0 0
disk-transport 0 0 0.0 171.5 0 0 0 0 32b 0
eft 1 0 0.0 3.6 0 0 0 0 1.4M 0
fmd-self-diagnosis 6 0 0.0 0.6 0 0 0 0 0 0
io-retire 1 0 0.0 0.9 0 0 0 0 0 0
snmp-trapgen 6 0 0.0 6.8 0 0 0 0 32b 0
sysevent-transport 0 0 0.0 294.3 0 0 0 0 0 0
syslog-msgs 6 0 0.0 4.2 0 0 0 0 0 0
zfs-diagnosis 36 1 0.0 51.6 0 0 0 1 0 0
zfs-retire 1 0 0.0 170.0 0 0 0 0 0 0
# fmdump
TIME UUID SUNW-MSG-ID
Feb 16 11:38:16.0976 48935791-ff83-e622-fbe1-d54c20385afc ZFS-8000-GH
Feb 16 11:38:30.8519 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233 ZFS-8000-GH
Feb 18 09:51:49.3025 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713 ZFS-8000-GH
Feb 18 09:56:24.8029 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH
Feb 18 10:23:07.2228 7c04a6f7-d22a-e467-c44d-80810f27b711 ZFS-8000-GH
Feb 18 10:25:14.6429 faca0639-b82b-c8e8-c8d4-fc085bc03caa ZFS-8000-GH
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3
Feb 18 11:16:44.2497 06bfe323-2570-46e8-f1a2-e00d8970ed0d ZFS-8000-D3
# fmdump -V -u 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
TIME UUID SUNW-MSG-ID
Feb 18 11:07:29.5195 50ea07a0-2cd9-6bfb-ff9e-e219740052d5 ZFS-8000-D3
TIME CLASS ENA
Feb 18 11:07:27.8476 ereport.fs.zfs.vdev.open_failed 0xb22406c635500401
nvlist version: 0
version = 0x0
class = list.suspect
uuid = 50ea07a0-2cd9-6bfb-ff9e-e219740052d5
code = ZFS-8000-D3
diag-time = 1203354449 236999
de = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = fmd
authority = (embedded nvlist)
nvlist version: 0
version = 0x0
product-id = Dimension XPS
chassis-id = 7XQPV21
server-id = arrakis
(end authority)
mod-name = zfs-diagnosis
mod-version = 1.0
(end de)
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.fs.zfs.device
certainty = 0x64
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x3a2ca6bebd96cfe3
vdev = 0xedef914b5d9eae8d
(end asru)
resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x3a2ca6bebd96cfe3
vdev = 0xedef914b5d9eae8d
(end resource)
(end fault-list[0])
fault-status = 0x3
__ttl = 0x1
__tod = 0x47b9bb51 0x1ef7b430
# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset
# fmadm reset zfs-retire
fmadm: zfs-retire module has been reset
Problem 3: Unrecoverable corruption
For those of you that have attended one of my Boot Camps or Solaris Best Practices training classes know,
House is one of my favorite TV shows - the only one that I watch regularly. And this next example would make a perfect episode. Is it likely to happen ? No, but it is so cool when it does :-)
Remember our second pool,
pool2. It has the same contents as
pool1. Now, let's do the unthinkable - let's corrupt both halves of the mirror. Surely data loss will follow, but the fact that Solaris stays up and running and can report what happened is pretty spectacular. But it gets so much better than that.
# dd if=/dev/zero of=/dev/dsk/disk3 bs=8192 count=10000 conv=notrunc
# dd if=/dev/zero of=/dev/dsk/disk4 bs=8192 count=10000 conv=notrunc
# zpool scrub pool2
# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.5 0 0 0 0 0 0
disk-transport 0 0 0.0 166.0 0 0 0 0 32b 0
eft 1 0 0.0 3.6 0 0 0 0 1.4M 0
fmd-self-diagnosis 6 0 0.0 0.6 0 0 0 0 0 0
io-retire 1 0 0.0 0.9 0 0 0 0 0 0
snmp-trapgen 8 0 0.0 6.3 0 0 0 0 32b 0
sysevent-transport 0 0 0.0 294.3 0 0 0 0 0 0
syslog-msgs 8 0 0.0 3.9 0 0 0 0 0 0
zfs-diagnosis 1032 1028 0.6 39.7 0 0 93 2 15K 13K
zfs-retire 2 0 0.0 158.5 0 0 0 0 0 0
As before, lots of zfs-diagnosis activity. And two hits to zfs-retire. But we
only have one spare - this should be interesting. Let's see what is happenening.
# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH Major
Feb 18 13:18:42 c3889bf1-8551-6956-acd4-914474093cd7
Fault class : fault.fs.zfs.vdev.checksum
Description : The number of checksum errors associated with a ZFS device
exceeded acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Response : The device has been marked as degraded. An attempt
will be made to activate a hot spare if available.
Impact : Fault tolerance of the pool may be compromised.
Action : Run 'zpool status -x' and replace the bad device.
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 16 11:38:30 9f7f288c-fea8-e5dd-bf23-c0c9c4e07233 ZFS-8000-GH Major
Feb 18 09:51:49 2ac4568f-4040-cb5d-f3b8-ae3d69e7d713
Feb 18 10:23:07 7c04a6f7-d22a-e467-c44d-80810f27b711
Feb 18 13:18:42 0a1bf156-6968-4956-d015-cc121a866790
Fault class : fault.fs.zfs.vdev.checksum
Description : The number of checksum errors associated with a ZFS device
exceeded acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Response : The device has been marked as degraded. An attempt
will be made to activate a hot spare if available.
Impact : Fault tolerance of the pool may be compromised.
Action : Run 'zpool status -x' and replace the bad device.
# zpool status -x
pool: pool2
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:
NAME STATE READ WRITE CKSUM
pool2 DEGRADED 0 0 2.60K
mirror DEGRADED 0 0 2.60K
spare DEGRADED 0 0 2.43K
disk3 DEGRADED 0 0 5.19K too many errors
spare1 DEGRADED 0 0 2.43K too many errors
disk4 DEGRADED 0 0 5.19K too many errors
spares
spare1 INUSE currently in use
errors: 247 data errors, use '-v' for a list
So ZFS tried to bring in a hot spare, but there were insufficient replicas to
be able to reconstruct all of the data. But here is where is gets interesting.
Let's see what zpool status -v says about things.
zpool status -v
pool: pool1
state: ONLINE
scrub: resilver completed with 0 errors on Mon Feb 18 11:23:13 2008
config:
NAME STATE READ WRITE CKSUM
pool1 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk1 ONLINE 0 0 0
disk2 ONLINE 0 0 0
spares
spare1 INUSE in use by pool 'pool2'
errors: No known data errors
pool: pool2
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver completed with 602 errors on Mon Feb 18 13:20:14 2008
config:
NAME STATE READ WRITE CKSUM
pool2 DEGRADED 0 0 2.60K
mirror DEGRADED 0 0 2.60K
spare DEGRADED 0 0 2.43K
disk3 DEGRADED 0 0 5.19K too many errors
spare1 DEGRADED 0 0 2.43K too many errors
disk4 DEGRADED 0 0 5.19K too many errors
spares
spare1 INUSE currently in use
errors: Permanent errors have been detected in the following files:
/pool2/scenic/cider mill crowds.jpg
/pool2/scenic/Cleywindmill.jpg
/pool2/scenic/csg_Landscapes001_GrandTetonNationalPark,Wyoming.jpg
/pool2/scenic/csg_Landscapes002_ElowahFalls,Oregon.jpg
/pool2/scenic/csg_Landscapes003_MonoLake,California.jpg
/pool2/scenic/csg_Landscapes005_TurretArch,Utah.jpg
/pool2/scenic/csg_Landscapes004_Wildflowers_MountRainer,Washington.jpg
/pool2/scenic/csg_Landscapes!idx011.jpg
/pool2/scenic/csg_Landscapes127_GreatSmokeyMountains-NorthCarolina.jpg
/pool2/scenic/csg_Landscapes129_AcadiaNationalPark-Maine.jpg
/pool2/scenic/csg_Landscapes130_GettysburgNationalPark-Pennsylvania.jpg
/pool2/scenic/csg_Landscapes131_DeadHorseMill,CrystalRiver-Colorado.jpg
/pool2/scenic/csg_Landscapes132_GladeCreekGristmill,BabcockStatePark-WestVirginia.jpg
/pool2/scenic/csg_Landscapes133_BlackwaterFallsStatePark-WestVirginia.jpg
/pool2/scenic/csg_Landscapes134_GrandCanyonNationalPark-Arizona.jpg
/pool2/scenic/decisions decisions.jpg
/pool2/scenic/csg_Landscapes135_BigSur-California.jpg
/pool2/scenic/csg_Landscapes151_WataugaCounty-NorthCarolina.jpg
/pool2/scenic/csg_Landscapes150_LakeInTheMedicineBowMountains-Wyoming.jpg
/pool2/scenic/csg_Landscapes152_WinterPassage,PondMountain-Tennessee.jpg
/pool2/scenic/csg_Landscapes154_StormAftermath,OconeeCounty-Georgia.jpg
/pool2/scenic/Brig_Of_Dee.gif
/pool2/scenic/pvnature14.gif
/pool2/scenic/pvnature22.gif
/pool2/scenic/pvnature7.gif
/pool2/scenic/guadalupe.jpg
/pool2/scenic/ernst-tinaja.jpg
/pool2/scenic/pipes.gif
/pool2/scenic/boat.jpg
/pool2/scenic/pvhawaii.gif
/pool2/scenic/cribgoch.jpg
/pool2/scenic/sun1.gif
/pool2/scenic/sun1.jpg
/pool2/scenic/sun2.jpg
/pool2/scenic/andes.jpg
/pool2/scenic/treesky.gif
/pool2/scenic/sailboatm.gif
/pool2/scenic/Arizona1.jpg
/pool2/scenic/Arizona2.jpg
/pool2/scenic/Fence.jpg
/pool2/scenic/Rockwood.jpg
/pool2/scenic/sawtooth.jpg
/pool2/scenic/pvaptr04.gif
/pool2/scenic/pvaptr07.gif
/pool2/scenic/pvaptr11.gif
/pool2/scenic/pvntrr01.jpg
/pool2/scenic/Millport.jpg
/pool2/scenic/bryce2.jpg
/pool2/scenic/bryce3.jpg
/pool2/scenic/monument.jpg
/pool2/scenic/rainier1.gif
/pool2/scenic/arch.gif
/pool2/scenic/pv-anzab.gif
/pool2/scenic/pvnatr15.gif
/pool2/scenic/pvocean3.gif
/pool2/scenic/pvorngwv.gif
/pool2/scenic/pvrmp001.gif
/pool2/scenic/pvscen07.gif
/pool2/scenic/pvsltd04.gif
/pool2/scenic/banhall28600-04.JPG
/pool2/scenic/pvwlnd01.gif
/pool2/scenic/pvnature08.gif
/pool2/scenic/pvnature13.gif
/pool2/scenic/nokomis.jpg
/pool2/scenic/lighthouse1.gif
/pool2/scenic/lush.gif
/pool2/scenic/oldmill.gif
/pool2/scenic/gc1.jpg
/pool2/scenic/gc2.jpg
/pool2/scenic/canoe.gif
/pool2/scenic/Donaldson-River.jpg
/pool2/scenic/beach.gif
/pool2/scenic/janloop.jpg
/pool2/scenic/grobacro.jpg
/pool2/scenic/fnlgld.jpg
/pool2/scenic/bells.gif
/pool2/scenic/Eilean_Donan.gif
/pool2/scenic/Kilchurn_Castle.gif
/pool2/scenic/Plockton.gif
/pool2/scenic/Tantallon_Castle.gif
/pool2/scenic/SouthStockholm.jpg
/pool2/scenic/BlackRock_Cottage.jpg
/pool2/scenic/seward.jpg
/pool2/scenic/canadian_rockies_csg110_EmeraldBay.jpg
/pool2/scenic/canadian_rockies_csg111_RedRockCanyon.jpg
/pool2/scenic/canadian_rockies_csg112_WatertonNationalPark.jpg
/pool2/scenic/canadian_rockies_csg113_WatertonLakes.jpg
/pool2/scenic/canadian_rockies_csg114_PrinceOfWalesHotel.jpg
/pool2/scenic/canadian_rockies_csg116_CameronLake.jpg
/pool2/scenic/Castilla_Spain.jpg
/pool2/scenic/Central-Park-Walk.jpg
/pool2/scenic/CHANNEL.JPG
In my best Hugh Laurie voice trying to sound very Northeastern American, that is so cool! But we're not even
done yet. Let's take this list of files and restore them - in this case, from pool1. Operationally this
would be from a back up tape or nearline backup cache, but for our purposes, the contents in pool1 will
do nicely.
First, let's clear the zpool error counters and return the spare disk. We want to make sure
that our restore works as desired. Oh, and clear the FMA stats while we're at it.
# zpool clear
# zpool detach pool2 spare1
# fmadm reset zfs-diagnosis
fmadm: zfs-diagnosis module has been reset
# fmadm reset zfs-retire
fmadm: zfs-retire module has been reset
Now individually restore the files that have errors in them and check again. You can even export and reimport
the pool and you will find a very nice, happy, and thoroughly error free ZFS pool. Some rather unpleasant gnashing of
zpool status -v output with awk has been omitted for sanity sake.
# zpool scrub pool2
# zpool status pool2
pool: pool2
state: ONLINE
scrub: scrub completed with 0 errors on Mon Feb 18 14:04:56 2008
config:
NAME STATE READ WRITE CKSUM
pool2 ONLINE 0 0 0
mirror ONLINE 0 0 0
disk3 ONLINE 0 0 0
disk4 ONLINE 0 0 0
spares
spare1 AVAIL
errors: No known data errors
# zpool export pool2
# zpool import pool2
# dircmp -s /pool1 /pool2
Conclusions and Review
So what have we learned ? ZFS and FMA are two great tastes that taste great together. No, that's chocolate
and peanut butter, but you get this idea. One more great example of Isaac's Multiplicity of Solaris.
That, and I have finally found a good lab exercise for the FMA training materials. Ever since Christine Tran put
the FMA workshop together, we have been looking for some good FMA lab exercises. The materials reference a synthetic
fault generator that is not available in public (for obvious reasons). I haven't explored the FMA test harness
enough to know if there is anything in there that would make a good lab. But this exercise that we have just
explored seems to tie a number of key pieces together.
And of course, one more reason why Roxy says,
"You should run Solaris."
Technocrati Tags:
Sun
Solaris
ZFS
FMA
Netherton
Trackback URL: http://blogs.sun.com/bobn/entry/zfs_and_fma_two_great
Bob,
What great blogs you write! The knowledge I receive every time I read them is unbelievable.
Thanks
Mark
Posted by Mark Huff on February 20, 2008 at 04:47 PM CST #
Great example of taking a real world problem and showing how Solaris behaves. Believe Roxy says spare1=disk5 ? ;)
Posted by Isaac on February 24, 2008 at 12:37 PM CST #
and thx for the Multiplicity plug!
Posted by isaac on February 24, 2008 at 12:40 PM CST #
I guess I could just ask you directly (you're about 5 meters away from me), but I'm seeing 149 events for zfs-diagnosis on a Thumper, but nothing faulted. I assume these are zfs checksum errors, automagically fixed by the zfs gods?
bash-3.00# fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
cpumem-retire 0 0 0.0 0.0 0 0 0 0 0 0
disk-monitor 0 0 0.0 0.0 0 0 0 0 54K 0
disk-transport 0 0 0.0 20680.8 0 1 0 0 32b 0
eft 0 0 0.0 3.7 0 0 0 0 1.4M 0
fabric-xlate 0 0 0.0 0.0 0 0 0 0 0 0
fmd-self-diagnosis 0 0 0.0 0.0 0 0 0 0 0 0
io-retire 0 0 0.0 0.0 0 0 0 0 0 0
snmp-trapgen 0 0 0.0 0.0 0 0 0 0 0 0
sp-monitor 0 0 0.0 3.6 0 0 0 0 20b 0
sysevent-transport 0 0 0.0 161.7 0 0 0 0 0 0
syslog-msgs 0 0 0.0 0.0 0 0 0 0 32b 0
zfs-diagnosis 149 0 0.0 0.8 0 0 0 0 0 0
zfs-retire 0 0 0.0 0.0 0 0 0 0 0 0
Posted by Charles Soto on April 20, 2009 at 10:51 AM CDT #