Wednesday November 16, 2005
Demonstrating ZFS Self-Healing
I'm the kind of guy who likes to tinker. To see under the bonnet. I used to have a go at "fixing" TV's by taking the back off and seeing what could be adjusted (which is kind-of anathema to one of the philosophies of ZFS).
So, when I have been presenting and demonstrating ZFS to customers, the thing I really like to show is what ZFS does when I inject "silent data corruption" into one device in a mirrored storage pool.
This is cool, because ZFS does a couple of things that are not done by any comparable product:
This all happens before the data is passed off to the process that asked for it. This is how it looks in slideware:

The key to demonstrating this live is how to inject corruption, without having to apply a magnet or lightning bolt to my disk. Here is my version of such a demonstration:
cleek[bash]# zpool create demo mirror /export/zfs/zd0 /export/zfs/zd1 cleek[bash]# zfs create demo/ccs |
cleek[bash]# cp -pr /usr/ccs/bin /demo/ccs cleek[bash]# zfs list NAME USED AVAIL REFER MOUNTPOINT demo 2.57M 231M 9.00K /demo demo/ccs 2.51M 231M 2.51M /demo/ccs |
cleek[bash]# cd /demo/ccs
cleek[bash]# find . -type f -exec cat {} + | cksum
1891695928 2416605
|
cleek[bash]# dd bs=1024k count=32 conv=notrunc if=/dev/zero of=/export/zfs/zd0 32+0 records in 32+0 records out |
cleek[bash]# find . -type f -exec cat {} + | cksum
1891695928 2416605
cleek[bash]# zpool status demo
pool: demo
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
demo ONLINE 0 0 0
mirror ONLINE 0 0 0
/export/zfs/zd0 ONLINE 0 0 0
/export/zfs/zd1 ONLINE 0 0 0
|
The reason for this is that ZFS still has all the data for this filesystem cached, so it does not need to read anything from the storage pool's devices.
cleek[bash]# cd / cleek[bash]# zpool export -f demo cleek[bash]# zpool import -d /export/zfs demo cleek[bash]# cd - /demo/ccs |
cleek[bash]# zpool status demo
pool: demo
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool online' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
demo ONLINE 0 0 0
mirror ONLINE 0 0 0
/export/zfs/zd0 ONLINE 0 0
/export/zfs/zd1 ONLINE 0 0 0
|
cleek[bash]# zpool online demo/export/zfs/zd0 Bringing device /export/zfs/zd0 online |
cleek[bash]# find . -type f -exec cat {} + | cksum
1891695928 2416605 note that my checksum is the same
cleek[bash]# zpool status
[...]
NAME STATE READ WRITE CKSUM
demo ONLINE 0 0 0
mirror ONLINE 0 0 0
/export/zfs/zd0 ONLINE 0 0
/export/zfs/zd1 ONLINE 0 0 0
|
Of course, if I wanted to know the instant things happened, I could also use DTrace (in another window):
cleek[bash]# dtrace -n :zfs:zio_checksum_error:entry dtrace: description ':zfs:zio_checksum_error:entry' matched 1 probe CPU ID FUNCTION:NAME 0 40650 zio_checksum_error:entry 0 40650 zio_checksum_error:entry 0 40650 zio_checksum_error:entry 0 40650 zio_checksum_error:entry [...] |
Technorati Tag: ZFS
Posted at 09:00AM Nov 16, 2005 by timc in Sun | Comments[4]
Posted by Jay Fenton on November 14, 2005 at 04:41 PM PST #
Posted by Ceri Davies on November 17, 2005 at 03:50 AM PST #
Posted by Ceri Davies on November 18, 2005 at 04:02 AM PST #
Of course self-storage unites are very comfortable, as for me I move from one place to another very often now and I'd better keep something in storage then to move it every time
Posted by opslagruimte selfstorage on February 03, 2009 at 06:31 PM PST #