Eric Kustarz's Weblog

e-street

All | FileBench | NFS | SETUP | ZFS

20070202 Friday February 02, 2007

 corrupted files and 'zpool status -v'

If ZFS detects either a checksum error or read I/O failure and is not able to correct it (say by successfully reading from the other side of a mirror), then it will store a log of objects that are damaged permanently (perhaps due to silent corruption).

Previously (that is, before snv_57), the output we gave was only somewhat useful:

# zpool status -v
  pool: monkey
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        monkey      ONLINE      26     0     0
          c1t1d0s7  ONLINE      12     0     0
          c1t1d0s6  ONLINE      14     0     0

errors: The following persistent errors have been detected:

          DATASET  OBJECT  RANGE
          0x0      0x13    lvl=0 blkid=0
          0x5      0x4     lvl=0 blkid=0
          0x17     0x4     lvl=0 blkid=0
          0x1d     0x4     lvl=0 blkid=0
          0x24     0x5     lvl=0 blkid=0
          0x2a     0x4     lvl=0 blkid=0
          0x2a     0x6     lvl=0 blkid=0
          0x30     0x4     lvl=0 blkid=0
          0x36     0x0     lvl=0 blkid=2

If you were lucky, the DATASET object number would actually get converted into a dataset name. If it didn't then you would have to use zdb(1M) to figure out what the dataset name/mountpoint was. After that, you would have to use the '-inum' option to find(1) to figure out what the actual file was (see the opensolaris thread on it). While it is really powerful to even have this ability, it would be really nice to have ZFS do all the dirty work for you - we are after all shooting for easy administration.

With the putback of: 6410433 'zpool status -v' would be more useful with filenames, observability has been greatly increased!:

# zpool status -v
  pool: monkey
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        monkey      ONLINE      24     0     0
          c1t1d0s6  ONLINE      10     0     0
          c1t1d0s7  ONLINE      14     0     0

errors: Permanent errors have been detected in the following files:

        /monkey/a.txt
        /monkey/bananas/b.txt
        /eric/c.txt
        /monkey/sub/dir/d.txt
        monkey/ghost:/e.txt
        monkey/ghost:/boo/f.txt
        monkey/dnode:<0x0>
        <metadata>:<0x13>

For the listings above, we attempt to print out the full path to the file. If we successfully find the full path and the dataset is mounted then we print out the full path with a preceding "/" (such as in the "/monkey/a.txt" example above). If we successfully find it, but the dataset is not mounted, then we print out the dataset name (no preceding "/"), followed by the path within the dataset to the file (see the "monkey/ghost:/e.txt" example above).

If we can't successfully translate the object number to a file path (either due to error or the object doesn't have a real file path associated with it as is the case for say a dnode_t), then we print out the dataset name followed by the object's number (as in the "monkey/dnode:<0x0>" case above). If an object in the MOS gets corrupted then we print out the special tag of <metadata>, followed by the object number.

Couple this with background scrubbing and you have very impressive fault management and observability. What other filesystem/storage system can give you this ability?

Note: these changes are in snv_57, will hopefully make s10u4, and perhaps even Leopard :)

If you're stuck on old bits (without the above mentioned changes) and are trying to figure out how to translate object numbers to filenames, then check out this thread



(2007-05-15 10:30:07.0/2007-02-02 17:10:53.0) Permalink Comments [4]
Trackback: http://blogs.sun.com/erickustarz/en_US/entry/damaged_files_and_zpool_status


« February 2007 »
SunMonTueWedThuFriSat
    
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
   
       
Today


XML





Today's Page Hits: 286