Thursday Mar 13, 2008
Bob Netherton took a look
at my
last post on corrupted file recovery (?) and asked whether I had
considered using the noerror
option to dd. Yes, I did
experiment with dd and the
noerror option.
The noerror option is described in dd(1)
as:
noerror
Does not stop processing on an input error.
When
an input error occurs, a diagnostic mes-
sage
is written on standard error, followed
by
the current input and output block counts
in
the same format as used at completion. If
the
sync conversion is specified, the missing
input
is replaced with null bytes and pro-
cessed
normally. Otherwise, the input block
will
be omitted from the output.
This looks like the perfect solution, rather than my dd and iseek
script. But I didn't post this because, quite simply, I don't really
understand what I get out of it.
Recall that I had a corrupted file which is 2.9 MBytes in size.
Somewhere around 1.1 MBytes into the file, the data is corrupted and
fails the ZFS checksum test.
|
#
zpool scrub zpl_slim #
zpool status -v zpl_slim pool:
zpl_slim state:
DEGRADED status:
One or more devices has experienced an error resulting in data corruption.
Applications may be affected.
action:
Restore the file in question if possible. Otherwise restore the entire
pool from backup.
see:
http://www.sun.com/msg/ZFS-8000-8A scrub:
scrub completed after 0h2m with 1 errors on Tue Mar 11 13:12:42
2008
config: NAME STATE READ WRITE CKSUM zpl_slim
DEGRADED 0 0 9 c2t0d0s0 DEGRADED 0 0 9
errors:
Permanent errors have been detected in the following files: /mnt/root/lib/amd64/libc.so.1 #
ls -ls /mnt/root/lib/amd64/libc.so.1 4667
-rwxr-xr-x 1 root bin 2984368 Oct 31 18:04
/mnt/root/lib/amd64/libc.so.1
|
I attempted to use dd with the noerror flag using several
different block sizes to see what I could come up with. Here are
those results:
|
#
for i in 1k 8k 16k 32k 128k 256k 512k >
do >
dd if=libc.so.1 of=/tmp/whii.$i bs=$i conv=noerror >
done read:
I/O error 1152+0
records in 1152+0
records out ... grond#
ls -ls /tmp/whii* 3584
-rw-r--r-- 1 root root 1835008 Mar 13 11:27
/tmp/whii.128k 2464
-rw-r--r-- 1 root root 1261568 Mar 13 11:27
/tmp/whii.16k 2320
-rw-r--r-- 1 root root 1184768 Mar 13 11:27
/tmp/whii.1k 4608
-rw-r--r-- 1 root root 2359296 Mar 13 11:27
/tmp/whii.256k 2624
-rw-r--r-- 1 root root 1343488 Mar 13 11:27
/tmp/whii.32k 7168
-rw-r--r-- 1 root root 3670016 Mar 13 11:27
/tmp/whii.512k 2384
-rw-r--r-- 1 root root 1220608 Mar 13 11:27
/tmp/whii.8k
|
hmmm... all of these files are of
different sizes, so I'm really unsure what I've ended up with. None
of them are the same size as the original file, which is a bit
unexpected.
|
#
dd if=libc.so.1
of=/tmp/whaa.1k bs=1k conv=noerror read:
I/O error 1152+0
records in 1152+0
records out read:
I/O error 1153+0
records in 1153+0
records out read:
I/O error 1154+0
records in 1154+0
records out read:
I/O error 1155+0
records in 1155+0
records out read:
I/O error 1156+0
records in 1156+0
records out read:
I/O error 1157+0
records in 1157+0
records out #
ls -ls /tmp/whaa.1k 2320
-rw-r--r-- 1 root root 1184768 Mar 13 11:12
/tmp/whaa.1k
|
hmmm... well, dd
did copy some of the file, but seemed to give up after around 5
attempts and I only seemed to get the first 1.1 MBytes of the file.
What is going on here? A quick look at the dd
source (open source is a good thing) shows that there is a
definition of BADLIMIT which is how many times dd
will try before giving up. The default compilation sets BADLIMIT to
5. Aha! A quick download of the dd
code and I set BADLIMIT to be really huge and tried again.
|
#
bigbaddd if=libc.so.1
of=/tmp/whbb.1k bs=1k conv=noerror read:
I/O error 1152+0
records in 1152+0
records out ... read:
I/O error 3458+0
records in 3458+0
records out ^C
I give up #
ls -ls /tmp/whbb.1k 6920
-rw-r--r-- 1 root root 3543040 Mar 13 11:47
/tmp/whbb.1k
|
As dd
processes the input file, it doesn't really do a seek, so it can't
really get past the corruption. It is getting something, because od
shows that the end of the whbb.1k
file is not full of nulls. But I really don't believe this is the
data in a form which could be useful. And I really can't explain why
the new file is much larger than the original. I suspect that dd
gets stuck at the corrupted area and does not seek beyond it. In any
case, it appears that letting dd
do the dirty work by itself will not acheive the desired results.
This is, of course, yet another opportunity...
Wednesday Mar 12, 2008
I was RASing around with ZFS the other day, and managed to find a
file which was corrupted.
|
#
zpool scrub zpl_slim #
zpool status -v zpl_slim pool:
zpl_slim state:
DEGRADED status:
One or more devices has experienced an error resulting in data corruption.
Applications may be affected. action:
Restore the file in question if possible. Otherwise restore the entire
pool from backup. see:
http://www.sun.com/msg/ZFS-8000-8A scrub:
scrub completed after 0h2m with 1 errors on Tue Mar 11 13:12:42
2008 config: NAME
STATE READ WRITE CKSUM zpl_slim
DEGRADED 0 0 9 c2t0d0s0 DEGRADED 0 0 9
errors:
Permanent errors have been detected in the following files:
/mnt/root/lib/amd64/libc.so.1
#
ls -ls /mnt/root/lib/amd64/libc.so.1 4667 -rwxr-xr-x 1 root
bin 2984368 Oct 31 18:04 /mnt/root/lib/amd64/libc.so.1
|
argv! Of course, this particular file
is easily extracted from the original media, it does't contain
anything unique. For those who might be concerned that it is the C
runtime library, and thus very critical to running Solaris, the
machine in use is only 32-bit, so the 64-bit (amd64) version of this
file is never used. But suppose this were an important file for me
and I wanted to recover something from it? This is a more interesting
challenge...
First, let's review a little bit about
how ZFS works. By default, when ZFS writes anything, it generates a
checksum which is recorded someplace else, presumably safe.
Actually, the checksum is recorded at least twice, just to be doubly
sure it is correct. And that record is also checksummed. Back to the
story, the checksum is computed on a block, not for the whole file.
This is an important distinction which will come into play later. If
we perform a storage pool scrub, ZFS will find the broken file and
report it to you (see above), which is a good thing -- much better
than simply ignoring it, like many other file systems will do.
OK, so we know that somewhere in the
midst of this 2.8 MByte file, we have some corruption. But can we at
least recover the bits that aren't corrupted? The answer is yes.
But if you try a copy, then it bails with an error.
|
# cp
/mnt/root/lib/amd64/libc.so.1 /tmp /mnt/root/lib/amd64/libc.so.1:
I/O error
|
Since the copy was not successful,
there is no destination file, not even a partial file. It turns out
that cp
uses mmap(2) to map the
input file and copies it to the output file with a big write(2).
Since the write doesn't complete correctly, it complains and removes
the output file. What we need is something less clever, dd.
|
#
dd if=/mnt/root/lib/amd64/libc.so.1 of=/tmp/whee read:
I/O error 2304+0
records in 2304+0
records out #
ls -ls /tmp/whee 2304 -rw-r--r-- 1 root
root 1179648 Mar 12 18:53 /tmp/whee
|
OK, from this experiment we know that
we can get about 1.2 MBytes by directly copying with dd. But this
isn't all, or even half of the file. We can get a little more clever
than that. To make it simpler, I wrote a little ksh
script:
|
#!/bin/ksh integer
i=0 while
((i < 23)) do typeset
-RZ2 j=$i dd
if=$1 of=$2.$j bs=128k iseek=$i count=1 i=i+1 done
|
This script will write each of the
first 23 128kByte blocks from the first argument (a file) to a unique
filename as a number appended to the second argument. dd
is really dumb and doesn't offer much error handling which is why I
hardwired the count into the script. An enterprising soul with a
little bit of C programming skill could do something more complex
which handles the more general case. Ok, that was difficult to
understand, and I wrote it. To demonstrate, I first appologize for
the redundant verbosity:
|
#
./getaround.ksh libc.so.1 /tmp/zz 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out read:
I/O error 0+0
records in 0+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 1+0
records in 1+0
records out 0+1
records in 0+1
records out #
ls -ls /tmp/zz.* 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.00 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.01 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.02 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.03 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.04 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.05 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.06 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.07 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.08 0
-rw-r--r-- 1 root root 0 Mar 12 19:00 /tmp/zz.09 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.10 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.11 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.12 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.13 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.14 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.15 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.16 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.17 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.18 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.19 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.20 256
-rw-r--r-- 1 root root 131072 Mar 12 19:00 /tmp/zz.21 200 -rw-r--r-- 1 root
root 100784 Mar 12 19:00 /tmp/zz.22
|
So we can clearly see that the 10th
(128kByte) block is corrupted, but the rest of the blocks are ok. We
can now reassemble the file with a zero-filled block.
|
#
dd if=/dev/zero of=/tmp/zz.09 bs=128k count=1 1+0
records in 1+0
records out #
cat /tmp/zz.* > /tmp/zz #
ls -ls /tmp/zz 5832 -rw-r--r-- 1 root
root 2984368 Mar 12 19:03 /tmp/zz
|
Now I have recreated the file with a
zero-filled hole where the data corruption was. Just for grins, if
you try to compare with the previous file, you should get what you
expect.
|
#
cmp libc.so.1 /tmp/zz+ cmp:
EOF on libc.so.1
|
How is this useful?
Personally, I'm not sure this will be
very useful for many corruption cases. As a RAS guy, I advocate many
verified copies of important data placed on diverse systems and
media. But most folks aren't so inclined. Everytime we talk about
this on the zfs-discuss alias, somebody will say that they don't care
about corruption in the middle of their mp3 files. I'm no audiophile,
but I prefer my mp3s to be hole-less. So I did this little exercise
to show how you can regain full access to the non-corrupted bits of a
corrupted file in a more-or-less easy way. Consider this a proof of
concept. There are many possible variations, such as filling with
spaces instead of nulls
when you are missing parts of a text file -- opportunities abound.
Don't forget 'conv=sync'; that may help. (otherw...
The ",sync" part is important in the con...
I agree that if dd actually handled the EIO proper...