Holy smokes! A holey file!
Wednesday Mar 12, 2008
I was RASing around with ZFS the other day, and managed to find a file which was corrupted.
|
#
zpool scrub zpl_slim
errors:
Permanent errors have been detected in the following files: #
ls -ls /mnt/root/lib/amd64/libc.so.1 |
argv! Of course, this particular file is easily extracted from the original media, it does't contain anything unique. For those who might be concerned that it is the C runtime library, and thus very critical to running Solaris, the machine in use is only 32-bit, so the 64-bit (amd64) version of this file is never used. But suppose this were an important file for me and I wanted to recover something from it? This is a more interesting challenge...
First, let's review a little bit about how ZFS works. By default, when ZFS writes anything, it generates a checksum which is recorded someplace else, presumably safe. Actually, the checksum is recorded at least twice, just to be doubly sure it is correct. And that record is also checksummed. Back to the story, the checksum is computed on a block, not for the whole file. This is an important distinction which will come into play later. If we perform a storage pool scrub, ZFS will find the broken file and report it to you (see above), which is a good thing -- much better than simply ignoring it, like many other file systems will do.
OK, so we know that somewhere in the midst of this 2.8 MByte file, we have some corruption. But can we at least recover the bits that aren't corrupted? The answer is yes. But if you try a copy, then it bails with an error.
|
# cp
/mnt/root/lib/amd64/libc.so.1 /tmp |
Since the copy was not successful, there is no destination file, not even a partial file. It turns out that cp uses mmap(2) to map the input file and copies it to the output file with a big write(2). Since the write doesn't complete correctly, it complains and removes the output file. What we need is something less clever, dd.
|
#
dd if=/mnt/root/lib/amd64/libc.so.1 of=/tmp/whee |
OK, from this experiment we know that we can get about 1.2 MBytes by directly copying with dd. But this isn't all, or even half of the file. We can get a little more clever than that. To make it simpler, I wrote a little ksh script:
|
#!/bin/ksh |
This script will write each of the first 23 128kByte blocks from the first argument (a file) to a unique filename as a number appended to the second argument. dd is really dumb and doesn't offer much error handling which is why I hardwired the count into the script. An enterprising soul with a little bit of C programming skill could do something more complex which handles the more general case. Ok, that was difficult to understand, and I wrote it. To demonstrate, I first appologize for the redundant verbosity:
|
#
./getaround.ksh libc.so.1 /tmp/zz |
So we can clearly see that the 10th (128kByte) block is corrupted, but the rest of the blocks are ok. We can now reassemble the file with a zero-filled block.
|
#
dd if=/dev/zero of=/tmp/zz.09 bs=128k count=1 |
Now I have recreated the file with a zero-filled hole where the data corruption was. Just for grins, if you try to compare with the previous file, you should get what you expect.
|
#
cmp libc.so.1 /tmp/zz+ |
How is this useful?
Personally, I'm not sure this will be very useful for many corruption cases. As a RAS guy, I advocate many verified copies of important data placed on diverse systems and media. But most folks aren't so inclined. Everytime we talk about this on the zfs-discuss alias, somebody will say that they don't care about corruption in the middle of their mp3 files. I'm no audiophile, but I prefer my mp3s to be hole-less. So I did this little exercise to show how you can regain full access to the non-corrupted bits of a corrupted file in a more-or-less easy way. Consider this a proof of concept. There are many possible variations, such as filling with spaces instead of nulls when you are missing parts of a text file -- opportunities abound.











Nice post Richard!
You could also use 'conv=noerror,sync', which repl...
Yes, which is why my next post goes into some deta...