Sunday July 01, 2007 | The Navel of Narcissus Josh Simons' Coordinates in the Blogosphere |
|
HPC Consortium: A Customer View of ZFS Thomas Nau of the University of Ulm gave a customer view of ZFS at Sun's HPC Consortium meeting in Dresden this past week. His talk was titled, "ZFS – How safe do you think your data is without?" Thomas motivated his discussion by asking the audience whether it would be okay to miss the one event in a trillion that would lead to the discovery of a new particle, lose all of the email from your mail server, or lose access to all of one's mp3 or video files, or perhaps worse, not be aware of most of the errors that have occurred at all at one's site. Currently, it is a matter of trust when we store data in a file system. We trust that the disk drives, the controllers, the multiple pieces of firmware, the battery backup, the cabling, adapters, and the operating system etc, all perform well enough to protect our data. And we hope as well that the "human factor" does not cause data loss. But of course there are things that can and will go wrong. Bit rot, phantom writes, DMA errors, driver and firmware bugs, accidental overwrites, misdirected reads and/or writes, etc. In addition, because volume managers and file systems are commonly separate pieces of software, the volume manager does not have knowledge of the importance of particular pieces of data--for example, critical metadata whose loss would lead to the loss of an entire file or file system rather than "just" some data within an individual file. This "all data is equal" view of the file system further increases the vulnerability of stored data. Thomas then went on to share a case study involving data corruption at the University of Ulm. It was one of those nightmare scenarios involving the loss of email services for the entire university. It wasn't as if they hadn't thought about data protection at Ulm. Their mail service is supplied by a two-node cluster and two disk arrays fully connected through a SAN with offsite mirrors, regular backups, etc. And yet, one day one of the email servers panic'ed with a "freeing free inode" error message. After fsck'ing the file system for 10+ hours during which no user access to the file system was allowed, they felt they had fixed the problem in that fsck had found and fixed several issues. They rebooted and the system crashed within ten seconds. One more fsck and they saw the same crash again. Because they had been considering and planning a migration to ZFS, they then took this opportunity to invest an additional 40 hours to copy all of their email data into a ZFS file system after rigging up a temporary email server and recovering enough of people's important email to carry them to the following weekend when they could do the swap over. They have had no problems since moving to ZFS. The Infrastructure Department is still doing a root-cause analysis of this failure, but believe at this point that a power failure about four weeks before the outage may have somehow let their mirrors get out of sync. As Thomas pointed out, ZFS cannot eliminate hardware problems, or change the math concerning the number of failures that will result due to the failure rates of underlying components. And it can't make human decisions smarter. But it can detect and inform you about errors. It will detect out of sync mirrors. And it will correct underlying problems if you let it. Thomas then gave further details on several of ZFS's more prominent safety features, including the ubiquitous use of checksums to provide end-to-end data integrity, the built-in volume manager that allows ZFS to selectively double- or triple-replicate file system metadata depending on its importance, and the copy-on-write approach used by ZFS to avoid ever overwriting valid data that is in use. For more information on ZFS, go here. (2007-07-01 03:58:10.0) Permalink Comments [0]
Trackback URL: http://blogs.sun.com/simons/entry/hpc_consortium_a_customer_view
Comments:
Post a Comment: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||