Confessions of an operating systems junkie

Val Henson's weblog


20040712 Monday July 12, 2004

Is your software crash-only? I've resisted starting a weblog for all the usual reasons - lack of time, neophobia, anti-herding instinct - but mostly because the only thing I really wanted to write was a rant about Sun's marketing strategy for ZFS (that's the Zettabyte File System, not Dynamic File Service, DFS, or DynFS), and management doesn't want us writing rants about Sun's marketing, no matter how entertaining they are. Finally, I had a halfway decent idea for a blog topic while talking to a presenter at USENIX '04: my favorite systems papers.

Crash-only Software

This is a short workshop paper that appeared in HotOS IX. From the abstract: "Crash-only programs crash safely and recover quickly. There is only one way to stop such software - by crashing it - and only one way to bring it up - by initiating recovery." As motivation, the authors show that with a system running Red Hat 8.0 with ext3 as the root file system, it is faster to crash and recover (as in a power outage) than to cleanly shutdown and restart the system - 75 seconds to crash and recover versus 104 seconds for a clean reboot, with no "important" data loss in either case. (The irony of this result should be apparent to any systems person.) The authors argue that crash-only systems, which are made up exclusively of crash-only components, are a good choice for certain classes of problems, where the best way to deal with bugs is to simply restart (crash) the component behaving badly.

The crash-only philosophy is already widespread, most notably to myself as a file systems developer in Google's clustered file system, GFS (indeed, in all of Google's software), NetApp's WAFL file system, used internally in their filers, and ZFS, in which the on-disk state is always self-consistent. This paper simply clarifies the tradeoffs and properties of crash-only software, and, most importantly, introduces a nifty name for the concept. Recommended reading for all programmers. (2004-07-12 17:11:51.0) Permalink Comments [3]

Calendar

July 2004 »
SunMonTueWedThuFriSat
    
1
2
3
4
5
6
7
8
9
10
11
14
16
17
18
19
20
21
23
24
25
26
27
28
29
30
31
       
Today

RSS Feeds

XML
All
/Operating Systems

Search

Navigation



Referers

Today's Page Hits: 51