
Wednesday November 16, 2005
The End-to-end argument meets ZFS
I'm really a networking&security type at heart. Why am I excited about ZFS?
Back when I was studying for a degree in computer science, I took what
was then (and probably still is) the best undergraduate course in MIT's
CS department: Computer Systems Engineering, better known as "6.033" or just "'033".
A major part of the course was a series of case studies -- we would
read an important paper on a system, write a short analysis, and then
discuss the system in class.
One of the key papers presented was Saltzer, Reed, and Clark's "End to End Arguments in System Design"
I'll quote the abstract:
This paper presents a design principle that helps guide placement of
functions among the modules of a distributed computer system. This
principle, called the end-to-end argument, suggests that functions
placed at low levels of a system may be redundant or of little value
when compared with the cost of providing them at that low level.
Examples discussed in the paper include bit error recovery, security
using encryption, duplicate message suppression, recovery from system
crashes, and delivery acknowledgement. Low level mechanisms to
support these functions are justified only as performance
enhancements.
The paper has spawned a lot of debate and more than a few followups
over the years, and interminable arguments about what counts as an end,
but overall I think it's held up pretty well.
Fast forward to a couple years ago when I first saw a high level
overview of the ZFS design. I immediately thought of this paper.
ZFS applies the end-to-end principle to filesystem design.
End-to-end is normally applied to distributed systems, where two
distinct "ends" are communicating with each other, often in real time
or with relatively short delays.
Here, the "ends" are separated mainly by time: one "end" writes data to
the filesystem, and the other "end" expects to get the exact same data
back in the future. (And the "middle" is the storage subsystem,
which these days is itself a complex distributed system).
By placing the functionality required for robustness at a relatively
high layer within the storage stack, ZFS can perform these functions
with reduced overall system cost; you can use a much simpler disk
subsystem to get a desired level of performance, availability and
reliability.
For instance, the filesystem knows for sure which disk blocks are in
use. The disk doesn't. If you replace a disk in a mirror or
Raid-Z group, ZFS only needs to copy the blocks that are currently in
use to the new disk; when lower layers are responsible for redundancy,
you have to copy the whole
disk. With the upper layer responsible for redundancy, the repair
takes less time, and your window of exposure to an additional failure
can be significantly shorter.
I'm hoping this leads to simpler (and cheaper) storage hardware in the
long run -- JBODs seem to be ideal for ZFS, and you can take the
battery-backed NVRAM out of the raid controllers and give it to the lumberjacks.
Technorati Tag: ZFS
(2005-11-16 09:20:06.0)
Permalink