Matthew Ahrens' Weblog

what I have to say

All | General | Solaris | ZFS

20040830 Monday August 30, 2004

What is ZFS? It occurs to me that before I can really talk much about ZFS, you need to know what it is, and how it's generally arranged. So here's an overview of what ZFS is, reproduced from our internal webpage:

ZFS is a vertically integrated storage system that provides end-to-end data integrity, immense (128-bit) capacity, and very simple administration.

To applications, ZFS looks like a standard POSIX filesystem. No porting is required.

To administrators, ZFS presents a pooled storage model that completely eliminates the concept of volumes and the associated problems of partition management, provisioning, and filesystem grow/shrink. Thousands or even millions of filesystems can all draw from a common storage pool, each one consuming only as much space as it actually needs. Moreover, the combined I/O bandwidth of all devices in the pool is available to all filesystems at all times.

All operations are copy-on-write transactions, so the on-disk state is always valid. There is no need to fsck(1M) a ZFS filesystem, ever.

Every block is checksummed to prevent silent data corruption, and the data is self-healing in mirrored or RAID configurations. When one copy is damaged, ZFS detects it (via the checksum) and uses another copy to repair it.

ZFS provides an unlimited number of snapshots, which are created in constant-time, and do not require additional copies of any data. Snapshots provide point-in-time copies of the data for backups and end-user recovery from fat-finger mistakes.

ZFS provides clones, which provide a fast, space-efficient way of making "copies" of filesystems. Clones are extremely useful when many almost-identical "copies" of a set of data are required -- for example, multiple code sourcebases, one for each engineer or bug being fixed; or multiple system images, one for each zone or netboot-ed machine.

ZFS also provides quotas, to limit space consumption; reservations, to guarantee availability of space in the future; compression, to reduce both disk space and I/O bandwidth requirements; and supports the full range of NFSv4/NT-style ACLs.
What exactly is a "pooled storage model"? Basically it means rather than stacking one FS on top of one volume on top of some disks, you stack many filesystems on top of one storage pool on top of lots of disks. Take a home directoy server with a few thousand users and a few terabytes of data. Traditionally, you'd probably set it up so that there are a few filesystems, each a few hundred megabytes, and put a couple hundred users on each filesystem.

That seems odd -- why is there an arbitrariy grouping of users into filesystems? It would be more logical to have either one filesystem for all users, or one filesystem for each user. We can rule out the latter because it would require that we statically partition our storage and decide up front how much space each user got -- ugh. Using one big filesystem would be plausable, but performance may become a problem with large filesystems -- both common run-time performance and performance of administrative tasks. Many backup tools are filesystem-based. The run-time of fsck(1m) is not linear in the size of the filesystem, so it could take a lot longer to fsck that one 8TB filesystem than it would to fsck 80 100GB filesystems. Furthermore, some filesystems simply don't support more than a terabyte or so of storage.

It's inconvenient to run out of space in a traditional filesystem, and happens all too often. You might have lots of free space in a different filesystem, but you can't easily use it. You could manually migrate users to different filesystems to balance the free space... (hope your users don't mind downtime! hope you find the right backup tape when it comes time to restore!) Eventually you'll have to install new disks, make a new volume and filesystem out of them, and then migrate some users over to the new filesystem, incurring downtime. I experienced these kinds of problems with my home directory when I was attending school (using VxFS on VxVM on Solaris), they still plague some home directory and other file servers at Sun (using UFS on SVM on Solaris).

With ZFS, you can have one storage pool which encompasses all the storage attached to your server. Then you can easily create one filesystem for each user. When you run low on storage, simply attach more disks and add them to the pool. No downtime. This is the scenario on the home directory server that I use at Sun, which uses ZFS on Solaris.

Thus concludes ZFS lesson 1. (2004-08-30 15:38:27.0) Permalink Comments [33]

Calendar

« August 2004 »
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
31
    
       
Today

RSS Feeds

XML
All
/General
/Solaris
/ZFS

Search

Links


Navigation