my personal space Maybee so

Wednesday Nov 16, 2005


This is an incredibly exciting day for me. I have been working on ZFS for almost five years, and now we are finally going public with it! The last five years have been fantastic. There is nothing to compare with the experience of working with a small group of really talented engineers to write a significant piece of software from scratch. I particularly want to mention Matthew Ahrens, my partner-in-crime for the DMU. Matt continues to amaze me with his insight and coding ability. I also want to mention Mark Shellenbaum. I started this project working with Mark on the ZPL (ZFS POSIX layer). When I moved on to work on the DMU, he stuck with thankless nitty-gritty job of turning ACLs from a little-used feature of most file system into a powerful and useful feature of ZFS. Finally, I have to mention Jeff Bonwick, the guy who started the whole thing and kept it all together over the long strange trip. ZFS really is his brainchild.

OK, so ZFS has tons of new cool features in it. Many of them catch your imagination immediately: Pooled Storage, Instantaneous Snapshots, or Clones. But some of them don't sound so cool, or maybe sound like something you've heard about before. I'm going to blog about a couple of the features in this category: Quotas and Reservations. You may think you know all about quotas: how they were the bane of your existence at the always diskspace-starved university you went to. But those were the old days, read on and find out what they mean in the brand new world of ZFS.

Why Do I Care?

In the pooled storage environment provided by ZFS, the file system becomes a point of control for system administrators. File systems in this environment are cheap, fast, and easy to create. It's likely that administrators will be creating a file system for every single user and probably a file system for every project as well. ZFS provides a pwerful hierarchical naming scheme for managing this, potentially very large, name space. But, since all of these file systems will draw their space from a common pool, there is also a need to manage the space consumption in the pool. Administrators need to be able to place limits on the amount of space any single user's or project's file system can consume. They also need to be able to guarantee space will be available for user and project file systems. ZFS introduces quota and reservation properties for the file system to provide this management ability.

How Is This Different?

In traditional file systems, there is a one-to-one mapping between file systems and physical storage. The amount of space allocated to a file system is pre-determined at creation time. In this type of environment there is no need for the management controls described above, and so they simply don't exist.

The concepts of quotas and reservations do exist in todays file systems. However, they are applied at different levels and so have very different semantics. A "traditional" quota is on space use by all files owned by a particular user, and limits the amount of space that user can consume within a file system. It has no impact on the amount of space consumed by the file system itself. A "traditional" reservation is usually applied to a file. It guarantees space within the file system for the file. It also has no impact on the amount of space available to the file system as a whole.

So What Does It Really Mean?

The ZFS quota and reservation properties are designed to give the file system administrator the ability to manage the way space is consumed within the storage pool. Each file system within the pool can have a quota or reservation assigned to it. A quota is a limit on the amount of space the file system can consume. For example, if a pool (named tank) is created with 36 gigabytes of space, and some file system (named fs1) is created within that pool and given a quota of 10 gigabytes, the file system fs1 will never be allowed to use more than 10GB of the 36GB of space in the pool. A reservation is a guarantee of space to a file system. For example, in pool tank just described, if some file system (named fs2) is created and given a reservation of 10GB, the pool will now report only 26GB of available space, since 10GB has been committed to file system fs2. File system fs2 could grow to use all 36GB, but the sum of the space used by all other filesystems can never be more than 26GB.

A quota is not subject to the available space limitations of a pool. It is possible to set a quota greater than the space available in a pool. For example, if we increase the reservation of file system fs2 to 30 gigabytes (permitted only if fs1 is currently less than 6GB in size), the file system fs1 will not be able to grow beyond 6GB even though it has a quota of 10GB. There is now only 6GB of space available in the pool for all file systems other than fs2. Note that it is illegal to set a quota for less than the current file system size, as the file system would be immediately in violation of its quota.

A reservation, in contrast, is limited by the available space in the pool. It is simply not possible to reserve more space than is available. It is possible, however, to set a reservation below the current amount of space used by a file system. While this has no impact when first set, it does have meaning if the size of the file system ever drops below the set reservation. Space freed is returned to the pool, for distribution to any file system, if the file system is using more space than its reservation. However, if the file system is below its reservation then freed space remains reserved for future use by only this file system.

Quotas and reservations are particularly powerful in the hierarchical file system environment supported by ZFS. The quota and reservation properties are not inherited as other properties in ZFS. Rather, they impact their descendants directly. For example, giving a reservation of 10GB to file system fs1 does not mean that some child fs1/child also receives a 10GB reservation. A quota limits the sum of the of space consumed by the filesystem it is placed on, and all of its descendants. A reservation reserves space for use by the filesystem it is placed on and all of its descendants. So quotas and reservations limit or reserve space that can be consumed from that point in the hierarchy down. Note that snapshots are considered descendants of the file system they originated from.

Each file system tracks its own quota, reservation, and the amount of space it's using. The space used is a sum of the space used directly by the file system and the space used by all descendants. When a file system wants to use more space, it must check against its own quota and all its ancestor's quotas. Reservations are also checked at each level. Space available in a reservation will be used first to satisfy a space request. If the space request cannot be satisfied by existing reservations, a final check will be made at the pool root against the pool's available free space. If the space request can be satisfied, the space change is applied to the file system. If the change is over the reservation, the change is applied recursively to the parent.

OK, Lets See Some Examples:

Steve the administrator has a pool with 500GB of space. He has 6 users working on 3 projects. Using the zfs(1M) command, he creates home and 6 file systems under home (one for each user). He also creates project and 3 file systems under it (one for each project):
    # zfs list -o name,used,available,reservation,quota
    NAME                   USED  AVAIL  RESERV  QUOTA
    pool                   162K   498G    none  none
    pool/home             59.5K   498G    none  none
    pool/home/ahrens         8K   498G    none  none
    pool/home/billm          8K   498G    none  none
    pool/home/bonwick        8K   498G    none  none
    pool/home/marks          8K   498G    none  none
    pool/home/maybee         8K   498G    none  none
    pool/home/perrin         8K   498G    none  none
    pool/project          33.5K   489G    none  none
    pool/project/dmu         8K   498G    none  none
    pool/project/spa         8K   498G    none  none
    pool/project/zpl         8K   498G    none  none
    
Steve does not want users to be putting everything in their home directories. The bulk of their files should end up in their project directores. So he decides to set a 100GB quota on pool/home:
    # zfs set quota=100g pool/home
    # zfs list -o name,used,available,reservation,quota
    NAME                   USED  AVAIL  RESERV  QUOTA
    pool                   162K   489G    none  none
    pool/home             59.5K   100G    none  100G
    pool/home/ahrens         8K   100G    none  none
    pool/home/billm          8K   100G    none  none
    pool/home/bonwick        8K   100G    none  none
    pool/home/marks          8K   100G    none  none
    pool/home/maybee         8K   100G    none  none
    pool/home/perrin         8K   100G    none  none
    pool/project          33.5K   489G    none  none
    pool/project/dmu         8K   489G    none  none
    pool/project/spa         8K   489G    none  none
    pool/project/zpl         8K   489G    none  none
    
Note that although each user's home directory now shows 100G available, if the combined usage by all users reaches 100G, no user will be able to create any more files.

One of Steve's users, bonwick, tends to be a space hog. So he further limits him with an individual quota:
    # zfs set quota=20g pool/home/bonwick
    # zfs list -o name,used,available,reservation,quota pool/home/bonwick
    NAME                   USED  AVAIL  RESERV  QUOTA
    pool/home/bonwick        8K  20.0G    none  20.0G
    
The quota on pool/home is intended to prevent the users from consuming all of the pool space with files in their home directories. But Steve also wants to make sure that there is a reasonable amount of the pool available for home directory use (i.e., it isn't all used by the project file systems). So he reserves some space for home directories:
    # zfs set reservation=60g pool/home
    # zfs list -o name,used,available,reservation,quota
    NAME                   USED  AVAIL  RESERV  QUOTA
    pool                  60.0G   429G    none  none
    pool/home             59.5K   100G   60.0G  100G
    pool/home/ahrens         8K   100G    none  none
    pool/home/billm          8K   100G    none  none
    pool/home/bonwick        8K  20.0G    none  20.0G
    pool/home/marks          8K   100G    none  none
    pool/home/maybee         8K   100G    none  none
    pool/home/perrin         8K   100G    none  none
    pool/project          33.5K   429G    none  none
    pool/project/dmu         8K   429G    none  none
    pool/project/spa         8K   429G    none  none
    pool/project/zpl         8K   429G    none  none
    
As you can see, reserving space for pool/home has decreased the available space for the projects under pool/project.

Finally, although the spa project is going to start out small, it's already known that it's going to need a lot of space. So Steve reserves 150GB just for that project:
    # zfs set reservation=150G project/spa
    # zfs list -o name,used,available,reservation,quota
    NAME                   USED  AVAIL  RESERV  QUOTA
    pool                 210.0G   279G    none  none
    pool/home             59.5K   100G   60.0G  100G
    pool/home/ahrens         8K   100G    none  none
    pool/home/billm          8K   100G    none  none
    pool/home/bonwick        8K  20.0G    none  20.0G
    pool/home/marks          8K   100G    none  none
    pool/home/maybee         8K   100G    none  none
    pool/home/perrin         8K   100G    none  none
    pool/project          33.5K   279G    none  none
    pool/project/dmu         8K   279G    none  none
    pool/project/spa         8K   429G    150G  none
    pool/project/zpl         8K   279G    none  none
    
Note again that reserving space has decreased the generally available space in the pool.

Where Can I Find Out More?

There's a lot more information about Quotas, Reservations, and all the other cool features of ZFS in the admin guide. Heck, you can even get the source on OpenSolaris!

Technorati Tags: [ ]
Comments:

Mark,
I quickly scanned the admin guide, and thought about a "what if" scenario...

Pg 27 states

"In fact, the file deletion can end up consuming more disk space, since a new version of the directory will need to be created to reflect the new state of the namespace."

What if, the administrator wants to remove the file permanently, and have it cascade through all snapshots?

Thanks

Posted by Amit Kulkarni on November 16, 2005 at 06:10 PM MST #

Looks like it was somewhat inspired by NetApp's system of Qtrees, snapshots, and COW technology. How does it perform on writes when the filesystem gets close to full? The system will have to hunt for new places to stick new data since it's COW. Also it will be interesting to see if you provide the tools that prevent the novice administrator from greatly overallocating the filesystem, and thus running into performance and/or space issues. Good job Sun, can't wait to test it out!

Posted by Jason Santos on November 17, 2005 at 09:29 PM MST #

Everything old is new again. I'm sure it was great fun creating a new file system! However many of the "innovative" features have been around in other operating systems for a long time. They are innovate because the old solaris file system was just so ancient. Much like parchment is innovative compared to clay tablets. Granted, there are some new ideas, but many other environments have had the bulk of these features for years or decades. NetWare has supported concurrent user, directory, and volume level quotas, extremely robust ACL's going back almost 20 years. NetWare's ACLs and quotas have been tied to an enterprise directory service for more than 10 years. Netware's "traditional file system" circa 1986, included TTS, Transaction Tracking System that included implicit / explicit transaction tracking. By 1990 NetWare 4.0s TFS supported block suballocation, compression. Novell's current file system, NSS, available for several years, supports many of the features of Sun ZFS. One of the proof of concept predecessors of NSS, designed to be a low memory footprint file system was, I think, also called "ZFS" for "Zero File System". ( NetWare's TFS stored most FS structures in RAM. ) So that's sort of funny. Again, its a great technical feat to finally have a decent file system for Solaris after suffering the one I was paying thousands of dollars a year to have, but the rest of the world was doing some of this stuff on a 16 Mhz 80386 10+ years. An amazing project, to be sure, but essentially "dragging Solaris kicking and screaming into the mid-1990's"

Posted by Don't Get Out Much? on November 18, 2005 at 09:41 PM MST #

Thanks for the feedback!

In regards to the "removing a file from snapshots": At the moment, snapshot content is immutable. The only way to "permanently" remove a file from the file system is to remove all snapshots that reference the file. Its an intriguing concept to remove part of a snapshot, but it violates one its fundamental properties.

About edge performance: Yes, like all file systems, we are susceptible to performance degradation when there is very little space left. As you surmise, this is partially due to fragmentation of the storage. We will be addressing this in a future release. In the meantime, its best to leave a few MB of space in the pool as a buffer (or throw another disk into the pool when you get close to maximum capacity).

About inovation: Yes, its true, not every feature of ZFS is innovative. Many of the features we offer are the expected/required features of a modern file system. But we also offer innovation, indeed I believe that even many of the expected features are implemented in innovative ways and so offer scalability and performance beyond anything else out there. As a package, I believe that ZFS is truly innovative (but of course I work for Sun and helped build this product)! So I beg to differ with you. We have not just dragged Solaris into the mid-1990's, we have brought Solaris well into the 21st century.

Posted by Mark Maybee on November 19, 2005 at 10:12 AM MST #

"Don't Get Out Much" is just trolling, I suggest you ignore and/or delete the comment.

Posted by S on November 27, 2005 at 07:47 PM MST #

http://bugs.opensolaris.org/view_bug.do?bug_id=6531759 does this tell me, what I THINK it does? ...an animal run around in Cupertino... please tell me!!

Posted by bluecube on April 19, 2007 at 11:11 AM MDT #

Post a Comment:
  • HTML Syntax: NOT allowed