Useful stuff for your blog-reading pleasure.
All | General

20070906 Thursday September 06, 2007

7 Easy Tips for ZFS Starters

So you're now curious about ZFS. Maybe you read Jonathan's latest blog entry on ZFS or you've followed some other buzz on the Solaris ZFS file system or maybe you saw a friend using it. Now it's time for you to try it out yourself. It's easy and here are seven tips to get you started quickly and effortlessly:

1. Check out what Solaris ZFS can do for you

First, try to compose yourself a picture of what the Solaris ZFS filesystem is, what features it has and how it can work to your advantage. Check out the CSI:Munich video for a fun demo on how Solaris ZFS can turn 12 cheap USB memory sticks into highly available, enterprise-class, robust storage. Of course, what works with USB sticks also works with your own harddisks or any other storage device. Also, there are great ZFS screencasts that show you some more powerful features in an easy to follow way. Finally, there's a nice writeup on "What is ZFS?" at the OpenSolaris ZFS Community's homepage.

2. Read some (easy) documentation

It's easy to configure Solaris ZFS. Really. You just need to know two commands: zpool (1M) and zfs (1M). That's it. So, get your hands onto a Solaris system (or download and install it for free) and take a look at those manpages. If you still want more, then there's of course the ZFS Administration Guide with detailed planning, configuration and troubleshooting steps. If you want to learn even more, check out the OpenSolaris ZFS Community Links page. German-speaking readers are invited to read my german white paper on ZFS or listen to episode #006 of the POFACS podcast.

3. Dive into the pool

Solaris ZFS manages your storage devices in pools. Pools are a convenient way of abstracting storage hardware and turning it into a repository of blocks to store your data in. Each pool takes a number of devices and applies an availability scheme (or none) to it. Pools can then be easily expanded by adding more disks to them. Use pools to manage your hardware and its availability properties. You could create a mirrored pool for data that should be protected against disk failure and that needs fast access to hardware. Then, you could add another pool using RAID-Z (which is similar, but better than RAID-5) for data that needs to be protected but where performance is not the first priority. For scratch, test or demo data, a pool without any RAID scheme is ok, too. Pools are easily created:

zpool create mypool mirror c0d0 c1d0

Will create a mirror out of the two disk devices c0d0 and c1d0. Similarly, you can easily create a RAID-Z pool by saying:

zpool create mypool raidz c0d0 c1d0 c2d0

The easiest way to turn a disk into a pool is:

zpool create mypool c0d0

It's that easy. All the complexity of finding, sanity-checking, labeling, formatting and managing disks is hidden behind this simple command.

If you don't have any spare disks to try this out with, then you can just create yourself some files, then use them as if they were block devices:

# mkfile 128m /export/stuff/disk1
# mkfile 128m /export/stuff/disk2
# zpool create testpool mirror /export/stuff/disk1 /export/stuff/disk2
# zpool status testpool
pool: testpool
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror ONLINE 0 0 0
/export/stuff/disk1 ONLINE 0 0 0
/export/stuff/disk2 ONLINE 0 0 0

errors: No known data errors

The cool thing about this procedure is that you can create as many virtual disks as you like and then test ZFS's features such as data integrity, self-healing, hot spares, RAID-Z and RAID-Z2 etc. without having to find any free disks.

When creating a pool for production data, think about redundancy. There are three basic properties to storage: availability, performance and space. And it's a good idea to prioritize them in that order: Make sure you have redundancy (mirroring, RAID-Z, RAID-Z2) so ZFS can self-heal data when stuff goes wrong at the hardware level. Then decide how much performance you want. Generally, mirroring is faster and more flexible than RAID-Z/Z2, especially if the pool is degraded and ZFS needs to reconstruct data. Space is the cheapest of all three, so don't be greedy and try to give priority to the other two. Richard Elling has some great recommendations on RAID, space and MTTDL. Roch has also posted a great article on mirroring vs. RAID-Z.

4. The power to give

Once you have set up your basic pool, you can already access your new ZFS file system: Your pool has been automatically mounted for you in the root directory. If you followed the examples above, then you can just cd to /mypool and start using ZFS!

But there's more: Creating additional ZFS file systems that use your pool's resources is very easy, just say something like:

zfs create mypool/home
zfs create mypool/home/johndoe
zfs create mypool/home/janedoe

Each of these commands only takes seconds to complete and every time you will get a full new file system, already set up and mounted for you to start using it immediately. Notice that you can manage your ZFS filesystems hierarchically as seen above. Use pools to manage storage properties at the hardware level, use filesystems to present storage to your users and applications. Filesystems have properties (compression, quotas, reservations, etc.) that you can easily administer using zfs set and that are inherited across the hierarchy. Check out Chris Gerhard's blog on more thoughts about file system organization.

5. Snapshot early, snapshot often

ZFS snapshots are quick, easy and cheap. Much cheaper than the horrible experience when you realize that you just deleted a very important file that hasn't been backed up yet! So, use snapshots whenever you can. If you think about whether to snapshot or not, just do it. I recently spent only about $220 on two 320 GB USB disks for my home server to expand my pool with. At these prices, the time you spend thinking about whether to snapshot or not may be more worth than just buying more disk.

Again, Chris has some wisdom on this topic in his ZFS snapshot massacre blog entry. He once had over 60000 snapshots and he's snapshotting filesystems by the minute! Since snapshots in ZFS “just work” and since they only take up the space that actually changes between snapshots, there's really no reason to not doing snapshots all the time. Maybe once per minute is a little bit exaggerated, but once a week, once per day or once an hour per active filesystem is definitely good advice.

Instead of time based snapshotting, Chris came up with the idea to snapshot a file system shared with Samba whenever the Samba user logs in!

6. See the Synergy

ZFS by itself is very powerful. But the full beauty of it can be unleashed by combining ZFS with other great Solaris 10 features. Here are some examples:

  • Tim Foster has written a great SMF service that will snapshot your ZFS filesystems on a regular basis. It's fully automatic, configurable and integrated with SMF in a beautiful way.

  • ZFS can create block devices, too. They are called zvols. Since Nevada build 54, they are fully integrated into the Solaris iSCSI infrastructure. See Ben Rockwood's blog entry on the beauty of iSCSI with ZFS.

  • A couple of people are now elevating this concept even further: Take two Thumpers, create big zvols inside them, export them through iSCSI and mirror over them with ZFS on a server. You'll get a huge, distributed storage subsystem that can be easily exported and imported on a regular network. A poor man's SAN and a powerful shared storage for future HA clusters thanks to ZFS, iSCSI and Thumper! Jörg Möllenkamp is taking this concept a bit further by thinking about ZFS, iSCSI, Thumper and SAM-FS.

  • Check out some cool Sun StorageTek Availability Suite and ZFS demos here.

  • ZFS and boot support is still in the works, but if you're brave, you can try it out with the newer Solaris Nevada distributions on x64 systems. Think about the possibilities together with Solaris Live Upgrade! Create a new boot environment in seconds while not needing to find or dedicate a new partition, thanks to snapshots, while saving most of the needed disk space!

And that's only the beginning. As ZFS becomes more and more adopted, we'll see many more creative uses of ZFS with other Solaris 10 technologies and other OSes.

7. Beam me up, ZFS!

One of the most amazing features of ZFS is zfs send/receive. zfs send will turn a ZFS filesystem into a bitstream that you can save to a file, pipe through bzip2 for compression or send through ssh to a distant server for archiving or for remote replication through the corresponding zfs receive command. It also supports incremental sending and receiving out of subsequent snapshots through the -i modifier.

This is a powerful feature with a lot of uses:

  • Create your Solaris zone as a ZFS filesystem, complete with applications, configuration, automation scripts, users etc., zfs send | bzip2 >zone_archive.zfs.bz2 it for later use. Then, unpack and create hundreds of cloned zones out of this master copy.

  • Easily migrate ZFS filesystems between pools on the same machine or on distant machines (through ssh) with zfs send/receive.

  • Create a crontab entry that takes a snapshot every minute, then zfs send -i it over ssh to a second machine where it is piped into zfs receive. Tadah! You'll get free, finely-grained, online remote replication of your precious data.

  • Easily create efficient full or incremental backups of home directories (each in their own ZFS filesystems) through ZFS send. Again, you can compress them and treat them like you would, say, treat a tar archive.

See? It is easy, isn't it? I hope this guide helps you find your way around the world of ZFS. If you want more, drop by the OpenSolaris ZFS Community, we have a mailing list/forum where bright and friendly people hang out that will be glad to help you.

"7 Easy Tips for ZFS Starters" has been brought to you by Constantin's Blooog.
This entry was created on 2007-09-06 11:20:15.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.


20070816 Thursday August 16, 2007

ZFS Snapshot Replication Script

One of the OpenSolaris' ZFS filesystem's greatest features are its snapshots. You can easily create a snapshot by saying zfs create pool/filesystem@snapshot and then replicate that snapshot by saying something like zfs send pool/filesystem@snapshot | zfs receive newpool/some_other_filesystem. So far, so great.

Now let's say you have a nice pool and have been creating snapshots on a regular basis. After a few months, you decide to remodel your pool layout or migrate some of your filesystems over to a new pool for whatever reason. Then, you're facing a lot of those zfs send and receive commands. Especially, if you're like Chris and do snapshots on a per-minute basis. That's why the ZFS community is now waiting for 6421958: A zfs send -r option.

I had to migrate quite a few filesystems and many snapshots (thanks to Tim's excellent ZFS Snapshot SMF Service) lately when I set up a new pool strategy for my home server so I wrote myself a script to do the replication job. Since it may take some time for the send -r feature to be implemented, I turned it into a more generic utility.

Disclaimer: Please be advised that this script has only been tested a couple of times and it is provided to you completely on an "as-is" basis. Please have a look at the script to understand how it works and try it out on some non-risky pools and filesystems before you do real stuff with it. Run a backup before using this script and don't shoot me if something goes wrong.

Ok, what can this script do for you? First of all, check out its -h flag to see what options it provides:

# zfs-replicate -h
usage: zfs-replicate [-h] [-F ] [-n] [-s] [-v]
[-m [-c "FMRI|pattern[ FMRI|pattern]...]" ]
source dest

where source and dest is a ZFS filesystem, snapshot or volume.

Options:
-h: Print this help.
-F: Force a rollback of the destination filesystem to the most recent
snapshot before replicating a snapshot. This is equivalent to the -F
option of zfs receive.
-n: Don't actually replicate anything, just print what would be done.
This will only print the next step but nothing dependent on that step
since it won't actually be executed. For instance, -ns will print the
snapshot command but not print the subsequent send/receive command as it
depends on the snapshot actually being taken.
-s: After sending existing snapshots, make a final one and replicate it as well.
This option requires that the source be a filesystem and not a snapshot.
-m: After sending all snapshots, migrate the source to the dest filesystem by
unmounting the source filesystem and changing the new filesystem's
mountpoint to that of the source one. This option includes -s.
-c: A space delimited list of SMF services in quotes to be temporarily disabled
before unmounting the source, then re-enable after changing the mountpoint
of the destination. Requires -m.
-v: Be verbose.

Great, let's try it out. Here's a pool with some data and some snapshots as well as another, empty pool:


# zfs list -r piscina
NAME                            USED  AVAIL  REFER  MOUNTPOINT
piscina                         384M  87.6M    19K  /piscina
piscina/ficheros                384M  87.6M   384M  /piscina/ficheros
piscina/ficheros@instantanea1    17K      -   128M  -
piscina/ficheros@instantanea2    17K      -   256M  -
piscina/ficheros@instantanea3      0      -   384M  -
# zfs list -r lago
NAME   USED  AVAIL  REFER  MOUNTPOINT
lago   105K   472M    18K  /lago

Now, let's copy the ficheros filesystem along with all its snapshots in one go:


# zfs-replicate -v piscina/ficheros lago
Sending piscina/ficheros@instantanea1 to lago.
Sending piscina/ficheros@instantanea2 to lago.
(incremental to piscina/ficheros@instantanea1.)
Sending piscina/ficheros@instantanea3 to lago.
(incremental to piscina/ficheros@instantanea2.)
# zfs list -r lago
NAME                         USED  AVAIL  REFER  MOUNTPOINT
lago                         384M  87.6M    20K  /lago
lago/ficheros                384M  87.6M   384M  /lago/ficheros
lago/ficheros@instantanea1    17K      -   128M  -
lago/ficheros@instantanea2    17K      -   256M  -
lago/ficheros@instantanea3      0      -   384M  - 

It works. And it automatically used incremental snapshots as well to save space, too!

If we now add another snapshot to our original pool piscina and then run zfs-replicate again, it will skip already replicated snapshots and just copy those that are additional:


# zfs snapshot piscina/ficheros@instantanea4 
# zfs-replicate -v piscina/ficheros lago
Sending piscina/ficheros@instantanea4 to lago.
(incremental to piscina/ficheros@instantanea3.)
# zfs list -r lago
NAME                         USED  AVAIL  REFER  MOUNTPOINT
lago                         384M  87.6M    20K  /lago
lago/ficheros                384M  87.6M   384M  /lago/ficheros
lago/ficheros@instantanea1    17K      -   128M  -
lago/ficheros@instantanea2    17K      -   256M  -
lago/ficheros@instantanea3      0      -   384M  -
lago/ficheros@instantanea4      0      -   384M  -

This is useful because you can now run this script on regularly basis to have one pool automatically backed up to another pool. In fact, the -s option will first make sure all existing snapshots are copied over, then create a new snapshot for you and copy that one over as well, all in one command.

Sometimes, the destination filesystem gets touched, or otherwise acted upon and then zfs receive to it will no longer work. In that case, you can use the -F switch which will force a rollback of the destination filesystem to its latest snapshot and then you'll be back in business.

Finally, another scenario is file system migration: You have a filesystem in one pool and want to migrate it with all it's snapshots to another pool, with minimal downtime. This can be done using the -m option: First copy all existing snapshots while the source file system is still live, then unmount it, create a final snapshot, copy that one over to the destination file system as well, set the destination filesystem's mountpoint to be the same as the source filesystem's one and remount the destination file system. (Please note that the script might appear frozen when unmounting while it's waiting for some open files to be closed, etc.).

If you're worried about some daemons depending on your filesystem's availability (like Samba), you can use the -c option to provide their names. zfs-replicate will then bring down the matching SMF services right before unmounting and restart them automatically after re-mounting the migrated filesystem. Again, you might need to wait until the SMF service is really down (Read: The last Samba connection has closed).

I hope this script is useful to you and again, I assume you know what you're doing and do some testing before using it in production. I'm sure there are still some bugs and shortcomings so please send me email to constantin (dot) gonzalez (at) sun (dot) com or leave a comment and I'll try to make the script better for you.

Many thanks to Chris Gerhard, whose backup script was an inspiration for me in hacking together this utility. Also, many thanks to Tim Foster for some code-review and initial feedback (Sorry, I haven't managed to implement some locking yet...). Let me know when you're in Munich and you'll get some well-deserved beer!

"ZFS Snapshot Replication Script" has been brought to you by Constantin's Blooog.
This entry was created on 2007-08-16 13:41:02.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.


20070812 Sunday August 12, 2007

ZFS Interview in the POFACS Podcast (German)

Last week, I've been interviewed by the german podcast POFACS, the podcast for alternative computer systems. Today, the interview went live, so if you happen to understand the german language and want to learn about ZFS while driving to work or while jogging, you're invited to listen to the interview.

I was actually amazed at how long the interview turned out: It's 40 minutes, while recording the piece only felt like 20 minutes or so. The average commute time in germany is about 20 minutes, so this interview will easily cover both ways to and from work. But there's more: This episode of POFACS also introduces you to the NetBSD operating system, the German Unix User Group GUUG. Finally, the guys at POFACS were also so kind to feature the HELDENFunk podcast in a short introductory interview. Thanks!

So with a total playing time if 1 hour and 20 minutes, this episode has you covered for at least two commutes or a couple of jogging runs :).

"ZFS Interview in the POFACS Podcast (German)" has been brought to you by Constantin's Blooog.
This entry was created on 2007-08-12 10:41:54.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.




Archives
Subscribe to This Blog!
Most Popular Entries
Watch videos of Constantin
About this site
Links
Get in Touch!
This is Sun employee Constantin Gonzalez' personal blog.
All opinions expressed herein are solely of the author and do not necessarily reflect those of his employer.
If you want to contact the author, please send email to constantin (dot) gonzalez (at) sun (dot) com.
Thank you for reading this blog!