Automatic snapshots into The Cloud
I've been rather busy of late, with a recent trip to MPK resulting in a ton of work to bring back home, so haven't had much chance to blog as much as I'd like, apologies.
But, recently, there's been some activity with the ZFS Automatic Snapshot service that I thought I'd publicise a little bit. It seems that great minds think alike: myself, Brock Pytlik from the IPS team and Glenn Brunette (ok, two great minds, and me :-) all seem to have come to the independent conclusion that automatic snapshots on a local machine are good, but snapshots going to a remote machine are great, and have become more interested in dusting off the lesser-known zfs/backup-save-cmd option of the ZFS Automatic Snapshot service.
The timing here is excellent, as this is something I'd been thinking about with the advent of the Sun Cloud API (which relates to my day job at the moment in an interesting kind of way). More work to come in these areas I hope, but after a few mails back & forth with Glenn, he's made it first-past-the-post, with an implementation to send auto snapshots to S3 storage, which looks pretty nifty to me!
There's a heap of other stuff we could do here, we need a few things for this to really fly though:
- A means to list all snapshots on the remote end
- A means to choose the most recent common snapshot between the local and remote ends, and send an incremental send stream between that snapshot, and the one we've just taken
- A means to define what "remote end" means, in an extensible way (be it removable media, network devices, cloud storage etc.)
- An ability to send/recv into ZFS-based Cloud storage - (storing flat ZFS send streams in the cloud isn't as useful imho - I'd like to be able to browse these from any device)
- Use the auto-snapshot
zfs/intervalSMF property set tonone, we can take event-driven snapshots, so we could do things like hook the service into nwam, so that we take an on-demand snapshot whenever we get a network connection ( assuming a sensible time period has elapsed since our last snapshot) so we never lose data. The zfs auto backup prototype I'd posted before did this for local disk storage, but I never really took the idea further, waiting for better ZFS removable-media support.
Of course, there's just not enough hours in the day for one person to do all of this, but if you're interested in these sorts of problems, do subscribe yourself to the ZFS Automatic Snapshot email alias and dive in!
But once again, kudos to Glenn for giving this a whirl!

I'd love to see a Visual Panel to configure this :)
Posted by Glynn Foster on April 27, 2009 at 09:39 PM IST #
Sounds quite interesting about the idea, is looking up the SUN cloud API for further specification. Thanks.
Posted by Robin Guo on April 28, 2009 at 04:29 AM IST #
Why would you want to configure it in a separate visual panel, rather than the Time Slider GUI?
Posted by 192.18.8.1 on April 28, 2009 at 12:32 PM IST #
192.18.8.1: Fair point I guess - sort of forgot about that GUI :)
Posted by Glynn Foster on April 28, 2009 at 10:41 PM IST #
I've been told by several people who know more than I do that storing "zfs send" output is inappropriate for backups, for at least two reasons:
1. The zfs send format is subject to change as ZFS is updated and improved. Without stability you might not be able to restore.
2. It lacks verifiability and/or recoverability: if your send data gets corrupted you may have just lost everything.
So your idea of backing up to ZFS filesystems in the cloud would be an improvement, provided you can maintain the systems on the two ends of the pipe with the same version of ZFS, which sounds a bit dubious now that I think of it. Also there's the issue of cost, since it probably involves a virtual solaris server in the cloud: that's probably even more expensive than backing up to S3, which is starting to get a little painful for me already.
I am going to be backing up to a tahoe grid (http://allmydata.org). You can build a tahoe grid with a bunch of friends, or you can get an unlimited amount of storage from allmydata.com for a reasonably low flat rate and no transfer fees. What I'm not sure of yet is how best to integrate that with a ZFS-based fileserver and a bunch of diverse machines, many of which are laptops and will travel. Figuring that out will be interesting.
Posted by David Abrahams on June 09, 2009 at 08:57 PM IST #
Yep, no arguments there David: I mentioned exactly this in my blog post,
"* An ability to send/recv into ZFS-based Cloud storage - (storing flat ZFS send streams in the cloud isn't as useful imho - I'd like to be able to browse these from any device)"
and have asked the Cloud API people for this support.
Maintaining the same version of ZFS at each end would be a case of versioning, given you can create arbitrary dataset versions (not pool versions, zfs create -o) so yes, you'd need to make sure they're in sync.
Posted by Tim Foster on June 10, 2009 at 09:14 AM IST #
Tim, I saw your bit about ZFS-based cloud storage, which is why I called it “your idea” ;-). I think you're missing my point(s), though. First, unless I am mistaken, there are more important reasons than usability for mission-critical backups not to use zfs send/receive. I realize you can set the on-disk version of a zfs dataset but the documentation gives no indication that it will affect the stream representation:
“The format of the stream is evolving. No backwards compatibility is guaranteed. You may not be able to receive your streams on future versions of ZFS.”
Seems pretty stark to me.
The other point is about cost. At least right now, ZFS storage in the cloud isn't cost-competitive.
Posted by David Abrahams on June 10, 2009 at 07:05 PM IST #
Wouldn't EBS (ala amazon or a workalike) in the Sun Cloud address something like this? So I guess the question would be will the Sun Cloud have an EBS workalike? You would think the zvol technology would lend itself to that.
At any rate I sure hope the Sun Cloud has some sort of block device storage provider, as there are IO intensive applications I would like to run in the cloud. Porting the storage backends to S3 or WebDav simply is not an option.
Posted by Zachary Schneider on August 05, 2009 at 09:19 PM IST #