ZFS Automatic For The People - Snapshots 0.9 and Backups 0.2
I'm releasing what might be the last set of ZFS Automatic Snapshot & Backup tools on this blog! Don't worry - the services aren't going away, but some folks in the Desktop group are starting to looking more carefully at this stuff, so they might move to opensolaris.org as a project somewhere. More on that later.
I've got two new packages available now - the main changes in each are more GUI integration, just some simple zenity tools to allow users to configure the services without needing the command line, which will appear in the GNOME menu under "Administration" after install - you may need to pkill gnome-panel or logout to see the change.
I'm not happy with the rights model for these GUI tools (they run using gksu to root), but hopefully they'll be replaced soon. I've also gone to more trouble to package these services properly, so installation/removal is much easier. More details below.
ZFS Automatic Snapshot Service 0.9
Apart from the GUI code being new, I've a few extra bits here:
- Packaging - we now have a Makefile and build a package, called TIMFauto-snapshot for now. Suggestions on better package names welcome. These are just SVR4 packages - I haven't had bandwidth to play with IPS yet, so beware the postinstall/preremove scripts!
- Adding some pre-configured SMF instances of common use-cases, eg. daily, hourly, weekly and monthly snapshot schedules, so users don't have to worry about creating a manifest for these.
- The inclusion of a special "fs-name" value, "//" which makes the service look at the ZFS user property "com.sun:auto-snapshot:<label>" so that users can mark filesystems to be included in a given snapshot schedule without needing to touch the SMF properties. The pre-configured instances above use this.
- Changing the timeout for the start and stop method to be infinite - the 10 second timout was occasionally causing the service to give up trying to start when lots of instances were on the system. Because of the way I'm adding cron entries, I need to serialize the startup for these instances, so that they can take longer than 10 seconds to start, depending on the number of instances you have. This isn't ideal, so better suggestions are welcome. I think this may fix the problem you were seeing Siegfried, but I still haven't been able to reproduce it exactly.
- A user had commented that having ":" characters in the snapshot names makes them inaccessible from samba clients - so I've added a $SEP variable to the method script to allow administrators change this. I was in two minds as to whether enough people would want this that I should add it as a new SMF property instead, but for now, if you need this, just edit line 64 of the method script.
Check out the updated README for more information.
ZFS Automatic Backup Service 0.2
Not much has changed here - the GUI is the main addition, along with putting the service in it's own package, the same caveats as per the auto snapshot service apply here - this package is currently called TIMFauto-backup.
Check out the updated README for more information.
Now, with the news out of the way, I'd like to ask for help. Given that both of these services are likely to get more attention in the future (the folks putting together Indiana think these would be really handy to have in the distro, which I'm really thrilled about) I want to see if we can get some more eyes on the code.
If there's anyone out there who'd like to offer code-reviews of either of these services, I'd be really interested in hearing from you. I use these on a regular basis myself, and I know there's quite a few folks out there who use them too, so they do work as intended (or at least have done so far, Murphy's law dictates that these versions I'm releasing today will kick your cat and burn your house down!) - however, as far as I know they've never been code-reviewed, from either a high-level architectural point of view, or a nitty gritty "your indenting sucks" perspective.
Would these be candidates for the Desktop consolidation, or some other consolidation targeting Indiana, and perhaps even Solaris some day ? Certainly the automatic backup service is currently quite desktop oriented, but a scheduled snapshot service sounds like something we'd want on servers too - ON perhaps ? (I'm not sure how to initiate that decision-making process either way)
So please, all comments would be welcome - I can take it, and if it means scrapping these implementations altogether and going back to the drawing board, then this would be a really good time to find that out!
Now - you've read this far (thanks) - here's the software. In each tarball, there's a prebuilt "proto" directory containing the package (which is architecture neutral). To install, as root simply do:
# cd <install dir>/proto # pgkadd -d <name of package>
Enjoy!
Tim, I'm winding up with all of the services in maintenance, and they are complaining in the log that:
[ Nov 1 22:47:43 Executing start method ("/lib/svc/method/zfs-auto-snapshot start"). ]
'//': not a ZFS filesystem
[ Nov 1 22:47:43 Method "start" exited with status 0. ]
Hopefully you can help me out here :)
This all looks pretty interesting, but I haven't used it before. I am trying to set it up on my home server.
I certainly thing that auto-snapshotting could live in the core product-- architecturally, it would probably make sense for it to fall under the ZFS umbrella and live in OS/Net. We run a similar (but probably much less fancy) service on jurassic. Just my $0.02.
Posted by Dan Price on November 02, 2007 at 05:56 AM GMT #
Murphy's law strikes again - good catch Dan, thanks!
The problem was that the "schedule_snapshots" function normally checks the fs-name property to see if it's a valid ZFS filesystem on method start, exiting if it isn't.
The problem was, every system I tested it on had ZFS root, and so the check to see if "//" was a valid ZFS filesystem always succeeded!
There needed to be a check here to see if we were looking at the special fs-name value "//" (I probably should have chosen a better keyword)
Fixed this, and uploaded the proper version of 0.9 now.
Posted by Tim Foster on November 02, 2007 at 08:11 AM GMT #
Commenting on my own blog (I know) - I thought I'd point one thing out about "//" and auto-snapshots.
Given we're allowing users to arbitrarily mark filesystems for backup, it means that if we want to snapshot tank and it's 3 children, tank tank/a tank/b and tank/c, the script currently manually iterates over each filesystem in turn.
Instead, more efficient would be setting fs-name to tank, and turning on recursive snapshots - the "zfs/snapshot-children" SMF property (which would result in zfs snapshot -R tank@[..] ) - for large numbers of filesystems, this could make a big difference to performance.
With a bit more work, in the case of "//", we could determine whether every child is marked for snapshotting, and then optimise the way we take the snapshots accordingly. Haven't done that yet, but it's a TODO I should have mentioned in the README.
Posted by Tim Foster on November 02, 2007 at 09:17 AM GMT #
Installing on Sol10u4 x86 (fully patched):
Your prototype has /usr/bin as root:sys, when it currently is root:bin... other than that, all appears well (although migration instructions from older releases would be nice)
pkgmk appears to be broken on large filesystems (or maybe just zfs...) , FYI:
pkgmk -f `pwd`/src/prototype -d `pwd`/proto -r `pwd`/src
## Building pkgmap from package prototype file.
## Processing pkginfo file.
## Attempting to volumize 21 entries in pkgmap.
pkgmk: ERROR: Objects selected for part 1 require 41 blocks, limit=-1110110856.
## Packaging was not successful.
carson:gandalf 1 SOL$ df -h proto/
Filesystem size used avail capacity Mounted on
media/data 3.1T 211G 1.5T 13% /export/data
Posted by Carson Gaspar on November 02, 2007 at 09:27 AM GMT #
Thanks Carson! Sounds like 6430563 - the packaging utilities weren't largefile aware. You can work around it by setting a smaller quota on the filesystem you're building in.
Thanks for the noting the permissions thing, on my nv_70b system it's "/usr/bin d none 0755 root sys"
Migration instructions - delete the old method script and default instance, then pkgadd the new one. Older instances/manifests should work fine with the new code (if they don't, it's a bug)
Thanks for taking the time to provide feedback!
Posted by Tim Foster on November 02, 2007 at 09:52 AM GMT #
One nit: during install of TIMFauto-snapshot the postinstall script complains:
## Executing postinstall script.
couldn't set locale correctly
couldn't set locale correctly
Installation of <TIMFauto-snapshot> was successful.
Posted by Vladimir Kotal on November 06, 2007 at 10:22 PM GMT #
Also, the GUI could be better - couple frequency setting with zfs datasets in one window so that multiple backups can be configured at once.
Posted by Vladimir Kotal on November 06, 2007 at 10:23 PM GMT #
Actually, I am confused - after running the GUI tool do I need to import the manifest by hand or will it do it for me ? If no, where is the manifest stored ?
Also, what would happen if cron is not running ? (e.g. computer turned off for a night) Will the snapshots be simply missing ?
Posted by Vladimir Kotal on November 06, 2007 at 10:34 PM GMT #
Okay, one at a time Vladimir ! :-)
1. locale messages - no idea, probably not my fault. What locale is your system running in, and does it have the correct shared library in /usr/lib/locale/<locale name>/ ?
2. Yes the GUI is basic, I know - if you'd like to contribute a better GUI, I'd be happy to take a look.
3. If you're using the config tool that appears in the GNOME menu, then you don't need to do anything. It sets user properties on ZFS filesystems to mark them to be snapshotted under whatever preconfigured-schedule you select.
(and you can have the same filesystem selected for multiple schedules, you just need to rerun the GUI, see 2)
The preconfigured service instances get automatically started after you pkgadd TIMFauto-snapshot. More in the README, see the documentation for the special "fs-name" value "//".
If you're using it from the command line with a dataset as the argument, then yes you do need to import the resulting manifest, which is saved in the directory you ran the script from as "auto-snapshot-instance.xml" - the GUI tells you this at the end of the set of steps. This looks a bit like the image at
http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_prototype_1
More info in the README.
4. If cron or your computer isn't running, snapshots won't be taken (much as I'd love to automatically power on your computer, take snapshots and power it off again once they're done, this would be impolite!)
The service has an SMF dependency on the cron service, so if cron breaks for some reason, the snapshot service(s) will drop to maintenance mode.
Does this help ?
Posted by Tim Foster on November 07, 2007 at 11:13 AM GMT #
Tim,
Version 0.9 looks good so far. Just thought you'd like to know that I'm testing this for use in production backup of the data for our alpha product (running on an 18.6Tbyte x4500 Thumper). If it works well through our beta testing and this SMF service becomes part of OpenSolaris or integrated into ZFS then we'll probably use it for our production release.
Thanks for writing this!
Posted by Reid Spencer on November 28, 2007 at 09:43 PM GMT #
Found a typo bug in the method script. Line 307:
if [ -n "${HAS_RECUSRSIVE}" ]
It is "RECURSIVE" not "RECUSRSIVE"
Fixing this made everything work for me :)
Posted by Reid Spencer on November 28, 2007 at 11:26 PM GMT #
Thanks for spotting the error Reid! I've one other change in the pipeline at the moment, and that's to fix how the "//" works with recursive snapshots - so that's a bit ugly at the moment still.
If you need recursive snapshots now, then specifying filesystems via "fs/name" is a better way to go than using "//" and the zfs property.
Posted by Tim Foster on November 29, 2007 at 11:04 AM GMT #
Hi Tim,
ad 4: I was merely hinting at something like anacron-like service. The frequency can be remembered and after system comes up again last snapshot date will be checked and eventually new snapshot will be taken. (if multiple intervals were missed it makes sense to take only one snapshot of course)
Posted by Vladimir Kotal on November 30, 2007 at 08:37 PM GMT #
Tim,
Setting frequently, hourly and daily snapshots for the same filesystem had an issue - when the frequent pass ran, it was deleting all but 4 snapshots - including daily and monthly. So, within 5 hours, all I had for snapshots was 3 hourly snapshots and a single frequent. Looking through /lib/svc/method/zfs-auto-snapshot I noticed that one of the calls to the destroy_older_snapshots function wasn't passing the $LABEL argument.
Here's the diff:
316c316
< destroy_older_snapshots $fs $KEEP
---
> destroy_older_snapshots $fs $KEEP $LABEL
Posted by Breandan Dezendorf on January 14, 2008 at 05:37 AM GMT #
Good catch Breandan - I'll include this fix in 0.10 - thanks for letting me know!
Posted by Tim Foster on January 14, 2008 at 09:46 AM GMT #
Tim,
I installed automatic backup on a Solaris 10 u4 machine. As there is no Administration button(?) in the menu, there is no Administration -> Automatic Backups.
Is tour service intended soley for OpneSolaris or can it work on Solaris 10 u4?
Posted by Ron Halstead on February 05, 2008 at 07:55 PM GMT #
Hi Ron, I haven't tested the automatic backup stuff on s10u4, but it should work fine assuming s10u4 is able to mount your USB disks.
There might be some differences with the way GNOME picks up menu entries on s10u4 (did you restart the panel, or log out and log back in again?)
Anyway, as a workaround, to run the gui from the command line, run
% /usr/bin/zfs-auto-backup-admin.sh
Posted by 192.18.43.225 on February 06, 2008 at 09:37 AM GMT #
Note for future visitors - this version is now out of date - the latest version is always available via a link on the sidebar of my blog (at the time of writing, this is version 0.10)
Posted by Tim Foster on June 15, 2008 at 01:16 PM IST #
Hi there Tim,
I don't see the link you cite in your sidebar. Or maybe I'm not entirely clear on /what/ you're saying is out-of-date. Could you clarify?
Posted by David Abrahams on June 09, 2009 at 08:45 PM IST #
The auto-snapshot 0.9 code is out of date: the current version is 0.11, and is linked via "ZFS Automatic Snapshot SMF Service - 0.11 nv_100" (but that too is going to be out of date once I get the 0.12 fixes done!
Posted by Tim Foster on June 10, 2009 at 08:51 AM IST #