I've finished writing the next round of features for my ZFS Automatic Snapshots SMF Service prototype. You can download this as zfs-auto-snapshot-0.6.tar.gz

The main new features in this release are:

The send/receive support means that if you want it to, the service can send backups of each snapshot, either full streams or incremental streams, depending on how the service is configured. The service will also send snapshots of all child filesystems, if required, though without the send -r support in ZFS, this is a little unweildy a at the moment.

There's a SMF property which the user can set to the command which should recover the backup stream. Typically, this would be a "zfs receive", but there's no reason why you couldn't simply cat the output to a unique file on an NFS server. I've altered the bundled GUI to also ask for these new options when it's constructing a new manifest:

The multiple schedules per filesystem feature allows the user to assign an optional label to each snapshot schedule, allowing multiple schedules for the same dataset. For example, for a given filesystem you might choose to take monthly full backups, sent to a remote server (and backed up to tape as a flat file), but also daily incremental backups, perhaps via zfs send/receive to a different server.

The label is also useful to quickly tell which services are running for which filesystems. For example, here's the configuration on my desktop at the moment:

root@haiiro[236] svcs | grep zfs
online         Aug_31   svc:/system/filesystem/zfs/auto-snapshot:space-archive
online         Aug_31   svc:/system/filesystem/zfs/auto-snapshot:tank-root_filesystem
online         13:28:27 svc:/system/filesystem/zfs/auto-snapshot:space-timf
online         17:47:37 svc:/system/filesystem/zfs/auto-snapshot:default
online         18:00:02 svc:/system/filesystem/zfs/auto-snapshot:tank-new,backup
online         18:01:02 svc:/system/filesystem/zfs/auto-snapshot:tank-new,moreoften

I've updated the documentation and README for these new features, but let me know if anything's unclear.

Finally, I'm trying hard to do the right thing in the face of failure. The service will move to maintenance should a backup fail for any reason, and the cron job should be removed in that case. Also, I'm doing some basic locking, to see if zfs send commands are still running before attempting to send another backup stream from the same instance. Unfortunately, there doesn't seem to be an atomic way to set/get properties from SMF from what I can see, but feedback is welcome.

I hope you find this stuff useful, and if you run into problems, bug reports would be great!

ps. Chris is also doing some pretty snazzy stuff with ZFS snapshots - over on his blog : well worth checking out!

[ update here ]


Comments:

Great job, this is a really cute and useful hack! I tried something like this before with simple cron jobs and it worked well, but you made a really really cool thing integrated in SMF and all... I also used a simple shared ip failover to achieve a poor man's shared storage cluster.... but without the shared storage costs, thanks to zfs :P (I know that's not like having shared storage, but sometimes you are on budget constraints and I prefer solaris+zfs over linux+drbd... :)

Posted by sickness on September 07, 2006 at 01:50 AM IST #

Thanks sickness! - btw. I found a slight bug when using the GUI to configure "no backups". SMF doesn't like setting properties to "" (empty string) instead it sets empty values to '""' (two double quotes). So to work around this, I've now made the backup options "none","full" and "incremental" and updated the archives accordingly. I haven't revved the tarball though, as this is a minor change.

Posted by Tim Foster on September 07, 2006 at 11:39 AM IST #

Thanks for the informative entry!

Posted by Amit Kulkarni on November 20, 2006 at 06:07 PM GMT #

his comment made life a lot easier for me ... ;-) just a quick question though.... Having received a snapshot from box a on box b, how do I rename(and promote I think) the snapshot on box b, so I replace an existing zfs mount-point ??

Posted by grif on July 06, 2007 at 11:43 PM IST #

Hi grif - good to hear you're finding this useful! There's a few ways to do what you're after - rename the old filesystem on box b (zfs rename) and then rename the new filesystem to whatever you like, or simply change the zfs "mountpoint" properties for the old and new filesystems respectively. Does that help ?

Posted by Tim Foster on July 06, 2007 at 11:52 PM IST #

Hi Tim to a certain degree yes... but still I get a funny error message... isssoltest06,PA06,~ > pfexec zfs rename isssoltest06-oradata/oracle/data/oracle isssoltest06-oradata/oracle/data1 cannot rename 'isssoltest06-oradata/oracle/data/oracle': out of space isssoltest06,PA06,~ > zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT isssoltest06-oraarch 79.5G 60.2G 19.3G 75% ONLINE - isssoltest06-oradata 298G 235G 63.7G 78% ONLINE - any ideas? you are most welcome to mail me offline...

Posted by grif on July 13, 2007 at 10:54 AM IST #

That's odd grif - I'm not sure what the problem is there, I'll send mail tomorrow..

Posted by Tim Foster on July 17, 2007 at 07:47 PM IST #

This is great! I have a basic question and then wondering why I have an error in the log.
1) When setting this up for the first time to use incremental back-ups, do I need to have a first back-up in place? For example, I just did " > /lc/bkups/hello" for the script. Not sure if it needed a send?

2) Have this error in one of my logs and don't understand why:
[ Sep 5 19:20:02 Disabled. ]
[ Sep 5 19:20:02 Rereading configuration. ]
[ Sep 5 19:20:14 Enabled. ]
[ Sep 5 19:20:14 Executing start method ("/lib/svc/method/zfs-auto-snapshot start") ]
[ Sep 5 19:20:14 Method "start" exited with status 0 ]
[ Sep 6 00:00:01 Rereading configuration. ]
[ Sep 6 00:00:01 No 'refresh' method defined. Treating as :true. ]
[ Sep 6 00:00:01 Stopping for maintenance due to administrative_request. ]
[ Sep 6 00:00:01 Executing stop method ("/lib/svc/method/zfs-auto-snapshot stop") ]
[ Sep 6 00:00:01 Method "stop" exited with status 0 ]
[ Sep 6 00:00:01 Stopping for maintenance due to administrative_request. ]
[ Sep 6 00:00:01 Rereading configuration. ]

thanks!

Posted by aorchid on September 06, 2007 at 09:35 PM IST #

Hi aorchid - glad you're finding it useful ( look for version 0.8 if you're not already using it )

If you're doing incremental backups, you don't need an initial backup - the system should create one of those for you first. It looks for an older snapshot that matches the same naming policy ("zfs-auto-snapshot:label"), and if that doesn't exist, takes a full backup the first time, subsequent backups are incremental.

For the backup command, the script does a " zfs send $LAST_SNAP | $BACKUP_SAVE_CMD" - so your backup command should probably be "cat > foo", rather than just "> foo" (anyone else, feel free to correct me if you've better ideas - I'm usually wary of using cat in this way)

As for the errors you're seeing, unfortunately logging is one of the known weaknesses of this service - see more in the last paragraph of http://blogs.sun.com/timf/entry/smf_philosophy_more_on_zfs

You might find more information in the cron logs, usually mailed to the cron user - /var/mail/root probably in this case.

Finally, when debugging this, it's sometimes easiest to import the manifest, and start the service, and then directly run the method script from the command line giving the SMF URI as the argument, rather than waiting for cron to do it - eg.

# /lib/svc/method/zfs-auto-snapshot svc:/system/filesystem/zfs/auto-snapshot:space-timf,frequent

Hope this helps ?

Posted by Tim Foster on September 07, 2007 at 11:34 AM IST #

Thanks for your pointers with this (running version .8). I can run the backup from the command line using:
1. pfexec zfs send home/ftp@today | pfexec zfs recv -d zz/bkups, then
2. pfexec zfs send -i today home/ftp@zfs-auto-snap-2007-09-23-12:18:32 | pfexec zfs recv -d zz/bkups

Checking zfs list -t snapshot demonstrates all snapshots are present. Strangely enough, all these auto-snapshot services are being maintenanced once they run, but one of them continues to make snapshots locally, but not on the backup disk:

@solenv ~ % zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
zz/ftp@today 15K - 19K -
zz/ftp@zfs-auto-snap-2007-09-21-00:00:01 0 - 19K -
zz/ftp@zfs-auto-snap-2007-09-23-12:17:58 0 - 19K -
zz/ftp@zfs-auto-snap-2007-09-23-12:18:32 0 - 19K -

svcs:

maintenance Sep_21 svc:/system/filesystem/zfs/auto-snapshot:zz-ftp

You can see that the service has been in maintenance mode since the 21st, but there is a snapshot from it dated 9-23. How is that happening?

thanks

Posted by aorchid on September 23, 2007 at 08:33 PM IST #

That's interesting - I'd thought that my marking the service as maintenance (read the method script, there's a few scenarios where we deem this a sane thing to do) would disable the service, and remove the crontab entry for that set of automatic snapshots - evidently not. You can verify this by running "crontab -l" as root. If there's still an entry for zfs-auto-snapshot, then that'll be it.

As regards further debugging, rather than just running "pfexec zfs send", try actually running the method script directly from /lib/svc/method, using the FMRI as the argument to it. You'll get even more information by turning on the verbose property in the service, or just cut right to the chase, and invoke using ksh -x, eg. "ksh -x /lib/svc/method/zfs-auto-snapshot svc:/system/filesystem/zfs/auto-snapshot:space-timf,frequent"

That zfs send/recv works is good to know - now I just need to work out why my method script is deciding we should be moving to maintenance.

Feel free to mail me offline with more details, and we'll get this sorted out. [ first.last@sun.com ]

Posted by Tim Foster on September 23, 2007 at 09:25 PM IST #

Note for future visitors - version 0.6 is out of date! The latest version of this service is available from the sidebar of my blog (at the time of writing, this is version 0.10)

Posted by Tim Foster on June 15, 2008 at 01:10 PM IST #

Hello Tim,
I'm just playing with your tool on Solaris 10 (10/09) but I can't see sending/receiving support there. Is it working on Solaris 10 too?
If I run "pfexec ./zfs-auto-snapshot-admin.sh rpool/export/test"
it asks about frequency, number to save, children and label. That is all.
Version: zfs-auto-snapshot-0.6.tar.gz
Thanks for answer.
Jan Hlodan

Posted by Jan Hlodan on October 15, 2009 at 05:57 PM IST #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2010 by timf