I think I'm about done with the next release. Here's the Changelog entry:

0.12

  • Add event-based snapshots
  • Add support to change the separator character in snapshot names
    • set the default value of "zfs/sep" to "_"
    • useful for CIFs clients that previously choked on colons in snapshot names
  • Improved shutdown speed via http://blogs.sun.com/dp/entry/speeding_to_a_halt
  • Add support to allow the user disable auto-snapshots of new pools
  • Bugfix to allow snapshots of datasets with spaces in their names
  • Bugfix to properly deal with namespace clashes in dataset names
  • Exported $LAST_SNAP and $PREV_SNAP variables when performing backups

The main new thing here is the "event-based-snapshot" instance, as described in the README and Defect 9595. It's nothing earth shattering, but a useful feature I think.

I've found that in my day to day use of OpenSolaris, I tend to take the lazy option of running "zfs snapshot -r rpool@snap" whenever I'm about to do something to the system that I might regret later, and don't want to wait for the 15 minute ":frequent" instance to fire. Later, I go to run the same thing again, and find I've already got a snapshot called rpool@snap, so I take a new one, rpool@snap2 - can you tell where this is going?

Eventually, I find myself out of disk-space and with no real clue what was interesting about rpool@snapn in the first place. I torch the snapshots and get on with life. So far, this has worked just fine, but it's a bit manual and results in us snapshotting rpool/swap and rpool/dump, and I don't really want that.

Some of the stuff Erwann added for 2009.06 helps a bit - with the "snapshot this directory" button in Nautilus, we could take a snapshot of a single dataset, but it doesn't group them, and still leaves the name of the snapshot as the only place to describe the snapshot contents.

So, my solution was to add the svc:/system/filesystem/zfs/auto-snapshot:event instance. Most of the code for this was in 0.11, but I'm now including a manifest that uses it. This service isn't managed using cron, instead you get to manually run run the method script each time you want to take a snapshot. You also have to option of supplying a description that gets stored into a user property on the snapshot, com.sun:auto-snapshot-desc (feel free to go all "Web 2.0" here, and add #hashtags if you want!)

Where this wins over a simple "zfs snapshot -r rpool@snap" is that it uses the com.sun:auto-snapshot properties used by the other instances to determine which datasets we want to take snapshots of (and can be overridden by com.sun:auto-snapshot:event).

I've hacked together a (flawed) GUI that I've added as a launcher on my GNOME panel which shows a Zenity dialog box asking for a description of the snapshot, runs the method script, then pops up a notification once the snapshots have been taken. I've not included this in the package, since I'm sure someone will do a better job of it, but download from here if you want it. Obviously this is but one use for event-based snapshots: if you set the zfs/backup-save-cmd SMF property on that instance, you'd have a 1-click "backup my stuff" button! :-)



More of my awesome GUI skills...



The snapshot event notification popup

As ever, the README also documents these changes, and you can get the sources and build yourself a package via:

$ hg clone ssh://anon@hg.opensolaris.org/hg/jds/zfs-snapshot

I'm not sure when these changes will land in OpenSolaris - there'll be a 0.12.1 release as soon as I get to write support for the 'zfs list -d' command that Chris added for the bug I filed 6762432, so perhaps we'll wait for that (I thought it polite to wait till everyone was able to run a version of OpenSolaris that included that fix before making the changes).

Comments welcome here, or on the zfs-auto-snapshot mailing list


Comments:

Would it be possible to use this in enterprise Solaris? We rely need an automatic snapshots program but we only use enterprise Solaris not open Solaris.

Posted by Eli Kleinman on June 25, 2009 at 01:39 AM IST #

Hi Eli,

This version, no unfortunately: we moved to using ksh93 as the scripting shell a few changesets back, and ksh93 isn't on s10 as far as I know. Earlier versions would work fine with the ksh that's in s10 (ksh88), see:

6766696 change for 6764535 introduced ksh93-specific syntax
http://bugs.opensolaris.org/view_bug.do?bug_id=6766696

it might be possible to backport subsequent ksh93'isms to ksh88 though.

http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_11#comment-1226590056000
http://mail.opensolaris.org/pipermail/zfs-auto-snapshot/2008-December/000077.html

Posted by Tim Foster on June 25, 2009 at 10:11 AM IST #

Solaris 10 includes /usr/dt/bin/dtksh which, although quite old, is based on ksh93. If there are no other opensolaris dependencies in 0.12, it might be possible to run auto snapshots using dtksh rather than ksh.

We've been using earlier versions of auto snapshots on Solaris 10 for a while now and really appreciate it. Tim - thanks for this work!

Posted by Sean Walmsley on June 25, 2009 at 01:34 PM IST #

No problem, glad you find it useful! dtksh might do the trick for s10 users, not a bad idea, worth investigating. Are snapshot and user-properties backported to s10 yet? (can you do "zfs set foo:bar=baz mypool@snap")

I suspect the 'zfs list -d' stuff in the upcoming 0.12.1 release would not be in any s10 updates just yet, but that'd be easier to back out (I'd just be using it for a performance boost)

Posted by Tim Foster on June 25, 2009 at 01:55 PM IST #

Just tried this on S10 U6 and it doesn't work:

cannot set property for 'mypool@snap': snapshot properties cannot be modified

I suppose it's possible this was implemented in U7, but I don't remember seeing any ZFS updates in the U7 release notes.

Oh well...

Posted by Sean Walmsley on June 25, 2009 at 05:14 PM IST #

Thanks for all the help, it works great (i tested on Solaris 10 10/08).

I had to change 3 things to get it to work.

1)The shell to run in /usr/dt/bin/dtksh
2)Remove on line 511 and 517 the -o com.sun:auto-snapshot-desc="$EVENT", Solaris 10 10/08 doesn't include this option.
3)Remove the space on line 970 between $SWAPVOLS and $(echo
SWAPVOLS="$SWAPVOLS$(echo $swap | sed -e 's#/dev/zvol/dsk/##')", this was causing the script to exit with an error pool can not be found ' rpool/swap', note the space.

Posted by Eli Kleinamn on June 25, 2009 at 05:48 PM IST #

Excellent! I've a sneaky feeling 3) above is a bug introduced by our messing about with $IFS to get over the spaces-in-dataset-names bug. I'll investigate.

Posted by Tim Foster on June 25, 2009 at 07:04 PM IST #

Yep, good catch Eli: that was a bug - I've fixed this in the repository now:

changeset: 41:175684ca36d7
tag: tip
user: Tim Foster <tim.foster@sun.com>
date: Thu Jun 25 20:11:03 2009 +0100
description:
9684 6762453 broke auto-include and disabling of snapshots of swapvols

http://defect.opensolaris.org/bz/show_bug.cgi?id=9684

Posted by Tim Foster on June 25, 2009 at 08:18 PM IST #

I don't know if it is a bug or not, but at least for the auto-snapshot:frequent instance, setting the period seems to do nothing. I set it to 1 but upon enabling the instance, the crontab was set to 0,15,30,45 anyway. I have not played with the period for the other instances because the default is fine.

This brings me to a question though, is it safe to edit the zfssnap crontab directly, say to set daily snapshots to happen at 4am? I know restarting or disabling/enabling the crontab will be changed, but I don't know if a better way to set times. Perhaps that would make a good RFE.

Thanks for this excellent tool.

Posted by John on July 02, 2009 at 05:08 PM IST #

hey John,

Did you svcadm refresh after setting the period value? As for hacking the crontab directly, that's only going to work as far as the next service stop/start: the service writes it's own crontab entry each time. Instead, you really want to set the 'zfs/offset' property, which would do just what you're after, described in the README, and at

http://blogs.sun.com/timf/entry/6777694_need_ability_to_control

Glad you like the service!

Posted by Tim Foster on July 02, 2009 at 07:14 PM IST #

Thanks Tim. My eyes have been bleeding from reading documentation on sendmail in trying to send a simple email to an outside SMTP server.

As a result, after reading the zfs-auto-snapshot README, I missed the offset feature. Also I never knew about refreshing SMF services.

Thanks for the help!

Posted by John on July 02, 2009 at 07:47 PM IST #

Hey, there's no chance you are planning on making a zfs-auto-scrub service is there? ;)

Posted by John on July 02, 2009 at 10:22 PM IST #

Feature request:

I'm running a fileserver that frequently is idle for several hours at a time. The disks in the machine eventually spin down. It would be nice if auto-snapshot could somehow detect this state and stand back, to avoid powering up the disks.

Love the service as it stands too!

Posted by Erik C Johansson on July 03, 2009 at 09:28 AM IST #

Thanks Erik - that's a good idea for an enhancement. Offhand, I've no idea how to do what you're suggesting, but I suspect it involves asking zfs what disks are involved in each pool and (hopefully) just querying /dev/pm to get the disk state before choosing to snapshot or not. I'll keep it in mind when thinking about future versions!

Posted by Tim Foster on July 07, 2009 at 03:11 PM IST #

Btw. John asked about an auto-scrub script, Constantin might be looking at that pretty soon :-)

Posted by Tim Foster on July 09, 2009 at 03:34 PM IST #

Hello, Tim.

I've been looking at these scripts for a long time, but it took me a while to start adopting them in my servers ;)

I want to report that the package SUNWzfs-auto-snapshot from snv_114 installs just fine on Solaris 10u6, and after tweaking /lib/svc/method/zfs-auto-snapshot to use dtksh instead of ksh93 (or making an appropriate symlink) it works.

However, on sol10u6 and snv_105 hosts (never on snv_114 so far) I occasionally hit this problem which I want to share:

cannot create snapshot 'pool/projects/appserv/localzfs/logs/var-opt-SUNWappserver-domain1@zfs-auto-snap:frequent-2009-07-10-13:52': dataset is busy
no snapshots were created
Error: Unable to take recursive snapshots of pool/projects@zfs-auto-snap:frequent-2009-07-10-13:52.
Moving service svc:/system/filesystem/zfs/auto-snapshot:frequent to maintenance mode.

What gives?

I have seen similar behavior with ZFS automount after reboots (especially with LiveUpgrade), and those were cleared by manually mounting and unmounting the filesystems. ZFS'es in question here, however, are mounted and running (a local zone in this case), so I can't just toggle them.

It's probably not your bug but ZFS's (although if your script could catch it and work around the problematic dataset instead of failing altogether - that would be nice). Since I haven't seen it on the snv_114 host, it may have been fixed by that build. Or I'm lucky ;)

If you by chance know when it (was or will be) fixed in Solaris 10, or how to work around it without zone reboots - let know please :)

Thanks a lot,
//Jim

Posted by Jim Klimov on July 10, 2009 at 11:07 AM IST #

Hi Jim,

You're hitting http://defect.opensolaris.org/bz/show_bug.cgi?id=5000 - a ZFS bug, which was tracked by http://bugs.opensolaris.org/view_bug.do?bug_id=6462803 fixed in nv_111 I'm not sure which s10 update that'll appear in.

In terms of working around it, that's tricky. You could choose to disable the zfs/fs-name='//' property, instead manually setting each individual dataset you want snapshot taken on, then also set zfs/snapshot-children to false. More in the auto-snapshot README.

This will be quite a bit slower though - the system would snapshot each dataset in turn rather than using 'zfs snapshot -r': the ones that were 'busy' due to this bug would still fail, but it would attempt to take snapshots of other datasets in the tree, rather than failing-fast. I've not tried this workaround, but it could be worth trying.

Posted by Tim Foster on July 10, 2009 at 11:35 AM IST #

Hi Tim,

The "zfs/sep" option is a great thing, so I won't have to manually hack the method ;-).

I was wondering... Wouldn't it be usefull to have the posibility to have some zfs properties like:

* com.sun:snapshot-pre-command and
* com.sun:snapshot-post-command

(and the corresponding code in the method to retrive and use it of course) so the user could define a command that would be run before and after taking the snapshot of the corresponding zfilesystem. The problem I see would be security issues...

I'm thinking of this for example to make consistent auto-snapshots of database files, or some application files using your services.

Posted by Alexandre Dumont on July 22, 2009 at 09:22 AM IST #

Where can we get this 0.12 pkg? Is it available in Opensolaris repositories yet?

Posted by Alexandre Dumont on July 22, 2009 at 09:27 AM IST #

Where do I download the latest version (0.12) of ZFS Automatic Snapshots? Thanks.

Posted by Ben Le on July 27, 2009 at 11:24 PM IST #

I just stumbled upon this. Very cool!

I, too, take snapshots before potentially regrettable actions. Although I name my snapshots based on the action (rpool@before-action-name). Makes it super easy to locate a desired snapshot in the future (not just via 'zfs list -t snapshot' but also via Time Slider's Delete Snapshots dialog, which has a 'Snasphot Name' column but does not include descriptions).

Having grabbed the latest source along with your "flawed" (but works just fine from where I'm sitting) GUI, my RFE is that the GUI should have a field so that you can name the snapshot. This would be in addition to the description field, which I like because it gives me the opportunity to elaborate on the snapshot name.

Thanks for making this feature available!

Posted by Joanmarie Diggs on August 03, 2009 at 01:56 AM IST #

Thanks for your work. I'm using auto-snapshot but still missing some improvements:

Keeping only non-empty snapshots. I'm using OpenSolaris on file servers. And lots of empty snapshots making search in them very hard (Open first - no. Open second - no... Oh no!). Clients working under windows and not have Time Slider. For now I'm use simple cron job for eradicating empty snapshots except last. But it's useless for frequent - only one or two snapshots of last hour not empty.

Unified prefix. You are already using metadata and this can avoid to take duplicate snapshots. Another reason - samba shadow_copy vfs module needs one prefix.

Create daily (weekly, monthly...) snapshots exactly at time. It's cool to do backup on this snapshots. But this task is resource cost and I prefer to schedule it for night time.

P.S. Sorry for my bad English.

Posted by Alexander Dorofeev on August 04, 2009 at 11:58 PM IST #

Glad you like Joanmarie! - hopefully that feature will make it into the next version.

Alexander: non-empty snapshots are complex, it's not something I feel comfortable writing without underlying zfs support ('snapshot diff' http://bugs.opensolaris.org/view_bug.do?bug_id=6425091 )

The unified prefix thing you're talking about I think is to avoid taking snapshots on overlapping periods, which is something that's being addressed in the current python rewrite.

As for the snapshot-at-specified-time, that's the 'zfs/offset' feature that's in the current code (but may not make it into the next version from what I've heard)

Posted by Tim Foster on August 05, 2009 at 01:20 PM IST #

No Tim. I don't mean snapshot diff. I mean following: if old snapshot is empty - kill it.

Posted by Alexander Dorofeev on August 05, 2009 at 09:51 PM IST #

What does "if old snapshot is empty" mean then?

Posted by Tim Foster on August 05, 2009 at 11:19 PM IST #

OK. I give you example. Lets "autosnap" new snapshot on test/one. All that we need - create new snapshot and eradicate old with zero used values.

zfs list -rH -s creation -o name,used -t snapshot test/one :
...
test/one@test 16K
test/one@02 1.02M
test/one@03 16K
test/one@04 0
test/one@05 0
test/one@06 1.02M
test/one@07 0 <---- This is "Just created" snapshot

zfs list -rH -s creation -o name,used -t snapshot test/one | grep 0$:
test/one@04 0 <---- Kill!
test/one@05 0 <---- Kill!
test/one@07 0 <---- Keep

Thats all. Sorry about tabs in "console".

Posted by Alexander Dorofeev on August 05, 2009 at 11:58 PM IST #

Aah, but that's not quite what the 0 means: space for snapshots means space unique to that snapshot, or shared with previous snapshots, or shared with the filesystem. If you delete older snapshots that shared that data, the space would then be accounted to that snapshot and the 0 would change: so deleting this snapshot might be the wrong thing to do if it happened to be the only snapshot referencing filesystem data that was just about to to be deleted.
http://docs.sun.com/app/docs/doc/819-5461/gbciq?a=view#gbcxc

Time Slider tries to jump through hoops to remove snapshots in a gentle manner, but it's a complex and not one I'm going to add to the core service in this implementation. The current plan is to have time-slider replace the SUNWzfs-auto-snapshot implementation, so one way or another you'll get this eventually.

See http://src.opensolaris.org/source/xref/jds/time-slider/usr/share/time-slider/lib/time_slider/cleanupmanager.py#77

Posted by Tim Foster on August 06, 2009 at 09:21 AM IST #

OK. I understand. But you are talking about "parallel" snapshots (frequent, hourly, daily). This is root of problem. But if observe snapshots from user side we see next: user don't need know about kind of snapshot (hourly, daily). He need one - snapshot with lost data. And (maybe I'm wrong) 99% of users not using Gnome (I'm talking about OpenSolaris box as headless NAS). They using Windows and MacOSX, they don't hear about Time Slider - and see lots of garbage.

Possibly workaround is ONE "timeline" with unified prefix for one FS what produced with ONE process. If all "tags" like "hourly" will be stored in user properties of snapshot - all be cool: users will get only the snapshot what they need, service will purge empty snapshots safely.

Possible algorithm:
- Service starting take new snapshot
- Service observe previous snapshots and check user properties (hourly, daily...) to determine - what events is actual.
- If event actual - service take snapshot with needed user properties.
- If previous snapshot used property is zero - service destroy it.

About old snapshots. They deletion may base on user properties. For example we need to keep 4 hourly and 2 daily snapshots and have next snapshot sequence:
- 01 hourly
- 02 hourly daily
- 03 hourly
- 04 hourly
- 05 hourly <-- destroy
- 06 hourly <-- destroy
- 07 hourly daily

Sorry for very large comment. I can realize all that I wrote. But I really hate forks and parallel projects with some functionality.

Posted by Alexander Dorofeev on August 06, 2009 at 11:33 AM IST #

Yes, the one timeline concept is what's currently being prototyped.

Posted by Tim Foster on August 06, 2009 at 11:52 AM IST #

Can I help you with this project?

Posted by Alexander Dorofeev on August 06, 2009 at 11:58 AM IST #

I'm not actively working on it - it's the time-slider guys actually doing the work: feel free to drop a mail to zfs-auto-snapshot[at]opensolaris.org though: afaik, the code is still in the prototype phase.

Posted by Tim Foster on August 06, 2009 at 12:01 PM IST #

Thanks Tim. I try to do this.

Posted by Alexander Dorofeev on August 06, 2009 at 12:24 PM IST #

Hello,

I'd like to thank you for your work, we're using an older version on two NAS servers of ours and it works really well.

However I'd like to install 0.12 manually on those because of the nice new feature 'set the default value of "zfs/sep" to "_"' which I had to hack into our current version myself. Unfortunately I cannot for the life of me find a .tar.gz download. The latest I can find are 0.10 and 0.11-EA.

Could you please tell me where I can obtain a .tar.gz download for version 0.12?

Posted by derfraenk on August 20, 2009 at 11:22 AM IST #

I've been trying to find somewhere to download the zfs-snapshot package from, and am having the darndest bad luck.
I found the {src,pkg}.opensolaris.org sites, but not enough information to actally grab something and then play with it.

Blame me for being a Solaris purist (and not playing with OpenSolaris)

Posted by Jean-Paul Blaquiere on August 22, 2009 at 01:27 PM IST #

@Jean-Paul: you can download sources using:

adumont$ hg clone ssh://anon@hg.opensolaris.org/hg/jds/zfs-snapshot

Then you can create the pkg files running make.

Nevertheless, when trying to install the package, I'll get:

adumont$ pfexec pkgadd -d .

The following packages are available:
1 SUNWzfs-auto-snapshot ZFS Automatic Snapshot Service
(all) 0.12
2 src ZFS Automatic Snapshot Service
(all) 0.12

Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]: 1

Processing package instance <SUNWzfs-auto-snapshot> from </export/home/adumont/zfs-snapshot/proto>

ZFS Automatic Snapshot Service(all) 0.12
Copyright 2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.

Current administration requires that a unique instance of the
<SUNWzfs-auto-snapshot> package be created. However, the maximum
number of instances of the package which may be supported at one time
on the same system has already been met.

No changes were made to the system.

¿Do I have to remove the previous version of the package? of is there another way to upgrade to 0.12?

Posted by Alexandre Dumont on August 23, 2009 at 08:32 PM IST #

@Alexandre

Note that I'm merely an end user with a background in liberal arts (read: I don't know what I'm doing). ;-) That said....

I did a 'man pkgadd' and discovered that pkgadd let's you specify an installation administration file. I took a look at the default one (/var/sadm/install/admin/default) and found the line 'instance=unique'. So I created an alternative file, copying default and changing the instance line to 'instance=overwrite'. Problem was solved. Might not have been solved "correctly," but that's what snapshots are for, right? :-)

Posted by Joanmarie Diggs on August 23, 2009 at 09:07 PM IST #

Alexandre,

thank you very much, that worked like a charm! Great! :)

Kind regards

Posted by derfraenk on August 24, 2009 at 01:58 PM IST #

Tim,

hourly and frequent services were sent to maintenance after system time changed cause of DST switching one hour backwards. Could restart, but had to delete the hourly and frequent snapshots manually.

Sorry, if you already knew this. And by the way - thanks for this really great feature!

Sebastian

Posted by Sebastian Weyrauch on October 27, 2009 at 09:15 AM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by timf