|
Sunday February 25, 2007 Here's a small update to the ZFS Automatic Snapshot SMF service first mentioned here. There's a few bugfixes in this release, and a small feature addition.
The first bugfix is the one that Dick kindly pointed out - sorry about that, all working now.
The second was a bit more subtle. Every once in a while, when starting my machine, I noticed that although the snapshot service instances were all listed as online, with no errors in their logs, nothing was being snapshotted - the entries that should have been added to the system crontab on starting the service weren't there.
It turns out that this was because I didn't have any form of locking around my accessing the crontabs, so the service instances were stepping on each others toes: in this case, SMF was being too damned efficient and was starting all my instances in parallel (as it should! :-) The locking I'm using now probably isn't the best - but let me know what you think. Seems to work fine.
I've also added a small feature - the addition of a "verbose" property for the instances, with the default setting being false. I have a recursive, rolling snapshot every 10 minutes on one of my filesystems, and I was getting tired of my system logs getting spammed all the time by my instance deleting older snapshots, so that's made life a bit nicer.
As always, let me know if you've any feedback - I'm always interested in ways I can improve this service.
The new version is now available in zfs-auto-snapshot-0.8.tar.gz and I've updated the README with these changes.
(2007-02-25 11:44:09.0) Permalink Comments [18]
Please wait while my microblog loads
Posted by Frederic Van De Velde on February 27, 2007 at 12:46 PM GMT #
Thanks Frederic, that's an interesting addition. I had originally intended the "offset" property to allow the user to specify an offset into each period when we'd actually run the snapshot job. (eg. snapshot time==period start + n seconds) This turned out to be a real pain to try to code though, so I haven't implemented it yet.
My only worry with specifying a property like "cronprefix", is that it exposes the implementation details of the service more than I'd have liked -- if we move to a new snazzy cron-like backend, then apart from me delivering new implementation of the service method script, users would also have to change their service manifests, which would be a real pain for them.
Can you think of a way to abstract your aim without creating a dependency on the way the method script is actually implemented ?
Posted by Tim Foster on February 27, 2007 at 01:05 PM GMT #
Posted by Matthew Painter on March 22, 2007 at 09:16 AM GMT #
Glad you're finding it useful - sorry that it's not working as reliably as it should for you. Logging is still a problem for this service, it seems - there's just not enough info in the log to work out what's going on.
Is there any more clues as to why the backups aren't completing in /var/spool/mail/root (cron should save it's output there by default) I might have another go at fixing the logging for this service - see if I can capture stdout and stderr from the backup command and send that to the logs, rather than having cron take care of it.
Let me know if that helps, and feel free to email me off line if you like.
Posted by Tim Foster on March 22, 2007 at 11:48 AM GMT #
I think it's worthwhile to keep the remote zpool snapshots in sync. Just curious, and thanks again for putting your scripts out there.
Posted by Ben Paultre on May 16, 2007 at 08:52 PM IST #
Posted by Tim Foster on May 21, 2007 at 11:06 AM IST #
I see your script checks if it's on Nevada (5.11) to make recursive snapshots. However Solaris 10u3 (11/06) has this feature, so I simply removed those checks from your scripts in my local installation.
Also, the zfs-auto-snap prefix for snapshots was a bit long for me, so I removed it entirely, relying on the LABEL instead, since I hardly have any non-automated snapshots. Actually I just made the prefix a variable and set it to empty. It might be nice to be able to customize this prefix.
Posted by Siegfried Leonard on August 14, 2007 at 09:15 PM IST #
Thank you for the work on your script! It is really nice.
I made another change to your script and that is using period separators for the time in the snapshot name instead of colons. Windows users connecting through samba had trouble accessing the directories that had colon in the name because samba would mangle it.
I made a patch to your script containing my modifications if you are interested. Once again, great work on this utility!
Posted by Siegfried Leonard on August 17, 2007 at 06:45 AM IST #
That makes sense Siegfried thanks for point it out! Having Samba deal with colons properly would seem like the proper solution, but I'll have a look and see if I can include that in the next version, till then - users beware :-)
Posted by Tim Foster on August 21, 2007 at 12:11 PM IST #
I have also found that root's crontab was deleted when this utility was running, removing all my custom cron entries. I will have to investigate why this happened, it could be due to a reset while it was writing the crontab file, but this should be atomic replacement.
While working on this, I'll attempt to get around the locking mechanism using a better way that will not spit out dozens of SMF error lines when booting the system with several auto snapshot entries. Other than that, this utility has been very useful and working well so far.
Posted by Siegfried Leonard on September 02, 2007 at 10:02 AM IST #
Interesting. The cron update isn't atomic, in that I have to save existing entries before adding new ones, but it should never be removing existing entries. In any case, line 130 is what you're after - I'd be interested to find out if you can improve this!
Can you expand on what you're seeing when booting with several auto-snapshot entries ? I've 4 auto-snapshot entries, and all's well.
Posted by Tim Foster on September 03, 2007 at 11:21 AM IST #
I have 10 entries, and I see messages like this per entry when booting:
svc.startd[7]: [ID 636263 daemon.warning] svc:/system/filesystem/zfs/auto-snapshot:tank,daily: Method "/lib/svc/method/zfs-auto-snapshot start" failed due to signal KILL.
Posted by Siegfried Leonard on September 04, 2007 at 03:19 AM IST #
Hmm, haven't seen that here Siegfried, but I'll try to reproduce.
Version 0.9 will have the ability to snapshot -r on capable s10 systems, will allow users to change the prefix in the instance manifest as well as separator character (probably only globally in the method script - I don't imagine this would be a common requirement) and it'll have support for send/recv -r as well.
As soon as send/recv -r gets integrated into nevada, and I get a chance to write and test the changes to this service, I'll post the new version.
Feel free to contact me offline if you can shed any more light on those weird SMF startup warnings!
Posted by Tim Foster on September 07, 2007 at 07:22 AM IST #
Thanks, Mr. Foster, I'll be looking forward to version 0.9.
I haven't gotten around debugging the SMF startup warnings, but I am thinking of adding a property to control whether the snapshot entry should take a snapshot if a scrub is in progress.
This would be useful for the hourly snapshot, since the scrub doesn't have time to complete before a snapshot is taken, making it restart the scrub from the beginning. My pool was in an infinite scrub loop until I temporarily disabled the hourly snapshot entry.
Posted by Siegfried Leonard on September 09, 2007 at 05:20 AM IST #
Still not sure why root's crontab gets emptied, but it happens almost every boot.
Your code makes sense, so I'm not sure what could cause it. Maybe the /tmp volume is not mounted by the time the script is called?
Posted by Siegfried Leonard on October 15, 2007 at 08:17 AM IST #
Hmm, that's still very odd Siegfried. Can you mail me offline, and we'll try to get this sorted out - I'm very interested in working out what's going on here...
Posted by Tim Foster on October 19, 2007 at 08:59 AM IST #
Updated packages here:
http://blogs.sun.com/timf/entry/zfs_automatic_for_the_people
Posted by Tim Foster on November 20, 2007 at 01:54 PM GMT #
Note for future visitors - this version is now out of date - the latest version is always available via a link on the sidebar of my blog (at the time of writing, this is version 0.10)
Posted by Tim Foster on June 15, 2008 at 01:15 PM IST #