ZFS Automatic Snapshots in nv_100
We got this code into nv_100, as part of LSARC 2008/571 and (at least inside Sun, so far) folks have been starting to play with it.
It's the first time I've been able to use the GNOME Nautilus integration that Erwann came up with, and I think it's pretty cool. Big ups to Niall & Erwann for all their hard work - on helping to get this integrated - without them, this stuff would still just be kicking around on my blog!We've had a few comments so far - most were known bugs and fixed already. I'll list them here, and add comments as we go along.
Services enabled by default
SUNWzfs-auto-snapshot delivers all it's instances as disabled, but the
accompanying desktop support, SUNWgnome-time-slider (the desktop service that uses SUNWzfs-auto-snapshot, integrates more tightly with the desktop and monitors disk space) had a postrun script that enables the services out of the box. Just run svcadm disable <service> to disable them if you want to, but see below for more ideas if you just don't want to snapshot everything...
Noisy cron job
There was some changes close to integration that made the home directory for the 'zfssnap' role go away, which had impact on the way we were planning on doing logging. Originally, the cron job would just echo messages onto the end of the SMF instance's log file in /var/svc/log but since the cron job now runs as a non-root user, we aren't able to write to those anymore.
So we changed it to write logs to the zfssnap user directory, but that wasn't good either, so we eventually moved all logging for the cron job to syslog. A small bug though meant that the service is still a bit too noisy, and so cron end up sending love letters in in the form of svcprop errors to /var/mail/zfssnap - sorry about that. This was actually fixed pre-nv_100, but it just missed the integration date.
Details here on how to grab the sources and build your own version of the SUNWzfs-auto-snapshot package if you want the fix sooner rather than later.
Service inexplicably dropping to maintenance mode
This is probably the most common failure - I'd filed 6749498 about this, which turned out to be a duplicate of 6462803. I say "inexplicably", /var/adm/messages will actually have more detail - as noted above, I don't have a way to explain to SMF why we're dropping the service to maintenance mode, so you just need to look for the log in the right place. Logging during service start/stop gets picked up by SMF, day-to-day log messages (and there's not many of those) get handled by syslog
Otherwise, a few other words of advice:
The service on startup will arrange to take automatic snapshots of all datasets on all pools on the system. You can have it not do this by setting a ZFS user property at the top level dataset in each pool, eg.
$ pfexec zfs set com.sun:auto-snapshot=false rpool
This is a much better way than just disabling the service altogether, this way, you get the option to have the service take snapshots of datasets you are interested in, eg.
$ zfs set com.sun:auto-snapshot=false space
$ zfs set com.sun:auto-snapshot=true space/timf
$ zfs set com.sun:auto-snapshot=false space/timf/foo
$ zfs set com.sun:auto-snapshot:frequent=false space/timf/onnv
Better yet, if you use ZFS Delegation to allow users the userprop permission, they can set user properties on their own filesystems, and choose which of their filesystems get included in the snapshot schedule as above.
Have a look at the hg history, the README for more documentation, and the auto-snapshot.xml service manifest if you're really interested in what's going on behind the scenes. Enjoy!
Tim,
You've gone into gruesome detail about ZFS snapshot operations without actually answering the high level questions about it?
What does it do?
Who can use it?
How does it work?
What are the benefits and trade-offs?
That's what I want to know.
Posted by Jim Laurent on October 11, 2008 at 01:05 PM IST #
Fantastic news Tim! Nautilus integration should enhance the desktop greatly - something that differentiates OpenSolaris from other platforms...
Posted by Che Kristo on October 11, 2008 at 03:29 PM IST #
Superb !!! I think I need to explore this feature of ZFS snapshot. Like this Lot of work can be reduced.....Keep up th e good work. I think ZFS has simply revolutionsed the way we handle our server....Long live Sun !!
Posted by Ashish Nabira on October 11, 2008 at 05:18 PM IST #
Good point Jim. I think the desktop guys are going to cover that at some point, but I'll take a stab at answering your questions:
This feature takes periodic snapshots of ZFS filesystems on Solaris machines - this allows you to always visit older versions of your data at any time. Snapshots are a very efficient way of doing this - more at Matt's blog post here:
http://blogs.sun.com/ahrens/entry/is_it_magic
Anyone can use it, seemingly - it's on by default right now. As a user, just click the "Restore" icon on your file manager (the icon is currently a white and red life-belt) then drag the slider that appears in order to see older versions of files in the current window.
It works using ZFS' snapshot feature, cron and some simple scripting to take snapshots of the on-disk state, every 15 minutes, every hour, every day, every week and every month, maintaining different numbers of these going back into the past.
Benefits: you never lose files you deleted by accident within certain time windows, and can see older copies of everything on disk. This has saved my bacon on more than one occassion, it's always on, so I never worry about backups anymore (other than occasionally sending data off the physical machine I'm using from time to time)
Trade-offs: it can use more disk space, depending on your usage, but there's a background job that will automatically try to delete snapshots taken if your disk space goes above a certain threshold.
That help at all?
Posted by Tim Foster on October 11, 2008 at 05:54 PM IST #
Screenshots!!! You need screenshots!
Posted by D Pierce on October 11, 2008 at 06:01 PM IST #
Good point D - I'll let Erwann take the glory for that one and link to his post as soon as it's available :-)
Posted by Tim Foster on October 11, 2008 at 06:04 PM IST #
I forgot to mention - all that "pfexec zfs ..." stuff is built into Niall's excellent "time-slider-setup" GUI, so no need to touch the command line at all. He writes much better GUIs than I do, just thought I'd point that out!
Posted by Tim Foster on October 11, 2008 at 08:31 PM IST #
I CAN HAZ SCREENSHOTS:
http://blogs.sun.com/erwann/entry/zfs_on_the_desktop_zfs
Posted by Tim Foster on October 14, 2008 at 10:34 AM IST #
Pretty cool, guys.
I, uh, demoed something essentially identical in May (flex web gui instead of Gnome); patent disclosure was a year ago, and I also proposed it to the Monterey thing.
Posted by Charlie (Colorado) on October 15, 2008 at 03:16 AM IST #
Nice feature, but why is it started even if no zfs-pools reside on the system. So
disabling is the only way on non zfs-intallations. Why are zfs-snapshots turned
on by default on new installations even if no zfs-pools are existing?
Posted by Rolf M Dietze on November 06, 2008 at 08:15 PM GMT #
Yep, as I mentioned it's a bug that the service is enabled by default - time-slider had a postinstall script that reenabled the service, even though it was delivered as disabled.
There's another bug to make enabling the service on a system with no pools a no-op.
Posted by Tim Foster on November 06, 2008 at 08:48 PM GMT #
Hi,
I am using SUNWzfs-auto-snapshot with NV100. Everything works fine. The package is just great.
However, one question remains: How can I change the way the name of a snapshot is generated? Currently, the snapshot name contains colons (":"). Having colons in the snapshot name prevents Windows users to access the right snapshot under the ".zfs" directory themselves. The colon is a forbidden character in Windows and used to mark Alternate Data Streams (ADSs). Windows users cannot see the right snapshot names in the ".zfs" folder. So I'd likt to ask for the provision of a property that allows me to control the way the snapshot names are generated.
Thanks for the great work so far!
Posted by John on November 10, 2008 at 09:03 PM GMT #
No way to do that at the moment John, except that there is a variable used through the method script, $SEP which you can change to whatever you like. Check out line 62ish of
/lib/svc/method/zfs-auto-snapshot
http://src.opensolaris.org/source/xref/jds/zfs-snapshot/src/lib/svc/method/zfs-auto-snapshot#62
I'll try to get an SMF property into the service for the next release.
Posted by Tim Foster on November 10, 2008 at 09:22 PM GMT #
Man, you are awesome!
Thank you so much for that immediate help.
Best regards!
Posted by john on November 11, 2008 at 12:33 AM GMT #
is it(zfs-auto-snapshot) available with new solaris_x86 10/08 U6 ?
ps
othervise looks great, thx guys for great work
vlaho
Posted by VLAHO DJURKOVIC on November 20, 2008 at 08:30 PM GMT #
Hi Tim, Hi Vlaho,
that's an interesting point. Owing to special policies in our companies we cannot use OpenSolaris currently. We have to rely on the official services provide by Sun. Thus, we have to use Solaris 10/08 instead of OpenSolaris 2008.05 (2008.11).
However, the zfs-auto-snapshot feature has proven extremly useful in our testbed. Is there any chance that the zfs-auto-snapshot OpenSolaris IPS package will be available as pkg for Solaris 10/08?
Best regards,
John
Posted by John on November 28, 2008 at 09:55 PM GMT #
Well, there is official Sun support available for 2008.05 and 2008.11 - more information at
http://www.sun.com/service/opensolaris
That said, I don't think there's anything in the service that would prevent it from running on recent s10 updates - it's more a question of prioritising this over other work for the updates.
Posted by Tim Foster on November 28, 2008 at 10:58 PM GMT #
Dear Tim,
thanks for the hint! Unfortunately, our company still considers Solaris as more "trust-worthy/stable/reliable/you-name-it" than OpenSolaris. Although Sun supports OpenSolaris, I am limited to Sun 10/08.
Currently, I am experimenting with the following solution:
1st: I downloaded "pkg" for Solaris:
wget
http://download.java.net/updatecenter2/promoted/B15-RC4/pkg-toolkit-2.0.0-sunos-i386.zip
2nd: I installed the package:
unzip ...
mv .org.opensolaris,pkg /.
mv pkg /.
3rd: I changed the package sources:
vi /.org.opensolaris,pkg/catalog/sun.com/attrs
change "origin" to read
S origin: http://pkg.opensolaris.org/
and also change
vi /.org.opensolaris,pkg/cfg_cache
"origin" to read
origin = http://pkg.opensolaris.org/
4th: I installed SUNWzfs-auto-snapshot
/pkg/bin/pkg install SUNWzfs-auto-snapshot
5th: I changed the RBAC file
vi /etc/user_attr
and added
zfssnap::::type=role;auths=solaris.smf.manage. \
zfs-auto-snapshot;profiles=ZFS File System Management
as you proposed in your blog.
6th: We have M$ Windoze clients. So I had to change the hour and minute separator
vi /lib/svc/method/zfs-auto-snapshot
SEP="-"
7th: I added the service to the system:
svcadm restart manifest-import
8th: I started the services that we currently need here:
svcadm enable auto-snapshot\:frequent
svcadm enable auto-snapshot\:hourly
svcadm enable auto-snapshot\:daily
9th: Testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing, testing ...
10th: So far, everything seems to be perfect on the test machine. However, it'll some more weeks/months until I can release the stuff into our "real systems."
Once more, I'd like to thank Tim for this great piece of software!
Best regards,
John
Posted by John on December 04, 2008 at 03:28 PM GMT #
Tim,
I am trying to have zfs-autosnap send incremental/full snapshots via SSH. I am running 2008.11 which has this service installed already, however since the zfssnap home directory doesn't exist (nor defined in /etc/passwd) it seems thats a road block for SSH!
What ways can we get around needing a home directory, with a .ssh directory that contains known_hosts, and optionally your authorized_keys2 file for passwordless logins?
Besides creating a home directory which you already went through great pains to get away from, how can we work around this?
Any insight would be greatly appreciated!
Posted by Brent Jones on December 18, 2008 at 06:18 AM GMT #
Hi Brent, it looks like ssh takes a -i option to specify identity files, so that'd likely work on the client (sending) side to point to the right set of keys.
On the server side, you can't use the zfssnap role to receive the content (since it's a role, it has no ssh access) so you'll need a normal user with privileges to recv to a given dataset: "zfs allow" is probably the best way of doing that, rather than creating another user with the full ZFS File System Management profile.
Sorry, haven't had a chance to try any of this, been a bit preoccupied of late! (see recent blog entries :-)
Posted by Tim Foster on December 19, 2008 at 11:40 AM GMT #
Tim,
Thanks for responding, I did end up making a home directory for zfssnap, but I didn't know about the sssh -i flag, I'll definitely make use of that, thank you!
I've got incremental snapshots sending successfully to the other server, just need to find a way to gracefully clean up the old ones.
Wish you could have essentially a mirrored filesystem, so you could mirror all of the snapshots (frequent, hourly, daily, etc...) but it seems ZFS receive only allows one "set" of snapshots... sad ;(
Thanks!
Posted by Brent Jones on December 19, 2008 at 11:35 PM GMT #
Do you accept patches for your script ? AFAIK the performance can be improved a lot by using builtin string/pattern operators, list, compound variables etc.
Posted by Roland Mainz on January 04, 2009 at 06:08 PM GMT #
Most definitely yes Roland - thanks! There's an opensolaris.org mailing list and hg repository :-)
Posted by Tim Foster on January 04, 2009 at 09:03 PM GMT #
I'm running OpenSolaris 2008.10 snv_101b_rc2. I've started getting this error:
$ sudo /usr/lib/time-slider-cleanup -y
File "/usr/lib/time-slider-cleanup", line 10, in <module>
main(abspath(__file__))
File "/usr/lib/../share/time-slider/lib/time_slider/cleanupmanager.py", line 363, in main
cleanup.send_notification()
File "/usr/lib/../share/time-slider/lib/time_slider/cleanupmanager.py", line 259, in send_notification
if linedetails[1]:
IndexError: list index out of range
I've tries started & restarting the services, but that doesn't help
Posted by Tom on February 09, 2009 at 02:16 PM GMT #
Hi Tom,
I've seen that bug once before, iirc it was because the time slider cleanup code is trying to send a gui notification but there either isn't a user logged in on the desktop, or the user doesn't have the correct permissions (and not enough error handling in the code) This is outside my code unfortunately, so I don't know what the outcome was.
I'd suggest you log a bug on http://defect.opensolaris.org and let the Time Slider gui folks take a look at it.
Posted by Tim Foster on February 09, 2009 at 02:24 PM GMT #
I guess I'll have to put in a bug. Thanks for pointing me there.
Who logs into the file server anyways?
IMO, there's been a bit too much of 'click on this' to do sys admin coming in Solaris. I'm logging into the machine via SSH and I don't have the Gnome menu to click on. I run /usr/sbin/blah (which you can click on by ....). This ain't Winders :-)
Posted by Tom on February 09, 2009 at 05:27 PM GMT #
Tim, if you are using the backup save command option, how does the target command know which file system is being backed up? I have looked at the environment variables passed to the target along with the SMF properties and did not see anything.
Ideally, I want to know exactly which snapshot is being sent so that I can save it (after encrypting it) with a matching name so that I know which encrypted snapshots are which.
Any ideas?
Glenn
Posted by Glenn Brunette on April 24, 2009 at 03:54 PM IST #
Yep, I think a backup command that does eg. "ssh timf@machine MY_BACKUP=$FILESYS /home/timf/bin/do_my_backup" would do it?
[ certainly "ssh localhost FOO=BAR env" appears to set $FOO on the "remote" end as expected ]
Posted by Tim Foster on April 24, 2009 at 04:09 PM IST #
Yes, but where does $FILESYS get set? Since you are doing a pipe to the backup command (spawning a sub-shell) and you did not export the FILESYS variable, the FILESYS parameter is not being set (based upon my tests). Can you confirm or suggest a fix?
Posted by Glenn Brunette on April 24, 2009 at 04:48 PM IST #
Och, you're absolutely right. Feel free to log a bug against solaris/zfs/utility, we should really fix this - hopefully something we'll get right in the python reimplementation whenever that happens.
Posted by Tim Foster on April 24, 2009 at 05:22 PM IST #
Quick question - I have some zfs directories shared via CIFS, but colon ':' is not a valid CIFS character, so a directory name such as "/share/.zfs/snapshot/zfs-auto-snap\:frequent-2009-05-18-22\:45\:02" is not valid, and instead gets translated to garbled gook, "ZDXBDC~J"
Is there some way to configure the way snapshot names get generated, to eliminate the ':' characters?
Thanks for your awesomely helpful tools. :-)
Posted by Edward Ned Harvey on May 19, 2009 at 04:11 AM IST #
Hi Edward,
Yep - this is actually becoming an oft-requested feature, I think we need to add an SMF property to allow users to set it.
In the meantime, you can edit the method script at
/lib/svc/method/zfs-auto-snapshot
and change the line that reads:
SEP=":"
(it's line 65 on my version, but I may be running development bits here)
and change that to any other character that's allowed in snapshot names. Note that if you change this, the pattern used to destroy older snapshots also changes, so just make sure you manually delete any snapshots taken by the service with the colon separator character, otherwise, they'll live forever.
Glad you're finding the snapshot service useful!
Posted by Tim Foster on May 19, 2009 at 10:28 AM IST #