I've been thinking rather a lot about ZFS snapshots recently - and have a few ideas that I thought people would be interested in. I'll mention one now (but the other will have to wait a bit - watch this space)

On zfs-discuss some folks were thinking that a mechanism to take snapshots automatically based on a schedule would be a good idea. I agreed, and suggested that this would be really nice with integration into SMF and I posted some thoughts as to how that could work.

Well, I've now got a working prototype. Based on SMF, it works by creating a cron job that takes periodic snapshots of the filesystem you specify.

Rather than a single default instance of the service, I was thinking that you should have multiple instances, one per set of automatic snapshots you want to take. I've also got support for creating recursive snapshots - though, again, getting a -r flag for zfs snapshot would make that support a bit nicer (and atomic!)

I haven't yet implemented the "rolling snapshot" functionality so we could only keep x number of snapshots into the past. For that, I'm waiting on the new -s flag for ZFS, that would allow me to sort by snapshot creation date, and remove the oldest (tail -1 is your friend!). I'm also a bit limited by what cron can do at the moment, but I reckon what I've got is good enough for starters.

What does it look like ? Well here's a "screenshot" :

# svcs | grep zfs
online         18:36:11 svc:/system/filesystem/zfs/auto-snapshot:space-timf
# svcs -l svc:/system/filesystem/zfs/auto-snapshot:space-timf
fmri         svc:/system/filesystem/zfs/auto-snapshot:space-timf
name         ZFS automatic snapshots
enabled      true
state        online
next_state   none
state_time   Wed May 10 18:36:11 2006
logfile      /var/svc/log/system-filesystem-zfs-auto-snapshot:space-timf.log
restarter    svc:/system/svc/restarter:default
dependency   require_all/none svc:/system/filesystem/local (online)
dependency   require_all/none svc:/system/cron (online)

Okay, not that exciting to look at. To make life easier, I also wrote a simple admin GUI, which asks you the right questions, and constructs the instance manifest for you. It does need a pretty recent version of zenity (thanks Glynn !) to work, but that's included in Solaris, so you should be okay. Here's what that looks like :

Tim's ZFS auto-snapshot admin script

Oh, if you'd like to give this a whirl, get a copy of this tarball and do the following:

# cp zfs-auto-snapshot/lib/svc/method/zfs-auto-snapshot /lib/svc/method
# svccfg import zfs-auto-snapshot/zfs-auto-snapshot.xml
#  [ now create an instance, using my GUI, or your text editor of choice ]
# svccfg import my-auto-snapshot-instance.xml
# svcadm enable svc:/system/filesystem/zfs/auto-snapshot:tank-foo

If all goes well, you should see a new entry in your crontab (check using crontab -l) and you'll start getting regular snapshots. And since this is SMF, you can disable with svcadm disable svc:/system/filesystem/zfs/auto-snapshot:tank-foo. I've tested this on snv_35, and it seems to be alright, but let me know if you encounter anything weird.

Now, there's more work to do : particularly, the error handling here isn't stellar (I'd like the service to degrade if we weren't able to take a snapshot for some reason), I also need to implement the rolling snapshot functionalty and should probably be a bit more sensitive wrt. security roles and profiles. Still, for a first attempt, I think this is cool.

Here's me showing all the snapshots I have, both automatic and manual of one of my filesystems :

# zfs list -r space/timf
NAME                   USED  AVAIL  REFER  MOUNTPOINT
space/timf            1.28G  24.9G  1.28G  /space/timf
space/timf@backup     1.66M      -   458M  -
space/timf@more-recent   114K      -   989M  -
space/timf@something_else  87.5K      -  1003M  -
space/timf@zfs-auto-snap-2006-05-10-19:00:00      0      -  1.28G  -

Any/all comments welcome!

Update 11th May: I've fixed a bug in the method script that could cause one auto-snapshot cron job to overwrite a separate job that was a child of the parent. Also tweaked the GUI to change the snapshot period depending on the interval type. The link above is updated to point to the new tarball.

Update 12th May: Fixed another bug in how cron jobs are created (oops).

Update 8th June : You probably want to see more recent posts on this topic, here , here, here and here.

Comments:

Nice work Tim.

Any idea if this functionality could be incorporated into the ZFS web browser GUI so we could do most all ZFS admin/monitoring in one GUI?

We appreciate your quick effort from forum to prototype!

Posted by Wes Williams on May 10, 2006 at 08:15 PM IST #

Hey Wes, no problem - hope it's useful. I see your point about the web GUI, I'll mention it to Steve and see what he thinks. For now, this is just a prototype to see if people like the idea of automatic snapshots, probably needs a bit more peer-review before we start making more ambitious plans (but hey, today automatic snapshots, tomorrow, The World! ;-)

Posted by Tim Foster on May 10, 2006 at 08:26 PM IST #

"...to see if people like the idea of automatic snapshots"

Are you kidding me? I'm sure they'll be a blockbuster hit! Okay, maybe not that enthusiastic, but certainly well received at least. 8)

Posted by Wes Williams on May 10, 2006 at 08:38 PM IST #

I'm aware of ZFS rollback of an entire system but I vaguely recall reading somewhere that commands can also be more discrete - rollback or restore of directories or individual files. Please, am I mistaken?

Posted by Graham Perrin on May 28, 2007 at 09:19 AM IST #

The latest version is at

http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_8

and I've got version 0.9 in the works at the moment, with some additional features...

Posted by Tim Foster on September 18, 2007 at 11:37 AM IST #

Oh, sorry forgot to answer your question Graham - no you can't rollback individual files or directories, but of course you can access them within the

<dataset>/.zfs/snapshot/<snapshot-name>/

file hierarchy and copy them manually from there.

Posted by Tim Foster on September 18, 2007 at 11:42 AM IST #

Note for future visitors - this version is now out of date - the latest version is available via a link on the sidebar of my blog (at the time of writing, this is version 0.10)

Posted by Tim Foster on June 15, 2008 at 01:11 PM IST #

I've noticed some odd behavior from the 'frequent' and 'hourly' services due to a scheduled scrub.

I scrubbed the whole pool over the course of 5 hours on Sunday night and found today that the frequent and hourly services had transitioned to maintenance:

"[ Jul 6 11:31:02 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:31:02 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:31:02 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:31:02 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]
[ Jul 6 11:45:54 Stopping for maintenance due to administrative_request. ]"

I'm running Solaris 10 05/08. Have you seen this behavior before? Does the SMF put a service in maintenance after a set number of failures?

cheers,
Blake

Posted by Blake Irvin on July 09, 2008 at 04:12 PM IST #

Hi Blake, hmm, interesting.

To answer your question, SMF doesn't drop the service to maintenance automatically - the code that takes/destroys the snapshots checks at various points to see if operations were successful - if they fail for any reason, we ask SMF to degrade the service (to allow the admin (you!) to investigate the problem)

Check out /lib/svc/method/zfs-auto-snapshot and look for places where it calls check_failure().

There should be logging in the SMF log in /var/svc/log/system-filesystem-zfs-auto-snapshot*.log that should explains why we dropped out to maintenance mode - it might help track down what you're seeing.

Posted by Tim Foster on July 09, 2008 at 04:24 PM IST #

I will be looking at the method script shortly to debug. The log entries I posted were from the SMF service log you specified - but there aren't any illuminating details. I was aware that the service should avoid taking a snapshot during a scrub, but wasn't expecting it to transition to maintenance. I'll post my findings if I can.

Posted by Blake Irvin on July 09, 2008 at 04:33 PM IST #

More digging and it appears that the error happens when the snapshot service sees that a file system has been destroyed:

"To: root@filer1.domain.com
Subject: Output from "cron" command
Content-Length: 1311

Your "cron" job on filer1
/lib/svc/method/zfs-auto-snapshot svc:/system/filesystem/zfs/auto-snapshot:hourly

produced the following output:

cannot open 'pit@zfs-auto-snap.hourly-2008-07-08-12.00.00': dataset does not exist
cannot create snapshot 'pit@zfs-auto-snap.hourly-2008-07-09-12.00.00': dataset already exists" "

I guess I don't fully understand how the service tracks snapshots - I thought it only looked at the custom zfs property com.sun:automatic-snapshot:*

Posted by Blake Irvin on July 09, 2008 at 09:30 PM IST #

That's weird - could the filesystem have been destroyed while the method script was running via cron? Could be a race condition. Alternatively, could there be two instances of the method script running with the same label?

Posted by Tim Foster on July 10, 2008 at 01:37 PM IST #

Oh - last thing, would this explain what you're seeing Blake?

If you're using one of the canned instances, turning on "snapshot-children" doesn't work. That would result in us querying all datasets for properties, then recursively snapshotting all of them, which would mean we'd step on our own toes.
eg. taking recursive snapshots of tank *and* tank/foo would fail.

0.11 will fix this by narrowing the list of datasets gathered by com.sun:auto-snapshot, and recursively snapshotting based on inheritance rules, so the above would be narrowed to just taking snapshots of "tank".

Unfortunately, it's all or nothing right now - you can't snapshot "just tank itself" and recursively "tank/foo", for that, using another instance would be needed. We could add another property com.sun:recursive-auto-snapshot perhaps, but that would result in more complexity...

Posted by Tim Foster on August 20, 2008 at 02:41 PM IST #

Hey Tim

Sure that i'm missing something .So i need to ensure, no function to destroy automatically older snapshot is included yet?
Am i wrong?

Thanks for your answer.

Posted by issa kandji on September 24, 2008 at 04:40 PM IST #

Hi Issa, That functionality is there, you need to be running the latest version of the code. The latest stable version is 0.10, to be replaced in a few days with 0.11.

Posted by 192.18.1.36 on September 24, 2008 at 05:11 PM IST #

Thanks for your answer.
Just for information, link to 0.10 on your page is broken.
I got an other issue trying to install the 0.10 ; i see that this is not provided with a script ( launching a GUI ).
I tried finding one your posts how it can be done ( i've to recognize that i'm new in solaris) but i didn't find.
May be weather in Dublin is dirty now, thus you get your hands now on your keyboard...Maybe an Answer i'll get in few minutes :-)

Posted by issa kandji on September 25, 2008 at 09:04 AM IST #

The README file provide pkgadd TIMFauto-snapshot but this is the error messages i got:
pkgadd: ERROR: no package associated with <TIMFauto-snapshot>;
even though i try to spool it :pkgadd -d <package-dir> -s /var/spool/pkg , it don't like to be part of my packages:-).

Thanks

Posted by Issa Kandji on September 25, 2008 at 09:58 AM IST #

Issa looks like the mediacast people changed the URL.

http://mediacast.sun.com/users/timsf/media/zfs-auto-snapshot-0.10.tar.gz/details

To add the package, as root, do:

# cd ./zfs-auto-snapshot-0.10
# pkgadd -d proto TIMFauto-snapshot

Nope, there's no graphical installer. This stuff will be in OpenSolaris 2008.11 by default, so you won't have to do anything to install it! :-)

Posted by 192.18.1.36 on September 25, 2008 at 10:06 AM IST #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2008 by timf