Wednesday Jun 24, 2009

I think I'm about done with the next release. Here's the Changelog entry:

0.12

  • Add event-based snapshots
  • Add support to change the separator character in snapshot names
    • set the default value of "zfs/sep" to "_"
    • useful for CIFs clients that previously choked on colons in snapshot names
  • Improved shutdown speed via http://blogs.sun.com/dp/entry/speeding_to_a_halt
  • Add support to allow the user disable auto-snapshots of new pools
  • Bugfix to allow snapshots of datasets with spaces in their names
  • Bugfix to properly deal with namespace clashes in dataset names
  • Exported $LAST_SNAP and $PREV_SNAP variables when performing backups

The main new thing here is the "event-based-snapshot" instance, as described in the README and Defect 9595. It's nothing earth shattering, but a useful feature I think.

I've found that in my day to day use of OpenSolaris, I tend to take the lazy option of running "zfs snapshot -r rpool@snap" whenever I'm about to do something to the system that I might regret later, and don't want to wait for the 15 minute ":frequent" instance to fire. Later, I go to run the same thing again, and find I've already got a snapshot called rpool@snap, so I take a new one, rpool@snap2 - can you tell where this is going?

Eventually, I find myself out of disk-space and with no real clue what was interesting about rpool@snapn in the first place. I torch the snapshots and get on with life. So far, this has worked just fine, but it's a bit manual and results in us snapshotting rpool/swap and rpool/dump, and I don't really want that.

Some of the stuff Erwann added for 2009.06 helps a bit - with the "snapshot this directory" button in Nautilus, we could take a snapshot of a single dataset, but it doesn't group them, and still leaves the name of the snapshot as the only place to describe the snapshot contents.

So, my solution was to add the svc:/system/filesystem/zfs/auto-snapshot:event instance. Most of the code for this was in 0.11, but I'm now including a manifest that uses it. This service isn't managed using cron, instead you get to manually run run the method script each time you want to take a snapshot. You also have to option of supplying a description that gets stored into a user property on the snapshot, com.sun:auto-snapshot-desc (feel free to go all "Web 2.0" here, and add #hashtags if you want!)

Where this wins over a simple "zfs snapshot -r rpool@snap" is that it uses the com.sun:auto-snapshot properties used by the other instances to determine which datasets we want to take snapshots of (and can be overridden by com.sun:auto-snapshot:event).

I've hacked together a (flawed) GUI that I've added as a launcher on my GNOME panel which shows a Zenity dialog box asking for a description of the snapshot, runs the method script, then pops up a notification once the snapshots have been taken. I've not included this in the package, since I'm sure someone will do a better job of it, but download from here if you want it. Obviously this is but one use for event-based snapshots: if you set the zfs/backup-save-cmd SMF property on that instance, you'd have a 1-click "backup my stuff" button! :-)



More of my awesome GUI skills...



The snapshot event notification popup

As ever, the README also documents these changes, and you can get the sources and build yourself a package via:

$ hg clone ssh://anon@hg.opensolaris.org/hg/jds/zfs-snapshot

I'm not sure when these changes will land in OpenSolaris - there'll be a 0.12.1 release as soon as I get to write support for the 'zfs list -d' command that Chris added for the bug I filed 6762432, so perhaps we'll wait for that (I thought it polite to wait till everyone was able to run a version of OpenSolaris that included that fix before making the changes).

Comments welcome here, or on the zfs-auto-snapshot mailing list

Monday Jun 15, 2009

Some of the work I've done recently involved some changes to virt-install(1) to teach it how to install xVM guests from OpenSolaris AI servers - work that you'll see going back soon as a patch to virt-install as part of our xVM 3.3 changes.

This ended up shaking a few bugs out of AI and OpenSolaris, two of which became stoppers for the 2009.06 release (which made for a rather exciting weekend) one was fixed, the other was documented in the release notes.

Along the way though, and the point of this blog post, I got to learn a bit more about OpenSolaris and the boot process when we're using AI and what to do when things go wrong.

For x86 (the only thing I really cared about in my case, sparc differs slightly) AI works by downloading the kernel and a very small boot archive via pxe and tftp. The client then boots with this image and the svc:/system/filesystem/root:live-media SMF service arranges to download solaris.zlib and solarismisc.zlib files, and mounts them on the client.

However, should that service fail for some reason - we're left with a pretty unfriendly OpenSolaris environment. There's a tradeoff between fast/low memory installs and easy-to-debug environments so it's a tough one to call.

However, if you do need to debug stuff early in boot with an AI image, I added a comment to Defect 6851 that explains how you do it. This came from an email I was writing to a colleague today who was running into the same problem - I figured posting those comments to the bug report and writing this short blog post would be a good thing to do. Hope this helps someone out there?

Wednesday Jun 03, 2009

A few weeks ago, we had a Saturday that was everything a Saturday should be - not too early a start, a nice breakfast, an energetic, if slightly damp, walk with Calum and the four of us visiting the Botanic Gardens for a picnic.

We're sort of in a weird state at home at the moment, trying to go around appreciating everything that Dublin has to offer (hence the visit to the Botanic gardens), unsure of what the future holds.

Ever since our first trip to New Zealand we've always thought it'd be an interesting place to live, but before my most recent trip there for Glynn & Jayne's wedding, we'd talked about my using the time over there to consider more carefully whether it's somewhere we'd want to emigrate to.

Over the course of the two and a bit weeks there, I was gradually leaning towards a "yes" answer to the question above, but it's funny - as soon as I heard the rumours about a supposed deal with IBM to buy Sun, I'd decided that if the deal went through, that'd be it, we'd really seriously look into moving. In some ways, that the deal fell through was a bit of a relief, not only because I think it'd have been the wrong thing for Sun, but also because I was off the hook in terms of facing that big decision to move. Now that there's another deal pending with Oracle, I need to face that question again.

On the other side of the argument, we've got a pretty cushy number in Ireland at the moment: our house is a 25 minute cycle from the office (the missus has a shorter commute to her office) and the creche we use for Ella, and possibly Calum too, is right next door -- but that, in a way makes us feel even more trapped: should we give up what appears to be a perfect setup and fling ourselves into the unknown? Looking further out, there's good primary schools in the local area for the kids, but getting them to a secondary schools would mean quite a commute for them I think, so we'd end up having to move somewhere in a few years anyway.

There's some things that could make moving easier. Already, I'm the only person in Ireland working in the Solaris xVM kernel group, so I'm working remotely wherever I am: working from the other side of the world probably wouldn't be that much different for me, assuming I'm allowed that opportunity with the pending acquisition. And of course, I'm not just moving me - we're a family, so anything we do has to work for everyone, otherwise it's not going to happen.

So is the decision to move made yet? No, not at all - it is a realization though, that we need to make that decision soon, rather than leave it hanging over us. Perhaps the best way, is to try to get out there for a year, and see how we settle - a sort of "Try and Buy" approach I suppose.

But, between then and now, there's plenty to enjoy about Ireland, and it seems like considering the question on whether to emigrate or not has made us think a lot about life in general and what we want to get out of it. I think we all should be enjoying it a lot more, dancing in the bluebells as much as we can.

Wednesday May 27, 2009

It's with regret, that I won't be able to head over to CommunityOne West next week - the OpenSolaris tracks look pretty interesting, but as ever with this sort of thing, it's as much about the people you'll meet as the actual content of the sessions - in fact, I'd almost argue, it's more about the people you'll meet, than the sessions.

It's been ages since I've chatted to OpenSolaris-folk in person (other than my immediate colleagues in the xVM team and Glynn of course) and while email and IRC are good, they're no really match for having a beer and a natter with like-minded people - the OpenSolaris Party on Monday night looks like a pretty good opportunity for that. I would also quite like to be over there for the launch of 2009.06 having played a bit of a part in this release too (note to self: filing bugs that end up on a stopper list makes for somewhat exciting weekend), oh well.

My travel plans this year have been put on hold for a while - having been away from the wife + kids for a few weeks while I was down in New Zealand, closely followed by another week in MPK, I've promised to stay put in Ireland for a bit, although more about that in a future blog post I think.

Still, I'll be doing my best to keep up with what's going on at C1, and hope that I can at least read some blog reports on how the event goes next week. Wish I was there! If you wish you were there too, but live a bit closer, and feel like learning more about OpenSolaris, then do fill in the box below :-)

Monday Apr 27, 2009

Original image by James Jordan

I've been rather busy of late, with a recent trip to MPK resulting in a ton of work to bring back home, so haven't had much chance to blog as much as I'd like, apologies.

But, recently, there's been some activity with the ZFS Automatic Snapshot service that I thought I'd publicise a little bit. It seems that great minds think alike: myself, Brock Pytlik from the IPS team and Glenn Brunette (ok, two great minds, and me :-) all seem to have come to the independent conclusion that automatic snapshots on a local machine are good, but snapshots going to a remote machine are great, and have become more interested in dusting off the lesser-known zfs/backup-save-cmd option of the ZFS Automatic Snapshot service.

The timing here is excellent, as this is something I'd been thinking about with the advent of the Sun Cloud API (which relates to my day job at the moment in an interesting kind of way). More work to come in these areas I hope, but after a few mails back & forth with Glenn, he's made it first-past-the-post, with an implementation to send auto snapshots to S3 storage, which looks pretty nifty to me!

There's a heap of other stuff we could do here, we need a few things for this to really fly though:

  • A means to list all snapshots on the remote end
  • A means to choose the most recent common snapshot between the local and remote ends, and send an incremental send stream between that snapshot, and the one we've just taken
  • A means to define what "remote end" means, in an extensible way (be it removable media, network devices, cloud storage etc.)
  • An ability to send/recv into ZFS-based Cloud storage - (storing flat ZFS send streams in the cloud isn't as useful imho - I'd like to be able to browse these from any device)
  • Use the auto-snapshot zfs/interval SMF property set to none, we can take event-driven snapshots, so we could do things like hook the service into nwam, so that we take an on-demand snapshot whenever we get a network connection ( assuming a sensible time period has elapsed since our last snapshot) so we never lose data. The zfs auto backup prototype I'd posted before did this for local disk storage, but I never really took the idea further, waiting for better ZFS removable-media support.

Of course, there's just not enough hours in the day for one person to do all of this, but if you're interested in these sorts of problems, do subscribe yourself to the ZFS Automatic Snapshot email alias and dive in!

But once again, kudos to Glenn for giving this a whirl!

Tuesday Mar 10, 2009

I voted:

RECORDED:  ballot 971674dbf07ccbd25a1bf7935a61ecc1a8b26493 on "Board
Election 2009/Change Constitution" from Tim Foster
Connection to poll.opensolaris.org closed.

- "yes" to what looks like an excellent change to the constitution, and for a set of seven people (I expressed preferences for all sixteen candidates) that I really would love to see on the OGB this year.

Yes, I voted before seeing much in the way of electioneering from several candidates, my view being that people going for posts on the OGB should be pretty well known in their work on the project already and I shouldn't only be hearing about them in the days running up to the election.

There's detail on the above on the 2009 elections page.

Wednesday Jan 21, 2009

We've got the venue confirmed: here are details of the upcoming Irish OpenSolaris User Group meeting:

TopicGeneral OpenSolaris Discussions
DateThursday 29th January 2009
Time19:30
LocationThe Vaults

Look forward to seeing you there! If you can help out with equipment or have ideas for presentations, or just feel like saying "hi", drop mail to the mailing list.

Monday Jan 19, 2009

I sent some mail to the Irish OpenSolaris User group list today, proposing to kick-start our user group meetings again.

Meetings haven't happened at ie-osug since last February, and we're trying to see if a change of tack would help get things going again.

Our meetings from June '06 - Feb '08 were more like a mini lecture-series about OpenSolaris, and while I think these were interesting, they often came across as a bit formal: yes we had pints afterwards (which were usually great) but there was never the atmosphere of community we'd hoped for.

So, this time, we're going to try the approach the SFOSUG use - try holding the meetings in a pub. The location we're thinking of, The Vaults serves food & beer and we'll hopefully be able to reserve a small room in the place. We'll bring along a wifi acccess point and a few laptops, and an LCD projector. We'll still be able to do "feature presentations" if people feel like doing them, but hopefully the more informal atmostphere of a pub will help get people talking a little more, and perhaps grow the user group and get more people participating.

I don't have the exact time/date yet - but will post more when I have it, I suspect it'll be Thursday January 29th. Do please comment here, or send mail to the list if you think this would be a good way to get more people interested in OpenSolaris in Ireland?

Wednesday Jan 14, 2009

I'm absolutely thrilled to announce, that Scott Seighman is continuing to write the OpenSolaris in review posts that I'd been maintaining for a while in this category.

Original image by Dan Taylor Original image by Dan Taylor

So, without further ado, here's Scott's first post - OpenSolaris in Review - December 2008. Time to update your RSS feeds everyone!

Finally, speaking from experience, these posts are a lot easier to write when people in the community help out by contributing additional content each month - so do please send Scott anything you might have that's worthy of a mention.

Thanks for taking on this job Scott - much appreciated!

Monday Jan 12, 2009

I'm rubbish at this sort of thing and generally don't do Internet memes, but since stevel tagged me, the worry of whether I'm interesting enough to come up with 7 things you may not know about me has been weighing heavily on my mind. Judge for yourself whether they're interesting or not!

  1. As a kid, I lived opposite Rathfarnham shopping centre, which in those days was shut at weekends and at 6pm each night. This made for the perfect place to tear around on my blue Raleigh Grifter, a fantastic machine - which probably weighed as much as I did at the time. I have fond memories of that bike - it's motorcycle-gripshift and it's wonderful, if sometimes deadly, 3-speed Sturmey Archer hub.

    Much of the time was spent just messing about, playing chicken with brick walls (pretending you were an X-Wing trying to pull up before crashing into the shields of the now-fully-operational Death Star. I still have all my teeth, in case you're wondering), but playing "Squares", a mixture of a slow bicycle race, sumo wrestling and Kick Start, was the best way to spend your time on a bike.

    Here's how you play: decide on a playing area, a square or rectangle, marked out by the white lines of a few empty car parking spaces. Then get a few of your friends/siblings on bikes to cycle around within the confines of that square. If you touch the ground with your feet or go outside the square, you're out. The winner is the last person on their bike. Hilarity ensues. (and the odd grazed knee, I'm guessing)

  2. I'm the most optimistic person I know. I think this tends to eventually get on people's nerves, but I can usually be relied on to put a cheery spin on whatever situation we're in at the time.

  3. Back in the day, I spent way too much time writing mod files using FastTracker and others. This was all on our 16mhz i386 (2mb ram!) with a handmade Covex thing stuck in LPT1 - I couldn't afford real hardware, which was probably for the best. I've lost most of what I wrote over the years, but have dredged up a few bits of music here, with one converted to mp4, just in case you don't have a mod player handy. If you don't, don't worry, you're not missing much!

  4. From time to time, I wonder what life would be like if I followed the other career ideas I'd had. Over the years, I've wanted to be a cabinet maker and a rally driver. I studied Science in university and was still doing two units of Botany in my final year, before deciding computers were for me.

    Even now, I wonder whether I'd be any good at photography, probably not - though more on my SmugMug page (I like the New Zealand ones - and am looking forward to visting there this year for Glynn & Jayne's wedding!) On the other hand, without the pressures of deadlines and actually putting bread on the table, perhaps being an amateur isn't so bad after all.

  5. I'm the only Genesis fan I've ever met - I mostly prefer the Peter Gabriel era, but like all of their stuff really. I tend not to really like any other prog rock band I've listened to.

  6. I used to role-play rather a lot: epic games that'd last multiple weekends, with story arcs that went on and on. We played anything we could get our hands on - D&D, Runequest, Call of Cthulu, Paranoia, Shadowrun, the system didn't really matter to us. Haven't played in a year or two though, but it's still something I really enjoy. I'm not sure whether this qualifies me as "evil" stevel? Depends who you ask I guess ? :-)

  7. Do you ever get the feeling that you have a really good idea floating around in your head, but can never quite organise your thoughts enough to express it? I get that all the time. I hold a few software patents (not really worth talking about) and have submitted one or two other patent disclosures, but still feel like I haven't quite had my "Eureka!" moment yet - it's not something I worry about, but it's in there somewhere.

Ok, with that out of the way, here's another list of people who can choose to ignore this Internet meme if they so choose:

Here are the rules:

  • Link to your original tagger(s) and list these rules in your post.
  • Share seven facts about yourself in the post.
  • Tag seven people at the end of your post by leaving their names and the links to their blogs.
  • Let them know they’ve been tagged.

Saturday Dec 27, 2008

One of the many great things about being a parent around Christmas, is that not only do you get to see the looks on the faces of your kids when they're opening their presents (itself, priceless) but you also get to play with the presents with them!

Ella was really lucky this year, receiving lots of lovely presents, but one of the best, in my opinion, came from Duncan & Denise, an Alphabet Jigsaw - handmade by a company in Westport, Co. Mayo. It's one of the most ingenious things I've seen.

Ella thinks it's wonderful too. Here she is playing with it wearing her Upsy Daisy costume (thanks Santa!) with Gramps on the kitchen floor:

Calum, being only 15 days old when Christmas arrived, wasn't quite as much into the spirit of the occasion as Ella was, though he did manage to look quite adorable in his "My first Christmas" outfit. May the joys of parenthood continue!

Friday Dec 19, 2008

So far, parental leave is going pretty well - 9 days in and we're coping ok (and by the way, I'm not reading work email at the moment, which is probably a good thing - in my sleep-deprived state, I'm not likely to have much in the way of coherent responses!)

To lessen the effects of cabin fever we decided to go on our first family trip with Ella and Calum yesterday. This year in the Dublin Docklands, there's a Christmas market, and it sounded a bit easier to get the kids to that, rather than fly to one of the real Christmas markets in Europe.

A few things became clear during this trip - getting all four of us out of the house is going to take a lot more practice, and our car doesn't fit two prams (the missus carried Calum in a sling instead, but he won't stay this size forever: perhaps it's time to consider a mini-van? Our days of 3-door BMW coupes seem to be long gone...)

Was the market worth going to? Well, yes, if only just to see the look on Ella's face when she saw the carousel - so many rocking horses in one place, how fantastic! The Bratwurst and Lebkuchen were also very welcome treats. More photos below.

Thumbnails of Tim's photos taken at the Dublin Docklands 12 Days of Christmas market

Happy Christmas everyone!

Wednesday Dec 10, 2008

(though I'm happy that's being announced too)

Last night, close to midnight, I'd started writing a blog entry about Solaris on the desktop, the different environments we've used, and was going to talk a bit about how far we've come from when I joined Sun back in 1996 up to today's release of OpenSolaris 2008.11 (I was also going to mention my role in writing a chunk of Time Slider for the current desktop environment) but that post's for another day.

There was a much more important delivery today: Calum Henry Foster, 7lbs 2 oz, was born at 6:04am about 30 minutes after we arrived in the hospital. And yes, I broke a few red lights on the way in! Mother & baby are both doing well and I am the proudest father in the world (again!) - I'm grinning from ear to ear right now.

Tuesday Dec 09, 2008

Since I started the zfs-auto-snapshot work back in May 2006, there's been a missing piece of functionality. Well, I'm happy to say - it's missing no longer!

With thanks to Luca Morettoni, we now have a fix for:

6777694 Need ability to control when auto-snapshots are taken

We hashed out some of the details on private mail, then on the mailing list here then a few test scripts from Luca, some paper work, and the integration was done - thanks for your patience Luca! [ involving signing a SCA, sending me a hg export patch which I could push to the repository and waiting for me to push the changes - a few more good patches, and I'd be delighted to give Luca commit privileges to make this a bit easier ]

So now with this changeset, the SMF property "zfs/offset", defined since version 0.1 now actually does something - it takes a value in seconds to specify exactly when the various auto-snapshot cron jobs should fire. As always, bug reports welcome!

I've updated the README with details about this feature. We'll try to get it included in the wos as soon as we can.

Monday Nov 10, 2008

Original image by mborowick

Today's a very interesting day for storage systems - it's cool to see the Fishworks team are announcing the Sun Storage 7000 series systems: congratulations one and all. Great things are afoot in my opinion, these are fantastic systems.

While I'm not working on storage systems at Sun any more, I do feel an amount of empathy for those guys: I am working on a software appliance [1] in the form of xVM Server, and I can certainly appreciate what it takes to take a perfectly working OpenSolaris install, strip it down to the bare minimum, add stuff to make it shine especially brightly for a given task, and (of particular focus for me at the moment!) get a product out to the market.

That said, in my previous job in the Solaris ZFS test group, I did run into the Fishworks project, and that story might be worth telling. (And if there's rose-coloured glasses coming across in this post, I apologise: I love my current job, as much fun as QE was, it was also pretty grueling at times ;-)

It was coming into October 2007, and PSARC 2007/618 - the addition of L2ARC devices to ZFS was looming. These devices, along with Separate ZFS Intent Log devices (as a pair, affectionately known as ReadZilla and WriteZilla) and their intelligent application in a hybrid storage pool are some of the most exciting things about the products being announced today and I've really been looking forward today's announcement: it always gives me kicks to see Sun technology hit the market when I've been able to contribute to the product personally, even in the small way that I did in this case.

Anyway, Brendan had got in touch with the ZFS test group to see whether we could do anything to help out.

Our job as QE engineers on ZFS was to write and maintain the ZFS test suite. Clearly we needed to update the test suite to work with these new L2ARC devices. We'd done the same thing for slog devices, but in this case, we were looking for test coverage quickly. There was a ton of other work piling up on my plate: Solaris 10 update testing for ZFS, the Newboot Sparc work for Nevada, test sponsor duties for the fingerprint authentiction project, on top of all the other daily stuff going on. Busy busy.

So, I started hacking about to see how quickly I could get us a very general set of tests on the L2ARC. The answer? Pretty quickly indeed.

Rather than start from scratch by coming up with a closed set of assertions about L2ARC devices, discussing those assertions with colleagues, making sure they were carefully worded, before setting about implementing tests to verify each assertion, I decided to just wing it.

Now that's not to say that we shouldn't also go about writing tests properly, but for a quick fix (in every sense of the word), I wrote a 90 line shell wrapper around /usr/sbin/zpool which you can download here, if you're interested.

The wrapper maintained a list of devices that it'd try to add to every zpool created wth the wrapper; creating a pool would use up one device from the list, destroying the pool via the wrapper would return the device to the list. Pretty simple. This gave us a phenomenal amount of testing for free.

We could use this with our existing test suite, and it would add an L2ARC device to every pool. We could test big and small L2ARC devices, ones based on lofi devices backed by files in / tmp or ramdisks (attempting to simulate really fast disks, despite the weird VM hoops we were jumping through - which resulted in great hilarity when run with our somewhat insane stress tests running on really large machines...) and generally give the code a good run through.

The wrapper found a respectable amount of bugs, and was worth it's weight in gold, despite it's lack of formality in terms of the way we usually write tests. I'm not sure if it's still being used by the ZFS QE team, but I was pretty fond of it.

I think one of the reasons why L2ARC was so pleasant to test, was down to it's design. Like the intent log devices, they integrate beautifully into the rest of the system, with very little extra work on behalf of the user: and that usually makes test engineers happy too (or at least lets them concentrate on the underlying feature, rather than having to spent extra time making sure the CLI was working properly)

Of course helping on L2ARC testing wasn't all work - I was lucky enough to make it over to the Bay Area for the first OpenSolaris developer summit that month, and while in town Brendan was kind enough to invite me up to the Fishworks office for a quick chat about the testing, a look around, and a rather excellent burger for lunch. I even got the chance to discover that I'm completely dreadful at Fish-pong, perhaps lacking in the basic grounding of American football, table tennis and volley ball rules that my Irish upbringing just didn't provide - but that's another story.

I never got a chance to test on one of the physical Storage 7000 series boxes themselves, nor play with what looks like one of the snappiest web interfaces I've seen in a long time, instead I was focusing on L2ARC itself, and helping to make sure it was solid enough to integrate into Solaris. However, that same operating system is the very one that underpins these appliances, so in that sense - I'm glad I could help!


[1] although yes, today's announcements are software and hardware - indeed, xVM Server's not much without the right hardware to back it up either..

Friday Oct 31, 2008

It's been a pretty hectic Z-Day and Halloween, but a great birthday overall! (Of course, if you were to ask E, she'd maintain it was her birthday today as well - but that's ok, I'm happy to share :-)

I was woken up by herself and the missus this morning, being presented with my birthday present: a set of knee and elbow pads and a unicycle, which I'm absolutely thrilled about!

As a result, when working from home today, coffee breaks were spent wobbling precariously around the kitchen, hanging on to various bits of furniture for dear life - definitely more practice needed, but I think I'm really going to enjoy this particular form of transport: the goal, to commute to work on it at least once, but one step (and fall) at a time - I'm a long way off being able to commute on it.

Work-wise, Halloween has been haunted by a wodge of xVM Server work, a not-too-terrifying zfs-auto-snapshot putback, the creepiness of some of my code getting pushed to pkg.opensolaris.org as part of nv_100a, and the blood-curdling results of more people trying out the service, running into both unknown and known issues along the way. Bug reports are always welcome though, however horrifying!

Tonight though has been entirely work-free: answering the door to trick-or-treaters, some nice pizza and some excellent beer (on an American theme tonight, Sierra Nevada Bigfoot 2008 and Anchor Steam Liberty Ale, yum) and the by now, traditional photographs of fireworks - so, here goes with continuing that tradition!

Luca pointed out some problems with doing a pkg image update to nv_100a bits regarding the new SUNWzfs-auto-snapshot functionality.

You can follow the discussion on the indiana-discuss@ mailing list, but so far, it looks like a few workarounds are needed. On a fresh install, it should all fine, but if you're upgrading from an older development build of 2008.11 (unless we come up with a better fix) it appears to deliver the zfssnap role as a locked account (*LK* in /etc/shadow) which isn't allowed to execute cron jobs.

To work around this, you need to unlock the zfssnap role (I'd recommend running pfexec passwd -N zfssnap), add the following to /etc/user_attr:

zfssnap::::type=role;auths=solaris.smf.manage.zfs-auto-snapshot;profiles=ZFS File System Management

then clear the maintenance state of the service:

$ pfexec svcadm clear frequent daily hourly weekly monthly

Thanks for the report Luca, and nice screenshots on your blog entry! I'll add comments to this post if we come up with any better solutions. As always, for those following along at home, the latest zfs-auto-snapshot bits are in our mercurial source repository, which you can get with $ hg clone ssh://anon@hg.opensolaris.org/hg/jds/zfs-snapshot

Friday Oct 17, 2008

Having missed August and September's reviews and, by the looks of things, October's news review as well, it seems like now is a good time to call it quits and pass the torch to someone else in the OpenSolaris community. I just don't appear to have enough bandwidth to produce these any more - my day job's super hectic, and at home we're expecting a new arrival in December: something has to give, and it's the monthly review, sorry.

I believe that some sort of fine-grained journalistic role is really important for the OpenSolaris community - the existing newsletters are fantastic, but rely on contributions moreso than just digging in and seeing what people are talking about at a community and project level, so I really hope someone will continue on with this work. Glynn did an excellent job before me with weekly news (sample here), and Dan's posts about what's new in build... before that were also fantastic.

From a personal perspective, compiling these reports has also been highly educational, if you're interested in OpenSolaris, this is a great way to get an overview of what's happening and where you yourself might want to contribute to the code, so I strongly urge you to have a go!

All that said, I thought I could pass along some tips on how I put these reports together, in hopes it'll be useful for whoever takes over.

The first place I tend to look for news, is the opensolaris-announce mailing list, checking for big announcements. Next up, is the the ON flag days list and the list of ON putbacks over the past month. Finally, the ARC caselog always makes for interesting reading.

After that, it gets a bit more random. I used some basic scripting to help out - opensolaris-lists.sh. Pass this a month as an argument, and it'll proceed to open the thread list for that month for every OpenSolaris mailing list in your browser - 10 at a time, pausing for Firefox to catch up, and you to hit any key to proceed.

I wasn't reading every email on every opensolaris mailing list (though I did read a lot), rather I scanned the Subject: lines, looking for interesting threads, looked at the length of threads to determine what other people found interesting, and over time, built up ideas in my head as to who's emails were worth reading regardless.

Having done that, I'd started to build up a text file with the following format:

nth October 2008
Some headline text to explain the links
http://opensolaris.org/some/link
http://foo.com/some/related/link

mth October 2008
Another headline
http://opensolaris.org/another/link

Then, I passed that text file through a basic html formatter I threw together, format-monthly-opensolaris.awk and then published. Along with each link, I left a quick plea to have people comment on stuff I'd left out that they thought was interesting over the past month. In months where I was super-organised, I was compiling that list throughout the month, rather than waiting for the end of the month - but in cases where I'd left things too late, it'd take most of an evening to put the list together, 3 or 4 hours I'd say.

I think my editorial style tended to veer more towards the technical posts, covering new and notable putbacks, project creations and occasional media happenings. I was admittedly biased towards ON, where I now work :-) I also tried to cover flamewars on the lists with as balanced a view as I could. Most particularly though, I didn't want to just have the monthly news posts turn into marketing material for Sun Microsystems, Inc - this was supposed to be a community service, for everyone contributing to OpenSolaris, so I hope whoever takes over has similar views! Here's all the reviews I've written, from June 2005 to the present (at varying levels of granularity) to get you in the mood.

So there you have it : now that you know how to produce these monthly reports, we just need someone to do it - volunteers? I'll update this post with a link to whoever puts together October's report!

Tuesday Oct 14, 2008

Nice screenshots Erwann! Take a peek here.

Saturday Oct 11, 2008

We got this code into nv_100, as part of LSARC 2008/571 and (at least inside Sun, so far) folks have been starting to play with it.

It's the first time I've been able to use the GNOME Nautilus integration that Erwann came up with, and I think it's pretty cool. Big ups to Niall & Erwann for all their hard work - on helping to get this integrated - without them, this stuff would still just be kicking around on my blog!

We've had a few comments so far - most were known bugs and fixed already. I'll list them here, and add comments as we go along.

Services enabled by default

SUNWzfs-auto-snapshot delivers all it's instances as disabled, but the accompanying desktop support, SUNWgnome-time-slider (the desktop service that uses SUNWzfs-auto-snapshot, integrates more tightly with the desktop and monitors disk space) had a postrun script that enables the services out of the box. Just run svcadm disable <service> to disable them if you want to, but see below for more ideas if you just don't want to snapshot everything...

Noisy cron job

There was some changes close to integration that made the home directory for the 'zfssnap' role go away, which had impact on the way we were planning on doing logging. Originally, the cron job would just echo messages onto the end of the SMF instance's log file in /var/svc/log but since the cron job now runs as a non-root user, we aren't able to write to those anymore.

So we changed it to write logs to the zfssnap user directory, but that wasn't good either, so we eventually moved all logging for the cron job to syslog. A small bug though meant that the service is still a bit too noisy, and so cron end up sending love letters in in the form of svcprop errors to /var/mail/zfssnap - sorry about that. This was actually fixed pre-nv_100, but it just missed the integration date.

Details here on how to grab the sources and build your own version of the SUNWzfs-auto-snapshot package if you want the fix sooner rather than later.

Service inexplicably dropping to maintenance mode

This is probably the most common failure - I'd filed 6749498 about this, which turned out to be a duplicate of 6462803. I say "inexplicably", /var/adm/messages will actually have more detail - as noted above, I don't have a way to explain to SMF why we're dropping the service to maintenance mode, so you just need to look for the log in the right place. Logging during service start/stop gets picked up by SMF, day-to-day log messages (and there's not many of those) get handled by syslog

Otherwise, a few other words of advice:

The service on startup will arrange to take automatic snapshots of all datasets on all pools on the system. You can have it not do this by setting a ZFS user property at the top level dataset in each pool, eg.

$ pfexec zfs set com.sun:auto-snapshot=false rpool

This is a much better way than just disabling the service altogether, this way, you get the option to have the service take snapshots of datasets you are interested in, eg.

$ zfs set com.sun:auto-snapshot=false space
$ zfs set com.sun:auto-snapshot=true space/timf
$ zfs set com.sun:auto-snapshot=false space/timf/foo
$ zfs set com.sun:auto-snapshot:frequent=false space/timf/onnv

Better yet, if you use ZFS Delegation to allow users the userprop permission, they can set user properties on their own filesystems, and choose which of their filesystems get included in the snapshot schedule as above.

Have a look at the hg history, the README for more documentation, and the auto-snapshot.xml service manifest if you're really interested in what's going on behind the scenes. Enjoy!

Wednesday Oct 01, 2008

Yow - October already, how did that happen? I'm a bit late with August's OpenSolaris monthly news, and now September's has piled up on me as well. I'll try to get them out soon - not enough hours in the day.

In the meantime, I remain very busy with xVM Server work, and my sideline project of getting ZFS Automatic Snapshots into Solaris is hopefully just about done - the status on that, is that I putback the zfssnap role last week to the ON source tree, and the Desktop consolidation have delivered SUNWzfs-auto-snapshot to the WOS already, so we'll have the ZFS Automatic Snapshot service in nv_100 - w00t :-)

Back to the day-job: one of the things that popped out of the gate work for the xVM Server product, was a growing frustration with the older version of hg we're using to manage the xVM and xVM-Server gates. We need support for webrev -r to be able to produce readable webrevs for source trees being managed by Mercurial MQ, and had been maintaining our own private copy of hg and the cdm module for a long time.

We're still not quite at the level where we can move off it entirely, but I was able to spend some time alleviating the problem by a quick bit of ksh hackery. The attachment I sent to the scm-migration-dev mailing list, patch-webrev.ksh makes for slightly nicer webrevs of MQ patches. So, no longer do you have to try to get your head around diffs of diffs. (yuck!)

I suspect this script won't scale for very large source trees, but it's certainly a step in the right direction. Hope you find it useful.

Update (later that day): johnlev spotted a bug in my script where it was reporting more changes that had actually been made in the patch. I've got a new version here which does the trick. I've fixed the link above too.

Saturday Sep 20, 2008

We're down in my parents' house in Wicklow this weekend - a bit of a family get-together, Lyd and Edu are over from Barcelona, and Duncan & Denise are down from Carlingford - the occasion being Duncan & Lyd's birthday. We had a BBQ, yes in September, and thankfully the Irish weather was kind to us and the Sun was shining all day - gorgeous. Sorry Glynn & Jayne, wish you were here!

One of the conversations over lunch was about our respective blogs (we all have one now, apparently), and everyone was complaining that mine had an almost complete lack of anything interesting at all right now - they've probably got a point. Posts in my "Off-topic" category have been pretty thin on the ground of late. Actually, I'm even slipping with the technical ones too - OpenSolaris monthly news posts are late, it's nearly the end of September, and I haven't done August's yet either. Sorry about that, there's just not enough hours in the day at the moment.

So, to appease some of my less technical readers (hi Mum & Dad!) here's a post that barely mentions computers. [ suffice to say, that this being Software Freedom Day, I'm composing this post on OpenSolaris 2008.11 nv_98, I used GNOME, Gimp, Exiftool and Gedit to write it - scarcely a scrap of proprietary software here, and I like it! Ok, on with the non-technical content]

Apart from hanging around with the family this weekend, I was down here for something just as enjoyable. Recently it was a milestone birthday for my father-in-law, and we had clubbed together to get him a a day out experiencing falconry, on a Hawk Walk. The voucher was for two people, and as my mother-in-law isn't terribly fond of birds, I was invited along.

What a fantastic outing it was! A group of eight of us spent a few hours learning about the sport of falconry, then got to spend time handing and flying a pair of Harris hawks in the open, and saw several other large birds-of-prey up close and very personal.

I brought the camera along, and quickly managed to fill a 1GB CF card - here's some of the better shots, but it was a tough choice.

A few emotions strike you when you see one of these birds flying towards the leather glove you're wearing in your left hand. Fear initially - it's all beak & talons arriving awfully fast, as the bird's going for the piece of meat you're holding. The landing is pretty dramatic too, but then wonder takes over. Close up they're absolutely amazing creatures. Surprisingly light too, but then again, they're birds, right?

Back at the centre, we got to see a Snowy Owl, some Ferruginous Buzzards, a pair of Lanner Falcons, and an Eagle Owl and got to bring one of the falcons out to see how vastly it differs in the air from the hawks we were flying earlier in the day.

Would I recommend the day out? Absolutely, yes! The guides were friendly and engaging, informative, and very very passionate about their hobby - a really fascinating experience, which I'd love to repeat sometime. More over on Falconry Ireland's web page and check out their Flickr stream too.

Thursday Sep 11, 2008

I'm probably the last person on earth to discover this - but just today, I used the Mercurial bisect command, and thought I'd write up my experiences in case anyone else hasn't played with it before. I'd read about hg bisect in the hgbook, but never had an opportunity to use it in anger.

Here's the problem I was seeing - in builds of xVM Server that I've been doing, we were producing ISO images, but after installation, the pkg command wasn't working properly. Exploring the image a bit, with some help from the pkg python stack trace, I found the problem was that some items in /var/pkg were symlinks pointing to a non-existent mountpoint on the installed image.

Looking at the build logs from distro constructor, cpio was complaining that there was no space left on the device it was writing to. Digging around a bit more and running another build just to make sure, I found the source of the problem - we were df'ing the source directory for the cpio, then doing a mkfile of that size, creating a lofi device that big, then creating a UFS filesystem on that device. There was the problem - the space overhead incurred by the filesystem meant that we were trying to pour a gallon into a pint pot.

So - I knew what the problem was, pulling the tip changeset from the distro constructor even showed me that the problem was already fixed (my favourite kind of bug!) - the fix being to make the file which backs the lofi device just a bit bigger. My question was, what changeset introduced this fix? Enter hg bisect.

With it, you just need to identify where you know the code is bad, and where you know the code is good, and a test to determine whether the change is present. In my case, the test was really short:

grep "Add 1%" build_dist.lib

- but you could conceivably have the test build an entire OS image, install it, and check for the change. The bisect command then does a chop through all of the changesets, narrowing down to where the change was introduced.

In my case, a source tree of 105 changesets resulted in my only having to perform 6 tests to determine where the change occurred. A grep across 105 files would have completed in no time, but had I actually needed to build an OS image for each test, 105 builds would have taken a very long time indeed.

Here's some edited highlights:

timf@haiiro[435] hg bisect -g tip
timf@haiiro[436] hg bisect -b 0  
Testing changeset 52:42e67ad1e103 (105 changesets remaining, ~6 tests)
125 files updated, 0 files merged, 5 files removed, 0 files unresolved
timf@haiiro[438] grep "Add 1%" build_dist.lib   
timf@haiiro[439] hg bisect -b
Testing changeset 78:76e8ef490770 (53 changesets remaining, ~5 tests)
119 files updated, 0 files merged, 95 files removed, 0 files unresolved
.
.
timf@haiiro[447] hg bisect -b                
Testing changeset 103:b8d33c12a531 (4 changesets remaining, ~2 tests)
2 files updated, 0 files merged, 0 files removed, 0 files unresolved
timf@haiiro[448] grep "Add 1%" build_dist.lib
	# Calculate the size of the pkg data directory.  Add 1% of the
timf@haiiro[449] hg bisect -g                
Testing changeset 102:ef08a25b1d1c (2 changesets remaining, ~1 tests)
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
timf@haiiro[450] grep "Add 1%" build_dist.lib
	# Calculate the size of the pkg data directory.  Add 1% of the
timf@haiiro[451] hg bisect -g                
The first good revision is:
changeset:   102:ef08a25b1d1c
user:        Karen Tung 
date:        Wed Aug 06 20:22:36 2008 -0700
summary:     2810 pkg archive size not big enough sometimes

So - I need to update our copy of distro_constructor to be based on changeset ef08a25b1d1c, which gets me the fix for 2810. Yahoo!

Tuesday Aug 26, 2008

I'm just about done with this release of the ZFS Automatic Snapshot SMF service and have just pushed some changes to the mercurial repository on opensolaris.org.

This is a pretty important release, in terms of fixing stuff that's been bugging me about the service since I initially released it. But inevitably, with lots of change comes the possibility of lots of bugs - so, I was hoping to get some feedback on how it's looking before it gets officially released.

So, if you're feeling brave (that means don't use this in production yet!) fire up your favourite source code management system (which is hg, right?) and access the mostly-untested ZFS Automatic Snapshot 0.11 Early Access release via:

hg clone ssh://anon@hg.opensolaris.org/hg/jds/zfs-snapshot

I'm working with Niall & Erwann in the Desktop group here, who have been tasked with DSK-5, to get ZFS Automatic Snapshots on the desktop, and so far, it looks like my code will be providing the back-end service (obviously as well as ZFS :-) so some of the changes are things that make most sense when running this on a desktop or laptop machine.

With that in mind, I've not made any changes to my bundled GUI, since it'll be going away real soon now. However, I've done my best to ensure that there's always ways of turning off the small-system-focused bits, and the service remains backwards-compatible with earlier manifests.

So what's going to be new in 0.11 ? Well having seen Nils write up his changes in that form (more on that later), I thought I'd have a go at writing a Changelog too - so here's the annotated Changelog entry so far for 0.11:

0.11

  • Add RBAC support
    • the service now runs under a zfssnap role
    • service start/stop logs stay under /var/svc/log
    • other logs saved to /export/home/zfssnap (and syslog) [ yes, this sucks a bit - better solutions welcome? ]
  • Add a 'zfs/interval' property value 'none' which doesn't use cron
  • Add a cache of svcprops to the method script (good idea Nils!)
  • Add a com.sun:auto-snapshot user property used by all instances, com.sun:auto-snapshot:$LABEL takes precedence
  • Remove the seconds field of the snapshot name - it's not needed (good idea Håkan!)
  • Changed the way // works with recursive snapshots - ignore snapshot-children, and instead automatically determine when we can take recursive snapshots based on which datasets have the zfs user properties
  • Set avoidscrub to false by default (6343667 was fixed in in nv_94)
  • Bugfix from Dan (thanks!)- Volumes are datasets too
  • Automatically snapshot everything by setting com.sun:auto-snapshot=true on startup. (this gets done on all top level datasets - an existing property set to false on the top level dataset overrides this)
  • Check for missed snapshots on startup
  • Clean up shell style a bit
  • Clean up preremove script (I need to make these scripts redundant before we move to IPS, I know)
  • Write this Changelog
  • In terms of user-visibility, the most obvious changes are running under RBAC, and taking snapshots of all filesystems by default - I realise the latter could be controversial, but you can turn it off if you don't like it. I'm also pretty happy with the changes to the "//" schedule - we now ignore "zfs/snapshot-children" for this particular case, and instead use the list of filesystems marked as "com.sun:auto-snapshot=true" to work out which filesystems we can take recursive snapshots of, and which we have to take indivdual snapshots of. This makes a big difference on large systems.

    One thing that's missing from this release, is Nils Goroll's suggested changes about improving the way the system performs scheduling - more details here. I feel that moving away from cron would result in less familiarity in what the service does: if cron is the problem, certainly one solution is running away from it, but wouldn't it be cool to get cron's shortcomings fixed instead? Yeah, one of those "ample-free time" problems.

    So, without further ado, there's full documentation in the README - enjoy, and please let me know if you see anything weird - there's still time to fix it before 2008.11 (and yes, all this despite my day-job being super hectic right now! xVM Server is getting 99% of my time at the moment, so I definitely expect bugs this early access release of zfs-auto-snapshots!)

    Tuesday Aug 05, 2008

    Life remains as hectic as ever - day-job still amazingly busy (but good-busy) and things are in the same state at home: I'm typing this entry from my in-laws house in Carrickfergus, where we've retreated for the week while we get some building work done to our house [ we're dry-lining the interiors of all external walls, which hopefully will make for a less chilly winter, but it's a messy job, and E's creche is shut for 2 weeks. Migrating north seemed like the sane thing to do ]

    As a result of the above, I haven't been able to give OpenSolaris the attention it deserves, except where my day-job contributes to it of course :-) So, with some amount of guilt, here goes with July's (slightly shorter than usual?) news report - as always, please add missing stuff to the comments section.

    Tuesday Jul 22, 2008

    I stumbled on this tip today on planet.gnome.org about how to tune what gets displayed on your favourite planet - this has made me extremely happy, as I now get to have a userContent.css file that says:

    @-moz-document domain(planet.opensolaris.org) {
      div.observatory div.person-info { display:none; }
      div.observatory div.post { display:none; }
    }
    

    Why? Well, in it's own words, "Planet OpenSolaris is a window into the world, work and lives of OpenSolaris hackers and contributors." - more particularly, I don't feel it's the right place for documentation or marketing spiel about OpenSolaris - there's other places for that.

    Don't get me wrong, I'm thrilled that those guys are writing content about OpenSolaris that Google will cache and end-users will benefit from - they're doing a fantastic job! Personally though, I go to planet.opensolaris.org to read what people think: I don't go there to read software documentation or watch hundreds of screen shots of installation wizards, let alone read about quintuple-boot setups (gak!). Come on guys - there's got to be real people behind the marketeers?

    So, to paraphrase Lt. Ripley, I vote we take off and userContent.css those entire posts from orbit. It's the only way to be sure.

    Of course, I could be wrong - if so, feel free to load your text editor of choice, and with feeling, type div.timf div.person-info ... I'll totally understand!

    Saturday Jul 12, 2008

    I'm packing at the moment for another trip to California - this time for a bunch of meetings in the Sun office in Menlo Park related to xVM Server and to generally get to see some of the folks I'm now working with face to face (you can say what you like about remote-work, nothing beats getting to be in the same room as your colleagues from time to time)

    This blog has been a bit quiet of late - I've been super busy since I started in Solaris kernel group working on xVM. Over the last few while, I've been learning as much as I can about xVM in Solaris, working on a bunch of nightly build scripts, which are now cranking out those iso images (hurrah - now I just to make the iso images contain the right packages!) I also got to fix 6714450, though we need to hang on for the opensolaris.org webapp to be refreshed in order to take full advantage of it. More about that at a later date.

    Coincidentally, Gman is also going to be in Menlo Park at the same time as me, so it'll be great to see him again - we Foster-brothers don't get to meet that often!

    Anyway, if you see me wandering around the labyrinth that is MPK17 stop me and say "hi", I'll be there from the 14th-18th!

    Wednesday Jul 02, 2008

    Here's the news for June 2008 - things are still pretty hectic for me as I continue to learn my new day-job, so haven't been able to keep on top of the news as much as I'd like, and I've got a sinking feeling I'm missing some important stuff in the list below. So, if you feel like helping out, please add comments to this post.

    If you really want to help out, and would like to become a guest-editor for July's news, drop me a mail and I'll send on the scripts I use when compiling the list - it'll still be a pretty time consuming task though (2 or 3 hours, typically), but it's for a good cause!

    So, without further ado, here's some of the stuff that went on in June:

    Wednesday Jun 11, 2008

    I did my first xVM gatekeeping putback today - pretty minor changes to some build scripts, but a nice way to ease myself into the joys of dealing with hg, MQ and our gate setup. Thanks to johnlev for helping me through the putback procedure!

    For those that care, the bug was http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6710070 - and if anyone's interested, you can see the fix as a comment to this post.

    My next task, is to tackle the xvm-unstable gate. At the moment, for contributers to the Xen Community on OpenSolaris.org, building the unstable bits is a bit unweildy - at least more so than building the current stable gate that's used for Nevada. We've got 3 repositories up there at the moment, but my aim is to make the rest of our repositories available, and get the build and gate management scripts to work regardless of whether you've got a sun.com email address.

    Monday Jun 02, 2008

    Here's my roll-up of news on opensolaris.org during May - usual rules apply: if I've missed anything you care about, or have inaccuracies in my posts below, feel free to add comments.

    I didn't manage to keep on top of news as much as I'd have liked during the month, so if the sysadmins notice that server logs on mail.opensolaris.org were a bit busier than usual for a Sunday evening, well, that was probably me digging around for links to post. Anyway enjoy!

    This blog copyright 2009 by timf