One of the ways we tested the ZFS L2ARC
Today's a very interesting day for storage systems - it's cool to see the Fishworks team are announcing the Sun Storage 7000 series systems: congratulations one and all. Great things are afoot in my opinion, these are fantastic systems.
While I'm not working on storage systems at Sun any more, I do feel an amount of empathy for those guys: I am working on a software appliance [1] in the form of xVM Server, and I can certainly appreciate what it takes to take a perfectly working OpenSolaris install, strip it down to the bare minimum, add stuff to make it shine especially brightly for a given task, and (of particular focus for me at the moment!) get a product out to the market.
That said, in my previous job in the Solaris ZFS test group, I did run into the Fishworks project, and that story might be worth telling. (And if there's rose-coloured glasses coming across in this post, I apologise: I love my current job, as much fun as QE was, it was also pretty grueling at times ;-)
It was coming into October 2007, and PSARC 2007/618 - the addition of L2ARC devices to ZFS was looming. These devices, along with Separate ZFS Intent Log devices (as a pair, affectionately known as ReadZilla and WriteZilla) and their intelligent application in a hybrid storage pool are some of the most exciting things about the products being announced today and I've really been looking forward today's announcement: it always gives me kicks to see Sun technology hit the market when I've been able to contribute to the product personally, even in the small way that I did in this case.
Anyway, Brendan had got in touch with the ZFS test group to see whether we could do anything to help out.
Our job as QE engineers on ZFS was to write and maintain the ZFS test suite. Clearly we needed to update the test suite to work with these new L2ARC devices. We'd done the same thing for slog devices, but in this case, we were looking for test coverage quickly. There was a ton of other work piling up on my plate: Solaris 10 update testing for ZFS, the Newboot Sparc work for Nevada, test sponsor duties for the fingerprint authentiction project, on top of all the other daily stuff going on. Busy busy.
So, I started hacking about to see how quickly I could get us a very general set of tests on the L2ARC. The answer? Pretty quickly indeed.
Rather than start from scratch by coming up with a closed set of assertions about L2ARC devices, discussing those assertions with colleagues, making sure they were carefully worded, before setting about implementing tests to verify each assertion, I decided to just wing it.
Now that's not to say that we shouldn't also go about
writing tests properly, but for a quick fix (in every sense of the word),
I wrote a 90 line shell wrapper around /usr/sbin/zpool
which you can download here, if you're interested.
The wrapper maintained a list of devices that it'd try to add to every zpool created wth the wrapper; creating a pool would use up one device from the list, destroying the pool via the wrapper would return the device to the list. Pretty simple. This gave us a phenomenal amount of testing for free.
We could use this with our existing test suite, and it would add an L2ARC device to every pool. We could test big and small L2ARC devices, ones based on lofi devices backed by files in / tmp or ramdisks (attempting to simulate really fast disks, despite the weird VM hoops we were jumping through - which resulted in great hilarity when run with our somewhat insane stress tests running on really large machines...) and generally give the code a good run through.
The wrapper found a respectable amount of bugs, and was worth it's weight in gold, despite it's lack of formality in terms of the way we usually write tests. I'm not sure if it's still being used by the ZFS QE team, but I was pretty fond of it.
I think one of the reasons why L2ARC was so pleasant to test, was down to it's design. Like the intent log devices, they integrate beautifully into the rest of the system, with very little extra work on behalf of the user: and that usually makes test engineers happy too (or at least lets them concentrate on the underlying feature, rather than having to spent extra time making sure the CLI was working properly)
Of course helping on L2ARC testing wasn't all work - I was lucky enough to make it over to the Bay Area for the first OpenSolaris developer summit that month, and while in town Brendan was kind enough to invite me up to the Fishworks office for a quick chat about the testing, a look around, and a rather excellent burger for lunch. I even got the chance to discover that I'm completely dreadful at Fish-pong, perhaps lacking in the basic grounding of American football, table tennis and volley ball rules that my Irish upbringing just didn't provide - but that's another story.
I never got a chance to test on one of the physical Storage 7000 series boxes themselves, nor play with what looks like one of the snappiest web interfaces I've seen in a long time, instead I was focusing on L2ARC itself, and helping to make sure it was solid enough to integrate into Solaris. However, that same operating system is the very one that underpins these appliances, so in that sense - I'm glad I could help!
[1] although yes, today's announcements are software and hardware - indeed, xVM Server's not much without the right hardware to back it up either..
Yah, so far, the latest zfs test suites now has already support multiple DIY wrappers, based on your wrapper prototype, now the customer is need to add its own wrapper into <ZFS_TEST_SUITES>/bin/ directory, you could see there're zpool_smi, zpool_crypto, zfs_crypto there. Certainly people could add more wrappers if they like, just follow the style mentioned above.
The usage now changed a bit to invoke the wrapper,
$ stf_configure -c DISKS="<disks>" -c WRAPPER="<wrapper_list>"
wrapper_list could separate by comma for multiple wrappers be used,
DIY wrapper could also be provided, for example,
wrapper_list: "crypt,smi" -> All the pool testing will created by SMI lable and ZFS-crypto support.
Certainly if without any wrapper specified, the default configuration will still be EFI label, without ZFS-crypto...
Posted by Robin Guo on November 11, 2008 at 04:48 AM GMT #