http://blogs.sun.com/csg/date/20070817 Friday August 17, 2007

Dan Stole the Subject for My Farewell

I had been thinking of what subject line to use for my farewell post, as my internship has come to an end - back to school. Halfway through last week I picked the perfect farewell subject line. Unfortunately, Dan used the Hitchhiker's Guide reference I'd settled on when he left last Friday. He should be most of the way back to the east coast now (he's driving). I've got a plane to catch early Tuesday morning. I started thinking about Spinal Tap references for the title, but nothing quite fitting came to mind.

My project is pretty much wrapped up for me, the code's cleaned up, and is now getting some wider attention from people other than Steve. And while waiting for feedback, I spent some time looking at a P2 panic that appeared Wednesday morning, which revealed an odd API quirk and an inconsistency in counting the number of CPUs available to a zone. These bugs, and my project RFEs have been transferred to Steve.

I've had a good time here, and I may be back after I graduate in May. Thanks to all the folks I've met here at Sun for showing me a good time, and teaching me a lot. I'm heading back to school having signed the OpenSolaris contributor agreement, so if I have any spare time at school, I may keep working on OpenSolaris while there. In fact one of the projects I passed on initially, which I later found myself wanting was prototyped by two people previously, but needs a bit of work, and was just posted as a new project on opensolaris.org, and I'm thinking of jumping into that.

As soon as I leave, my login account goes away, so this will be the last entry in this blog. I'm [colin at cs dot brown dot edu], or [colin dot s dot gordon at gmail dot com], and I'll keep blogging at http://ahamsandwich.wordpress.com/.



Posted by csg [Personal] ( August 17, 2007 04:19 PM ) Permalink | Comments[0]
http://blogs.sun.com/csg/date/20070801 Wednesday August 01, 2007

Favorite Comments and Constants in OpenSolaris Source

While perusing the code for OpenSolaris, my fellow interns and I have come across some great comments, variable names, and documentation, as any large code base must inevitably have:

Posted by csg [Personal] ( August 01, 2007 02:58 PM ) Permalink | Comments[1]
http://blogs.sun.com/csg/date/20070724 Tuesday July 24, 2007

Layers of Complexity

I've been learning a lot about the layering of software from the kernel through up to userland utilities lately. There are frequently many more layers of indirection than are immediately obvious. Many of the tests I'm running which still have problems have this problem as a result of the side-effect of one test which turns off default binding property of all pools. Normally this is fine to do because pooladm -x; pooladm -d; pooladm -e will clear everything out. Disabling pools wipes out all properties on the default pool and pool_pset, and enabling recreates the properties with default values. When I made disabling a no-op at the kernel level, this prevented these properties from being cleared by the test suite's environment cleanup routines. So a number of tests failed because they couldn't bind to anything. Another more obnoxious side effect of this is that sshd is unable to accept new sessions because it can't find a pool to bind to. Changing the kernel code for pool_set_status() to reinitialize the properties of the default resources should fix things, right? Well, it would be if the sequence of events I assumed occurred was correct. What I assumed initially, without thinking about it, was that the sequence of events would be: Unfortunately, I had been spending too much time in pset.c, where it really is that simple - the psrset command simply calls the function in the syscall layer which carries out the appropriate task. I completely forgot about both libpool and poold. The actual call sequence is: But this is enclosed in a conditional which only does all of this if you are setting the pools state to a different state. So because pools are permanently enabled, when tests cleaned up for themselves to set up a fresh testing environment, libpool saw that pools were already enabled, and didn't restart the pools service. So when the later tests ran which disable pools partway through, the invocation of svcadm saw the pools service was already disabled, and did not execute the additional pooladm -d, so no call was ever made into the kernel to reset these properties. I had not really noticed before how differently userland and kernelspace were written. The complexity in kernelspace tends to be intricacies - locking, complicated invariants and such. Userspace's complexity seems to be largely layering and indirection. Or perhaps that's just what I perceive most because it's so large, and unlike the kernel, where the definition a symbol corresponds to is always clear (making following flow of control with tools like cscope or OpenGrok usually straightforward), it's not always easy to figure out the flow of control between programs in userspace without some preexisting knowledge.

Fixing this cleaned up a lot of inconclusive tests, and a number of tests which were failing because of properties not being reset. After this and fixing a couple tests which made assumptions about processor sets and pools being mutually exclusive, as of this afternoon the pools test tally has improved from that in the last entry to:
Result Total:
        FAIL: 7
        PASS: 705
        UNSUPPORTED: 5
In addition to this, I've cleaned up some code, merged redundant code paths, done the final implementation for the last class of process sets, and fixed several race conditions and deadlocks related to the conversion of pool reference counting to per-thread. And because two of the failures also fail on the gate (it was one the other week, but I've found bugs in several test cases), there are really only 5 failures left. Current builds also hold up to as many runs of the processor set stress test as I've been able to subject it to, without deadlock or panic. Things are looking good.

Posted by csg [Personal] ( July 24, 2007 05:03 PM ) Permalink | Comments[0]
http://blogs.sun.com/csg/date/20070619 Tuesday June 19, 2007

Two Weeks in the Kernel

Steve Lawrence suggested I make some notes about my first couple weeks at Sun, while it was still fresh. This seems to be the most appropriate place. He suggested a couple questions to answer for myself: I was also somewhat surprised, given my previous internship working in filesystems and the interest in filesystems following that experience, I'm a bit surprised that I'm not working on ZFS. Though really, I suppose it's in tune with my other goals for this internship - to try something different. To some degree an internship is to help decide if the work you do there is the sort of work you'd like to do in the long term. I think last summer I reached the conclusion that I enjoy operating systems work - filesystems, kernels and such. I enjoy the problems in those and related areas. So for me this is more about sampling the variety which can be had within that space. And thus far, the Solaris kernel group has been great, and I don't expect that to change.

Posted by csg [Personal] ( June 19, 2007 09:50 PM ) Permalink | Comments[0]