Tuesday June 14, 2005 | « June 2005 » | ||||||
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
2 | 3 | 4 | ||||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 15 | 16 | 17 | 18 | |
19 | 20 | 21 | 23 | 24 | 25 | |
26 | 27 | 28 | 29 | 30 | ||
| Today | ||||||
Links
- smf(5) blogs
- Jonathan Adams
- Liane Praza
- Stephen Hahn
- Steve Peng
- Visual Panels
- Jaime
- Steve Talley
- Tony Nguyen
Today's Page Hits: 5
Simplified Pseudo-Filesystem Implementation
A common complaint we hear from Linux users who try Solaris for the first time is that our /proc sucks. "Sucks" in this case usually refers to either the fact that you can't cat files under /proc and get text out, or the fact that we don't have things like /proc/pci. The former is a flamewar for another day; right now I want to drill down on the latter.
The idea behind the Solaris /proc is simple: export information about the process model (hence its name). We try not to put other stuff there. This begs the question: "Where does other stuff go?". Some functionality is available from libraries or devices, other functionality might not be available at all. But some functionality may be best suited for a hierarchical namespace like... a filesystem.
To satisfy the need for pseudofilesystems interfaces and the desire for consistent nomenclature, we've introduced a /system directory in Solaris 10. /system is intended to contain mount points for file systems which export non-process system information. Under /system we currently mount two new filesystems, the contract filesystem (ctfs(7FS), at /system/contract) and the kernel object filesytem (objfs(7FS), at /system/object).
Now that we have a home for this kind of thing, and now that OpenSolaris is open for business, there are many opportunities for the industrious to fortify Solaris with the filesystem conveniences they desire. Of course, just because there's an opportunity doesn't mean it's easy. There's a lot of work involved in interfacing with the parts of the kernel you want to interact with, and a lot of work involved in writing the FS glue. The latter is actually the insidious part; if you've spent any time perusing the source under usr/src/uts/common/fs, you've probably noticed that most of the filesystems are copy-and-pasted from each other and often reimplement a lot of complex algorithms. I consider this a badge of shame, but the practice dates back to the earliest days of the OS (in other words, there's no-one left to pin this badge on).
One of the projects I worked on in Solaris 10 was SMF, for which I was responsible for creating a process tracking mechanism. The result was process contracts, and we decided that the most appropriate interface for process contracts was a fileystem.1
Having spent a lot of time reading filesystem code, I was familiar with the mistakes of the past and for the sanity of my successors was determined not to repeat them. To make a long, uninteresting story short, I ultimately created a library of abstractions called gfs which I used when implementing ctfs. With these abstractions I was able to tuck away a lot of the complexity of implementing common fs entry points such as VOP_READDIR, leaving behind a relatively simple filesystem-specific implementation. When Eric later wrote the objfs filesystem, he spent a lot of time refining the gfs interfaces to be even more streamlined.
The end result is that it is now pretty darn easy to create pseudo filesystems on Solaris. For a rough before/after comparison, check out this behemoth, weighing in at over 70 lines of grungy C code. I hope we never have to write something like it again. The equivalent from ctfs is much cleaner:
static int
ctfs_tdir_do_readdir(vnode_t *vp, struct dirent64 *dp, int *eofp,
offset_t *offp, offset_t *nextp, void *data)
{
uint64_t zuniqid;
ctid_t next;
ct_type_t *ty = ct_types[gfs_file_index(vp)];
zuniqid = VTOZ(vp)->zone_uniqid;
next = contract_type_lookup(ty, zuniqid, *offp);
if (next == -1) {
*eofp = 1;
return (0);
}
dp->d_ino = CTFS_INO_CT_DIR(next);
numtos(next, dp->d_name);
*offp = next;
*nextp = next + 1;
return (0);
}
As you can see, there's little boilerplate. In fact, the function's body is almost completely specific to reading a ctfs template directory2.
The next step is obvious: we need to take some time and re-factor existing filesystems to use these interfaces wherever possible. As part of my initial putback I also reimplemented parts3 of /proc to use them4. A more thorough eye needs to be turned to /proc to finish the job (and perhaps elevate its low-level gfs usage to complete gfs management5), and things like fdfs have little reason to be ignored.
What have we learned here?
- /system is our home for new pseudo filesystems.
- Copying old code is a bad idea.
- Factoring code is a good idea (and fun, because you usually get to delete the aforementioned old code).
- ctfs and objfs are good examples to follow when writing your spiffy new pseudo filesystem.
- I like footnotes.
We've made other refinements to our file system implementation in Solaris 10. Things like the new file system interfaces (which replace the crusty, fixed-length vnode and vfs definitions with an opaque structure initialized with a variable length parameter list) have done much to improve the sanity of filesystem developers. I encourage you to explore our filesystem code; perhaps you'll find a new class of improvements we should make (or even make them yourself!).
Footnotes:
1I'm not sure this was the wisest choice, but it seems to have worked out all right.
2"But what is a template directory?" you may be asking yourself. I'll elaborate on contracts another day.
3At the time I was already modifying a bajillion popular kernel files, and wasn't eager to add more to the list. To avoid unnecessarily complicating my merge process, I only rewrote those few functions which didn't require coordinated changes in other files.
4I had already added a bunch of code for some contract-related entries in /proc, and wanted to "restore balance to the force" by removing an equivalent amount of code. Perhaps surprisingly, we tend to get more excited about an opportunity to delete code than an opportunity to add more...
5See the comment at the top of gfs.c.
(2005-06-14 16:15:00.0) Permalink Comments [2]Alas, this comes a week late.
Dimas demos Looking Glass
Charlie talks about NetBeans
Eric talks to some new CS students
FISL continued to be a fantastic experience. While on the first day (ignoring our formal morning session) we pretty much just hung around and talked with whomever we could lure to our booth with the promise of free t-shirts, DVDs, and candy1 (not to mention software), we tried to make the second and following days a little more structured. There is a lot of cool stuff in Solaris, and we wanted to make sure that some of the lesser known technology saw the light of day. Though we quickly discovered we could have spent the entire conference demoing Project Looking Glass and DTrace, or chatting about OpenSolaris, we felt obligated to try to break the popularity feedback loop.
We broke the last three days down into hour-long segments, and assigned topics to each. For each topic we had2 a short presentation we expected to consume only about 15 minutes, and left the rest of the time open for questions and discussion. This of course varied from subject to subject (e.g. DTrace was demo-heavy, OpenSolaris was more interactive), but the idea usually held. We posted the next two hours of topics on a side outside our booth and waited. Sometimes people would show up, sometimes not. While it may have been the case that the Fault Management Architecture and the Service Management Facility topics simply were poorly scheduled against some of the larger conference presentations, I personally think they suffered most from the lack of sexy names (as an engineer, I have a subconscious aversion to anything having to do with "Management"). Certainly the people who attended each found them interesting enough to stay through our presentations and ask a lot of good questions; perhaps all it would have taken to draw a larger initial crowd would be if we had done some spontaneous marketeering. We'll have to try "The Hardware Bouncer" and "SMF: Booty Guru" next year.
The other topics were more successful. Talking about Open Solaris was, of course, a lot of fun. Since I didn't get too many questions on the development mechanics part of my pre-session talk, I left that part out of my booth talk. I don't think it was missed (which is a little disappointing, but also understandable -- it's hard enough for a native English speaker to follow the volume of terminology and acronyms introduced). My only awkward moment during the conference had to be when, in my first Open Solaris booth talk, I tried to illustrate the possibilities of what the community could do with Open Solaris that Sun would not. Needing an appliance to use as an example, I quickly chose TiVo. While very recognizable in the US, I can't imagine they exist at all in Brazil. Upon seeing a sea of blank faces, I quickly backpedalled to the generic "personal video recorder" which, judging by the unchanged gaze of the crowd, was still a lost cause. Then -- and I have no idea how I got there -- I said "or a car". Few people design software for their cars, and those who do so for a living apparently have a hard time of it. That's when I decided to cut my losses and move on to my next bullet. The following day I ditched the appliance idea entirely and instead used the example of a MIPS port which was easily understood by all.
Our booth was open and inviting, and spacious enough to allow multiple large simultaneous conversations. Unfortunately, this meant it was a difficult for one unmiced person to fill the space, especially with the high ceiling catching a lot of the ambient noise in the hall. Miraculously we all seemed to escape with our voices intact. We had a well-stocked fridge in our booth; after some experimentation I found that soda was much more effective at soothing my throat than plain water. Much to my chagrin, the local staff quickly (after only my second can) dubbed me "Coke Boy". It was certainly a misplaced nickname, as I very seldom drink soda back in the states3. That said, I found the taste of the local soda GuaranĂ¡ very appealing; it's a shame it isn't common in the US (assuming you can get it at all -- something I need to check the next time I'm at BevMo).
CodeBreakers and Solaris
The high point of the conference was, without a doubt, the people we met. Going to an open source conference as emissaries from a not-yet-open-source operating system, I was concerned we would be facing a surly welcome. To the contrary, almost everyone I talked to seemed more focused on technology than dogma. A few were justifiably skeptical4, but most were eager to see what were about, if not install Solaris as quickly as they could. It is almost inconceivable that the original plan was to bring 10 DVDs; we went through enough of the couple hundred we brought on the first day that we had to carefully ration the remains5. Perhaps the most enthusiastic were Thiago, Iru, and others from CodeBreakers, who are already working to form an OpenSolaris user group in Brazil. Hopefully we'll be able to make it to CONISLI in November to see them again, and the group they have started.
As much fun as the conference was, it was exhausting. We all slept extremely well the last evening.
Footnotes:
1As Eric quickly discovered, the metallic wrappers around the chocolate candies would make a silvery mess of anyone's hands who ate enough of them. This was not intentional.
2Often written the night before. Or sometimes just a few hours before. Interestingly, the more impromptu the materials, the better they seemed to be. It shouldn't be too surprising; this let us tailor the content to our audience, which is always a recipe for more effective communication.
3 I suppose it's better than "Crack Baby".
4 While Sun has a long history of contributing open source, Solaris is unlike anything we, or anyone else, have done before.
5 Which for me meant: give one to anyone who came up to me and asked.
(2005-06-14 02:00:00.0) Permalink Comments [0]