It Must Be Time for Tea

Mike Kupfer's Weblog


20091125 Wednesday November 25, 2009

Signed Crypto Gets Its Own Tarball

The OS/Net (ON) component of OpenSolaris has some closed-source code. The binaries for this code (well, the binaries that are redistributable) are made available to non-Sun developers in the form of a compressed tar file, which the build tools incorporate into BFU archives or packages. These closed-bins tarballs also contain binaries for open-source cryptographic code. To satisfy US government regulations, the OpenSolaris cryptography framework requires that certain crypto binaries be signed. Most external developers don't have the necessary key and certificate to sign their binaries, so we provide a working set for them.

This setup has worked okay since the launch of OpenSolaris in 2005, but it's got a couple problems. First, bindrop, the script that puts the crypto binaries into the closed-bins tarball, works off a hard-coded list. As with any manually-maintained list, this introduces a risk that it will not be updated when a new crypto module is added to the system. Second, bindrop gets the crypto binaries by extracting them from the SVR4-format packages that are generated from the ON gate. With the upcoming move to IPS, those packages will go away, and it will be much harder to extract the binaries from the IPS packages that will replace them.

So I'm working on changes to the way we deliver the signed crypto binaries. First, we'll be splitting the crypto out into its own tarball. This gives us more flexibility about when we deliver the crypto binaries. Second, instead of using a hard-coded list, we'll scan the proto area, which is the staging area before files are packaged. Any properly signed binaries will be included in the crypto tarball. If you're interested, CR 6855998 has more details.

The code for this has been written, though it still needs a little more polishing, like making sure that error messages are handled correctly. I'm hoping to get this into build 130, but it might slip into 131.

(2009-11-25 11:28:48.0) Permalink Comments [0]

20091104 Wednesday November 04, 2009

OSCON 2009: FreeBSD

The last "people" talk that I went to at OSCON was Marshall Kirk McKusick's talk "Building and Running an Open-Source Community". I was interested in this talk for a couple reasons. First, I don't know much about how the BSD communities work, and I'm always interested in how large open-source communities do things. Second, I was at Berkeley during the time of the Computer Systems Research Group (CSRG). And while I got to know some of the CSRG staff, I was not directly involved with the development of Berkeley Unix. So I wanted to find out more about what was going on while I was there.

Kirk mentioned that the CSRG started up in the 1970s, after Bill Joy was already at Berkeley. At first, they didn't use a source code control system. Then around 1980 they started using SCCS. There are various reasons for using a source code control system, such as making it easy to review changes if a regression is discovered. For the CSRG, introducing SCCS enabled better productivity for the CSRG staff. Although they still reviewed all the changes that were checked in prior to a release, they could hand off some of the mechanics, such as merging patches and testing, to trusted committers.

This basic structure, with a core team, a group of committers, and a group of developers, is still used today for FreeBSD development. Kirk mentioned a couple details that I thought were interesting. In particular, he said that most developers don't want to be committers. This is usually because they don't want to be involved that much; they just have a change or two that they want to see made. Kirk also mentioned that committers are held to higher standards for things like email etiquette. And all changes must be reviewed by at least one other committer.

The FreeBSD core team is 9 people who are nominated from and elected by the committers every 2 years. They maintain the FreeBSD roadmap, they resolve conflicts between committers, and they admit and remove committers.

Kirk pointed out that people can contribute to FreeBSD in ways other than writing code. They can write documentation, they can do testing, they can do release engineering, and so forth. These people can be committers, too, and there's no relative weighting between code committers and other committers. This also means that these other folks can be elected to the core team. In fact, the latest election brought an advocacy/marketing committer onto the core team. At the time of the talk, there were 390 committers and around 6,000 developers.

FreeBSD does a stable .0 release every couple years. The stable branch has a 5-year lifetime and a binary compatibility guarantee. Minor (dot) releases happen on the stable branch every 6 months or so. Development happens on the trunk, and important bug fixes are merged into the stable branch. They use Subversion and provide a CVS mirror.

The pre-release freeze times vary: for a new stable branch it's about a month, and for a dot release it's about a week. I'm not sure how to compare those times with the times for OpenSolaris. For example, OpenSolaris freezes are gradual: first there's a period where only bug fixes are allowed in (no new features), then there's a period where only fixes for stopper bugs are allowed in. I wish I'd thought to ask Kirk for more details on how FreeBSD manages freezes.

One of the ways the BSD license is different from the GPL or the CDDL is that it lets people make proprietary changes to the code. Kirk said that this does happen, but those changes are usually specific to a particular product. Because they aren't generally interesting, the FreeBSD project probably wouldn't take the changes even if they were offered.

(2009-11-04 15:36:59.0) Permalink Comments [0]

20091010 Saturday October 10, 2009

OSCON 2009: Building Belonging

Ubuntu is widely known for its community-building efforts, so I was very interested to hear what Jono Bacon, the Ubuntu community manager, had to say at OSCON.

Jono suggested that people want to be in a community (any community) because it gives them a sense of belonging. This makes sense to me--we are, after all, social creatures. And Ubuntu seems to do a good job at this. Jono showed a photo from the recent Ubuntu Developer Summit in Prague, in which all the volunteers were highlighted. I didn't have time to count, but it appeared to me that over half the people there were volunteers, not Canonical employees.

So how does a community foster this sense of belonging? Jono had several suggestions. One was to treat all contributions as gifts. That doesn't mean accepting every contribution. But if the contribution can't be accepted for some reason, be constructive and gentle in your feedback.

Jono suggested some ways that structure can help foster belonging. For example, Ubuntu structures everything in teams, even user groups. This makes everything simple and uniform: to get involved, find a team that you want to join. And because the teams are smaller than the community as a whole, it's easier to get started and make connections.

Structure can also be used to encourage socializing. By this Jono didn't mean the structured team-building exercises that one sees in the corporate world. Rather, the idea is to budget time just for chatting, sharing stories, etc. Besides encouraging social bonding, this promotes information exchange that might not happen otherwise.

Jono also noted the importance of the virtual environment. First, he drew an analogy with physical neighborhoods. People are more inclined to hang around someplace that is kept-up, safe, and inviting. Second, the virtual environment can help build belonging by making contributions (and contributors) visible.

Community members can reinforce a sense of belonging by framing questions or issues in a way that leads to progress. Jono's example was to change the thought "this sucks" to "I won't live life this way".

(2009-10-10 11:46:21.0) Permalink

SCM Mounts: Done (Almost)

I've finished the workaround for the sshd privileges issue. I ended up writing a simple setuid C program so that our PAM module could unmount the loopback filesystems. I had been using an RBAC-based approach, but that requires that the user own the mount point for each loopback mount. The more I worked on it, the more failure scenarios I ran into because of that requirement. The setuid approach had none of those issues, and it turned out to be much simpler to code than I had been expecting.

So the changes have been committed to the repository for the SCM infrastructure, and the new bits have been deployed on the backup SCM server. The only thing left is to deploy on the primary SCM server.

Unfortunately, this doesn't mean I'll now have time to finish off the OSCON trip report. Instead, I'll be focusing on a change to the way we deliver crypto binaries to ON developers.

(2009-10-10 11:35:07.0) Permalink

20090925 Friday September 25, 2009

Progress with SCM Mounts

I've been busy implementing a workaround to the sshd privileges issue that I mentioned a couple months ago. Unfortunately, this has meant some delays in my OSCON trip report, but I hope to post the next installment of that sometime in the next week.

Besides implementing the actual workaround, I've been beefing up the scripts that we use for setting up a test environment. Even though we have a backup production system that I could use for testing, it's safer to test a change of this size on a separate system. I sleep a lot better knowing that if I break something during development, I definitely won't inconvenience users.

I've also been learning about the privilege-management facilities that are available in (Open)Solaris. We had some problems finding a concise but sufficiently detailed writeup of how the mount/unmount privileges work. While the information is present in the umount(2) man page, a much clearer explanation is given in the output from "ppriv -lv".

(2009-09-25 14:32:59.0) Permalink

20090905 Saturday September 05, 2009

OSCON 2009: No Unicorns

Between the opensolaris.org site migration, a code review for Darren Moffat [1], and catching a cold, I'm still having to write my OSCON 2009 trip report in dribs and drabs.

The next "people" talk on my list was a talk by Rolf Skyberg of eBay called "There Are No Unicorns: And Other Lessons Learned While Running an Innovation Team". Having worked in a few research settings, but not being a researcher myself, I was hoping the talk would give me a better understanding of how innovation happens. And Rolf did have some things to say here. In particular, he pointed out that innovation is by nature disruptive, which can lead to resistance. So be sure to focus on the cultural changes that will be needed for your innovation to be successful, don't just focus on the technology or specific product.

The bulk of the talk seemed more about navigating corporate politics. Rolf's points included the importance of knowing how your work and your team's work will be evaluated, and how your project fits into the larger organizational picture. Understanding how your project fits in is particularly important when times are tough. If your project isn't in a core area for your organization, and it doesn't have someone high-up to defend it, it's more likely to get cancelled when funding runs low.

Rolf also talked about how traumatic events in a company's history can affect its behavior for a long time. In 1998 eBay was off-line for 3 days, which led to a very strong emphasis in the company on reliability and scalability. But, Rolf argued, the outage also led to an inflexible organization that has a hard time incorporating new innovations.

I've seen this never-again effect in other contexts. When I was working for Xerox, I wanted to look at the source code for Pilot, which was the operating system we were using. I was told I couldn't do that, so I asked why. I was told that sometime in the past, some application code had been written that depended on an interface that was private to Pilot. I doubt that this was accidental usage, because all this stuff was written in Mesa, which made it easy to keep public interfaces separate from private ones. Anyway, in the normal course of development, the Pilot group made some incompatible changes to that interface, which broke the application--right before the application was about to ship. The Pilot group wanted the application programmers to fix their code, even if it meant a schedule slip. But they were overruled by management and forced to revert the interface change. They vowed that they would not let this happen again, and from then on they kept their source squirreled away on private servers, accessible only to people in the group.

I've also seen this effect at Sun. I started at Sun in 1992, not too long after Solaris 2.0 had shipped. Solaris 2.0 introduced a lot of changes that were incompatible with SunOS 4.x. This caused considerable disruption for customers, and I gather that many were not shy in voicing their displeasure. I remember being told that all the pain around Solaris 2.0 was a big factor in the decision not to do any more major (.0) releases of Solaris.

Of course, that policy has born considerable fruit over the years. The ISVs that I've spoken with have praised Solaris because new Solaris releases rarely broke their code. Other platforms that their code ran on tended to cause breakage pretty regularly.

On the other hand, this compatibility doesn't come for free. Project teams must design new interfaces carefully, so that they can evolve in a compatible fashion. The interface review and cross-checking that is done by the ARCs is largely manual. Standard (cross-platform) interfaces are not always stable, so project teams must sometimes do extra engineering to support the new interface without breaking code that depended on the old interface. There is a mechanism for removing old, obsolete code, but again, this requires supporting the old and new interfaces for some time.

Rolf compared companies to creatures when he talked about the effect of traumatic events. That seems to imply some sort of organizational memory of the traumas. But I wonder if that's really how it works. Is it really an organizational memory, or is it more the memory of a few key individuals? If it's the memory of key individuals, then I think it's not until these individuals move on (or relax around the trauma) that the never-again imperative starts to lose its hold.


[1] Darren is removing 92,000 lines of closed-source code--the obsolete smartcard support--from ON.

(2009-09-05 15:39:47.0) Permalink Comments [1]

20090822 Saturday August 22, 2009

Still Reviewing Web Pages

I'm still reviewing web pages in preparation for the migration of opensolaris.org to XWiki. So far I've finished reviewing the ON Developer Reference and the SCM-related pages in the Tools web space.

The only thing left for (my) review is the SCM Migration project web space. Since that project is no longer active, I don't plan to look too carefully. But a sanity check does seem in order, and maybe there are some obsolete pages or attachments that we can delete.

The ON Developer Reference (DevRef) is a particularly tough case for the migration software because of its extensive use of anchored links. I had been planning to retire the XML (Docbook) source that the DevRef currently uses, and keep everything in XWiki markup, but I'm not looking forward to fixing all the cross-references. So I'm having second thoughts about that strategy.

(2009-08-22 16:00:16.0) Permalink Comments [2]

20090816 Sunday August 16, 2009

OpenSolaris.org Moving to XWiki

I'm afraid I haven't made any progress on my OSCON trip report this week. We've started the beta testing for migrating opensolaris.org from the current portal application to XWiki. I've been reviewing the ON Developer Reference, and it's taken more of my time than I had expected. (In fact, I'm still not done.)

If you're a community or project leader, please do take the time to review the pages that you're responsible for. Some issues will be easier to fix if they are identified before the migration. And the migration team needs user feedback to help identify which issues cause the most trouble.

If you just have a question, you can ask on the website-discuss list (at opensolaris.org). If you're sure you've found a bug, either in the migration code or in the new XWiki-based site, go to defect.opensolaris.org and file a bug under Development: product=website, component=site-wiki.

(2009-08-16 11:57:17.0) Permalink

20090807 Friday August 07, 2009

OSCON 2009: Lying and Geniuses

Alas, it's taking me much longer to write up my OSCON trip report than I had planned. Part of that is that I've been busy with other things, like resurrecting a testbed for the SCM framework, so that I can start on the workaround for the "unwanted mounts" issue. And part of it is that it's just taking me longer to do the writing, despite already having an outline. So since I have finished my comments on a couple talks, I'll go ahead and post those now.

My favorite session at OSCON was "How to Lie Like a Geek" by Michael Schwern, which was a lively talk about things we can do to help or hinder communication. Most of us are familiar with technical lying, particularly bad benchmarking and abuse of statistics. Michael recommended shootout.alioth.debian.org as a place to get good benchmarks.

Michael also suggested several ways to "lie" that are more generic. Many of these had to do with getting so caught up in details that the big picture is lost. For example, there's lying by information overload: providing too much detail. There's also lying by pedantry or by being excessively literal: focusing on the wrong details.

A fascinating way to mislead is to state the obvious. For example, consider this exchange (which I've adapted from the example Michael used):

Alice: We should change this code.
Bob: But that could introduce bugs.
Alice: Of course. Why are you arguing against making the change?
Bob (annoyed): I'm not, I'm just saying we might introduce bugs. Don't put words in my mouth.

I've seen exchanges like this in the past. At best, there's this unnecessary hiccup in the conversation. At worst, the conversation goes off into the weeds, with endless rounds of claims and counter-claims about who said what and what was really meant.

When I mentioned this example to my wife, she pointed out that people often make implicit requests in their statements. So if Bob says something obvious, he's clearly not offering new information, so it's not surprising for Alice to interpret his comment as a request. If Bob had a different request in mind, like "let's defer that until after the code freeze", it'd be more helpful to say that in the first place.

Another talk that I liked a lot was "Programmer Insecurity and the Genius Myth", by Ben Collins-Sussman and Brian Fitzpatrick. I had to chuckle at this quote, which Ben and Brian said came from the 2008 Google I/O developer conference:

Can you guys make it possible to create open source projects that start out hidden to the world, then get "revealed" when they're ready?

The opensolaris.org site supports hidden projects. And for awhile we created new projects as hidden, so that the project team could get the project page set up before making the project visible. But hiding the project doesn't provide that much benefit. After all, the project space is a work area; it's okay for things to be rough. And we've had a few problems with projects lingering in hidden mode for a long time. So we've moved towards making projects visible from the start.

Of course, it's natural for people to want to get things right before sharing them with the outside world. But there are potential advantages to sharing early. One is that if you're going down the wrong path, early sharing improves the odds that you'll discover the problem quickly. And if you have to explore a couple dead-ends before you come up with something that works, the information from those explorations is available for others to learn from.

And then there's the "bus factor", which refers to the number of people who would have to all be run over by a bus to stop the project. Working in the open--with archived discussions, publicly visible code, any plans and documentation that you've written down--makes it easier to recover if a team member is unavailable for whatever reason.

Ben and Brian did point out that it is possible to share too early. The project needs to be far enough along that it won't get stalled when outsiders show up and start asking questions and suggesting changes.

But since working in the open like this can be challenging, what are things we can do to encourage it? Ben and Brian made several suggestions, both social and technical.

On the social side, the first thing to remember is not to let your ego get tied up in your design or code. You've probably heard that before, but I think it bears repeating. When someone points out a problem in my design or code, I try to think of it as an opportunity to improve something that I care about, and as an opportunity to learn. But that's often not my initial, automatic response; it takes some effort.

Even if you don't expect your project to benefit from outsiders' comments, engaging in a conversation can have benefits. As Ben and Brian put it, if you want to influence people, you need to be open to influence.

On the technical side, consider what behavior your tools are encouraging. For example, the current opensolaris.org portal doesn't keep page histories, which discourages its use for collaborative writing and editing. That's one of the things that will be fixed by the move to XWiki in September.

Ben and Brian also recommended responding to questions and arguments on the project web site, rather than by email. With that approach, they said, the discussion is less likely to degenerate into pointless argumentation. I don't recall them saying why they think this works. I suppose one reason is that it helps keep arguments from being repeated.

(2009-08-07 11:13:30.0) Permalink Comments [1]

20090729 Wednesday July 29, 2009

OSCON 2009

The O'Reilly Open Source Convention (OSCON) was at the San Jose Convention Center this year. It's been in Portland in the past, and while I like Portland, the additional expenses for air fare and hotel probably would have meant staying home this year. So I'm glad it was in San Jose. We'll see where O'Reilly decides to hold it next year. At the feedback session after the closing keynote, there seemed to be quite a few people who would like the conference to return to Portland.

Lunches on Wednesday and Thursday were catered. At my last OSCON (2006) the conference provided basic box lunches; nothing special. The lunches this year were impressive, with a variety of well-prepared dishes.

I attended several talks, which I've organized into 2 categories: people talks and technology talks. The people talks covered things like communication skills, community-building, etc. The technology talks that I went to covered topics that I wanted to learn more about, either for use at work or for personal projects. I also went to a couple BOFS and the closing keynote session. In order to make the writeups more digestible, I'll cover the BOFS and keynote here, with separate postings over the next few days for the "people" and "technology" sessions.

BOFS

I went to Brian Nitz's SourceJuicer BOF session Wednesday night. Alas, only a few people attended, mostly from Sun.

Brian gave a short presentation and demo. This helped fill in some of the holes in my knowledge of SourceJuicer, like how Pkgfactory and Robo-Porter fit into the picture. (Pkgfactory is an automated mechanism that feeds into SourceJuicer. Robo-Porter is a component of Pkgfactory.)

Individuals can contribute spec files to SourceJuicer, though apparently the tags aren't quite the same as they are for RPM spec files. Builds are done in a freshly-created zone which has a minimal build environment. Thus the spec file must list build dependencies (as well as runtime dependencies).

Thursday night I went to the Silicon Valley OpenSolaris Users Group (SVOSUG) meeting, which was relocated to OSCON for this month. John Weeks demoed a couple toys that he has built and talked about what went into making them. John Plocher also demoed his programmable xylophone. I always find this sort of presentation fascinating, even though I've never built anything similar myself.

Closing Keynote

A few things struck me from closing keynote address, which was given by Jim Zemlin of the Linux Foundation.

The first thing that grabbed my attention was how Microsoft's attitude towards open source and the GPL have changed over the years. Jim had a Microsoft quote from a few years ago about how open source and the GPL are just horrible (a threat to business, if I remember correctly). But this year, Microsoft is releasing some code under the GPL because that's what customers want.

The second thing was Jim's discussion about the introduction of netbooks and the changes that he predicts for the PC business ecosystem. In particular, Jim predicts that wireless service providers will be making discounted PCs available, just as they make discounted cell phones available today. If I understand his argument correctly, end users will focus more on the applications and services that they get from the wireless provider, and less on the underlying operating system. And the platform - hardware plus operating system - providers will be under pressure to make their platforms attractive to the wireless service providers (cheap, good functionality). The resulting competitive pressure should improve the opportunities for operating systems other than Windows (Jim was focusing on Linux, of course).

I suppose this could happen; I don't know the PC business well enough to have an opinion. It'll certainly be interesting to watch. And it'll be interesting to see whether these subsidized netbooks are treated more like computers or like phones. Linux is already bundled on some netbooks, and from what I've read, users can run into problems if they upgrade from someone other than the netbook provider. And Jim mentioned that the Linux Foundation has been getting a lot more phone calls in the last 6 months. Some of them are from people offering kudos, some are like the one he played for us, which was from a guy who needed technical support. Compare that with my cell phone: I know who made the hardware, but I have no idea what OS it's running.

The third item was a discussion about software patents and the danger they present to the open source movement. This was mostly old news to me, but Jim mentioned something I hadn't heard about before: DefensivePublications.org. One of the problems with the current patent system, at least in the USA, is getting information recorded so that it is counted as prior art. Filing defensive patents is a pain, and the US patent office doesn't keep up on the zillions of articles that are published at academic conferences and in trade magazines. You can challenge a patent after it's been granted, but that's a pain, too. But now there's an organization dedicated to collecting technical disclosures and publishing it in a prior art database that the patent office will check. Very cool.

(2009-07-29 17:08:36.0) Permalink

20090721 Tuesday July 21, 2009

Unwanted Mounts

As described in the design document, source code access on opensolaris.org is done via ssh. The user doesn't invoke ssh directly. Rather, the user runs Mercurial (or Subversion), which invokes ssh using its standard processing for ssh URLs. Once connected to the server, a custom restricted shell invokes the server-side program. This is all done in a chroot environment, with loopback mounts providing access to only those repositories that the user has write access to.

The loopback mounts are created when the user logs in, and they are torn down when the source code management (SCM) operation completes. This is done by way of a custom PAM module. As part of the session's "open" processing, the module determines what repositories to grant access to, and it establishes those mount points. As part of the session's "close" processing, it removes those mount points.

We recently noticed that the loopback mounts were not getting unmounted. This causes a couple problems. One is that thousands of unused loopback mounts accumulate on the server. If nothing else, this makes life more difficult for administrators.

The lingering mounts can also lead to a denial of service problem, which we've witnessed a few times. The problem occurs if a repository is deleted and recreated while there is still a loopback mount for it. Future references to the loopback mount will fail with an error. This can interfere with the setup of a user's loopback mounts in a subsequent login, resulting in a situation where users are unable to access recently created repositories. Worse, attempts to unmount the broken loopback mount fail, and lofs doesn't support forced unmount. So the only way to recover is to reboot the server.

After the third or so instance of this, we decided to figure out why the loopback mounts were not getting unmounted. Arguments can be passed to a PAM module by putting them after the module name in /etc/pam.conf, and there's a convention to enable debugging output with the argument "debug", e.g.,

other	session requisite	pam_foo.so.1	debug
    

For this to be useful, syslogd needs to be configured to display the debug output. For example, put

auth.debug	/var/adm/auth.log
    

in syslog.conf and utter

# svcadm restart system/system-log

Once we made these two changes, we could see that the session-open routine was running normally, but it didn't look like the session-close routine was getting invoked.

This seemed awfully strange, so we enabled PAM framework debugging with

# touch /etc/pam_debug

(This, too, requires that syslogd be configured to put auth.debug output somewhere accessible.)

This showed that our session-close routine was, in fact, being invoked.

Looking more closely at the session-closed routine, we noticed that it checks what user it is invoked as. If it's not invoked as uid 0, it bails out, before doing any debug logging. Moving the debug logging to come before the uid check confirmed that it was running as the user whose session was ending.

Some Googling revealed a known issue in OpenSSH (from which the Solaris SSH is derived) in which the session-close routine is called as the session's user, not uid 0.

From the comments in the OpenSSH Bugzilla, it looks like a fix is available from upstream, so we're hopeful that we just need to talk to the Sun SSH team about getting the fix into OpenSolaris. We're also looking into possible workarounds, in case the fix can't be pulled in promptly.

Update 2009-09-16

I filed a bug for this: 6869790.

The current status is that the Solaris SSH team is discussing possible fixes, but they haven't come up with a good approach yet. Just reverting the code isn't an option because it would break support for hardware acceleration. And the upstream privilege separation code is different from the code in Solaris, so they can't just use the upstream patch.

(2009-07-21 14:29:21.0) Permalink Comments [2]

20090213 Friday February 13, 2009

OpenSolaris and gnuserv

I installed OpenSolaris 2008.11 on my notebook (a VAIO TX) several weeks ago. I've been tweaking the environment, in preparation for the day when I move to OpenSolaris on my desktop system.

One of the issues that came up was that gnuserv would exit immediately after being started. This meant that every time I wanted an editor (e.g., for a Mercurial commit), I had to wait for a new XEmacs process to start.

I looked around for some sort of error message but couldn't find anything. I finally started XEmacs using truss -f. Looking at the truss output, I saw that gnuserv was looking in /etc/hosts and not finding an entry with the notebook's hostname ("loiosh").

I added "loiosh" to the localhost (127.0.0.1) line, and that fixed the problem.

(2009-02-13 13:49:37.0) Permalink Comments [4]

20090211 Wednesday February 11, 2009

Mercurial pretxn Hook Race

Currently the ON gate (or at least the open source part) is mirrored on opensolaris.org. We were having a discussion the other day about what needs to be done so that we can actually host it on opensolaris.org.

One of the issues that came up is the race in the Mercurial pre-transaction hooks, such as the pretxnchangegroup hook. These hooks let a repository reject pushes that don't meet whatever criteria that the hook has set. For the ON gate, we use it for things like making sure there is an approved RTI for the changegroup.

The problem is that the implementation of these hooks opens up a race condition. The metadata for the changegroups gets written to the repository, then the pre-transaction hook gets run. The advantage of this approach is that the pre-transaction hooks can use existing APIs and code paths when examining the incoming changegroup. But Mercurial repositories are structured so that readers don't need a lock; instead they depend on an atomic update of the top-level metadata. So the disadvantage of this approach is that there's a window during which someone pulling from the gate could get the pending changegroup, even if the hook later rejects it.

This issue is described in Section 10.3 of Bryan O'Sullivan's Mercurial book; it is also issue 1321 in the Mercurial bug tracker. The workaround that the Mercurial book describes is the one that we used for the ON gate: the repository that people push to is write-only. After the pre-transaction hooks have cleared the changegroup, another hook pushes the changegroup to a second clone repository, which developers pull from.

While this approach is functional, it's not esthetically pleasing. And there's a practical problem: the SCM infrastructure on opensolaris.org doesn't support having two repositories tied together like that. I'm sure it could be done, but administration (e.g., updating the access list) would be clumsy, and it might require giving the ON gatekeepers shell access to the opensolaris.org servers (which would not please them or the server administrators).

Fortunately, Matt Mackall has devised a fix for the race condition. The new changegroup will not be visible for pulls until it has passed the pre-transaction hooks. And if I understand correctly, the fix will not require changes to existing hooks, except for the case of Python hooks that spawn subprocesses..

There are other changes that we will probably make before hosting the gate on opensolaris.org. For example, we'll probably change the SCM console (the web interface for managing repositories) so that it scales better for large numbers of committers. But getting a fix for this race condition means we'll have one less issue to deal with.

(2009-02-11 14:48:57.0) Permalink

20081005 Sunday October 05, 2008

Printing opensolaris.org Pages with Recent Builds

Back in August I upgraded my desktop to snv_95. I sometimes print pages from opensolaris.org to read during my commute, but with snv_95 the pages came out pretty much unreadable. They looked like they had been through several fax transmissions, with blotchy, almost indecipherable characters. At the time I chalked it up to known issues with fonts and went back to running an earlier build (thank you, Live Upgrade).

I revisited the issue last week, after noticing that the headers and footers from Firefox looked okay. It was just the main text that was messed up. I checked my preferences (Content>Fonts&Colors>Advanced)--the checkbox "Allow pages to choose their own fonts" was enabled. I disabled it and tried again, and now the printed pages are legible.

(2008-10-05 14:52:26.0) Permalink

20081001 Wednesday October 01, 2008

Using the MH date2local function

Bill Janssen recently suggested on the MH-E developers list a different format for showing the contents of a folder. Normally MH-E just shows the month and day, e.g., 09/17, for the date. One of Bill's suggestions was that if the message is less than a day old, MH-E could show the time instead, e.g., 13:45.

I've been happy just seeing the date, but I could see how showing the time might be useful. I did have one concern, which was whether MH (which is what MH-E runs on top of) would use the message's timezone or my local timezone. I routinely get email from all over the USA, plus the United Kingdom, China, India, Australia, and Japan. To get an accurate sense of when the emails were sent--for the timestamps to be useful--I'd want to see them all in my timezone.

Unfortunately, some experimentation showed me that Bill's patch used the sender's timezone.

But poking around in the mh-format man page showed a date2local function that would convert the displayed time to my local timezone. Ah, just what I wanted.

I first tried using it in something like

%02(hour(date2local{date})):%02(min(date2local{date}))

but that produced error messages. Looking more closely at the man page showed me that date2local works by side-effect; it doesn't return an updated date string. Okay, so how do you use it? I couldn't figure it out from the man page, and Googling for date2local didn't show any usage examples.

But I did get some Google hits that reminded me of the sample format files that typically ship with MH. After staring at them for a bit, I tried

%(void(date2local{date}))%02(hour{date}):%02(min{date})

and that worked.

(2008-10-01 16:05:22.0) Permalink

Calendar

« November 2009
SunMonTueWedThuFriSat
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today

RSS Feeds

XML
All
/General
/OpenSolaris
/Solaris

Search

Links




Navigation



Referers

Today's Page Hits: 94