A method to the madness

OpenJDK The dust is still settling from this morning's announcement. Aside from the story being picked up by every tech news site out there, it also unleashed a blogging storm on both blogs.sun.com, and the rest of the blogosphere. Yes I'm talking about Sun open sourcing Java (under the GPL no less).

Not to be left out, let me just say that this is absolutely the right move and will prove to be a win-win-win for Sun, the Java community, and the entire open source community (all of which are not mutually exclusive). Personally, I think the biggest gain is summed up in this piece from Jonathan's blog:

The GPL is the same license used to manage the evolution of GNU/Linux - in choosing the GPL, we've opened the door to comingling the communities, and the code itself.
The whole idea of Java grew out of the "write once, run anywhere" mentality. However, when the platform used to run applications is a closed, proprietary system, conflicts of interest crop up everywhere. Granted, the JVM was always free, and the source code was never really secret, but licensing incompatibilities restricted some groups (such as many Linux distributions) from bundling Java as part of its system. Thus there was a wall separating two otherwise completely compatible communities.

Today Sun has taken down that wall. It is now up to the open source community to explore the potential benefits that can be gained.

This blog has been up for less than a week and I've already been hit by my first comment spammer, Mr. "xvsasd"!. Is that even possible? I suppose since all Sun blogs are automatically syndicated to the blogs.sun.com feed, that's how links to my posts are being picked up. I guess I'll be manually moderating my comments for now. *sigh*

A few months back, I took on co-system administration duties for one of our groups most critical data and application servers. It provides several important services, including housing our back-end test harness database, maintaining the Bugzilla database for Sun's Secure Application Switches, and acting as the main execution server for our test automation. Until recently, it was cranking away on a five year old system running an ancient version of Red Hat (7.1 I believe). In desperate need of an upgrade, we got a hold of an X4200 to replace it.

The X4200 class of servers is certified to run Solaris, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and various editions of Microsoft Windows Server. Flying in the face of logic, we chose to install Debian stable (along with the custom kernel compiling that was involved to provide support for the X4200's SAS disks). There are a lot of advantages to running a Debian system, most notable its superior apt-get package management system. After some initial aggravation setting it up, it has performed with minimal complaints since.

Our situation is somewhat unique, however. We have just one system to support with only a small set of functions. Most corporations, though, would not choose to buy racks of expensive Sun hardware and then run an unsupported operating system on them. Those who want to take advantage of the benefits of a Debian-based distribution are on their own. It appears, though, that this may soon change. According to reports, Sun and Canonical are ready to announce plans to certify Ubuntu (a Debian derivative) on Sun's x86 servers (including X4200's).

Yet another step in the right direction.

One feature that is sorely lacking in our test automation harness is the ability to share and lock resources between competing groups of tests. Consider this situation: Our group is responsible for testing some of the systems in Sun's X64 family of servers. In particular, much of our focus is on the Sun Blade 8000 (code named Andromeda). Given two indisputable facts:

  1. These systems are expensive, and thus there are not an infinite supply for us to test.
  2. There are many different components to one system, some of which can be tested independently.
It is possible (and in fact very likely) that we might try to run competing groups of tests on the same hardware at the same time. The reality is that we've run into this problem a lot more than we care for, and our current test automation harness provides no mechanism to ensure that separate groups of tests do not collide in this manner.

Thus we are faced with a somewhat classic resource sharing problem, but with a unique twist: some resources (such as an Andromeda chassis) might have sub-resources which can be locked and used independently and without interference. Therefore, a resource can only be locked for use if it and all of its sub-resources are available. Our specific problem details can be described as follows:

  • Resources are structured in a hierarchical format as just described.
  • The resource hierarchy is stored in a centralized SQL database.
  • Each logical group of automated tests defines a set of resources that must be locked before execution can begin.
  • Each group of automated tests is run by an execution daemon. There may be multiple execution daemons running on possibly different servers which will be attempting to lock the same resources in the centralized database before executing the tests.
One of the considerations in coming up with a resource sharing policy for this situation is that we should strive for simplicity over any optimizations made for speed or to maximize concurrent execution. While both of these benefits would be nice, the number one priority is to ensure a consistent, well-known, and repeatable test environment so that the test results are not invalidated by external interference. Any complexity that we add to the design introduces additional possibilities for infrastructure problems which could create overhead in test maintenance and debugging. Throughput priorities should be secondary which is why I have enforced the policy that all resources required for the entire life of a group of tests (even if it is only required for a short period of time or only a few test cases) must be locked by an execution group before any tests in that group can begin.

It turns out that having a hierarchy of resources does not present a significant problem in coming up with a resource sharing policy. As long as resources and their children are properly ordered per Dijkstra's solution to the infamous dining philosopher's problem, and resource locks are always acquired in the same order, the same basic principles apply. Thus here is my basic locking algorithm presented as a solution to the above described problem:

  • An execution group requires a set of resources, we'll call them R
  • Foreach resource r in R (in alphabetical order)
    • Lock the database tables associated with resources (to avoid race conditions in database access).
    • If r is unlocked AND there is nobody waiting in the queue for r (the queue is maintained in the database), lock it (by assigning your name to the resource as the locker in the database).
      • Recursively lock each child resource of r in alphabetical order.
      • If any children of r are locked, release all children of r (but do NOT release r itself). This will allow any execution groups that require only children of r to execute while we are waiting for resources to be released. After waiting n (10?) minutes, go back to the beginning of the children locking loop and re-attempt to lock the children. This time, do NOT release the child locks if a locked child is encountered, enter into the queue for that child instead and initiate the pseudo-busy waiting process for that resource.
    • Else enter into the queue for resource r and initiate the pseudo-busy waiting process for r.
    • Unlock the database tables associated with resources
  • Once all resources are acquired, execute all tests in the group and then release the locked resources in R
  • Pseudo-busy waiting process:
    • sleep for (5?) minutes.
    • check if the resource is available. If it is, lock it and remove yourself from queue. Otherwise, go back to the previous step.
As you can see, the algorithm is not built for speed and is decidedly greedy in its locking policy. Like I said above, though, speed and concurrency are not the main goals. Reliability, consistency, and simplicity are. There are still a few problems to iron out, but this basic algorithm should ensure that groups of automated tests contending for the same resources do not interfere.

So I've decided to give it another shot. My last blogging attempt (along with it's short-lived corollary, my internal Sun blog) was sustained for a while, but quickly faded after I started working here at Sun Microsystems. After much contemplating, though, I think I'm at a point where I can give it another go.

With this decision made, the first order of business was to decide on the location of my Blog2.0. I did consider reviving owenkellett.info, but ultimately I decided to leave it alone and go a different route for a few main reasons:

  • As an employee of Sun Microsystems, I do feel somewhat compelled to contribute to the corporate blogging effort that Sun, and in particular CEO Jonathan Schwartz has pioneered.
  • owenkellett.info was started before my first full-time employment and has a very limited overlap with my current tenure at Sun.
  • I felt the need to start fresh but do not yet want to purge my previous blog's identity.
Considering all of this, I was left with an obvious option: launch a new, public, personal blog as part of the blogs.sun.com community. My only apprehension about this prospect, though, was that Sun's blogging site uses Roller as its blogging engine while I had become particularly attached to Wordpress during my owenkellett.info toils. Having previously explored Roller, I was not left particularly impressed.

It turns out, however, that Roller, while it may be behind in a few features and have less-robust plugin support in comparision to Wordpress, suffers mostly from its smaller user base and less established development community. Thus while Wordpress has a large collection of community developed themes and plugins, Roller has a much smaller set. In addition, it may just be a matter of personal preference, but I don't like any of the stock Roller themes installed for blogs.sun.com. To be vague, they all feel very "Web1.0" as opposed to belonging to the blogsphere "Web2.0" style.

So last night I dug through the Roller docs and cracked open a can of css to see what Roller can really do. I have to admit that I am a fan of the Java/Hibernate combination which makes up the foundation of the back-end engine. Additionally, the use of the Velocity template language to generate the web gui front-end is pretty slick and makes for fairly clean, modular, and simple theme and template code. And while I don't claim to be a css expert by any means, the result of my efforts is the layout of this site. In my opinion, it looks more "blog-like" than any of the stock templates in the blogs.sun.com library. [Although since I do not have a machine with Windows installed anywhere that is readily accessible, I still have not been able to test the layout on Internet Explorer. The structure is fairly basic though, and to the best of my knowledge W3C standards compliant.]

In any case, I'm back. Hopefully I can stick with it this time, and that this blog will help me to keep my perspective on whatever lies ahead.