A method to the madness

Recently (just completed today), our group has migrated from Perforce to Subversion for source control of all of our automation test scripts and harness code. The decision to make the move was largely more for political and economic reasons than technical ones, but personally I do feel that for our specific needs, Subversion is a better overall choice. The advantages that we gain include:

  • Cost. Perforce costs roughly $200 per user per year. With our group holding upwards of 30 licenses, this becomes a non-trivial cost. Subversion is open source and free. The benefit here is clear.
  • Easier code sharing and collaboration between us and other internal Sun groups. Seeing as many of our tests can be used by other QA groups within Sun testing similar or identical products, Perforce licensing hurdles have started to block some of our collaboration initiatives.
  • More flexibility when editing checked out code. One of my personal pet-peeves of Perforce is its requirement that you explicitly notify the repository that you are going to make a modification on a file before it allows you to modify it (i.e "p4 edit"). Not only does this prove to be a minor annoyance while editing files, it also makes it difficult to work on checked out code when you do not have direct network access to the repository. I do realize that there is a workaround for this problem, and I also realize the benefits of notifying the repository when code is being edited. In general, though, it makes the overall experience doing even simple tasks in Perforce cumbersome and somewhat unintuitive.
  • Consistency between us and Andromeda/Galaxy developers as well as other Sun QA groups. In general, it seems that Subversion is the source control system of choice for many of the groups with which we are directly involved. Using the same system only make sense.
As far as disadvantages go, the one main pitfall that has been touted in comparisons between Subversion and Perforce is that Subversion does not have automatic branch tracking. Thus fundamental software engineering techniques of maintaining multiple branches of code and tracking and merging changes between branches is a considerably more manual and error-prone process in Subversion. However, since, as a QA group, we utilize merging features considerably less than a software development group would, I would contend that this negative has minimal impact for our purposes.

The move to Subversion is therefore a welcome one. With it, though, comes the minor overhead of adopting slightly different techniques when working with a Subversion repository as opposed to a Perforce depot. For the benefit of all, here are some basic tasks that many members of our group are used to doing with Perforce, along with their Subversion equivalents [Note these are guidelines for the Unix-based clients. I stay away from Windows as much as possible.]:

  • Checkout code from repository:

    Perforce: The process of setting up a local workspace in Perforce always seemed overly complex to me. You had to set multiple environment variables (usually P4USER, P4CLIENT, P4PORT, but others configurable) to configure the repository location and user credentials. You then needed to setup a client workspace, mapping depot directories to local directories. After this process, "p4 sync" would check out a copy of the desired workspace for you.

    Subversion: Simpler.
    svn checkout --username [USERNAME] http://repository/location/and/desired/subdirectory [local directory]

  • Update your local copy to the latest revision in the repository:

    Perforce:
    p4 sync

    Subversion: One step process.
    svn update

  • Make changes to a file in the repository:

    Perforce: Two step process.
    p4 edit [filename]
    Modify the file as desired

    Subversion: One step process.
    Modify the file as desired

  • Add a new file to the repository:

    Perforce:
    p4 add [filename]

    Subversion:
    svn add [filename]

  • Delete a file from the repository:

    Perforce:
    p4 delete [filename]

    Subversion:
    svn delete [filename]

  • Undo changes to a file in your local copy:

    Perforce:
    p4 revert [filename]

    Subversion:
    svn revert [filename]

  • Check the status of modified, added, deleted, etc. files in your local copy:

    Perforce:
    p4 opened

    Subversion:
    svn status

  • Compare a modified local copy of a file to the original checked out copy:

    Perforce:
    p4 diff [filename]

    Subversion:
    svn diff [filename]

  • Commit changes from your local copy into the repository:

    Perforce:
    p4 submit
    An editor window using the program specified by the environment variable P4EDITOR will open for you to enter a description of the changes.

    Subversion:
    svn commit
    An editor window using the program specified by the environment variable SVN_EDITOR will open for you to enter a description of the changes. Alternatively, you can bypass the editor and enter the description inline:
    svn commit -m "This is a description of the changes"

  • Create a branch:

    Perforce:
    p4 integrate [source_tree] [dest_tree]
    Perforce automatically keeps track of branched files. Therefore, merging changes between branches is more automated.

    Subversion:
    svn copy [source_tree] [dest_tree]
    As far as SVN is considered, once a tree is branched, they are completely separate and independent copies. After branching a tree, it is good practice to make note of it in the revision log so that the revision number can be found later when changes need to be merged with other branches.

  • Move a file:

    Perforce: Two steps.
    p4 integrate [source file] [dest file]
    p4 delete [source file] In order to move a file in the Perforce depot, the source file must be branched and then deleted. This creates a somewhat counterintuitive revision history since instead of associating the entire revision history log with the new file, it simply points to the revision history of the originally branched file and maintains all history from that point forward.

    Subversion: One step.
    svn move [source file] [dest file]
    Under the hood, this performs the same copy, delete actions as the Perforce process. However, with SVN the entire revision history is copied to the new file since it does not maintain automatic branch information. For simple file moves, this is a more desired behavior.

  • Resolve conflicts between your local changes and others' conflicting changes:
    If changes to certain files have been made since you last ran an "svn update", any conflicting changes that you have made to common files need to be resolved.

    Perforce: Semi-automated process.
    p4 resolve [file]
    Perforce will walk you through all of the conflicts between your changes and the others' changes and ask you how you would like to merge the changes. There is also the option to resolve the conflicts manually.

    Subversion: Completely manual process.
    After conflicts have been discovered (either when trying to do an svn commit or after an svn update), Subversion leaves three copies of the file in your local tree: filename.mine, filename.oldRev, filename.newRev. These represent your new copy of the file, the original checked out copy of the file, and the repository's current copy of the file respectively. At this point, it is up to the user to examine the differences between the three files and decide how to merge the conflicts together. After the file is updated to the user's liking, the following command must be run:
    svn resolved [filename]
    At that point, the newly resolved file can be committed to the repository.

There is a lot more that you can do with both of these source control systems, but for basic needs, this list is fairly comprehensive. An excellent online book that documents the complete feature set of Subversion can be found here: http://svnbook.red-bean.com/ and Perforce docs can be found via http://www.perforce.com

OpenJDK The dust is still settling from this morning's announcement. Aside from the story being picked up by every tech news site out there, it also unleashed a blogging storm on both blogs.sun.com, and the rest of the blogosphere. Yes I'm talking about Sun open sourcing Java (under the GPL no less).

Not to be left out, let me just say that this is absolutely the right move and will prove to be a win-win-win for Sun, the Java community, and the entire open source community (all of which are not mutually exclusive). Personally, I think the biggest gain is summed up in this piece from Jonathan's blog:

The GPL is the same license used to manage the evolution of GNU/Linux - in choosing the GPL, we've opened the door to comingling the communities, and the code itself.
The whole idea of Java grew out of the "write once, run anywhere" mentality. However, when the platform used to run applications is a closed, proprietary system, conflicts of interest crop up everywhere. Granted, the JVM was always free, and the source code was never really secret, but licensing incompatibilities restricted some groups (such as many Linux distributions) from bundling Java as part of its system. Thus there was a wall separating two otherwise completely compatible communities.

Today Sun has taken down that wall. It is now up to the open source community to explore the potential benefits that can be gained.

This blog has been up for less than a week and I've already been hit by my first comment spammer, Mr. "xvsasd"!. Is that even possible? I suppose since all Sun blogs are automatically syndicated to the blogs.sun.com feed, that's how links to my posts are being picked up. I guess I'll be manually moderating my comments for now. *sigh*

A few months back, I took on co-system administration duties for one of our groups most critical data and application servers. It provides several important services, including housing our back-end test harness database, maintaining the Bugzilla database for Sun's Secure Application Switches, and acting as the main execution server for our test automation. Until recently, it was cranking away on a five year old system running an ancient version of Red Hat (7.1 I believe). In desperate need of an upgrade, we got a hold of an X4200 to replace it.

The X4200 class of servers is certified to run Solaris, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and various editions of Microsoft Windows Server. Flying in the face of logic, we chose to install Debian stable (along with the custom kernel compiling that was involved to provide support for the X4200's SAS disks). There are a lot of advantages to running a Debian system, most notable its superior apt-get package management system. After some initial aggravation setting it up, it has performed with minimal complaints since.

Our situation is somewhat unique, however. We have just one system to support with only a small set of functions. Most corporations, though, would not choose to buy racks of expensive Sun hardware and then run an unsupported operating system on them. Those who want to take advantage of the benefits of a Debian-based distribution are on their own. It appears, though, that this may soon change. According to reports, Sun and Canonical are ready to announce plans to certify Ubuntu (a Debian derivative) on Sun's x86 servers (including X4200's).

Yet another step in the right direction.

One feature that is sorely lacking in our test automation harness is the ability to share and lock resources between competing groups of tests. Consider this situation: Our group is responsible for testing some of the systems in Sun's X64 family of servers. In particular, much of our focus is on the Sun Blade 8000 (code named Andromeda). Given two indisputable facts:

  1. These systems are expensive, and thus there are not an infinite supply for us to test.
  2. There are many different components to one system, some of which can be tested independently.
It is possible (and in fact very likely) that we might try to run competing groups of tests on the same hardware at the same time. The reality is that we've run into this problem a lot more than we care for, and our current test automation harness provides no mechanism to ensure that separate groups of tests do not collide in this manner.

Thus we are faced with a somewhat classic resource sharing problem, but with a unique twist: some resources (such as an Andromeda chassis) might have sub-resources which can be locked and used independently and without interference. Therefore, a resource can only be locked for use if it and all of its sub-resources are available. Our specific problem details can be described as follows:

  • Resources are structured in a hierarchical format as just described.
  • The resource hierarchy is stored in a centralized SQL database.
  • Each logical group of automated tests defines a set of resources that must be locked before execution can begin.
  • Each group of automated tests is run by an execution daemon. There may be multiple execution daemons running on possibly different servers which will be attempting to lock the same resources in the centralized database before executing the tests.
One of the considerations in coming up with a resource sharing policy for this situation is that we should strive for simplicity over any optimizations made for speed or to maximize concurrent execution. While both of these benefits would be nice, the number one priority is to ensure a consistent, well-known, and repeatable test environment so that the test results are not invalidated by external interference. Any complexity that we add to the design introduces additional possibilities for infrastructure problems which could create overhead in test maintenance and debugging. Throughput priorities should be secondary which is why I have enforced the policy that all resources required for the entire life of a group of tests (even if it is only required for a short period of time or only a few test cases) must be locked by an execution group before any tests in that group can begin.

It turns out that having a hierarchy of resources does not present a significant problem in coming up with a resource sharing policy. As long as resources and their children are properly ordered per Dijkstra's solution to the infamous dining philosopher's problem, and resource locks are always acquired in the same order, the same basic principles apply. Thus here is my basic locking algorithm presented as a solution to the above described problem:

  • An execution group requires a set of resources, we'll call them R
  • Foreach resource r in R (in alphabetical order)
    • Lock the database tables associated with resources (to avoid race conditions in database access).
    • If r is unlocked AND there is nobody waiting in the queue for r (the queue is maintained in the database), lock it (by assigning your name to the resource as the locker in the database).
      • Recursively lock each child resource of r in alphabetical order.
      • If any children of r are locked, release all children of r (but do NOT release r itself). This will allow any execution groups that require only children of r to execute while we are waiting for resources to be released. After waiting n (10?) minutes, go back to the beginning of the children locking loop and re-attempt to lock the children. This time, do NOT release the child locks if a locked child is encountered, enter into the queue for that child instead and initiate the pseudo-busy waiting process for that resource.
    • Else enter into the queue for resource r and initiate the pseudo-busy waiting process for r.
    • Unlock the database tables associated with resources
  • Once all resources are acquired, execute all tests in the group and then release the locked resources in R
  • Pseudo-busy waiting process:
    • sleep for (5?) minutes.
    • check if the resource is available. If it is, lock it and remove yourself from queue. Otherwise, go back to the previous step.
As you can see, the algorithm is not built for speed and is decidedly greedy in its locking policy. Like I said above, though, speed and concurrency are not the main goals. Reliability, consistency, and simplicity are. There are still a few problems to iron out, but this basic algorithm should ensure that groups of automated tests contending for the same resources do not interfere.