One feature that is sorely lacking in our test automation harness is the ability to share and lock resources between competing groups of tests. Consider this situation: Our group is responsible for testing some of the systems in Sun's X64 family of servers. In particular, much of our focus is on the Sun Blade 8000 (code named Andromeda). Given two indisputable facts:
- These systems are expensive, and thus there are not an infinite supply for us to test.
- There are many different components to one system, some of which can be tested independently.
Thus we are faced with a somewhat classic resource sharing problem, but with a unique twist: some resources (such as an Andromeda chassis) might have sub-resources which can be locked and used independently and without interference. Therefore, a resource can only be locked for use if it and all of its sub-resources are available. Our specific problem details can be described as follows:
- Resources are structured in a hierarchical format as just described.
- The resource hierarchy is stored in a centralized SQL database.
- Each logical group of automated tests defines a set of resources that must be locked before execution can begin.
- Each group of automated tests is run by an execution daemon. There may be multiple execution daemons running on possibly different servers which will be attempting to lock the same resources in the centralized database before executing the tests.
It turns out that having a hierarchy of resources does not present a significant problem in coming up with a resource sharing policy. As long as resources and their children are properly ordered per Dijkstra's solution to the infamous dining philosopher's problem, and resource locks are always acquired in the same order, the same basic principles apply. Thus here is my basic locking algorithm presented as a solution to the above described problem:
- An execution group requires a set of resources, we'll call them R
- Foreach resource r in R (in alphabetical order)
- Lock the database tables associated with resources (to avoid race conditions in database access).
- If r is unlocked AND there is nobody waiting in the queue for r (the queue is maintained in the database), lock it (by assigning your name to the resource as the locker in the database).
- Recursively lock each child resource of r in alphabetical order.
- If any children of r are locked, release all children of r (but do NOT release r itself). This will allow any execution groups that require only children of r to execute while we are waiting for resources to be released. After waiting n (10?) minutes, go back to the beginning of the children locking loop and re-attempt to lock the children. This time, do NOT release the child locks if a locked child is encountered, enter into the queue for that child instead and initiate the pseudo-busy waiting process for that resource.
- Else enter into the queue for resource r and initiate the pseudo-busy waiting process for r.
- Unlock the database tables associated with resources
- Once all resources are acquired, execute all tests in the group and then release the locked resources in R
- Pseudo-busy waiting process:
- sleep for (5?) minutes.
- check if the resource is available. If it is, lock it and remove yourself from queue. Otherwise, go back to the previous step.