Who Cares About Software Reliability
Cristian has done some pretty interesting work in software reliability. Check out his paper on Distributed Speculative Execution for Reliability and Fault Tolerance. The paper talks about speculative execution in a grid environment, but with CPU vendors planning CPUs this decade that can run not just 1 or 2 threads but 32 or 64 or more in parallel, perhaps we will figure out how to use some of those extra threads to make software more reliable. Sun's upcoming Rock processor, with 16 cores, employs scout threads to increase performance through speculative execution of instructions ahead of the main thread, for instance pre-executing instructions after a branch. Branch prediction and out of order execution is already done today to a limited extent on modern processors, but scout threads, by employing separate hardware treads, promise to do so more efficiently than current processors. Maybe in another few years if we have a 32 core CPU we will use some of the extra threads not to increase performance but to increase reliability, similar to how Cristian describes his research doing this at the cluster level.
This is exactly why, ever since I graduated college, I have been an IEEE member. Be it local chapter meetings, worldwide conferences, or their many publications, my interactions with the IEEE and its members have always sparked by intellectual curiosity. Maybe when my cell phone gets a 32 core processor with scout treads, I can finally stop rebooting it.

Posted by Ravenor on February 23, 2007 at 09:53 AM PST #