Monday Feb 11, 2008

What is Predictive Self Healing?

Ten years ago Sun learned some hard lessons in scaling up Solaris to run on systems with large numbers of CPUs and lots of RAM. For example, if you have a system with 16 CPUs your odds of experiencing a CPU failure event go up by a factor of 16. In order for these large machines to be viable, we had to learn to produce a software layer that could survive a CPU or memory failure and protect the applications that were running. We call that technology Predictive Self Healing, and it’s one of the key features we’re adding to xVM Server. While a few years ago a system with more that two CPUs was exclusively the realm of high-end UNIX hardware, it’s now easy to find off-the-shelf x86 hardware with 16 CPU cores and a quarter terabyte of RAM. The number of components that can fail in a system like that is huge, and with server consolidation, a single failure could crash ten or even twenty operating systems. xVM Server can protect guest operating systems (even Linux and Windows) from various classes of hardware faults – even on hardware not built by Sun.

Comments:

We are doing a lot more than that now. We can diagnose IO errors. Memory page retire can off-line memory. I have an operating system comparison sitting in my inbox. It is clear that Sun leads the market in this technology as far as the OS is concerned.

I run the San Diego open OS Forum and it is very clear that this technology interests the systems admins and it helps them sleep better at night. Sun needs to talk about this more aggressively.

I think that we met at the EBC presenting Sun's Vision Class. How are you?

Posted by Foz Saeed on February 11, 2008 at 10:29 AM PST #

Post a Comment:
Comments are closed for this entry.