#
jdh's blog
theme and variations
|
|
Thursday Feb 22, 2007
working remotely: Prague Paradise
This past year, I decided to leverage some intersecting reasons for travelling to Prague on business. But instead of just doing a business trip, I proposed to my management that I work out of the Sun office in Prague for a short time while also mixing in vacation time, so I could both experience the city and make a connection with Sun folks at a non-U.S. site. My management was very obliging, and before I knew it, my Prague plan panned out. Posted at 03:47PM Feb 22, 2007 by Julia Harper in Sun | Comments[0]
Friday Dec 08, 2006
Complexity: basic system monitoring
The other day I had my second contact with a Real Customer, and I learned something. (Come to think of it, I learned something at my first meeting as well. This is a trend I'm trying hard to extend by insinuating myself into meetings with customers.) We have a bunch of really cool systems at Sun -- across a wide range, from high end to low end, covering both SPARC and x64 platforms. Some customers buy lots and lots of them. And then want to manage them. Posted at 07:36PM Dec 08, 2006 by Julia Harper in Sun | Comments[1]
Saturday Nov 11, 2006
Complexity and Completeness: FMA
Complexity brings joy into the life of an engineer: it is so satisfying to find all the nooks and crannies of a problem and come up with a solution that covers them all. No, wait. Complexity is the nemesis of the engineer: it is so hard to be satisfied with an 80% solution. The desire to provide a complete solution is almost unbearable, beyond all reasonable expectations of the company or customer, Must an engineer resort to damned statistics to prove that an inelegant or incomplete solution is sufficient, and a more efficient use of time? Sufficient for whom? For the consumer of course. Being associated with a less than perfect solution repulses me, yet I am the very customer that would never pay what it costs for perfection. One of my favorite examples of the agony and ecstasy of complexity is FMA (fault management architecture - see Mike Shapiro's blog). FMA is hard.1 I've looked at the blog entry by past Sun luminary Andy Rudoff, in which he provides a summary of the concepts of FMA. The concepts are so pure and beautiful. And simple! But it all starts with fault trees. One must explain every fault that could occur, and every symptom (error) it might produce. Then in the middle there are all the timing issues - how long will related errors take to show up? And will they show up, because after all the paths for communication are not perfect? And at the end is the problem of how to isolate the fault. Who are all the constituents who will be affected, and how do we guarantee there is no race condition between multiple actors? Right now we take an all-or-nothing approach to a vertical segment of faults -- all cpu faults, for example. Essentially we diagnose a complete subtree of faults, but ignore faults caused by a component closer to the root of the fault tree (maybe the fault is really a power supply problem that affects all components in the box). This is how we make the problem tractable. But the complete subtree approach gets particularly hard for I/O, where components can have a great deal of interaction, not necessarily all in a nice hierarchy, and errors are reflected in all directions. Cindi McGuire has lead a herculean effort to wrestle down I/O fault management into something containable and expressible, but getting every device to participate in FMA has stumped this effort. The job is not complete. The ability to diagnose with complete accuracy remains an unrealized vision. So I wonder - would it be so bad to just sprinkle some FMA around? For example, if we have evidence that a particular subset of faults is most common or catastrophic, can we can provide just sufficient error reporting and diagnosis to narrow the fault landscape to find the instigators of those most egregious faults? Maybe we allow drivers to be enhanced to some minimal level of error reporting to enable just that type of diagnosis, for example. This might make it easier for driver writers inside and outside of Sun to inch their way along the path of enabling the full FMA vision. 1 let's go shopping Posted at 06:45PM Nov 11, 2006 by Julia Harper in Sun | Comments[0] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||