Mike Shapiro's Blog $<blog

Monday Jan 24, 2005

I recently wrote an article for the ACM's Queue Magazine entitled Self-Healing in Modern Operating Systems, which you can read online or download as a PDF file. The article was subsequently discussed on SlashDot. I stated the thesis of the article as follows:

Your operating system provides threads as a programming primitive that permits applications to scale transparently and perform better as multiple processors, multiple cores per die, or more hardware threads per core are added. Your operating system also provides virtual memory as a programming abstraction that allows applications to scale transparently with available physical memory resources. Now we need our operating systems to provide the new abstractions that will enable self-healing activities or graceful degradation in service without requiring developers to rewrite applications or administrators to purchase expensive hardware that tries to work around the operating system instead of with it.

The article provides an overview of our approach to building real self-healing technology into Solaris 10, and tries to make the case that administrators and developers will only benefit from automated diagnosis and repair technology in a cost-effective fashion when the operating system is involved and provides new stable abstractions for these RAS interactions. You can learn more about self-healing in Solaris 10 on BigAdmin and at our Knowledge Article Web.

$q

Sunday Jan 23, 2005

I work in Solaris Kernel Development at Sun Microsystems, where among other things I'm the architect for RAS (Reliability, Availability, Serviceability) features in Solaris. My research and engineering interests are focused on technology to enhance the availability of computer systems,including programming languages and debugging tools for developers, operating system technologies for handling and recovering from software and hardware faults and defects, and tools for administrators and users that improve the user experience. My work at Sun includes the design and implementation of:
  • Commands: dtrace(1M), dumpadm(1M), fmadm(1M), fmdump(1M), fmstat(1M), mdb(1), pgrep(1), pkill(1)
  • Daemons: fmd(1M)
  • Libraries: libctf, libdtrace, libfmd_adm, libfmd_log, libproc
  • Kernel Subsystems: Lock-Free Error Queues, Panic Subsystem, Firmware Locking, Error Trap Interpositioning (on_trap), UltraSPARC-I and II CPU and Memory Error Handling, DTrace Virtual Machine
  • File and Data Formats: CTF (Compact C Type Format), DOF (DTrace Object Format), FCF (FMD Checkpoint Format)

as well as contributions to the design of coreadm(1M), user core files, kernel crash dumps, the /proc filesystem, and other related areas. In Solaris 10, I designed and implemented the D programming language and compiler for DTrace, and led the effort to create Sun's architecture for Predictive Self-Healing, part of our innovative approach to Fault Management that is debuting in Solaris 10.

Contrary to earlier blog easter-eggs, I resemble neither the battering-ram power of Bosco Baracus nor the pasty-haired impishness of Larry Fine. I do, however, look pretty much exactly like my cartoon action figure, as seen in InsideJack Episode 2.

Prior to working at Sun, I was causing trouble with my partner-in-crime Bryan Cantrill at Brown University, where I received a BS and MS in Computer Science. I'm originally from the Boston area, and spend much of my free time reliving basketball games from the 80's now on DVD, this year's Red Sox triumph, and the weekly drama of a Man Named Brady.

$q