Dilpreet Bindra's Weblog

Weblog

All | General | Java | Music | solaris

20050614 Tuesday June 14, 2005

Match Made in Heaven Match Made in Heaven

The wedding:

Dearly beloved, we are gathered here today to unite the community to this code, OpenSolaris, in the bonds of holy hacking which is an honorable estate. Into this, these two now come to be joined. If anyone present can show just and legal cause why they may not be joined, let them speak now or forever hold their peace. :::sorry times up::: Who gives this code to this community? (Sun) Community, will you have this code as your lawful source, to live together in the estate of hackery? Will you love her, debug her, test her, enhance her, and keep her bug free and in health; forsaking all others, be true to her as long as you both shall live? (you will). OpenSolaris, will you have the community as your lawful developer, to live together in the estate of hackery? Will you love them, take putbacks from them, teach them, learn from them, and keep them with some defects (as few as possible) and in health; forsaking all others, be true to them as long as you both shall live? (it will).

I now pronounce you code and developers, you may now look at the source.

The honeymoon:

Now that that's taken care of we can finally share our favorite little pieces of code with the community. For this blog I would like to focus on the error handling portions of the I/O subsystem. Discussing fault diagnosis first would be more appropriate but I will go backwards, especially since Andy wrote a great introduction to fault diagnosis in his blog.

As part of the Predictive Self-Healing work that went into S10 we "hardened"1 the nexus drivers that connect the system bus to the PCI bus. One such driver is the pcisch (name of driver which can be seen via 'prtconf -D') nexus driver which attaches to the Schizo/Tomatillo/Xmits (internal names which can be seen via 'prtdiag -v') bridge chips.

To "harden" this driver (and actually to "harden" any driver) we first needed to understand the underlying hardware and the various fault conditions that could lead to errors being detected and/or reported by the hostbridge. I will table this discussion for a later blog so that we can focus on error handling.

Once we had a clear indication of what faults exist and what their associated detectors were then the next step was to determine how the errors should be reported and handled in the driver so that we can recover (wherever possible) and persist the error information so that it can be diagnosed (via a diagnosis engine which understands the relationship between faults and errors for this subsystem).

Since this driver controls hardware that bridges two bus standards, as usually is the case, we need to "harden" the devices which exist on the two bus standards as well to be able to take full advantage of the logging/recovery options we may have available. For example, if we encounter a PCI master abort during a PIO2 Read transaction (possible fault(s) may be leaf3 device not responding, requesting driver/user addressing a non-existant device or attempting to access the device after power managing it):

As you can see from the above, the detector in this instance was actually the hostbridge but the CPU reported the error up to the kernel. Both locations have important data, the CPU recorded the address of the device which either failed to respond or did not exist due to an incorrect address and the hostbridge recorded the error condition which caused it to send the Bus Error to the CPU in the first place. Without both pieces of information error handling *and* fault diagnosis are severely limited. Also the device in question may have some status which could also help solve this case, such as it could currently be power managed.

So to reap all this information the CPU, nexus, and leaf must be "hardened".

The "hardened" response to the above example:

If you would like some more details on the inner workings of the PCI nexus error handling code, please read the comment block here

The happy ending:

Previously, the same situation would have caused us to send a SIGBUS to the offending process and it would have either cored or printed some cryptic message. We did not reap the information in the nexus or the leaf and did no diagnosis.

Now we send the SIGBUS to the offending application (and if the application is not able to recover and is managed by SMF it will be restarted) and diagnose the detected error telemetry to the appropriate suspects (the device which failed to respond, and the driver of the device).

The above is only one simple example, another is Uncorrected ECC Errors we detect while executing a non privledged thread. Previously, a UE taken by a user thread would have caused you downtime (system would panic) with a cryptic panic message. Now the system is able to restart your application (if managed by SMF), diagnose the fault (point you to the failed component, if user action is required), and retires the faulty page.


1:"hardened" for this discussion means handling and reporting errors which are detected by the underlying hardware in a manner which aids fault diagnosis.
2:PIO stands for programmed I/O and is a request, either read or write, to a I/O device.
3:leaf refers to the endpoint device.
4: DEVSEL# is a active low signal on the PCI bus which is enabled when a device accepts an address sent on the bus.

Technorati Tag:
Technorati Tag:

(2005-06-14 08:50:16.0) Permalink Comments [1]

20041203 Friday December 03, 2004

nice solaris 10 article

Dan Price just sent this great Solaris 10 article internally and I thought I should share it with the world. Some highights:

"...someone who knows DTrace could walk into a strange Solaris 10-based environment with machines and configurations he had never seen before and use it to track the problem down."
"Many people will wonder, "Is Solaris 10 better than Red Hat Enterprise Server 3, Windows Server 2003, and SUSE Linux Enterprise Server 9?" Under most conditions the answer is yes, thanks to the above-mentioned features that are unique to Solaris 10. While SLES9 has Usermode Linux to do operating system virtualization, it requires assigned system resources and doesn't offer optimal performance. Solaris Containers require only storage (hard drive) space to work and don't suck up as much system resources, making this feature more efficient while providing similar functionality. ReiserFS v4 may be a significant step forward for Linux file systems, but looking through the feature list on its Web site, I don't see anything like the ability to add storage space dynamically or integrated checksums to protect against data corruption. ReiserFS v4 is also not 128-bit, so its ceiling is much lower than that of ZFS. DTrace has no equivalent anywhere, as far as I can tell."

The paragraph above is great, especially the last sentence which gets to the heart of what Byran, Mike and Adam have been trying to convey all along.

(2004-12-03 10:09:47.0) Permalink Comments [1]

20041127 Saturday November 27, 2004

interesting

Just in case you don't read comments to weblogs, here is one by William Strathearn in response to Adam's latest blog that I thought you might find interesting (the blog is interesting as well).

Oh and not to be out done, Eric "the professor" Schrock has another great lesson posted for us OS internals junkies.

(2004-11-27 23:49:57.0) Permalink

20041123 Tuesday November 23, 2004

perception versus reality

First, let me apologize for being away so long and promise :::fingers crossed::: to blog as frequently as possible (but only when I have something interesting to say). Eric mentioned a while ago that I was the first FMAer to blog, even though I would love the title, it is incorrect and Andy Rudoff owns that distinction. Andy along with others have been working on and delivered a really cool Fault Diagnosis Engine (will have more on what this is later), which allows you to describe the faults in a particular system or subsystem based on the symptoms (error telemetry) that this system/subsystem experiences in the presence of that fault. I will let him describe it further, also more later from me regarding FMA (Predictive Self-Healing).

Now to the reason for this blog. During dinner last night with some Solaris engineers: Jeff Bonwick, Bill Moore (needs a blog because he has a lot of cool things to say), Valerie Bubb and Val Henson (former ZFS junkie), I mentioned how awed I was by the sheer talent gathered in the Solaris group and how the people here never cease to amaze me. They thought it may be blog worthy and after reading Jim's response to Joe Barr, I knew it needed to be shared.

Joe's article states "Perhaps he [Frank Ottink] thinks that a large, vital, worldwide, dedicated group of highly skilled Solaris kernel hackers is going to appear out of nowhere and make it so, simply because Sun has hung the "Open Source" sign out on Solaris 10." as Jim has already mentioned in his blog we *already* have all of the above. Except the term "hacker" does not do them justice. The development community in Sun and particularly in the Solaris team are engineers in every sense of the word. They are driven, focused, devoted, visionaries with attention for detail like master craftsmen/women. The article goes on to say "Let's come back to this in a couple of years and compare the community of Linux kernel hackers with the community of Solaris 10 kernel hackers. If the Solaris "community" is as large, skilled, and productive as the Linux community, I'll eat one of those Red Hats.", Solaris 10 is a testament to how skilled and productive our community *already* is. In a hallway conversation earlier in the evening, Byran (of Dtrace fame), happened to mentioned that he was told "you're a hacker, we're going to make you into an engineer" when he joined Sun, if he ever was a hacker he sure isn't now.

This is the culture and the community we have to offer, and what we will export. IMHO, this is much more valuable to the community at large than the code itself.
(2004-11-23 10:19:48.0) Permalink


Today's Page Hits: 13