Platform Independent FMA for sun4v: July 2008 update
There's two active sub-projects at the moment. First is the development of a sun4v generic topology enumerator. This project is quite far along - the code is in test, and should be integrated into OpenSolaris in the next 30 days. PSARC 2008/392 is the first visible result of the project. The PSARC case really doesn't give much insight into how the enumerator is working, so a short description of how it works is in order.
On sun4v systems, the firmware in the SP provides a Physical Resource Inventory (PRI). The contents of this structure, as its name suggests, details all of the physical attributes of the platform. It is primarily for use with Logical Domains (LDOMs), but some portions of it are consumed by other subsystems, FMA being one of them. The PRI already had\s the capability to describe relationships between components, much like a topology map file was doing in Solaris. So with FMA PI, we defined a set of new properties for certain PRI nodes that our new Solaris-side enumerator picks up to construct the FMA topology. End result - sun4v platforms control their topology from the SP firmware. We can take a PRI node that looks something like this:
node component node_0x3dac { back -> node_0x38aa; type = "dimm"; fru = 0x1; label = "J585"; topo-hc-name = "dimm"; part_number = "511-1161-01 Rev 01"; serial_number = "00CE0108042425E8B3"; dash_number = "00"; rev_number = "00"; sdram_manuf_jedec_id = "80CE"; amb_vid = "111D"; amb_rid = "21"; amb_did = "0482"; id = 0x0; nac = "MB/CPU0/CMP0/BR0/CH0/D0"; back -> node_0x3da4; }
and turn it into an FMA topology entry that looks like this:hc://:product-id=SUNW,FOO:server-id=sca-foo-1/chassis=0/motherboard=0/cpuboa rd=0/chip=0/memory-controller=0/branch=0/dram-channel=0/dimm=0 ASRU: - FRU: hc://:product-id=SUNW,FOO:server-id=sca-foo-1:serial=00CE010804 2425E8B3:part=511-1161-01 Rev 0100:revision=00/chassis=0/motherboard=0/cpuboard= 0/chip=0/memory-controller=0/branch=0/dram-channel=0/dimm=0 Label: MB/CPU0/CMP0/BR0/CH0/D0
The only thing I've not reflected above is the hierarchy built into the PRI. But the fwd and back nodes give you the notion. Once the enumerator is putback to Solaris, sun4v platforms don't have to write enumerators anymore. The topology is encoded into the FW, and the FMA PI enumerator transforms it into a usable topology.The other active project is for SPARC generic CPU and memory diagnosis. This project is still in the earlier stages of development. Processor-agnostic ereports have been defined and are being fine tuned. To follow is the diagnosis rules to consume those ereports. An analogy to the project is the MCA work done on x86 - sun4v FMA PI aims to provide a base level of FMA support for any sun4v platform. But with sun4v, as there's more that can be done with the SP firmware, we're targeting a stronger base feature set. Since the project is still early on, I'll leave it at that for now.
If you've read any of my "FMA Triad" posts, you may notice the pattern to the projects: topology, telemetry, and diagnosis rules. I'll endeavor to give more updates on the FMA PI projects as the rest of 2008 progresses. At the very least, you'll likely hear from me when the OpenSolaris putback happens.
:wq