Wednesday Nov 04, 2009

A few months ago, I talked about the x86gentopo project - a reusable baseboard enumeration approach for x86 systems. It's the same spirit that drove the platform independent sun4v FMA work.

Yesterday evening, the project integrated into OpenSolaris. If you missed the flag day message, here it is again.

If you don't use x86 systems, or don't care about x86 FMA, you can hit delete now. There is a new x86 generic FMA topology enumerator available with the putback of: PSARC/2009/490 x86 Generic FMA Topology Enumerator 6785310 Implement SMBIOS contained elements/handles 6841286 Need x86 generic FMA topo enumerator 6853537 x86gentopo needs OEM-Specific SMBIOS structures 6865771 Topology relationships should be derived from contained handles & elements of SMBIOS 6865814 Chip enumerator should derive serials & labels using libsmbios, if SMBIOS is FM aware 6865845 /dev/fm should export the Initial APICID, SMBIOS based ID/instance to the chip enumerator 6866456 Generic Topology FMRI ereport The new x86 generic enumerator creates physical topology, as well as identity information (serial number, part number, etc...), for i86pc class systems which contain a compatible SMBIOS. The X64 Platform Resource Management Specification (PRMS-1) describes what a compliant SMBIOS is (currently in uncirculated draft form). To correctly diagnose faults the cpu and memory ereport generators have been modified to report x86 generic topology, when a compliant SMBIOS is found. If a compliant platform SMBIOS is not found, the x86 generic enumerator and x86 generic ereport generators will revert to existing (legacy) enumeration and ereport generation. If a platform does contain a compliant SMBIOS and wishes to force legacy enumeration, the kernel tunable variable x86gentopo_legacy can be set in /etc/system: set x86gentopo_legacy = 1 To report bugs against the x86 generic topology enumerator and/or the cpu/mem ereport generators please use the following product/cat/subcat: solaris/fma/other : x86 generic enumerator solaris/fma/mem : memory ereport generator solaris/fma/cpu : cpu ereport generator To report bugs against the SMBIOS structure use please use the following product/cat/subcat: solaris/library/libsmbios For more information please visit the OpenSolaris x86gentopo project page: http://hub.opensolaris.org/bin/view/Project+x86gentopo/WebHome Thank you, The x86gentopo team

Now to get that PRMS finalized and made public...

:wq

Tuesday Oct 27, 2009

Nehalem EX is coming. There's been a bit of press on Solaris and EX recently as well as a whitepaper describing the Solaris modifications to take advantage of EX's capabilities. On the FMA front, support for EX integrated into b127 last week.

:wq

Thursday Oct 22, 2009

Solaris 10 Update 8 is now posted and available for download. And there's been plenty of bug fix work for FMA. Beyond lots of memory leak and core dump fixes, here's my favorite fixes.


6743295 fault.memory.dimm is overloaded
6758561 KA pages for fault.memory.dimm* are needlessly different

These two fixes provide some cleanup for DIMM faults. CR 6743295 explains the more tangible benefit, IMHO. In addition to getting the information a DIMM is declared faulty, there is now better separation via the fault and knowledge article explaining why the DIMM is deemed bad.


6394503 fmdump should show contents of rotated logs without specifying them explicitly
6535637 Add Severity level to payload of list.suspects event

I lump these together as administrative improvements. The first expanding fmdump to show information from historical logs and not just the current log. FMA error and fault logs are rotated periodically, and the new behavior is to display data from all logs.


6618751 Include memboard in T5440 FBR/FBU diagnosis

Beyond fixing a nasty core dump in the diagnosis flow, this fix also improves the diagnosis. This was a gap from when T5440 first shipped. On configurations where DIMMs on memory boards are in the mix, the memory board itself is part of the FB-DIMM channel. That component is now included in the diagnosis of channel errors.


6747341 Add FMA to hermon driver
6656720 Initial hxge driver

Add two more FMA hardened drivers to Solaris 10!


6800878 CMI_MAX_{CHIPS,CORES_PER_CHIP,STRANDS_PER_CORE} should be dynamic

The impetus for this change is the higher core counts coming in the x86 world. With newer chips from both AMD and Intel, the previously defined maximums would be insufficient to fault manage all the cores and strands. Not anymore :)


6818561 FMA topology fails on Sun Blade T6300

This was just a flat out embarrassment. Topology completely missing from the T6300 blades, rendering FMA largely ineffective (errors still logged, but diagnosis crippled). Glad this one is fixed.

:wq

This blog copyright 2009 by Scott Davenport