Sun Servers, product quality, escalations and more.....
Remco's Sun blog
Archives
Click me to subscribe
Search

Links
 

Today's Page Hits: 3

Wednesday Feb 04, 2009
Ultrasparc IV+ Cache Line retirement in Solaris 9 and 10
Solaris 9 Kernel patch 122300-28 and Solaris 10 Kernel patch 137111-02 did introduce cache line retirement (a while ago).
Before cache line retirement a single weak cell, creating CE's, in a L2/L3 cache in a processor could cause the processor to be offlined and it would no longer do useful work and would just await replacement. With cache line retirement the offending cache line will no longer be used and the processor will continue run as normal. This will result in more processing power being available and less downtime. A Ultrasparc IV+ cpu module has 2 MB L2 cache, and 32MB L3 cache. If 64 out of 524,288 L3 cache lines are retired the cpu module will be offlined, so well before it can have any impact on the system performance. (The actual implementation is a little more complex as caches are organized in index's and ways) (Generic primer here http://en.wikipedia.org/wiki/CPU_cache ). On Solaris 10 cache line retirement has been implemented with FMA (Fault Management Architecture). In Solaris 10 cache line retirement is persistent through reboot through FMA log replay which happens at Solaris boot time.

To see how if any and how many cache lines are retired on S9:
kstat -n pn_cacheline_retire
And for Solaris 10
fmdump -av
To see if a CPU module needs replacement in S10
fmadm faulty -a
Example FMA message:
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Oct 28 15:15:38 9f7ef7e2-4282-cd2d-f7b4-eeac8c9986d6  SUN4U-8001-1E  Major

Fault class : fault.cpu.ultraSPARC-IVplus.l3cachetag
Affects     : cpu:///cpuid=4/serial=80010221135D2559
              cpu:///cpuid=20/serial=80010221135D2559
                  degraded but still in service
FRU         : "Slot C" (hc://:product-id=SUNW,Sun-Fire-V890:server-id=s064130/component=Slot C)

Description : The number of errors associated with this CPU has exceeded
              acceptable levels.  Refer to http://sun.com/msg/SUN4U-8001-1E for
              more information.

Response    : The fault manager will attempt to remove the affected CPU from
              service.

Impact      : System performance may be affected.

Action      : Schedule a repair procedure to replace the affected CPU, the
              identity of which can be determined using fmdump -v -u
              .
Also
psrinfo
will show "offline" processors. Generally Solaris 10 with FMA doesn't alert the Sys Admin until action does need to be taken. In that case it'll post a message in /var/adm/messages . Solaris 9, by default more verbose in error reporting will notify about each retired index and way in /var/adm/message. And will subsequently offline the processor if a threshold is crossed.

Reference: UltraSPARC-IV+
Reference: SunSolve Document ID:23862 Need to have access to SunSolve contract documents.

Cache line retirement should not be confused with Memory page retirement.
Posted at 12:15PM Feb 04, 2009 by remco in Tech  | 

Monday Nov 28, 2005
The birth of a urban legend: Forcing network ports (to full duplex).
In the beginning of TP ethernet getting ports from different vendors to play nice together was sometimes a challenge, those days are over....... I guess in order to battle this problem some companies require the ethernet ports to be software wise "forced" to the highest speed and "full duplex", this sometimes seems to work but too often causes wierd connection problems. Now that most of these interoperatibility problems have gone the "old" rule is still continuing to cause, hard to find problems, every now and then. Nowadays, and for quite some time, autonegotiation works (and if not have the vendors fix it). I guess its now not more than an urban legend, IT style.
Posted at 10:30AM Nov 28, 2005 by remco in Tech  |  Comments[5]