Eric Saxe's dump o' core
End of Line
Archives
« December 2005 »
SunMonTueWedThuFriSat
    
1
2
3
4
5
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
       
Today
Click me to subscribe
Search

Links


Navigation

 

Today's Page Hits: 7

« Previous month (Nov 2005) | Main | Next month (Jan 2006) »
Wednesday Dec 07, 2005
Do-it-yourself Kernel Development Preso
Russ Blaine gave this presentation to the ACM student chapter at Northeastern University. The first part of the presentation describes (in seat gripping detail) our adventures in root causing a *very* recent bug:

6348316 cyclic subsystem baffled by x86 softint behavior

It's a nice ride, with lots of sights along the way...the cyclic subsystem, device configuration, autovectored interrupts, and more... The second part of the presentation, is essentially the same as the OpenSolaris presentation Steve presented at UCSD last week, and the last part talks about why the Solaris Kernel Group is such a great place to work, who should work for us, and why.
Technorati Tags: OpenSolaris
Posted at 05:07PM Dec 07, 2005 by Eric Saxe in Solaris  |  Comments[1]

Tuesday Dec 06, 2005
UltraSPARC T1 and Solaris: Threading Throughout
We are unveiling several UltraSPARC T1 (aka Niagara) based servers today. If you don't know what the T1 processor is, and you haven't heard about this chip and the systems that will house it, then you really should have a look. Seriously, this chip is impressive. Working at Sun, every now and again i'm fortunate enough to hear about new products and technologies we've got coming down the pipe. When I had first heard about the Niagara (T1) processor I was in disbelief.

32 logical CPUs presented by a single chip at 72 Watts?! Simply amazing.

Housed in T1's 2 square inch package are 8 processor cores, each capable of running 4 threads simultaneously. For me, the gravity of all this really sunk in when I invoked psrinfo(1M) on a test box and watched as the top of the output scrolled out of view in my xterm:
esaxe@ontario-mc25$ psrinfo
0       on-line   since 10/14/2000 20:54:37
1       on-line   since 10/14/2000 20:54:39
2       on-line   since 10/14/2000 20:54:39
3       on-line   since 10/14/2000 20:54:39
4       on-line   since 10/14/2000 20:54:39
5       on-line   since 10/14/2000 20:54:39
6       on-line   since 10/14/2000 20:54:39
7       on-line   since 10/14/2000 20:54:39
8       on-line   since 10/14/2000 20:54:39
9       on-line   since 10/14/2000 20:54:39
10      on-line   since 10/14/2000 20:54:39
11      on-line   since 10/14/2000 20:54:39
12      on-line   since 10/14/2000 20:54:39
13      on-line   since 10/14/2000 20:54:39
14      on-line   since 10/14/2000 20:54:39
15      on-line   since 10/14/2000 20:54:39
16      on-line   since 10/14/2000 20:54:39
17      on-line   since 10/14/2000 20:54:39
18      on-line   since 10/14/2000 20:54:39
19      on-line   since 10/14/2000 20:54:39
20      on-line   since 10/14/2000 20:54:39
21      on-line   since 10/14/2000 20:54:39
22      on-line   since 10/14/2000 20:54:39
23      on-line   since 10/14/2000 20:54:39
24      on-line   since 10/14/2000 20:54:39
25      on-line   since 10/14/2000 20:54:39
26      on-line   since 10/14/2000 20:54:39
27      on-line   since 10/14/2000 20:54:39
28      on-line   since 10/14/2000 20:54:39
29      on-line   since 10/14/2000 20:54:39
30      on-line   since 10/14/2000 20:54:39
31      on-line   since 10/14/2000 20:54:39
Yes, I know the system's clock is off by a few years. But seriously, output like this is something i'm used to seeing on monsters like the Sun Fire E25K and Sun Fire E6900 Servers. It was mind expanding indeed to see this sort of output from a small box with but a single physical processor.

Like UltraSPARC-IV and UltraSPARC-IV+ the T1 implements a Chip Multi-Threading (CMT) architecture...(which simply means it is able to run multiple threads of execution simultaneously). For a nice bit of background on CMT technology check out this article (I won't rehash it all here).

Solaris, as you might imagine is a natural fit for CMT systems like the T1000 and T2000, as it has efficiently operated across systems having twice as many CPUs (and more) for years.

For CMT however, acheiving good performance requires more than simply being able to scale. In a previous blog entry I talked about some of the CMT scheduling optimizations we've implemented in Solaris. Andrei will be discussing these optimizations more specifically in the context of the T1 processor (so again, I won't rehash), but it is worth underscoring that (especially for threaded processor architectures), the optimized thread placement and load balancing performed by the scheduler is a huge performance win.

Looking ahead, it's likely that workload characterization is going to be an important (and interesting) area of research. For example, we know that throughput on threaded processor architectures is maximized when threads running on the same core (sharing the same pipeline) are able to effectively execute in each other's stall cycles. CPI (Cycles Per Instruction) should therefore be an interesting metric to note when trying to characterize a given workload's scaling ability on this architecture. What other workload characteristics and metrics will be important/useful to collect and observe? We've got our work cut out for us. :)

Technorati Tags:
Posted at 09:16AM Dec 06, 2005 by Eric Saxe in Solaris  |  Comments[2]