Eric Saxe's dump o' core
End of Line
Archives
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
Click me to subscribe
Search

Links


Navigation

 

Today's Page Hits: 103

« Good Times @ UCSD | Main | Do-it-yourself Kerne... »
Tuesday Dec 06, 2005
UltraSPARC T1 and Solaris: Threading Throughout
We are unveiling several UltraSPARC T1 (aka Niagara) based servers today. If you don't know what the T1 processor is, and you haven't heard about this chip and the systems that will house it, then you really should have a look. Seriously, this chip is impressive. Working at Sun, every now and again i'm fortunate enough to hear about new products and technologies we've got coming down the pipe. When I had first heard about the Niagara (T1) processor I was in disbelief.

32 logical CPUs presented by a single chip at 72 Watts?! Simply amazing.

Housed in T1's 2 square inch package are 8 processor cores, each capable of running 4 threads simultaneously. For me, the gravity of all this really sunk in when I invoked psrinfo(1M) on a test box and watched as the top of the output scrolled out of view in my xterm:
esaxe@ontario-mc25$ psrinfo
0       on-line   since 10/14/2000 20:54:37
1       on-line   since 10/14/2000 20:54:39
2       on-line   since 10/14/2000 20:54:39
3       on-line   since 10/14/2000 20:54:39
4       on-line   since 10/14/2000 20:54:39
5       on-line   since 10/14/2000 20:54:39
6       on-line   since 10/14/2000 20:54:39
7       on-line   since 10/14/2000 20:54:39
8       on-line   since 10/14/2000 20:54:39
9       on-line   since 10/14/2000 20:54:39
10      on-line   since 10/14/2000 20:54:39
11      on-line   since 10/14/2000 20:54:39
12      on-line   since 10/14/2000 20:54:39
13      on-line   since 10/14/2000 20:54:39
14      on-line   since 10/14/2000 20:54:39
15      on-line   since 10/14/2000 20:54:39
16      on-line   since 10/14/2000 20:54:39
17      on-line   since 10/14/2000 20:54:39
18      on-line   since 10/14/2000 20:54:39
19      on-line   since 10/14/2000 20:54:39
20      on-line   since 10/14/2000 20:54:39
21      on-line   since 10/14/2000 20:54:39
22      on-line   since 10/14/2000 20:54:39
23      on-line   since 10/14/2000 20:54:39
24      on-line   since 10/14/2000 20:54:39
25      on-line   since 10/14/2000 20:54:39
26      on-line   since 10/14/2000 20:54:39
27      on-line   since 10/14/2000 20:54:39
28      on-line   since 10/14/2000 20:54:39
29      on-line   since 10/14/2000 20:54:39
30      on-line   since 10/14/2000 20:54:39
31      on-line   since 10/14/2000 20:54:39
Yes, I know the system's clock is off by a few years. But seriously, output like this is something i'm used to seeing on monsters like the Sun Fire E25K and Sun Fire E6900 Servers. It was mind expanding indeed to see this sort of output from a small box with but a single physical processor.

Like UltraSPARC-IV and UltraSPARC-IV+ the T1 implements a Chip Multi-Threading (CMT) architecture...(which simply means it is able to run multiple threads of execution simultaneously). For a nice bit of background on CMT technology check out this article (I won't rehash it all here).

Solaris, as you might imagine is a natural fit for CMT systems like the T1000 and T2000, as it has efficiently operated across systems having twice as many CPUs (and more) for years.

For CMT however, acheiving good performance requires more than simply being able to scale. In a previous blog entry I talked about some of the CMT scheduling optimizations we've implemented in Solaris. Andrei will be discussing these optimizations more specifically in the context of the T1 processor (so again, I won't rehash), but it is worth underscoring that (especially for threaded processor architectures), the optimized thread placement and load balancing performed by the scheduler is a huge performance win.

Looking ahead, it's likely that workload characterization is going to be an important (and interesting) area of research. For example, we know that throughput on threaded processor architectures is maximized when threads running on the same core (sharing the same pipeline) are able to effectively execute in each other's stall cycles. CPI (Cycles Per Instruction) should therefore be an interesting metric to note when trying to characterize a given workload's scaling ability on this architecture. What other workload characteristics and metrics will be important/useful to collect and observe? We've got our work cut out for us. :)

Technorati Tags:
Posted at 09:16AM Dec 06, 2005 by Eric Saxe in Solaris  |  Comments[2]

Comments:

hi there,

thanks for the info.

it will be nice to see some prices mentioned in some of the blogs -- easy friendly table with some common configuration vs prices for say 1 CPU, 2 CPU, 4 CPU systems.

thank you,

BR,
~A

Posted by anjan bacchu on December 06, 2005 at 06:23 PM PST #

Here is is pricing information for the T1000 and the T2000 Both systems have one physical processor, and appear to Solaris as 32 logical CPUs (8 cores x 4 threads per core). HTH... -Eric

Posted by Eric Saxe on December 07, 2005 at 05:02 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed