Russ Blaine gave
this presentation to the
ACM student chapter at
Northeastern University. The first part of the presentation describes (in seat gripping detail) our adventures in root causing a *very* recent bug:
6348316 cyclic subsystem baffled by x86 softint behavior
It's a nice ride, with lots of sights along the way...the
cyclic subsystem,
device configuration,
autovectored interrupts, and more...
The second part of the presentation, is essentially the same as the
OpenSolaris presentation
Steve presented at
UCSD last week, and the last part talks about why the Solaris Kernel Group is such a great place to work, who should work for us, and why.
Technorati Tags:
Solaris
OpenSolaris
We are unveiling several UltraSPARC T1 (aka Niagara) based servers today.
If you don't know what the T1 processor is, and you haven't heard about this
chip and the systems that will house it, then you really should have a
look.
Seriously, this chip is impressive. Working at Sun, every now and again i'm
fortunate enough to hear about new products and technologies we've got
coming down the pipe. When I had first heard about the Niagara (T1) processor
I was in disbelief.
32 logical CPUs presented by a single chip at 72 Watts?! Simply amazing.
Housed in T1's 2 square inch package are 8 processor cores, each capable of
running 4 threads simultaneously. For me, the gravity of all this really
sunk in when I invoked
psrinfo(1M) on a test box and watched as the top of the output scrolled
out of view in my xterm:
esaxe@ontario-mc25$ psrinfo
0 on-line since 10/14/2000 20:54:37
1 on-line since 10/14/2000 20:54:39
2 on-line since 10/14/2000 20:54:39
3 on-line since 10/14/2000 20:54:39
4 on-line since 10/14/2000 20:54:39
5 on-line since 10/14/2000 20:54:39
6 on-line since 10/14/2000 20:54:39
7 on-line since 10/14/2000 20:54:39
8 on-line since 10/14/2000 20:54:39
9 on-line since 10/14/2000 20:54:39
10 on-line since 10/14/2000 20:54:39
11 on-line since 10/14/2000 20:54:39
12 on-line since 10/14/2000 20:54:39
13 on-line since 10/14/2000 20:54:39
14 on-line since 10/14/2000 20:54:39
15 on-line since 10/14/2000 20:54:39
16 on-line since 10/14/2000 20:54:39
17 on-line since 10/14/2000 20:54:39
18 on-line since 10/14/2000 20:54:39
19 on-line since 10/14/2000 20:54:39
20 on-line since 10/14/2000 20:54:39
21 on-line since 10/14/2000 20:54:39
22 on-line since 10/14/2000 20:54:39
23 on-line since 10/14/2000 20:54:39
24 on-line since 10/14/2000 20:54:39
25 on-line since 10/14/2000 20:54:39
26 on-line since 10/14/2000 20:54:39
27 on-line since 10/14/2000 20:54:39
28 on-line since 10/14/2000 20:54:39
29 on-line since 10/14/2000 20:54:39
30 on-line since 10/14/2000 20:54:39
31 on-line since 10/14/2000 20:54:39
Yes, I know the system's clock is off by a few years. But seriously, output
like this is something i'm used to seeing on monsters like the
Sun Fire E25K and
Sun Fire E6900 Servers. It was mind expanding indeed to see this
sort of output from a small box with but a single physical processor.
Like
UltraSPARC-IV and
UltraSPARC-IV+ the T1 implements a Chip Multi-Threading (CMT)
architecture...(which simply means it is able to run multiple threads of
execution simultaneously). For a nice bit of background on CMT technology
check out
this article (I won't rehash it all here).
Solaris, as you might imagine is a
natural fit for CMT systems like the T1000 and T2000, as it has efficiently
operated across systems having twice as many CPUs (and more) for years.
For CMT however, acheiving good performance requires more than simply being able to scale.
In a previous
blog entry
I talked about some of the CMT scheduling optimizations we've
implemented in Solaris.
Andrei will be discussing these optimizations more specifically in the context of
the T1 processor (so again, I won't rehash), but it is worth underscoring
that (especially for threaded processor architectures), the optimized thread
placement and load balancing performed by the scheduler is a huge performance win.
Looking ahead, it's likely that workload characterization is going to be an important
(and interesting) area of research. For example, we know that throughput
on threaded processor architectures is maximized when threads running
on the same core (sharing the same pipeline) are able to effectively
execute in each other's stall cycles. CPI (Cycles Per Instruction) should
therefore be an interesting metric to note when trying to characterize a
given workload's scaling ability on this architecture. What other workload
characteristics and metrics will be important/useful to collect and observe?
We've got our work cut out for us. :)
Technorati Tags:
Niagara
NiagaraCMT
Solaris