Keith Bierman's Weblog

Keith Bierman's Weblog

All | General | Java | Music

20051206 Tuesday December 06, 2005

What I want to work with next. Typically one blogs about what one has done, or has discovered. In this entry, I'll talk about an area I want to spend some time working in RealSoonNow.

As a performance analyst, much of my work has been reductionist. That is, I take some application and make it go faster. Step 0 is to measure it before doing anything to it, followed by figuring out little bits that don't go as fast as they ought (or determining that the wrong algorithms were used, and doing some wholesale surgery ;>) and iterating. The key has always been to isolate the smallest bits possible (faster turnaround, better leverage, etc.). And such work has been rewarding in many different ways. But most of the time, my computers are not focused on running just one application.

My laptop, for example, currently has over 300 threads in 80 processes running. I'm not even driving it very hard. If I want to focus on any of the specific processes or threads, tools such as Performance Analyzer (or it's earlier, more primitive predecessors, such as gprof) are fine.

But if what I really want to do is to maximize the performance of the overall system (throughput) I've largely been toolless.

Worse, everything that my friends at Intel (see their last couple of Intel Developer Fora) have been saying is that they are going to move to a strongly multicore strategy (Justin Rattner spoke of hundreds to thousands of cores, and ElReg reported this as http://www.theregister.co.uk/2005/03/04/intel_100_core/). 

With the DProfile utility (keyword dataspace if you want to search for it at docs.sun.com) developed by Nicolai Kosche and friends, it's now possible to see how all the various threads and processes actually interact inside the memory hierarchy.

Of course, this took a lot of infrastructure, SPARC needed to supply enough runtime instrumentations, Solaris the APIs (including Dtrace), the compilers instrumentation (for optimal results), and extensions to the Performance Analyzer to collect and display the appropriate information (this is where that keyword dataspace comes in handy for docs.sun.com searches).

No doubt Intel and Microsoft will eventually have as many threads in a chip as Sun does today with Niagara (2010++??) No doubt, someday Windows+++ will have mature support for highly threaded applications (in addition to robustly supporting heavily loaded systems). Intel has, of course, purchased several suppliers of threaded tools so ... and to be fair, the hardware threads only have to be on a single board to provide much of the same software opportunity (of course, the RAS is much better with just one chip ;>)

But why wait? Clearly such "complicated" environments are no longer the sole province of supercomputer users and major IT departments (and a power desktop user probably has a lot more challenging apps than I have on my laptop, visual processing is easily parallelized....) so getting started now with the next generation of tools is going to be a lot like it must have been for the first radiologists. Lots to learn, with brand new shiny technology!

So keep your eyes peeled for anything from *.sun.com with words like DProfile or dataspace and dig in!


[ T:http://technorati.com/tag/NiagaraCMT]

[ T:

(2005-12-06 10:01:01.0) Permalink Comments [0]

Amazingly stupid competitor quote You have to wonder if they've been misquoted:


Don't they even bother to read the literature?

UltraSPARC T (nee Niagara) does break a lot of ground for a microprocessor. But effectively reducing latency (which caches are intended to do) is something they multithreading is known to be good for. So megacaches aren't required.



[ T:http://technorati.com/tag/NiagaraCMT]


[ T:

(2005-12-06 10:00:02.0) Permalink Comments [0]

Dec. 6th notable events Of course, today is the big announcement of the first UltraSPARC T based systems (nee Niagara).

It is also the 2nd birthday of Jerry Sandor Bierman. When available, pictures from his birthday party will be located on Flickr
(2005-12-06 10:00:00.0) Permalink Comments [0]

Which Evolves Faster: Hardware or Software? Conventional wisdom has always had it that hardware is the "long pole"
in system design. Software can be changed up until the last moment (and
even beyond via patches). So the conventional answer is, of course, that
software evolves faster.

But, for large complex software is it really true? Let's consider the
new UltraSPARC T chips (formerly known as Niagara). As can be found
#link to hw_blog (anyone know the best pointer?) there are 32 hardware
threads per chip.

Given that these hardware threads are quite fundamentally different than
having 32 separate cores, just how does an OS such as Solaris deal with
them? The answer is by ignoring the differences and to a first
approximation treating them as 32 "CPUs".

This mostly works well; but it's interesting performance corner cases
that cause great confusion ... because the tools (e.g. mpstat) haven't
really evolved to keep up with the hardware for more details.

Sometimes the hardware does evolve faster.

Of course, the point of software layers such as an Operating System is to provide abstraction of hardware details. Just which hardware details need to be directly exposed is a deliberate process.

(2005-12-06 08:07:14.0) Permalink Comments [0]

Calendar

« December 2005 »
SunMonTueWedThuFriSat
    
1
2
3
4
5
10
11
13
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
       
Today

RSS Feeds

XML
All
/General
/Java
/Music

Search

Links


Navigation



Referers

Today's Page Hits: 144