
Tuesday December 06, 2005
What I want to work with next.
Typically one blogs about what one has done, or has discovered. In this
entry, I'll talk about an area I want to spend some time working in
RealSoonNow.
As a performance analyst, much of my work has been reductionist.
That is, I take some application and make it go faster. Step 0 is to
measure it before doing anything to it, followed by figuring out little
bits that don't go as fast as they ought (or determining that the wrong
algorithms were used, and doing some wholesale surgery ;>) and
iterating. The key has always been to isolate the smallest bits
possible (faster turnaround, better leverage, etc.). And such work has
been rewarding in many different ways. But most of the time, my
computers are not focused on running just one application.
My laptop, for example, currently has over 300 threads in 80 processes running. I'm
not even driving it very hard. If I want to focus on any of the
specific processes or threads, tools such as Performance Analyzer (or
it's earlier, more primitive predecessors, such as gprof) are fine.
But if what I really want to do is to maximize the performance of the
overall system (throughput) I've largely been toolless.
Worse, everything that my friends at Intel (see their last couple of
Intel Developer Fora) have been saying is that they are going to move
to a strongly multicore strategy (Justin Rattner spoke of hundreds to
thousands of cores, and ElReg reported this as http://www.theregister.co.uk/2005/03/04/intel_100_core/).
With the DProfile utility (keyword dataspace if you want to search for it at docs.sun.com) developed by
Nicolai Kosche
and friends, it's now possible to see how all the various threads and
processes actually interact inside the memory hierarchy.
Of course, this took a lot of infrastructure, SPARC needed to supply
enough runtime instrumentations, Solaris the APIs (including Dtrace),
the compilers instrumentation (for optimal results), and extensions to
the Performance Analyzer to collect and display the appropriate
information (this is where that keyword dataspace comes in handy for docs.sun.com searches).
No doubt Intel and Microsoft will eventually have as many threads in a
chip as Sun does today with Niagara (2010++??) No doubt, someday Windows+++ will
have mature support for highly threaded applications (in addition to
robustly supporting heavily loaded systems). Intel has, of course,
purchased several suppliers of threaded tools so ... and to be fair, the hardware threads only have to be on a single board to provide much of the same software opportunity (of course, the RAS is much better with just one chip ;>)
But why wait? Clearly such "complicated" environments are no longer the
sole province of supercomputer users and major IT departments (and a
power desktop user probably has a lot more challenging apps than I have
on my laptop, visual processing is easily parallelized....) so getting
started now with the next generation of tools is going to be a lot like
it must have been for the first radiologists. Lots to learn, with brand
new shiny technology!
So keep your eyes peeled for anything from *.sun.com with words like
DProfile or dataspace and dig in!
[ T:http://technorati.com/tag/NiagaraCMT]
[ T:
(2005-12-06 10:01:01.0)
Permalink
Amazingly stupid competitor quote
You have to wonder if they've been misquoted:
Don't they even bother to read the
literature?UltraSPARC T (nee Niagara) does break a lot of ground for a microprocessor. But effectively reducing latency (which caches are intended to do) is something they multithreading is known to be good for. So megacaches aren't required.
[ T:http://technorati.com/tag/NiagaraCMT]
[ T:
(2005-12-06 10:00:02.0)
Permalink
Dec. 6th notable events Of course, today is the big announcement of the first UltraSPARC T based systems (nee Niagara).
It is also the 2nd birthday of Jerry Sandor Bierman. When available, pictures from his birthday party will be located on Flickr
(2005-12-06 10:00:00.0)
Permalink
Which Evolves Faster: Hardware or Software?
Conventional wisdom has always had it that hardware is the "long pole"
in system design. Software can be changed up until the last moment (and
even beyond via patches). So the conventional answer is, of course, that
software evolves faster.
But, for large complex software is it really true? Let's consider the
new UltraSPARC T chips (formerly known as Niagara). As can be found
#link to hw_blog (anyone know the best pointer?) there are 32 hardware
threads per chip.
Given that these hardware threads are quite fundamentally different than
having 32 separate cores, just how does an OS such as Solaris deal with
them? The answer is by ignoring the differences and to a first
approximation treating them as 32 "CPUs".
This mostly works well; but it's interesting performance corner cases
that cause great confusion ... because the tools (e.g. mpstat) haven't
really evolved to keep up with the hardware for more details.
Sometimes the hardware does evolve faster.
Of course, the point of software layers such as an Operating System is
to provide abstraction of hardware details. Just which hardware details
need to be directly exposed is a deliberate process.
[
(2005-12-06 08:07:14.0)
Permalink