Keith Bierman's Weblog

Keith Bierman's Weblog

All | General | Java | Music

20051215 Thursday December 15, 2005

Language Standardization considered harmful? Les Hatton makes a reasonable argument for it.

....This is exacerbated by the process of language standardisation. We would all agree that standardisation is an important step forward in engineering maturity, however, if the
process of standardisation ignores historical lessons, then this may well be worse than useless. Language standardisation suffers from two important drawbacks as practised today. First of all, language committees (and I’ve sat on a few in my time), have an irresistible temptation to fiddle. They will persist in adding features which seem
like a good idea at the time, without any notion as to whether they will work or not. Of course, this is normal in engineering. It is similar to the role of mutation in Darwinian evolution. What is not normal however is the second drawback. This embodies the opposing principle to control process feedback. It is called “backwards compatibility” and is often expressed in the hallowed rule that “thou shalt not break old code”. So drawback one guarantees the continual injection of features which may or may not work, (most don’t) and drawback two guarantees that you can’t take them out again. In other words it is a technique whereby learning from previous mistakes is guaranteed not to take place. In backwards compatibility, you take as a starting point all the failure modes which have occurred so far and then add new and poorly understood failure modes. We call the result a modern programming language. If other engineering disciplines pursued this doctrine, hammers for example would have micro-processor controlled ejection mechanisms to cause the head to fly off randomly every few minutes as they used to about 40 years ago when made with wooden handles. Not surprisingly, they were redesigned fairly quickly.

The longtime reader (all three of you) may recognize some resonance to my comments to Bill Moffitt in a previous entry. I think that the rule about not breaking existing code is a must; but that the consequence is that (as I described to Bill in the case of Fortran) that the model for language evolution ought to be more like Algol (begat C, begat C++ begat Java) not that the old language necessarily dies ... but that the Standards group should restrict itself to a revision or two with new substantive features and then restrict themselves to cannonization of existing practice and harmonization of feature sets (or bindings to other things, etc.). The creative energy of the language's supporters should go into the "next language" which will be free to discard the bits of the original language found to be errors in practice.

(2005-12-15 15:39:39.0) Permalink Comments [3]

20051214 Wednesday December 14, 2005

Xeon is just as Good as Opteron (says HP!) Thanks to the folks at theInq for calling our attention to this one. Aside from the Inq's warning that you need an HP signon, also be prepared for a Microsoft Word source document. It's actually a rather nice writeup, and it's conclusion seems to be "true". If your workload is bottlenecked on I/O having a faster processor with a faster memory interface won't help.

Of course, that really is pretty obvious and doesn't take 21 pages to justify now does it?

In a variety of places, their detailed analysis favors the AMD processor, for example from page 14

1.The more content that can be cached in memory the greater the Opteron advantage. This performance difference is tied to the different designs of the two processors... detailing the FSB vs. HT issues

"Countered with"
2.If any of the server sub-components become a bottleneck, the Opteron memory access speed advantage is negated.

So the obvious solution is to favor systems with fast processors, fast and ample memory bandwidth and fast I/O subsystems.

Unfortunately  that doesn't result in "all processors being equal" at least not if you buy the right subsystems ;>

(2005-12-14 13:18:40.0) Permalink

20051212 Monday December 12, 2005

The sad saga of xemacs vs. gnu emacs I'd been a longtime user of the Xemacs that came packaged with the Sun Studio tools years ago. I knew there was a split, and a dreamed of merge ... but I'd never really quizzed Ben Wing or Martin Buckholtz (sp?) (two of the Sun engineers who contributed the linkage code, and were Xemacs maintainers) about the how and why of the split and fork.

Here is a pointer to at least side's worth of the sad saga.
(2005-12-12 13:25:28.0) Permalink Comments [2]

20051209 Friday December 09, 2005

What Some Customers Are Saying...part 1
Unix Admin
Spanish
Psynix
Linux Stuff
zdnet (ok, not quite users)
Ben Rockwood  look for HP Marketing
Bach
Performance Guru
Rich Teer

Kshitij

dotNetMavin
zdnet (again, mostly quoting JS)
(2005-12-09 15:07:21.0) Permalink Comments [0]

20051208 Thursday December 08, 2005

Caches Considered Harmful For what seems like forever, designers have been adding more and more cache to systems to reduce latency to memory. This has been successful, but it hasn't been the only approach, but it has been the most typical.

But has it been Good?

  • Caches are very energy intensive (essentially large amount of SRAM close to the CPU). The larger they are, the more energy wasteful they are.
  • Caches, on average, produce a benefit on the order of sqrt(size), so the heat outpaces the benefit.
    • Of course, with heat you pay several times. You pay for the electricity to create the heat, you pay in the system design to cool the device, you pay in the data center to cool the entire system, and you typically pay a price in RAS because heat kills.
    • Notably, adding cores (providing enough memory bandwidth has been provided) provides nearly linear improvement in throughput.
    • And for cache experts, increasing the associativity increases their effectiveness.
  • Caches help us avoid dealing with the underlying issue of doing useful work while waiting for memory. Putting off the harder work of innovation, or at least limiting the innovation to the process level rather than the architecture level is a form of laziness.
  • When the data one wants isn't in the cache, it's often worse than it would have been had there been no cache (so fancy non-cache polluting loads and stores may be added to the ISA, and compiler, etc.
So what are the alternatives to caching?

As in the citations above, the key observation is that if one has additional "threads" ready to do useful work, that work can be done while awaiting the data to be returned (from memory, from cache, from disk, ... wherever) rather than keeping all that hot and possibly expensive iron (silicon) hot. And that's precisely what UltraSPARC T1 does  ).

So when you hear someone making a spurious claim about the UST1 being cache starved, ask them how big a cache they think it should have, and why? What level of associativity? What's the downside? And, of course, point out that the application performance is what counts, and it doesn't support the contention that the UltraSPARC T1 family systems are cache starved.


NB: of course, caches aren't all bad. If you are focused on minimum latency (fastest response time for a single thread) they can be very effective. But if your goal is the most aggregate work for the least power, they are certainly not your friend.

To learn more about caches


(2005-12-08 16:57:41.0) Permalink Comments [2]

20051207 Wednesday December 07, 2005

With All Due Respect Jonathan 9.6GHz is not the clock of our UltraSPARC T1 processor. As any hardware engineer knows, clock speed is a simply measurable entity, you stick  test leads on the appropriate wires and count the phase transitions. The correct number is 1.2GHz.

Now, as software engineers we know that folks like to compute theoretical maximum operation rates, and 1.2GHz * 8 cores does yield 9.6 GOPS as a theoretical limit. And thanks to many years of industry confusion, probably due to the infamous Dhrystone benchmark, clock rate and operations per second have become hopelessly confused in the minds of many. 

But fundamentally there is no link between clock and work done (you can have really, really simple instructions (the limiting case is a single instruction (but then you need a very large number of instructions per useful work done ;>)

As proved previously by our Opteron based systems should have demonstrated, clock speed isn't a solid predictor of performance (Intel has a faster clock, and poorer performance).

I suppose one can argue that it is "pleasing that in lots of scenarios we do see scaling that is in fact more or less linear with the core count". See a benchmark focused blog for examples.



(2005-12-07 14:52:04.0) Permalink Comments [1]

20051206 Tuesday December 06, 2005

What I want to work with next. Typically one blogs about what one has done, or has discovered. In this entry, I'll talk about an area I want to spend some time working in RealSoonNow.

As a performance analyst, much of my work has been reductionist. That is, I take some application and make it go faster. Step 0 is to measure it before doing anything to it, followed by figuring out little bits that don't go as fast as they ought (or determining that the wrong algorithms were used, and doing some wholesale surgery ;>) and iterating. The key has always been to isolate the smallest bits possible (faster turnaround, better leverage, etc.). And such work has been rewarding in many different ways. But most of the time, my computers are not focused on running just one application.

My laptop, for example, currently has over 300 threads in 80 processes running. I'm not even driving it very hard. If I want to focus on any of the specific processes or threads, tools such as Performance Analyzer (or it's earlier, more primitive predecessors, such as gprof) are fine.

But if what I really want to do is to maximize the performance of the overall system (throughput) I've largely been toolless.

Worse, everything that my friends at Intel (see their last couple of Intel Developer Fora) have been saying is that they are going to move to a strongly multicore strategy (Justin Rattner spoke of hundreds to thousands of cores, and ElReg reported this as http://www.theregister.co.uk/2005/03/04/intel_100_core/). 

With the DProfile utility (keyword dataspace if you want to search for it at docs.sun.com) developed by Nicolai Kosche and friends, it's now possible to see how all the various threads and processes actually interact inside the memory hierarchy.

Of course, this took a lot of infrastructure, SPARC needed to supply enough runtime instrumentations, Solaris the APIs (including Dtrace), the compilers instrumentation (for optimal results), and extensions to the Performance Analyzer to collect and display the appropriate information (this is where that keyword dataspace comes in handy for docs.sun.com searches).

No doubt Intel and Microsoft will eventually have as many threads in a chip as Sun does today with Niagara (2010++??) No doubt, someday Windows+++ will have mature support for highly threaded applications (in addition to robustly supporting heavily loaded systems). Intel has, of course, purchased several suppliers of threaded tools so ... and to be fair, the hardware threads only have to be on a single board to provide much of the same software opportunity (of course, the RAS is much better with just one chip ;>)

But why wait? Clearly such "complicated" environments are no longer the sole province of supercomputer users and major IT departments (and a power desktop user probably has a lot more challenging apps than I have on my laptop, visual processing is easily parallelized....) so getting started now with the next generation of tools is going to be a lot like it must have been for the first radiologists. Lots to learn, with brand new shiny technology!

So keep your eyes peeled for anything from *.sun.com with words like DProfile or dataspace and dig in!


[ T:http://technorati.com/tag/NiagaraCMT]

[ T:

(2005-12-06 10:01:01.0) Permalink Comments [0]

Amazingly stupid competitor quote You have to wonder if they've been misquoted:


Don't they even bother to read the literature?

UltraSPARC T (nee Niagara) does break a lot of ground for a microprocessor. But effectively reducing latency (which caches are intended to do) is something they multithreading is known to be good for. So megacaches aren't required.



[ T:http://technorati.com/tag/NiagaraCMT]


[ T:

(2005-12-06 10:00:02.0) Permalink Comments [0]

Dec. 6th notable events Of course, today is the big announcement of the first UltraSPARC T based systems (nee Niagara).

It is also the 2nd birthday of Jerry Sandor Bierman. When available, pictures from his birthday party will be located on Flickr
(2005-12-06 10:00:00.0) Permalink Comments [0]

Which Evolves Faster: Hardware or Software? Conventional wisdom has always had it that hardware is the "long pole"
in system design. Software can be changed up until the last moment (and
even beyond via patches). So the conventional answer is, of course, that
software evolves faster.

But, for large complex software is it really true? Let's consider the
new UltraSPARC T chips (formerly known as Niagara). As can be found
#link to hw_blog (anyone know the best pointer?) there are 32 hardware
threads per chip.

Given that these hardware threads are quite fundamentally different than
having 32 separate cores, just how does an OS such as Solaris deal with
them? The answer is by ignoring the differences and to a first
approximation treating them as 32 "CPUs".

This mostly works well; but it's interesting performance corner cases
that cause great confusion ... because the tools (e.g. mpstat) haven't
really evolved to keep up with the hardware for more details.

Sometimes the hardware does evolve faster.

Of course, the point of software layers such as an Operating System is to provide abstraction of hardware details. Just which hardware details need to be directly exposed is a deliberate process.

(2005-12-06 08:07:14.0) Permalink Comments [0]

Calendar

« December 2005 »
SunMonTueWedThuFriSat
    
1
2
3
4
5
10
11
13
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
       
Today

RSS Feeds

XML
All
/General
/Java
/Music

Search

Links


Navigation



Referers

Today's Page Hits: 6