BM Seer Unofficial thoughts from an anonymous Sun employee

POWER6 err, try >10GB/s peak

Wednesday Jun 27, 2007

I finally saw an IBM power6 presentation that says:

    "Balanced design with highest system bandwidth
    – 2X Memory Bandwidth (> 10 GB/sec)
What happened to the implied IBM press release of 300GB/s bandwidth that could download iTune library in 30 sec... oh, that was meaningless marketing fluff(is that the right word?).

So to move the iTunes library from power6 memory to POWER6 CPU at the 10GB/s listed above, it would really take 15 hours at p570 peak memory speed(not-measured!).

    postscript: Another doc listed 45GB/s SMP+IO bandwidth (whatever that means), time for IBM to publish the measured STREAM bcopy performance so we can put an end to this nonsense.

:) On a Sun Fire E25K going to disk(Yes IO, which is slower than IBM's memory) it would take half that 15 hour time. Sun demonstrated a delivered 21 GB/sec of delivered disk to CPU bandwidth. (yes I need to repeat 'delivered' twice as IBM has a tendency to only mention peak numbers and then omits the word 'peak'.

OK we don't have IBM delivered IO performance measured on an IBM p595, as IBM doesn't trust to share those numbers with the public...

For news article about bigger systems see this:
http://www.betanews.com/article/Sun_We_Can_Build_a_Faster_Supercomputer_Than_IBM/1182889189

[5] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

If you emphasis so much on STREAM benchmark, what do you think of this then?
--------------------------------------------------------------------------------
Machine ID                      ncpus    COPY     SCALE       ADD     TRIAD
--------------------------------------------------------------------------------
HP_AlphaServer_GS1280-1300        64   407351.0   400142.0   437010.0   431450.0 data
HP_AlphaServer_GS1280-1300        32   207441.0   207441.0   226230.0   224390.0 data
Sun_SPARC_Enterprise_M9000       128   224401.0   223113.0   224271.0   227059.0 data
Sun_SPARC_Enterprise_M9000        64   114920.0   114618.0   130035.0   134369.0 data
Sun_SPARC_Enterprise_M8000        32    60313.6    60230.8    69301.1    69629.9 
Those Alpha results are dated 2004 and Alphas are even older. Still they seem to stack up to newish Suns very nicely.

Posted by Mike on June 30, 2007 at 11:50 PM PDT #

The GS1280 used a torus NUMA architecture, and used Rambus memory. Each Alpha 21364 processor had two on chip memory controllers, for a total bidirectional memory bandwidth per processor of 12.3 GB/sec. This very high bandwidth per processor for 2004 era technology is due totally to the very high bandwidth Rambus memory, and a memory controller and chip architecture designed to take advantage of it.

The actual bisection bandwidth of the GS1280 was 49.6 GB/sec, par for the course in 2003-2004 timeframe.

Generally, a torus interconnect sacrifices system bisection bandwidth for nearby processor to processor bandwidth. This is more useful in HPC systems where jobs sharing data can be localized to a few adjacent processors. But for large, ad-hoc databases, such locality cannot be guaranteed. Which is probably why HP did not do TPC-H runs on the GS1280.

The ability for HP to produce the STREAM results shows Tru64 has NUMA memory locality support to basically make all bandwidth local. Indeed the GS1280 STREAM results divided by the number of processors gets close to the 6.2 GB/sec one-direction limit of the processor.

So yes, a processor nobody wanted and a memory technology nobody wanted, when joined together with a pretty descent OS could produce very good results in 2004.

If anything, the GS1280 result show one of the limits of STREAM. On a NUMA system, it does not actually measure interconnect bandwidth.

Posted by Mark on July 01, 2007 at 12:10 PM PDT #

Alpha is the processor nobody wanted? Sigh. I don't know about TPC-H benchmarks, but TPC-C results are impressive. OK, probably TPC-C is easy to partition to exploit memory locality. But I believe SAP SD results would count then? "hp AlphaServer Model GS 1280, 32-way SMP, Alpha 21364C (EV7) 1150 MHz, 1835 KB L2 cache" does 23220 SAPS, and "Sun Fire 15000, 72-way SMP, UltraSPARC III, 1200 MHz, 8 MB L2 cache" does 29820 SAPS. I'd say that's not too bad for 2002 year tech (date of certification is 01/27/2003).

Posted by Mike on July 01, 2007 at 08:00 PM PDT #

HP never ran a TPC-C on the GS1280. They ran it on 4-socket ES45.

Yes, the 32-socket GS1280 did well on the SAP result. Notice HP didn't run the benchmark on the 64-socket version. Why? Scaling problems, perhaps? This would make sense as system bisection bandwidth was probably the same for the 64-socket and 32-socket versions of the GS1280, but latencies would be double for the bigger box.

Alpha, when paired with Tru64, was the ISA/API platform the software developers and ISVs ignored. Sure it was fast, and very well suited for smaller SMP designs, but DEC could not do the big SMPs which Sun and HP were doing in the late 1990s. That makes sense, the VAX had a uniprocessor legacy. The poor SMP implementations doomed Alpha when ERP and big Oracle databases dominated IT spending in the late 1990s.

HP may have had a hot SAP box in the Alphaserver, but 99% of HP customers run SAP on UNIX did it on PA-RISC, not Alpha.

Alpha's lack of success proves it is the system, not the processor which matter for computers, and it is developer support of the ISA/API platform which determine success of a compute platform.

How fast did the Alpha run applications which were not available for Tru64? That is the speed test which really matters, and the speed test which killed the Alpha.

Posted by Mark on July 03, 2007 at 12:45 PM PDT #

I'd say that HP never ran TPC-C benchmark on Alpha boxes because they have to sell HP-PAs and Itanium, but we'll never know... "DEC could not do the big SMPs which Sun and HP were doing in the late 1990s"? It greatly depends on how you count, I think. If a 32-way machine runs Oracle+SAP at 77% rate of really big E15K, isn't it big enough "for most of us"? I believe it's HP and bad marketing killed Alpha, not speed test.

Posted by Mike on July 03, 2007 at 06:29 PM PDT #

Post a Comment:
Comments are closed for this entry.