BM Seer Facts & Questions from an Anonymous Sun Source

Datacenter Throughput is King: SPEC CPU2006 Rate Blade System World Record: Sun Blade 6000 w/T6320

Friday Apr 11, 2008

What most people forget, is that datacenters are really throughput engines. I don't know any datacenter (besides home ones) that only use one thread or one core. When you look at racks of servers in a datacenter, you are looking at thousands of threads! Which means 10,000 to 100,000 or more in a complete datacenter. Lots of work to be done, lots of threads doing it!

Sun has announced blade system world record results for SPECint_rate2006 and SPECfp_rate2006. These results were run on the Sun Blade 6000 system with 10 Sun Blade T6320 server modules which use the 1.4 GHz UltraSPARC T2 processor.

The Sun Blade 6000 system fully populated with 10 T6320 server modules delivered a SPECint_rate2006 score of 838, a world record result for blade systems.

The Sun Blade 6000 system (10 RUs) powered by 10 Sun UltraSPARC T2 1.4 GHz processors provides 73% more integer throughput than the IBM p 570 (16 RUs) equipped with 8 POWER6 4.7 GHz processors, as measured by SPECint_rate2006.

The Sun Blade 6000 system fully populated with 10 T6320 server modules delivered a blade system world record SPECfp_rate2006 score of 571.

Sun has chosen to submit a single run as both SPECfp_rate_base2006 and SPECfp_rate2006, (which is allowed under the run rules), in order to emphasize that even without aggressive tuning, the score of 571 is a record for both base and peak.

The Sun Blade 6000 system powered by 10 Sun UltraSPARC T2 1.4 GHz processors provides 73% more floating-point throughput than the IBM p 570 equipped with 8 POWER6 4.7 GHz processors, as measured by SPECfp_rate_base2006.

The IBM p 570 system (16RU) uses 1.6x times more rack units than the 10RU Sun Blade 6000 system(16 RU vs. 10 RU).

SPEC CPU2006 Performance Charts - bigger is better, selected recent results

SPECint_rate2006

Please see www.spec.org for complete results

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
Sun B6000 w/10 x T6320 UltraSPARC T2 1.4 10 80 640 838 752
HP Superdome Itanium 2 1.6 32 64 64 824 770
Sun M9000 SPARC VI 2.4 32 64 64 650 553
IBM p 570 POWER6 4.7 8 16 32 484 420

Results as of 7 Apr 2008 from www.spec.org.

SPECfp_rate2006

Please see www.spec.org for complete results or for just SPECfp_rate2006 results ordered by peak score.

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
Sun M9000 SPARC VI 2.4 32 64 64 600 556
Sun B6000 w/10 x T6320 UltraSPARC T2 1.4 10 80 640 571 571
IBM p 570 POWER6 4.7 8 16 32 430 369
HP rx8640 Itanium 2 1.6 16 32 32 371 357

Results as of 7 Apr 2008 from www.spec.org.

Benchmark Description<

SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

Disclosure Statement:

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 838 SPECint_rate2006, 752 SPECint_rate_base2006.

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 838 SPECint_rate2006, 752 SPECint_rate_base2006. IBM p 570 (POWER6, 8 chips, 16 cores), 484 SPECint_rate2006, 420 SPECint_rate_base2006.

SPEC, SPECfp reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 571 SPECfp_rate2006, 571 SPECfp_rate_base2006.

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 571 SPECfp_rate_base2006. IBM p 570 (POWER6, 8 chips, 16 cores), 369 SPECfp_rate_base2006.

Results Summary

Results
Reference Date: Apr 7, 2008
System: Sun Blade 6000 with 10 T6320 Modules
Processor: 10 Sun UltraSPARC T2, 1.4 GHz
  838 SPECint_rate2006
  752 SPECint_rate_base2006
  571 SPECfp_rate2006
  571 SPECfp_rate_base2006
Software: Solaris 10, Sun Studio 12 Compiler gccfss

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

TPC-C Reminder

Monday Apr 30, 2007

When Sun was had the world record we said it was too simplistic and old, and that was yeast ago. TPC-C has problems, IBM has heavily tuned it like this. Why does IBM still point to this 14+ year old benchmark? Why do they avoid new benchmarks with the lastest GHz full-system IBM p595 on:

  • SPECjbb2005?
  • SPECint_rate2006?
  • SPECfp_rate2006?
  • Linpack?
  • SPECint_2006?
  • SPECfp_2006?
  • ....the list goes on...
Doesn't IBM want fair comparisons? I guess IBM would just be beaten by Sun in performance and $/perf so they want to avoid comparisons.

It is funny that last year I egged HP on about SPECjbb2005, "why no results?" Someone commented that HP thinks it is a bad benchmark, so they won't publish on it. Now HP has the top result. Changed their tune?

Notice how this is different than when established a World Record TPC-C, Sun told the world the benchmark was too simplistic back then and is sticking to it? The world became a lot more complicated in the past 7 years and computing has evolved a lot so we won't go back to something that was created 13 years ago. Sun never quotes 23-year old Dhrystones benchmark anymore either. :)

The press and analysts are overwhelmingly seeing TPC-E the successor to the simplistic 14 year-old TPC-C.

IBM's TPC-C "tuning"(?) that won't apply to anything in the real world

June 2005 Interview with Bruce Lindsay (IBM Fellow) at http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf

    "And the good news is that about 40-70% of the stuff we do in performance tuning actually ends up helping end users."

This means that 30% to 60% of IBM's TPC-C tunings don't help users.

Really beyond the huge disk size of the large TPC-C results (which has a lot to do with the TPC-C being 14 years old), the quote below points to tuning that is legal but seems a bit too "tricky" for my taste...

    "We get down to the level of worrying about the physical column order in the table so the reference columns are near each other, minimizing cache misses during fetching. This is feasible in the TPC-C benchmark because there are only five tables and only ten to fifteen columns in each table. In a more realistic application, where there are many more queries to be considered, the tables are typically much, much wider, in the 80 to 100 column range; and there are dozens if not thousands of tables. Then this kind of analysis is no longer practical." Bruce Linsay, IBM fellow"

For those who may not remember, IBM didn't even end the EOL'ed SPECint_rate2000 on a high note. See: http://www.spec.org/cpu2000/results/rint2000.html and search for "1644" and "1513"

various footnotes:

"It's well-understood in the technical communities that TPC-C no longer represents current customer workloads since the transaction load that its models are made of are small, primitive and disconnected transactions. While this model was acceptable for the workloads of the late 1980s, it misses the mark..." Sun's World Record TPC-C Press release, August2000

Disclosure Statement

TPC-C results referenced above was the fastest overall performance world record at August 31, 2000. Sun Enterprise 10000 server (Starfire) running Sybase Adaptive Server Enterprise (ASE), 156,873.03 transactions per minute (tpmC), $48.81 price/tpmC, available February 28, 2001. A full disclosure report and executive summary are available through the TPC Web site located at www.tpc.org.

[5] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Power5+ now off the road?

Saturday Mar 31, 2007

IBM lacks Power5+ benchmarks on new & old workloads that everyone else is publishing on. Why no lastest GHz full-system IBM p595 publications on:

  • SPECjbb2005?
  • SPECint_rate2006?
  • SPECfp_rate2006?
  • Linpack?
  • SPECint_2006?
  • SPECfp_2006?
  • ....the list goes on...
Don't they want comparisons? I hear IBM bloggers still love TPC-C so is the IBM p595 only suited for that very old (14-year old) test? The press and analysts are overwhelmingly seeing TPC-E the successor to the simplistic 13 year-old TPC-C. 7 years ago when Sun established a World Record TPC-C, Sun told the world the benchmark was too simplistic. It is good the see the rest of the industry is catching up. Sun never quotes 23-year old Dhrystones benchmark anymore either. :)

For those who may not remember, IBM didn't even end the EOL'ed SPECint_rate2000 on a high note: http://www.spec.org/cpu2000/results/rint2000.html, search for "1644" and "1513" Since we're talking history, I should be clear and state that by "1513" I wasn't talking about the year that Juan Ponce de Leon definitely is known to have sighted what is now the USA and claimed it for Spain. :)

Like this post? del.icio.us | furl | slashdot | technorati | digg

IBM & TPC-C - more hints at over-optimisation

Tuesday Feb 20, 2007

Another clue to IBM's over optimisation of TPC-C? Let's look historically. Since 2002, IBM has speed up SPECint_rate2000 by 6.1x times. Clearly this was due to newer systems, faster GHz, higher thread count, improved caches, and software improvements.

Funny At same time, IBM increased TPC-C by 10x times. Since these are the same systems there must be a lot more software work to get this kind of increase!

In 4-5 years the IBM TPC-C tuning outpaced the SPECint_rate2000 tuning by 64% ...and this is after 10 years after TPC-C was made public, so there before 2002 there must have been plenty of time to properly index and tune a database. Considering all of the compiler work on SPECint_rate2000 seems like IBM went to a lot of extra extra effort on TPC-C.

Somewhat funny, but looking at the post earlier today, it seems like things are lining up.

The math:
From the IBM p690 ca May 02 to the current IBM p595:

  • SPECint_rate2000: 1513 / 249 = 6.1x
  • TPC-C tpmC: 4033378 / 403255 = 10.0x
In the last 4-5 years the IBM high-end tpmC has outpaced the high-end SPECint_rate2000 by
    10.0 / 6.1 = 1.64x -> 64%

Disclosure Statements

IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) 4,033,378 tpmC, 2.97 US $/tpmC, Avail 01/22/07, IBM DB2 9, IBM AIX 5L V5.3, Microsoft COM+. As of May 21, 2002: IBM eServer pSeries 690 Turbo (1300 MHz, 32 CPU), 403,255.36 tpmC, $19.57/tpmC, available by November 22, 2002. Results as of 2/15/07, see http://www.tpc.org.

IBM System p5 595 (Power5+ 2.3GHz 64p, 128thread), 64 cores, 32 chips, 2 cores/chip (SMT on), 1513 SPECint_rate2000. IBM eServer pSeries 690 Turbo (1300 MHz, 32 CPU) 249 SPECint_rate2000. SPECint_rate2000. SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from http://www.spec.org as of 2/15/07.

[1] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

judging by the wrong things: IBM & TPC-C

Tuesday Feb 20, 2007

Is IBM 3.3x or 1.4x faster? - I guess it depends if you use a over-optimised benchmark like TPC-C. As mentioned yesterday, IBM doesn't publish on a variety of standard benchmarks like SPECint_rate2006 or SPECjbb2005 on their high-end systems so we have to look at the SPECint_rate2000 which is just about to be EOL'ed and completely replaced by SPECint_rate2006.

First let's compare an IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) to a HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread, single core/CPU) on SPECint_rate2000.

Constructing a SPECint_rate2000 ratio
1.4x = 1513/1108
we find that the IBM 595 is 1.4x faster, it makes sense because this isn't the latest HP dual-core Itanium2. Both IBM and HP systems have results on TPC-C U SPECint_rate2000.

OK now using TPC-C, let's compare a IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) to a HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread, single core/CPU).

Constructing a TPC-C ratio
3.3x = 4033378/1231433
what?
comparing the same systems the IBM is 3.3x faster ?!? Looks that TPC-C over-inflates what can be expected from IBM.

My guess is IBM over-optimised and played lots of tuning tricks on TPC-C, correct? So is TPC-C relavent to customers if this is the case?

...maybe that's why seven years ago Sun, upon publishing a world record TPC-C result said:

"It's well-understood in the technical communities that TPC-C no longer represents current customer workloads since the transaction load that its models are made of are small, primitive and disconnected transactions. While this model was acceptable for the workloads of the late 1980s, it misses the mark..."
http://www.sun.com/smi/Press/sunflash/2000-08/sunflash.20000831.1.html

You'll also notice the Aug 2000 press release said, "Customer workloads nowadays require a more ad hoc workload than the TPC-C specifies."

Disclosure Statements

IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) 4,033,378 tpmC, 2.97 US $/tpmC, Avail 01/22/07, IBM DB2 9, IBM AIX 5L V5.3, Microsoft COM+. HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread), 1,231,433 tpmC, 4.82 US $/tpmC, Avail 06/05/06, Microsoft SQL Server 2005 Enterprise Edt SP1, Microsoft Windows Server 2003 Datacenter Ed.(64-bit)SP1. Results as of 2/15/07, see http://www.tpc.org.

IBM System p5 595 (Power5+ 2.3GHz 64p, 128thread), 64 cores, 32 chips, 2 cores/chip (SMT on), 1513 SPECint_rate2000. HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread, 16 cells), 64 cores, 64 chips, 1 core/chip, 1108 SPECint_rate2000. SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from http://www.spec.org. as of 2/15/07.

World record TPC-C results referenced above was an overall performance world record at August 31, 2000. Sun Enterprise 10000 server (Starfire) running Sybase Adaptive Server Enterprise (ASE), 156,873.03 tpmC, $48.81 price/tpmC, available February 28, 2001. A full disclosure report and executive summary are available through the TPC Web site located at http://www.tpc.org.

[7] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg