Friday Apr 11, 2008
What most people forget, is that datacenters are really throughput
engines. I don't know any datacenter (besides home ones) that only
use one thread or one core. When you look at racks of servers in a datacenter, you are
looking at thousands of threads! Which means 10,000 to 100,000 or more in a complete datacenter. Lots of work to be done, lots of threads doing it!
Sun has announced blade system world record results for
SPECint_rate2006 and SPECfp_rate2006.
These results were
run on the Sun Blade 6000 system with 10 Sun Blade T6320 server modules which
use the 1.4 GHz UltraSPARC T2 processor.
The Sun Blade 6000 system fully populated with 10 T6320 server modules
delivered a SPECint_rate2006 score of 838, a world record result for
blade systems.
The Sun Blade 6000 system (10 RUs) powered by 10 Sun UltraSPARC T2 1.4 GHz
processors provides 73% more integer throughput than the IBM p 570 (16 RUs)
equipped with 8 POWER6 4.7 GHz processors, as measured by
SPECint_rate2006.
The Sun Blade 6000 system fully populated with 10 T6320 server modules
delivered a blade system world record SPECfp_rate2006 score of 571.
Sun has chosen to submit a single run as both
SPECfp_rate_base2006 and SPECfp_rate2006,
(which is allowed under the run rules), in order
to emphasize that even without aggressive tuning, the
score of 571 is a record for both base and peak.
The Sun Blade 6000 system powered by 10 Sun UltraSPARC T2 1.4 GHz
processors provides 73% more floating-point throughput than the IBM p 570
equipped with 8 POWER6 4.7 GHz processors, as measured by
SPECfp_rate_base2006.
The IBM p 570 system (16RU) uses 1.6x times more rack units than the 10RU Sun Blade 6000 system(16 RU vs. 10 RU).
SPEC CPU2006 Performance Charts -
bigger is better, selected recent results
SPECint_rate2006
Please see
www.spec.org for complete results
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips |
Cores |
Threads |
Peak |
Base |
| Sun B6000 w/10 x T6320 |
UltraSPARC T2 |
1.4 |
10 |
80 |
640 |
838 |
752 |
| HP Superdome |
Itanium 2 |
1.6 |
32 |
64 |
64 |
824 |
770 |
| Sun M9000 |
SPARC VI |
2.4 |
32 |
64 |
64 |
650 |
553 |
| IBM p 570 |
POWER6 |
4.7 |
8 |
16 |
32 |
484 |
420 |
Results as of 7 Apr 2008 from www.spec.org.
SPECfp_rate2006
Please see
www.spec.org for complete results
or for just
SPECfp_rate2006 results ordered by peak score.
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips |
Cores |
Threads |
Peak |
Base |
| Sun M9000 |
SPARC VI |
2.4 |
32 |
64 |
64 |
600 |
556 |
| Sun B6000 w/10 x T6320 |
UltraSPARC T2 |
1.4 |
10 |
80 |
640 |
571 |
571 |
| IBM p 570 |
POWER6 |
4.7 |
8 |
16 |
32 |
430 |
369 |
| HP rx8640 |
Itanium 2 |
1.6 |
16 |
32 |
32 |
371 |
357 |
Results as of 7 Apr 2008 from www.spec.org.
Benchmark Description<
SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and
CINT2006. CFP2006 targets floating-point performance, while CINT2006
targets integer performance.
Each suite has two different measures. First is the CPU measure, which
is the performance on the suite as a single stream. This can be either
a single thread or automatic compiled parallel run. This measure is
further defined by base and optimized runs. Base uses the same compiler
flags for all kernels, where optimized is allowed to use different
compiler flags for each kernel. Results are compared against a baseline
system run that was standardized by SPEC.
The second measure is Rate. It is a measure of how many CPU measures
can be run at a time. Typically, it is run as n processes on n
processors. It shows how well the same job mix can run on a system
under some load. It also is run as a base and optimized set of
results.
Disclosure Statement:
SPEC, SPECint reg tm of Standard Performance Evaluation Corporation.
Sun result submitted to SPEC,
other results from www.spec.org as of 4/7/08.
Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores),
838 SPECint_rate2006, 752 SPECint_rate_base2006.
SPEC, SPECint reg tm of Standard Performance Evaluation Corporation.
Sun result submitted to SPEC,
other results from www.spec.org as of 4/7/08.
Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores),
838 SPECint_rate2006, 752 SPECint_rate_base2006.
IBM p 570 (POWER6, 8 chips, 16 cores), 484 SPECint_rate2006,
420 SPECint_rate_base2006.
SPEC, SPECfp reg tm of Standard Performance Evaluation Corporation.
Sun result submitted to SPEC,
other results from www.spec.org as of 4/7/08.
Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores),
571 SPECfp_rate2006, 571 SPECfp_rate_base2006.
SPEC, SPECint reg tm of Standard Performance Evaluation Corporation.
Sun result submitted to SPEC,
other results from www.spec.org as of 4/7/08.
Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores),
571 SPECfp_rate_base2006.
IBM p 570 (POWER6, 8 chips, 16 cores),
369 SPECfp_rate_base2006.
Results Summary
| Results |
| Reference Date: |
|
Apr 7, 2008 |
| System: |
|
Sun Blade 6000 with 10 T6320 Modules |
| Processor: |
|
10 Sun UltraSPARC T2, 1.4 GHz |
|
|
|
838 SPECint_rate2006 |
|
|
|
752 SPECint_rate_base2006 |
|
|
|
571 SPECfp_rate2006 |
|
|
|
571 SPECfp_rate_base2006 |
| Software: |
|
Solaris 10, Sun Studio 12 Compiler gccfss |
Monday Apr 30, 2007
When Sun was had the world record we said it was too simplistic and old, and
that was yeast ago. TPC-C has problems, IBM has heavily tuned it like this.
Why does IBM still point to this 14+ year old benchmark? Why do they
avoid new benchmarks with the lastest GHz full-system IBM p595 on:
- SPECjbb2005?
- SPECint_rate2006?
- SPECfp_rate2006?
- Linpack?
- SPECint_2006?
- SPECfp_2006?
- ....the list goes on...
Doesn't IBM want fair comparisons? I guess IBM would just be beaten by Sun
in performance and $/perf so they want to avoid comparisons.
It is funny that last year I egged HP on about SPECjbb2005, "why no results?"
Someone commented that HP thinks it is a bad benchmark, so they won't publish
on it. Now HP has the top result. Changed their tune?
Notice how this is different than when established a World Record TPC-C, Sun told the world the benchmark was too simplistic back then
and is sticking to it? The world became a lot more complicated in the past 7 years
and computing has evolved a lot so we won't go back to something that was created
13 years ago. Sun never quotes 23-year old Dhrystones benchmark anymore either.
The press and analysts are overwhelmingly
seeing TPC-E the successor to the simplistic 14 year-old TPC-C.
IBM's TPC-C "tuning"(?) that won't apply to anything in the real
world
June 2005 Interview with Bruce Lindsay (IBM Fellow) at
http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf
"And the good news is that about 40-70% of the stuff we do in
performance tuning actually ends up helping end users."
This means that 30% to 60% of IBM's TPC-C tunings don't help users.
Really beyond the huge disk size of the large TPC-C results (which has a lot to do with the TPC-C being 14 years old),
the quote below points to tuning that is legal but seems a bit too "tricky" for my taste...
"We get down to the level of worrying about the physical column order in the table so the reference columns are near each other, minimizing cache misses during fetching. This is feasible in the TPC-C benchmark because there are only five tables and only ten to fifteen columns in each table. In a more realistic application, where there are many more queries to be considered, the tables are typically much, much wider, in the 80 to 100 column range; and there are dozens if not thousands of tables. Then this kind of analysis is no longer practical." Bruce Linsay, IBM fellow"
For those who may not remember, IBM didn't even end the EOL'ed SPECint_rate2000 on a high note. See:
http://www.spec.org/cpu2000/results/rint2000.html and search for "1644" and "1513"
various footnotes:
"It's well-understood in the technical communities that TPC-C no longer represents current customer workloads since the transaction load that its models are made of are small, primitive and disconnected transactions. While this model was acceptable for the workloads of the late 1980s, it misses the mark..." Sun's World Record TPC-C Press release, August2000
Disclosure Statement
TPC-C results referenced above was the fastest overall performance world record at August 31, 2000. Sun Enterprise 10000 server (Starfire) running Sybase Adaptive Server Enterprise (ASE), 156,873.03 transactions per minute (tpmC), $48.81 price/tpmC, available February 28, 2001. A full disclosure report and executive summary are available through the TPC Web site located at
www.tpc.org.
Saturday Mar 31, 2007
IBM lacks Power5+ benchmarks on new & old workloads that
everyone else is publishing on. Why no lastest GHz full-system IBM p595 publications
on:
- SPECjbb2005?
- SPECint_rate2006?
- SPECfp_rate2006?
- Linpack?
- SPECint_2006?
- SPECfp_2006?
- ....the list goes on...
Don't they want comparisons?
I hear IBM bloggers still love TPC-C so is the IBM p595 only suited for that very old
(14-year old) test? The press and analysts are overwhelmingly
seeing TPC-E the successor to the simplistic 13 year-old TPC-C. 7 years ago when Sun
established a World Record TPC-C, Sun told the world the benchmark was too simplistic.
It is good the see the rest of the industry is catching up.
Sun never quotes 23-year old Dhrystones benchmark anymore either.
For those who may not remember, IBM didn't even end the EOL'ed SPECint_rate2000 on a high note:
http://www.spec.org/cpu2000/results/rint2000.html, search for "1644" and "1513"
Since we're talking history, I should be clear and state that by "1513" I wasn't
talking about the year that Juan Ponce de Leon definitely is known to have sighted what is
now the USA and claimed it for Spain.
Tuesday Feb 20, 2007
Another clue to IBM's over optimisation of TPC-C? Let's look
historically. Since 2002, IBM has speed up SPECint_rate2000 by
6.1x times. Clearly this was due to newer systems, faster GHz, higher
thread count, improved caches, and software improvements.
Funny At same time, IBM increased TPC-C by 10x times. Since these are
the same systems there must be a lot more software work to get this
kind of increase!
In 4-5 years the IBM TPC-C tuning outpaced the SPECint_rate2000 tuning
by 64% ...and this is after 10 years after TPC-C was made public, so there
before 2002 there must have been plenty of time to properly index and
tune a database. Considering all of the compiler work on SPECint_rate2000
seems like IBM went to a lot of extra extra effort on TPC-C.
Somewhat funny, but looking at the post earlier today, it seems like
things are lining up.
The math:
From the IBM p690 ca May 02 to the current IBM p595:
- SPECint_rate2000: 1513 / 249 = 6.1x
- TPC-C tpmC: 4033378 / 403255 = 10.0x
In the last 4-5 years the IBM high-end tpmC has outpaced the
high-end SPECint_rate2000 by
10.0 / 6.1 = 1.64x -> 64%
Disclosure Statements
IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) 4,033,378 tpmC, 2.97 US $/tpmC, Avail 01/22/07, IBM DB2 9, IBM AIX 5L V5.3, Microsoft COM+.
As of May 21, 2002: IBM eServer pSeries 690 Turbo (1300 MHz, 32 CPU), 403,255.36 tpmC, $19.57/tpmC, available by November 22, 2002.
Results as of 2/15/07, see http://www.tpc.org.
IBM System p5 595 (Power5+ 2.3GHz 64p, 128thread), 64 cores, 32 chips, 2 cores/chip (SMT on), 1513 SPECint_rate2000. IBM eServer pSeries 690 Turbo
(1300 MHz, 32 CPU) 249 SPECint_rate2000. SPECint_rate2000. SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from http://www.spec.org as of 2/15/07.
Tuesday Feb 20, 2007
Is IBM 3.3x or 1.4x faster? - I guess it depends if you use a
over-optimised benchmark like TPC-C. As mentioned yesterday,
IBM doesn't publish on a variety of standard benchmarks like
SPECint_rate2006 or SPECjbb2005 on their high-end systems so we
have to look at the SPECint_rate2000 which is just about to be EOL'ed
and completely replaced by SPECint_rate2006.
First let's compare an IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) to
a HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread, single core/CPU)
on SPECint_rate2000.
Constructing a SPECint_rate2000 ratio
1.4x = 1513/1108
we find that the IBM 595 is 1.4x faster, it makes sense because this
isn't the latest HP dual-core Itanium2. Both IBM and HP systems have
results on TPC-C U SPECint_rate2000.
OK now using TPC-C, let's compare a IBM p5 595 (Power5+ 2.3GHz 64p,
128thread) to a HP Integrity Superdome
(Itanium2 1.6 GHz 64p, 64thread, single core/CPU).
Constructing a TPC-C ratio
3.3x = 4033378/1231433
what?
comparing the same systems the IBM is 3.3x faster ?!?
Looks that TPC-C over-inflates what can be expected from IBM.
My guess is IBM over-optimised and played lots of tuning tricks
on TPC-C, correct? So is TPC-C relavent to customers if this
is the case?
...maybe that's why seven years ago Sun, upon publishing a world
record TPC-C result said:
"It's well-understood in the technical communities that TPC-C no longer
represents current customer workloads since the transaction load that
its models are made of are small, primitive and disconnected transactions.
While this model was acceptable for the workloads of the late 1980s, it
misses the mark..."
http://www.sun.com/smi/Press/sunflash/2000-08/sunflash.20000831.1.html
You'll also notice the Aug 2000 press release said, "Customer workloads
nowadays require a more ad hoc workload than the TPC-C specifies."
Disclosure Statements
IBM p5 595 (Power5+ 2.3GHz 64p, 128thread) 4,033,378 tpmC,
2.97 US $/tpmC, Avail 01/22/07, IBM DB2 9, IBM AIX 5L V5.3, Microsoft COM+.
HP Integrity Superdome (Itanium2 1.6 GHz 64p, 64thread), 1,231,433 tpmC,
4.82 US $/tpmC, Avail 06/05/06, Microsoft SQL Server 2005 Enterprise Edt SP1,
Microsoft Windows Server 2003 Datacenter Ed.(64-bit)SP1. Results as of
2/15/07, see http://www.tpc.org.
IBM System p5 595 (Power5+ 2.3GHz 64p, 128thread), 64 cores, 32 chips,
2 cores/chip (SMT on), 1513 SPECint_rate2000. HP Integrity Superdome
(Itanium2 1.6 GHz 64p, 64thread, 16 cells), 64 cores, 64 chips,
1 core/chip, 1108 SPECint_rate2000. SPEC, SPECint, SPECfp reg tm of
Standard Performance Evaluation Corporation. Results from http://www.spec.org. as of 2/15/07.
World record TPC-C results referenced above was an overall performance
world record at August 31, 2000. Sun Enterprise 10000 server (Starfire)
running Sybase Adaptive Server Enterprise (ASE), 156,873.03 tpmC, $48.81 price/tpmC, available February 28, 2001. A full disclosure report and executive summary are available through the TPC Web site located at
http://www.tpc.org.
So, was that a single system image, or was it simp...
Quite clear that SPECrate benchmarks are runs of i...