BM Seer Facts & Questions from an Anonymous Sun Source

Datacenter Throughput is King: SPEC CPU2006 Rate Blade System World Record: Sun Blade 6000 w/T6320

Friday Apr 11, 2008

What most people forget, is that datacenters are really throughput engines. I don't know any datacenter (besides home ones) that only use one thread or one core. When you look at racks of servers in a datacenter, you are looking at thousands of threads! Which means 10,000 to 100,000 or more in a complete datacenter. Lots of work to be done, lots of threads doing it!

Sun has announced blade system world record results for SPECint_rate2006 and SPECfp_rate2006. These results were run on the Sun Blade 6000 system with 10 Sun Blade T6320 server modules which use the 1.4 GHz UltraSPARC T2 processor.

The Sun Blade 6000 system fully populated with 10 T6320 server modules delivered a SPECint_rate2006 score of 838, a world record result for blade systems.

The Sun Blade 6000 system (10 RUs) powered by 10 Sun UltraSPARC T2 1.4 GHz processors provides 73% more integer throughput than the IBM p 570 (16 RUs) equipped with 8 POWER6 4.7 GHz processors, as measured by SPECint_rate2006.

The Sun Blade 6000 system fully populated with 10 T6320 server modules delivered a blade system world record SPECfp_rate2006 score of 571.

Sun has chosen to submit a single run as both SPECfp_rate_base2006 and SPECfp_rate2006, (which is allowed under the run rules), in order to emphasize that even without aggressive tuning, the score of 571 is a record for both base and peak.

The Sun Blade 6000 system powered by 10 Sun UltraSPARC T2 1.4 GHz processors provides 73% more floating-point throughput than the IBM p 570 equipped with 8 POWER6 4.7 GHz processors, as measured by SPECfp_rate_base2006.

The IBM p 570 system (16RU) uses 1.6x times more rack units than the 10RU Sun Blade 6000 system(16 RU vs. 10 RU).

SPEC CPU2006 Performance Charts - bigger is better, selected recent results

SPECint_rate2006

Please see www.spec.org for complete results

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
Sun B6000 w/10 x T6320 UltraSPARC T2 1.4 10 80 640 838 752
HP Superdome Itanium 2 1.6 32 64 64 824 770
Sun M9000 SPARC VI 2.4 32 64 64 650 553
IBM p 570 POWER6 4.7 8 16 32 484 420

Results as of 7 Apr 2008 from www.spec.org.

SPECfp_rate2006

Please see www.spec.org for complete results or for just SPECfp_rate2006 results ordered by peak score.

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
Sun M9000 SPARC VI 2.4 32 64 64 600 556
Sun B6000 w/10 x T6320 UltraSPARC T2 1.4 10 80 640 571 571
IBM p 570 POWER6 4.7 8 16 32 430 369
HP rx8640 Itanium 2 1.6 16 32 32 371 357

Results as of 7 Apr 2008 from www.spec.org.

Benchmark Description<

SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

Disclosure Statement:

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 838 SPECint_rate2006, 752 SPECint_rate_base2006.

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 838 SPECint_rate2006, 752 SPECint_rate_base2006. IBM p 570 (POWER6, 8 chips, 16 cores), 484 SPECint_rate2006, 420 SPECint_rate_base2006.

SPEC, SPECfp reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 571 SPECfp_rate2006, 571 SPECfp_rate_base2006.

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 4/7/08. Sun Blade T6320 (UltraSPARC T2, 10 chips, 80 cores), 571 SPECfp_rate_base2006. IBM p 570 (POWER6, 8 chips, 16 cores), 369 SPECfp_rate_base2006.

Results Summary

Results
Reference Date: Apr 7, 2008
System: Sun Blade 6000 with 10 T6320 Modules
Processor: 10 Sun UltraSPARC T2, 1.4 GHz
  838 SPECint_rate2006
  752 SPECint_rate_base2006
  571 SPECfp_rate2006
  571 SPECfp_rate_base2006
Software: Solaris 10, Sun Studio 12 Compiler gccfss

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record Single-Chip UltraSPARC T2 SPECint_rate2006 Performance with gccfss

Thursday Feb 14, 2008

World's fastest chip. The Sun SPARC Enterprise T5120 server, running at 1.4 GHz, delivered a world record single chip result of 83.9 SPECint_rate2006. Please remember it is about system performance and chips not about things inside a chip (like perf/transistor, perf/NAND-gate, perf/metal-layers, perf/thread, perf/bore, oopps perf/core, perf/silicon grain).

The Sun SPARC Enterprise T5120 using the GCC for SPARC Systems (gccfss) compiler topped all competitor's single-chip results including beating the IBM p570 single-chip 4.7GHZ POWER6 result by 38%. IBM used its proprietary compiler, XL C/C++.

The Sun SPARC Enterprise T5120 using the GCC for SPARC Systems (gccfss) compiler beat the performance of the HP DL360 G5 with a single chip quad-core 3.16GHz Xeon X5460 by 15%.

The gccfss compiler allows one to use the optimal Sun SPARC optimization tools along with the popular gcc coding conventions and deliver performance that has not been possible before without time consuming code changes.

For more information on gccfss and how to get it, go to http://cooltools.sunsource.net/gcc/.

Sun also submitted results on the SPECfp_rate2006 benchmark suite using just a single disk. The Sun SPARC Enterprise T5120 server, running at 1.4 GHz, delivered a result of 62.1 SPECint_rate2006.

This result was run on a single disk. The previously reported result used the electrical equivalence rule of SPEC, but the configuration used more disks than fit in a T5120. This result shows that the performance is comparable, regardless of the disk configuration.

SPEC CPU2006 Performance Charts - bigger is better, selected recent results, see www.spec.org for complete results

SPECint_rate2006

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
T5120 (gccfss 4.2) UltraSPARC T2 1.4 1 8 64 83.9 76.2
T5220 (gccfss 4.2) UltraSPARC T2 1.4 1 8 64 83.2 75.6
T5120/T5220 UltraSPARC T2 1.4 1 8 64 78.5 73.0
T5220 (gccfss) UltraSPARC T2 1.4 1 8 64 78.0 71.6
Asus P5E3 Intel QX9650 3.0 1 4 4 76.7 69.0
HP DL360 G5 Intel X5460 3.16 1 4 4 73.0 62.1
Asus P5E3 Intel QX6850 3.0 1 4 4 69.1 64.9
Dell T3400 Intel QX9650 3.0 1 4 4 68.8 61.4
IBM p 570 Power6 4.7 1 2 4 60.9 53.2
Fujitsu RX100 Intel X3210 2.13 1 4 4 54.4 48.0

SPECfp_rate2006

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
T6320 UltraSPARC T2 1.4 1 8 64 62.3 58.1
T5120/T5220 UltraSPARC T2 1.4 1 8 64 62.3 57.9
T5120 (one disk) UltraSPARC T2 1.4 1 8 64 62.1 57.9
IBM p 570 Power6 4.7 1 2 4 58.0 51.5
Intel Asus P5E3 Intel QX9650 3.0 1 4 4 52.0 49.9
Dell T3400 Intel QX9650 3.0 1 4 4 47.2 44.9
HP DL360 G5 Intel X5460 3.16 1 4 4 44.5 41.3

Results as of 12 Feb 2008 from www.spec.org.

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark. It measures:

  • "Rate" - system performance of CPUs, memory, compiler
  • "Speed" - single thread performance of chip, memory, compiler; not intended to stress multi-core designs
  • The strategic metrics include:

  • SPECint_rate2006: throughput for 12 integer benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding

    SPECfp_rate2006: throughput for 17 floating point benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

  • There are "base" variants of both the above metrics that require more conservative compilation, such as using the same flags for all benchmarks.

    Disclosure Statement:

    SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 2/12/08. Sun SPARC Enterprise T5120 gccfss (UltraSPARC T2, 1 chip, 8 cores), 83.9 SPECint_rate2006. IBM p570 (POWER6, 1 chip, 2 cores), 60.9 SPECint_rate2006. HP DL360 G5 (Xeon X5460, 1 chip, 4 cores), 73.0 SPECint_rate2006. Sun SPARC Enterprise T5120 (UltraSPARC T2, 1 chip, 8 cores), 62.1 SPECfp_rate2006.

    Results Summary

    Results
    Reference Date: Feb 12, 2008
    System: Sun SPARC Enterprise T5120
    Processor: Sun UltraSPARC T2, 1.4 GHz
      83.9 SPECint_rate2006
      62.1 SPECfp_rate2006
    Software: Solaris 10, Sun Studio 12 Compiler gccfss

    [17] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    UltraSPARC T2: more floating-point performance

    Tuesday Aug 07, 2007

    More about floating-point on the Sun UltraSPARC T2 in this posting, In the previous posting SPECfp_2006 scores and the UltraSPARC T2 design being open-sourced were discussed.

    In the UltraSPARC T2 there are eight floating-point units that are well suited for scientific applications. Based upon preliminary runs the Sun UltraSPARC T2 processor at 1.4 GHz beats all single chip scores showing 14230(est)/15081(est) SPECompMbase2001/SPECompMpeak2001.

    How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECompMbase2001/SPECompMpeak2001 scores?

    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip IBM p520 POWER5+ 1.9GHz processor published result by 85%.
    • ...Sun is waiting for POWER6 4.7GHz results, maybe UltraSPARC T2 results will scare IBM from ever publishing a single-chip result?
    Benchmark description:

    The SpecOMP benchmark is a test of the performance of 9 High Performance computing applications. It is used to compare the performance of shared memory servers. All C/C++ and FORTRAN applications in this suite use the OpenMP programming model that provides a portable, scalable model for developing parallel applications for platforms ranging from the desktop to the supercomputer.

    The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, from the largest Unix servers to the small Windows NT platforms.

    Disclosure statement:

    All UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use preproduction systems. SPEC, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 14230 (est)/ 15081 (est) SPECompMbase2001/SPECompMpeak2001. Competitive results from www.spec.org as of August 6, 2007. IBM p520 1.9GHz (1 chip, 2 cores, 4 threads) published 8141/8174 SPECompMbase2001/SPECompMpeak2001.

    [2] Comments

    Performance of the new Sun UltraSPARC T2

    Tuesday Aug 07, 2007

    Sun UltraSPARC T2 is an amazing chip and very fast! The UltraSPARC T2 features several industry firsts:

    • Eight cores and 64 threads
    • Integrated 10 GbE networking and I/O
    • Dedicated, cryptographic and floating point units per core
    • 10 cryptographic functions supported with hardware
    • open-source design: www.opensparc.net

    Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz, beat all single chip scores showing 78.3 est. SPECint_rate2006. How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECint_rate2006 results.

    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip IBM POWER6 4.7GHz processor published result by 29%.
    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip estimated scores of the AMD Barcelona by 23%.
    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip published scores of the 2.66GHz Intel X5355 (Clovertown) by 48%.
    Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz, beat all single chip scores showing 62.3 est. SPECfp_rate2006. How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECfp_rate2006 results.
    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best published single-chip IBM POWER6 4.7GHz processor result by 7%.
    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip estimated scores of the AMD Barcelona by 11%.
    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip published scores of the 2.66GHz Intel X5355 (Clovertown) by 66%.

    Performance per core doesn't matter GHz doesn't matter, what matters is numbers of cores, efficiency, and design of the chip! Competitors are saying that UltraSPARC T2 is proprietary... this makes no sense. both UltraSPARC T1 and UltraSPARC T2 are open source designs (www.opensparc.net). You do not find the latest design of Intel, AMD, or IBM as open source designs.

    Disclosure Statement:

    All Sun UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use preproduction systems. SPEC, SPECint, SPECfp registered trademarks of Standard Performance Evaluation Corporation. Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 78.3 est. SPECint_rate2006, 62.3 est. SPECfp_rate2006. Competitive results from www.spec.org as of August 6, 2007. IBM POWER6 4.7GHz (1 chip, 2 cores, 4 threads) 60.9. SPECint_rate2006, 58.0 SPECfp_rate2006. AMD Barcelona 2.6 GHz (1 chip, 4 cores, 4 threads) 63.9 est SPECint_rate2006, 56.3 est. SPECfp_rate2006. Barcelona estimates based upon "The Register" article stating 2.6GHz quad is 21% and 50% faster than Intel 2.66 system. Fujitsu RX300 Intel X5355 2.66 GHz (1 chip, 4 cores, 4 threads) 52.8 SPECint_rate2006, 47.5 SPECfp_rate2006.

    Reminder: The Niagara 2 score was obtained from a full "reportable" SPEC run, but is designated as an "estimate" because a pre-production system was used.

    ...more information on the UltraSPARC T2 later today.

    [6] Comments

    new two-processor quad-core AMD result estimated on SPECfp_rate (8 cores total)

    Tuesday Jul 31, 2007

    AMD made a new two-processor (2-chip) quad-core estimated SPECfp_rate result public. The two-chip quad-core result for the Barcelona is an estimated 69.5 SPECfp_rate2006. Note this is not been submitted it is therefore marked "estimated".

      added note: all SPEC members are allowed to post preliminary numbers and mark them with the term "estimated". Given the velocity that AMD & Intel incorporate chips into full systems, it won't be too long before we see submitted results. I imagine they will use newer software when they submit systems as well.

    IBM has a submitted result on a 1-chip IBM POWER6 p 570 (4.7 GHz) 58.0 SPECfp_rate2006 result. Clearly performance per core doesn't matter as everyone puts different numbers of cores of different processing strengths and much different costs. So one has to look at chips & more importantly system performance and know system price. Does anyone know the system cost differential of 2-chip AMD quad-cores vs. 1-chip POWER6 IBM dual-cores?

      added note: We've all learned the cores price can vary by orders of magnitude with IBM leading the industry in $/core by a huge amount. When IBM cites best performance per core every customer should ask what is the $/core when configured with the memory I want. You will be surprised by comparison.

    AMD's data can be found on slide 20:
    http://www.amd.com/us-en/assets/content_type/DownloadableAssets/July_2007_AMD_Analyst_Day_Randy_Allen_FINAL.pdf

    Required Disclosure:

    SPEC and the benchmark name SPECfp_rate2006 are registered trademarks of the Standard Performance Evaluation Corporation. AMD Barcelona (two-chip, quad-core, 8 cores total) estimated 69.5 SPECfp_rate2006. IBM System p 570 POWER6 (1 chip, 2 cores/chip, 4 threads total, 4.7 GHz) of 58.0 SPECfp_rate2006 result. Results as of July 26, 2007. For latest scores visit www.spec.org.

    [1] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    bucket-o-records SPEC CPU2006 Sun Blade X8420

    Thursday Jan 11, 2007

    Sun Blade X8420 is 1.9x faster than the best Intel Woodcrest system on SPECint_rate2006 and is also 2.1x faster than the best Intel Woodcrest on SPECfp_rate2006. The Sun Blade X8420 is also 22% faster than 4-way Itanium2 dual-core on SPECfp_rate.

    Sun Blade X8420 delivered the best result with SPECint_rate2006 score of 93.1, using Solaris 10 and Studio 11 combo. The Sun Blade X8420 also delivered the best result of of 87.3 for the SPECfp_rate2006 benchmark for all x86 systems.

    SPEC CPU2006 Performance Charts (bigger is better, selected recent results)

    SPECint_rate2006

    System Processors Performance Results
    Type GHz Chips Cores Threads Peak Base
    Sun Blade X8420 AMD Opteron 8220 2.8 4 8 8 93.1 80.4
    Fujitsu CELSIUS R640 Xeon 5160 (Woodcrest) 3.0 2 4 4 50.3 48.8
    Sun Ultra 40 M2 AMD Opteron 2220SE 2.8 2 4 4 48.8 41.9
    HP DL585 Opteron 854 2.8 4 4 4 46.9 41.4
    Supermicro X7DBE Xeon 5160 (Woodcrest) 3.0 2 4 4 --- 45.2
    Sun Fire X4200 Opteron 285 2.6 2 4 4 42.8 37.8
    Fujjitsu RX220 Opteron 280 2.4 2 4 4 40.0 35.7
    Sun Fire X4200 Opteron 256 3.0 2 2 2 26.4 23.1
    HP DL585 Opteron 854 2.8 2 2 2 25.2 22.3
    Dell PrecWork 380 Pentium EE 3.73 1 2 2 -- 23.1
    HP DL380 G4 Pentium 4 3.8 2 2 2 -- 20.9

    SPECfp_rate2006

    System Processors Performance Results
    Type GHz Chips Cores Threads Peak Base
    Sun Blade X8420 AMD Opteron 8220 2.8 4 8 8 87.3 82.5
    HP rx6600 Itanium2 dual-core 1.6 4 8 8 71.4 69.1
    HP DL585 Opteron 854 2.8 4 4 4 49.3 45.6
    FSC CELSIUS R640 Intel Xeon 5160 (Woodcrest), WinXP Pro 3.0 2 4 4 42.5 41.4
    Sun Fire X4200 Opteron 285 2.6 2 4 4 38.1 36.0

    Results as of 09 Jan 2007 from www.spec.org.

    Benchmark Description

    SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

    Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

    The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

    Disclosure Statement:

      SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 1/9/07. Sun Blade X8420 (AMD Opteron 8220, 4chips/8cores, Solaris 10) 93.1 SPECint_rate2006. Sun Blade X8420 (AMD Opteron 8220, 4chips/8cores, Solaris 10) 87.3 SPECint_rate2006.

    Results Summary

      Results
      X8420 93.1 SPECint_rate2006
      X8420 87.3 SPECfp_rate2006
      Reference Date: Jan 09, 2007
      System: Sun Blade X8420, 64GB memory
      Processors: four 2.8 GHz Opteron 8220
      Software: Solaris 10, Sun Studio 11

    Like this post? del.icio.us | furl | slashdot | technorati | digg