BM Seer Unofficial thoughts from an anonymous Sun employee

SPECjbb2005 Sun X4600 M2 X86 World Record Multi-JVM

Tuesday Aug 19, 2008

The Sun Fire X4600 M2 (8 Opteron 2.5 Ghz QC) running Sun Java SE 6 Update 6-p achieved a result of 683542 SPECjbb2005 bops, 85443 SPECjbb2005 bops/JVM for the best score for all x86 based servers on the SPECjbb2005 benchmark.

The Sun Fire X4600 M2 demonstrated 53% better performance over the Dell PowerEdge R900 result of 446209 SPECjbb2005 bops, 55776 SPECjbb2005 bops/JVM which used 4 Intel Xeon quad-core processors at 2.93 GHz and the BEA JRocket JDK 1.6.0_02.

The Sun Fire X4600 M2 (8-chip) demonstrated 3% better performance over the IBM p570 result of 664167 SPECjbb2005 bops, 83021 SPECjbb2005 bops/JVM which used 8 Power6 dual-core processors.

Note: An IBM blogger made a snarky comment about the fact that Sun "should know better". Sun clearly pointed out this is for 8-chip systems. No other vendor posted results on x86 systems with this many chips on this benchmark so Sun compared to 4-chip x86 results. Maybe IBM would like to publish the price of their (16 RU)POWER6 8-chip system compared to the Sun X4600 M2 as configured for this benchmark? ...or is it easier ignore system price and to confuse people by pointing to core-count, thereby dodging the cost-per-core issue?

The Sun Fire X4600 M2 used Solaris 10 5/08 and Sun JDK 1.6.0_06 Performance Release to obtain this leading result.

SPECjbb2005 Performance Chart (ordered by performance)

bops: SPECjbb2005 Business Operations per Second (bigger is better)

System Processors Performance
Chips, Cores, Threads GHz Type SPECjbb2005
bops
SPECjbb2005
bops/JVM
Sun Fire X4600 M2 8,32,32 2.5 8360SE 683542 85443
IBM p570 8,16,32 4.7 POWER6 664167 83021
Dell PE R900 4,16,16 2.93 X7350 446209 55776

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Disclosure Statement:

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 8/7/2008 on www.spec.org. Sun Fire X4600 M2(8 chips, 32 cores) 683542 SPECjbb2005 bops, 85443 SPECjbb2005 bops/JVM submitted for review. Dell PE R900(4 chips, 16 cores) 446209 SPECjbb2005 bops, 55776 SPECjbb2005 bops/JVM. IBM p570 (8 chips, 16 cores) 664167 SPECjbb2005 bops, 83021 SPECjbb2005 bops/JVM.

Results Summary

Reference Date: Aug 8, 2008
Results 683542 SPECjbb2005 bops, 85443 SPECjbb2005 bops/JVM
System: Sun Fire X4600 M2
Processor: 8 x AMD Opteron 8360SE 2.5 GHz
Operating System: Solaris 10 5/08
JVM: Java HotSpot(TM) 32-Bit Server, Version 1.6.0_06-p

[0] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Extremely Fast Pattern Matching on Sun SPARC Enterprise T5220/T5240

Friday Aug 08, 2008

Sun SPARC Enterprise T5220 / T5240 beats IBM Cell Broadband Engine with significantly easier application code development!

Pattern matching or string searching are important to a variety of commercial, government and HPC applications. One of the core functions needed for text identification algorithms in data repositories is real-time string searching. For this benchmark, both IBM and Sun used the Aho-Corasick algorithm for string searching.

Note: Got this from an internal website on info that is going public.

The 2-chip Sun SPARC Enterprise T5240 performed string searching at a rate of 6.12 GB/s (49.0 Gbit/sec) whereas the 2-chip IBM Cell Broadband Engine DD3 Blade performed string searching at a rate of 0.48 GB/s (3.8 Gbit/sec).

The 1-chip Sun SPARC Enterprise T5220 performed string searching at a rate of 3.08 GB/s (24.6 Gbits/s).

The Sun SPARC Enterprise T5240 demonstrated a 2x speedup over the Sun SPARC Enterprise T5220.

The Aho-Corasick algorithm as deployed on the IBM Cell Broadband Engine DD3 Blade required substantial optimization and tuning to achieve the reported performance, whereas on the Sun SPARC Enterprise T5220 or T5240 only a basic implementation of the algorithm and a simple compilation were needed.

Performance Summary

System Throughput
(GBits/sec)
Chips Cores GHz
Sun SPARC Enterprise
T5240
49.0 2 16 1.4
Sun SPARC Enterprise
T5220
24.6 1 8 1.4
IBM Cell Broadband Engine
DD3 Blade
3.8 2 16 3.2

IBM results are obtained from Figure 7(d) of IEEE Computer, Volume 41, Number 4, pp. 42-50, April 2008. Sun benchmark results as of 08/05/2008.

Benchmark Description

One of the core functions needed for text identification algorithms in data repositories is real-time string searching. This string searching benchmark demonstrates the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code creation and speed of code execution.

In IEEE Computer, Volume 41, Number 4, pp. 42-50, April 2008, IBM describes a variant of the Aho-Corasick string searching algorithm that uses deterministic finite automata. The algorithm first constructs a graph that represents a dictionary, then walks that graph using successive input characters from a text file. Each "state" in the graph includes a state transition table (STT) that is accessed using the next input character from the text file to determine the address of the next state in the graph. IBM defines an automaton as a two-step loop that: (1) obtains the address of the next state from the STT, and (2) fetches the next state in the graph.

IBM reports the performance of its Cell Broadband Engine (CBE) to execute this algorithm to search a 4.4 MB version of the King James Bible using a dictionary of the 20,000 most used words in the English language (average word length of 7.59 characters). Each of the 8 synergistic processing elements (SPEs) of each of the two CBEs executes 16 automata, for a total of 256 automata. All automata and hence all SPEs access a single, shared dictionary.

IBM describes elaborate optimizations of the Aho-Corasick algorithm, including state shuffling, state replication, alphabet shuffling and state caching. These optimizations were required to: (1) overcome "memory congestion", i.e., contention amongst the SPEs for access to the shared dictionary, and (2) compensate for the limited local storage that is associated with each SPE. These optimizations were necessary to achieve the performance reported for the CBE DD3 Blade. IBM does not provide references that indicate where to obtain the dictionary and Bible. IBM reports the algorithmic performance in Gbits/s but does not indicate whether an 8-bit byte is extended to 10 bits as required for network transmission.

In order to closely approximate the dictionary and Bible that were used by IBM, Sun used a dictionary of 25,144 English words (the Open Solaris file cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/spell/list) for which the average word length is 8.22 characters, and a 4.6 MB version of the King James Bible (www.patriot.net/users/bmcgin/kjv12.zip). For reporting of results in Gbits/s, the length of a byte is assumed to be 8 bits.

In order to demonstrate the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code generation and speed of code execution, Sun implemented the Aho-Corasick algorithm using ANSI C. No optimizations of the algorithm were required to achieve the performance reported for the T5220 and TT5240.

The source code was compiled using the -m64 -xO3 and -xopenmp options. The dictionary is represented using a graph that comprises 187 MB. Each core of the T5220 or T5240 executes 8 automata using one OpenMP thread per automaton. Thus, the T5220 executes 64 total automata and the T5240 executes 128 total automata. All automata and hence all cores access a single, shared dictionary. Access to this dictionary is accelerated by the large, shared L2 caches of the Sun SPARC Enterprise T5220 and T5240.

Disclosure Statement:

Pattern Matching: Sun SPARC Enterprise T5240 (2 x 1.4 GHz UltraSPARC T2 Plus, 2 chips, 16 cores), Solaris 10, Sun C 5.9, 49.0 GBits/sec; Sun SPARC Enterprise T5220 (1 x 1.4 GHz UltraSPARC T2, 1 chip, 8 cores), Solaris 10, Sun C 5.9, 24.6 GBits/sec; IBM Cell Broadband Engine DD3 Blade (2 x 3.2 GHz Cell Broadband Engine, 2 chips, 16 cores), Linux kernel v2.6.16, IBM CBE Software Development Kit v2.1, 3.8 GBits/sec.

System Configuration

Throughput (GBits/sec) 24.6   T5220
  49.0   T5240
Reference Date: August 5, 2008
Systems: Sun SPARC Enterprise T5220, T5240
Total Number Processors: 1, 2
Processor/GHz of Server: 1.4 GHz UltraSPARC T2, T2 Plus
Operating System: Solaris 10

Like this post? del.icio.us | furl | slashdot | technorati | digg

SPECjvm2008 Sun Fire X4450 First Result Ever Published

Thursday Jul 17, 2008

The Sun Fire X4450 server demonstrates Sun's position of leadership in Java based computing by publishing the first result ever for the new SPEC benchmark JVM2008. The Sun Fire X4450 server delivered a result of 260.08 SPECjvm2008 Base ops/m.

Now we just need the other vendors (SPEC members who must have approved the benchmark) to step up and start publishing...

SPECjvm2008 Performance Chart (ordered by performance)

base: SPECjvm2008 Base ops/m (bigger is better)
peak: SPECjvm2008 Peak ops/m (bigger is better)
Ch/Co/Lc: Chips, Cores, Logical CPUs

System Processors Performance
Ch Co Lc GHz Type base peak
Sun Fire X4450 4 16 16 2.933 X7350 QC 260.08 -

Benchmark Description

SPECjvm2008 (Java Virtual Machine Benchmark) is a benchmark suite for measuring the performance of a Java Runtime Environment (JRE), containing several real life applications and benchmarks focusing on core java functionality. The suite focuses on the performance of the JRE executing a single application; it reflects the performance of the hardware processor and memory subsystem, but has low dependence on file I/O and includes no network I/O across machines. The SPECjvm2008 workload mimics a variety of common general purpose application computations. These characteristics reflect the intent that this benchmark will be applicable to measuring basic Java performance on a wide variety of both client and server systems.

SPEC also finds user experience of Java important, and the suite therefore includes startup benchmarks and has a required run category called base, which must be run without any tuning of the JVM to improve the out of the box performance.

SPECjvm2008 benchmark highlights:

  • Leverages real life applications (like derby, sunflow, and javac) and area-focused benchmarks (like xml, serialization, crypto, and scimark).
Also measures the performance of the operating system and hardware in the context of executing the JRE.

Disclosure Statement:

SPEC, SPECjvm reg tm of Standard Performance Evaluation Corporation. Results as of 07/16/08 on www.spec.org. Sun Fire X4405 260.08 SPECjvm2008 Base ops/m

Results Summary

Certified Results
Performance: 260.08 SPECjvm2008 Base ops/m
Reference Date: July 8, 2008
Systems: Sun Fire X4450
Total Number Processors: 4
Processor/ GHz of Server: Intel Xeon X7350 QC 2.933 GHz
Operating System: Solaris 10 5/08
JVM: Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_06 Performance Release

Like this post? del.icio.us | furl | slashdot | technorati | digg

SPEComp2001 Sun SPARC Enterprise M8000 2.52GHz

Wednesday Jul 16, 2008

The Sun SPARC Enterprise M8000 server using the new SPARC64 VII 2.52 GHz processor delivered a SPECompM2001 result of 104,714.

The Sun SPARC Enterprise M8000 server (2.52GHz SPARC64 VII processors) beat the best posted IBM p570 result(4.7GHz Power6) by 11% on the SPECompM2001 benchmark.

The Sun SPARC Enterprise M8000 server (2.52GHz SPARC64 VII processors) delivered a SPECompL2001 result of 581,807, the fastest result using 16 chips or less.

The new 2.52GHz SPARC64 VII processors delivered 58% more performance for the Sun SPARC Enterprise M8000 when compared to the SPARC64 VI 2.28GHz processors as measured by the SPECompM2001 benchmark.

Benchmark Description

The SPEC OMPM2001 Benchmark Suite was released in June 2001 and tests HPC performance using OpenMP for parallelism.

  • 11 programs (3 in C and 8 in Fortran) parallelized using OpenMP API
Goals of suite:
  • Targeted to mid-range (4-32 processor) parallel systems
  • Run rules, tools and reporting similar to SPEC CPU2000
  • Programs representative of HPC and Scientific Applications
Result Landscape SPECompM2001 (bigger is better, ordered by peak metric, representative results)

Result Cores Chips OpenMP
Threads
System
Peak Base
157880 148510 64 32 64 IBM p5 p595, POWER5 2.3GHz
104714 75418 64 16 127 Sun SE M8000, SPARC64 VII 2.52GHz
94350 84017 16 8 32 IBM p570, POWER6 4.7GHz
66283 59179 32 16 32 Sun SE M8000, SPARC64 VI 2.28GHz
56211 45275 16 8 32 IBM p5-575, POWER5 1.9GHz
46444 44164 32 16 32 SGI Altix 4700, Itanium2 1.6GHz
45895 35534 16 8 32 IBM p5-560Q, POWER5+ 1.8GHz

Results from www.spec.org as of 14 July 2008

SPECompL2001 (bigger is better,Results ordered by peak metric)

Result Cores Chips OpenMP
Threads
System
Peak Base
1456653 1250890 256 64 192 Sun SE M9000, SPARC64 VII 2.52GHz
1230446 1148235 128 64 128 Sun SE M9000, SPARC64 VI 2.4GHz
1056459 1005583 64 32 128 IBM p5 595, POWER5 2.3GHz
1005076 987139 256 128 256 SGI Altix 4700, Itanium 2 1.6GHz
672757 620741 64 32 128 IBM p5 595, POWER5 1.9GHz
581807 532576 64 16 64 Sun SE M8000, SPARC64 VII 2.52GHz

Results from www.spec.org as of 14 July 2008

Disclosure Statement:

SPEC, SPEComp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 07/14/08. Sun results submitted to SPEC. Sun SPARC Enterprise M8000 (64 cores, 16 chips, 64/127 OMP threads, 2.52GHz) 104714 SPECompM2001, 75418 SPECompMbase2001. Sun SPARC Enterprise M8000 (32 cores, 16 chips, 32 OMP threads, 2.28GHz) 59179 SPECompMbase2001. IBM p 570 (16 cores, 8 chips, 32 OMP threads, 4.7GHz Power6) 94350 SPECompM2001.

SPEC, SPEComp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 07/14/08. Sun results submitted to SPEC. Sun SPARC Enterprise M8000 (64 cores, 16 chips, 64 OMP threads, 2.52GHz) 581807 SPECompL2001, 532576 SPECompLbase2001.

Results Summary

Result
M8000: 581807 SPECompL2001
M8000: 104714 SPECompM2001
Reference Date: Jul 14, 2008
System: Sun SPARC Enterprise M8000
Total Number Processors: 16
Total Memory : 256 GB (128x2GB DIMMs)
Processor/GHz of Server: SPARC64 VII, 2.52 GHz
Operating System: Solaris 10
Compiler: Sun Studio 12

Like this post? del.icio.us | furl | slashdot | technorati | digg

SPECompL2001 Sun SPARC Enterprise M9000 @ 2.52GHz

Wednesday Jul 16, 2008

The Sun SPARC Enterprise M9000 server using the new SPARC64 VII 2.52 GHz processor delivered results on the SPEC OMPL2001 benchmarks.

The Sun SPARC Enterprise M9000 server, powered by 2.52GHz SPARC64 VII processors reset the World Record for SPECompL2001 with a result of 1,456,653 and a world record SPECompLbase2001 result of 1,250,890.

The Sun SPARC Enterprise M9000 server beats the 128-socket SGI Altix 4700, Itanium2 DC 1.6GHz by 45% on SPECompL2001. There are no POWER6 results on this benchmark at this scale. (post a comment if I missed it on the SPEC website and I will post a correction).

Benchmark Description

The SPEC OMPM2001 Benchmark Suite was released in June 2001 and tests HPC performance using OpenMP for parallelism.

  • 11 programs (3 in C and 8 in Fortran) parallelized using OpenMP API
Goals of suite:
  • Targeted to mid-range (4-32 processor) parallel systems
  • Run rules, tools and reporting similar to SPEC CPU2000
  • Programs representative of HPC and Scientific Applications

Result Landscape SPECompL2001 (bigger is better, Results ordered by peak metric)

Result Cores Chips OpenMP
Threads
System
Peak Base
1456653 1250890 256 64 192 Sun SE M9000, SPARC64 VII 2.52GHz
1230446 1148235 128 64 128 Sun SE M9000, SPARC64 VI 2.4GHz
1056459 1005583 64 32 128 IBM p5 595, POWER5 2.3GHz
1005076 987139 256 128 256 SGI Altix 4700, Itanium 2 1.6GHz
672757 620741 64 32 128 IBM p5 595, POWER5 1.9GHz
581807 532576 64 16 64 Sun SE M8000, SPARC64 VII 2.52GHz

Results from www.spec.org as of 14 July 2008

Disclosure Statement:

SPEC, SPEComp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 07/14/08. Sun results submitted to SPEC. Sun SPARC Enterprise M9000 (256 cores, 64 chips, 192/256 OMP threads, 2.52GHz) 1456653 SPECompM2001, 1250890 SPECompMbase2001. SGI Altix 4700 (256 cores, 128 chips, 256 OMP threads, Itanium 2 1.6GHz) 1005076 SPECompM2001, 987139 SPECompMbase2001.

Results Summary

Result
M9000: 1456653 SPECompL2001
  1250890 SPECompLbase2001
Reference Date: Jul 14, 2008
System: Sun SPARC Enterprise M9000
Total Number Processors: 64
Total Memory : 1 TB (512x2GB DIMMs)
Processor/GHz of Server: SPARC64 VII, 2.52 GHz
Operating System: Solaris 10
Compiler: Sun Studio 12

[7] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

more on gallon/mile not MPG and huge amount of datacenter power draws

Tuesday Jul 15, 2008

ecogeek covers whey MPG (miles/gallon) is a silly measuremnt and why gallons/mile is a much better metric. The didn't draw the conclusions that watt/performance is also the better measurement, but the same reasoning applies. For more read: http://www.ecogeek.org/content/view/1875/69/
http://blogs.sun.com/bmseer/entry/mpg_and_perf_watts_are
http://blogs.sun.com/bmseer/entry/miles_gal_perf_watt_use

Also read this interesting fact:

"At the moment, the world's data centres are estimated to consume about 14 gigawatts of power, and to be responsible for 2% of global carbon-dioxide emissions—roughly the same as air traffic." The Economist article.

So why do other companies only measure watts on slow low-GHz CPUs and tiny 8GB-16GB memory instead of measuring on a wide variety of benchmarks like Sun does? My guess (and some unofficial measurements) is that HP and IBM would lose. I'm still waiting for HP's 2.93GHz X64 and IBM Power6 (5GHz or 4.7GHz) system power measurements for instance...

Like this post? del.icio.us | furl | slashdot | technorati | digg

Linpack HPC benchmark Sun SPARC Enterprise M9000 @ 2.52GHz

Monday Jul 14, 2008

The Sun SPARC Enterprise M9000 server running 2.52GHz SPARC64 VII processors delivered 2.023 TFLOPS on the Linpack HPC benchmark.

For single servers, the Sun SPARC Enterprise M9000 server outperforms the best IBM Power 595 5GHz POWER6 published result by two times on the Linpack HPC benchmark. This system is the largest that IBM makes for its 5GHz Power6-based servers.

A single Sun SPARC Enterprise M9000 server shows 2.7 times the performance on the Linpack HPC benchmark when compared to the HP Integrity Superdome Itanium 2 system.

The Sun Performance Library was enhanced to take advantage of the SPARC64 VII architecture.

Benchmark Description

The Linpack benchmark suite measures the performance for factoring and solving a dense set of linear equations in double-precision floating-point.

The Linpack HPC benchmark allows the solution of any size matrix with a single right hand side. It was developed to allow vendors to show off their hardware. Because big problems allow for peak performance potentials, the benchmark is seen as an upper bound of potential performance of a machine. The run rules are much more flexible. The solution technique must use a pivoting scheme and the driver must follow the spirit of the Linpack 1000 or Linpack 100 benchmarks.

LINPACK HPC Performance Chart - GFLOPS (bigger is better)

Table below does not include clustered solutions.

System GFLOPS Processors
Total Peak Threads CPUs Type GHz
Sun SPARC Enterprise M9000 2023.0 2580.5 256 64 SPARC64 VII 2.52
Sun SPARC Enterprise M9000 1032.0 1228.8 128 64 SPARC64 VI 2.4
IBM Power 595 1028.0 1280.0 64 32 POWER6 5.0
HP Superdome 745.5 819.2 128 64 Itanium 2 1.6
Sun SPARC Enterprise M8000 548.2 645.1 64 16 SPARC64 VII 2.52

Disclosure Statement:

Linpack HPC, results from http://www.netlib.org/benchmark/index.html as of 07/01/08. Sun SPARC Enterprise M9000 (SPARC64 VII @2.52, 64 chips, 256 cores), 2.023 TFLOPS. IBM Power 595 (POWER6 5.0GHz, 32 chips, 64 cores) 1028.0 GFLOPS. HP Superdome (Itanium 2 1.6GHz/24MB, 64 chips, 128 cores) 745.5 GFLOPS.

Linpack HPC, results from http://www.netlib.org/benchmark/index.html as of 04/13/07. Sun SPARC Enterprise M9000 (SPARC64 VI @2.4, 64 chips, 128 cores), 1.032 TFLOPS. IBM p5 595 (POWER5 1.9GHz, 32 chips, 64 cores) 418.0 GFLOPS. HP Superdome (Itanium 2 1.6GHz/24MB, 64 chips, 128 cores) 745.5 GFLOPS.

Results Summary SAE (Strategic Applications Engineering) has submitted results for the LINPACK HPC benchmark
Published Results
Performance: 2.023 TFLOPS
System: Sun SPARC Enterprise M9000
Total Number Processors: 64
Processor/GHz of Server: SPARC64 VII, 2.52 GHz
Operating System: Solaris 10
Compiler: Sun Studio 12

[1] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record SAP-SD 2-Tier: Sun SPARC Enterprise M9000 SPARC64 VII SAP-SD 2-Tier ERP 6.0 (2005)

Monday Jul 14, 2008

The Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) set a World Record for the SAP-SD 2-Tier Standard Application benchmark. World Record SAP-SD 2-Tier: Sun SPARC Enterprise M9000 SPARC64 VII SAP-SD 2-Tier ERP 6.0 (2005) outperforms largest IBM Power 595 / 5GHz POWER6.

The 64-way Sun SPARC Enterprise M9000 with 2.52 GHz SPARC64 VII processors achieved 39,100 users on the two-tier SAP Sales and Distribution (SD) standard SAP ERP 2005 application benchmark.

The 64-way Sun SPARC Enterprise M9000 beat the 32-way IBM Power 595 (5GHz 64-core Power6) by 10%. This is the largest configuration that IBM makes. IBM has a very different and very complicated core. Users should compare hardware system costs for these two systems. The IBM p595 achieved 35,400 users on SAP-SD 2005 6.0 (177,950 SAPS, 5,561 SAPS/proc, 08-Apr-08).

The 64-way Sun SPARC Enterprise M9000 beat the 64-way HP Integrity Superdome by 30%. The IBM p595 achieved 30,000 users on SAP-SD 2005 6.0 (152,530 SAPS, 2,383 SAPS/proc, 18-Dec-06).

SAP-SD 2-Tier ERP 6.0 (2005) Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains systems on the various SAP products.

SAP-SD 2-Tier Performance Table (in decreasing performance order).

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun SPARC Enterprise M9000
64xSPARC64 VII @2.52GHz
1024 GB
Solaris 10
Oracle 10g
39,100 2005
6.0
196,564 3,071 14-Jul-08
IBM Power 595
32xPOWER6 @5.0GHz
64 cores, 512 GB
AIX 6.1
DB2 9.5
35,400 2005
6.0
177,950 5,561 08-Apr-08
HP Integrity SD64B
64xItanium2 @1.6GHz
128 cores, 512 GB
HP-UX 11iV3
Oracle 10g
30,000 2005
6.0
152,530 2,383 18-Dec-06
Sun SPARC Enterprise M9000
64xSPARC64 VI @2.4GHz
1024 GB
Solaris 10
Oracle 10g
25,130 2005
6.0
129,420 2,022 11-Jul-08
IBM p5 595
64xPOWER5+ @2.3GHz
64 cores, 512 GB
AIX 5.3
DB2 9
23,456 2004
5.0
117,520 1,836 25-Jul-06
Sun SPARC Enterprise M8000
16xSPARC64 VI @2.4GHz
256 GB
Solaris 10
Oracle 10g
7,300 2005
6.0
36,570 2,285 17-Apr-07

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Disclosure Statement:

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 2004/2005 application benchmark as of 07/14/08: Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0; Sun SPARC Enterprise M9000 (64 processors, 128 cores, 256 threads) 64 x 2.4 GHz SPARC64 VI, 1024GB memory, 25,130 SD benchmark users, 1.65 sec. avg. response time, Cert#2008040, Oracle 10g, Solaris 10, SAP ECC Release 6.0; Sun SPARC Enterprise M8000 (16 processors, 32 cores, 64 threads) 16 x 2.4 GHz SPARC64 VI, 256GB memory, 7,300 SD benchmark users, 1.98 sec. avg. response time, Cert#2007026, Oracle 10g, Solaris 10, SAP ECC Release 6.0; IBM Power 595 (32 processors, 64 cores, 128 threads), 35,400 SD benchmark users, 32 x 5.0 GHz POWER6, 512 GB, DB2 9.5, AIX 6.1, Cert. 2008019, SAP ECC Release 6.0; IBM System p5 595 (64 processors, 64 cores, 128 threads), 23,456 SD benchmark users, 64 x 2.3 GHz POWER5+, 512 GB, DB2 9, AIX 5.3, Cert. 2006045, SAP ECC Release 5.0; HP Integrity SD64B (64 processors, 128 cores, 256 threads), 30,000 SD benchmark users, 64 x 1.6 GHz Dual-Core Intel Itanium 2, 512 GB, Oracle 10g, HP-UX 11iV3, Cert#2006089, SAP ECC Release 6.0; SAP, R/3, mySAP reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark.

Sun's submitted results for the SAP-SD 2-Tier benchmark
Certified Results
Performance: 39,100 benchmark users
Server: Sun SPARC Enterprise M9000
Processors: 64 x 2.52 GHz SPARC64 VII
Memory: 1024 GB
Operating system: Solaris 10
Database S/W: Oracle 10g
SAP S/W: SAP ECC 6.0
SAP Certification: #2008042
Storage: 1 x Internal System Disk
8 x Sun StorageTek(tm) 6140 Arrays

[17] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

mpg and perf/watt are misleading

Monday Jun 23, 2008

Last friday I blogged about an article on Duke University's Larrick & Soll's research:

Posting a vehicle’s fuel efficiency in “gallons per mile” (GPM) rather than “miles per gallon” (MPG) would help consumers make better decisions about car purchases and environmental impact, researchers from Duke University’s Fuqua School of Business report in the June 20 issue of Science magazine.
The main issue is that people usually make comparisons by linear improvement in miles/gallon, but this leads most to errors. Switching to gallons/mile (and as I said for servers watt/performance) avoids these problems.

If one does the calculations correctly of course it doesn't matter, but on a quick look one can be mislead. For example, (do this quickly!) if one climbs 10 miles a hill and gets 10mpg and then coasts down the hill for 10 miles getting 100 mpg, how many mpg does one average? If you didn't come up with an answer of 18mpg (or nearly double the uphill rate), then you should consider looking at the reciprocal calculation.

If on that same hill that same car gets 1 gal/10 miles (10mpg) uphill and 0.1gal/10miles (100mpg), then it is easy to that coasting downhill can only come close doubling your fuel efficiency. Even if you doubled your fuel efficiency on the downhill section to 200mpg (0.05gal/10 miiles) you can see that your average fuel efficiency doesn't change much.

As I've said before on servers it is also critical to understand watt/performance on a wide variety of benchmarks, Sun understands this. This way you avoid benchmarks were vendors only highlight small-memory and low-GHz configurations.

Finally increase your server utilisation (even a small amount) and closely look at power-performance (watt/perf).

Like this post? del.icio.us | furl | slashdot | technorati | digg

Dont' use mpg & perf/watt, please use gpm & watt/perf!

Friday Jun 20, 2008

miles/gallon is as misleading to consumers! Remember when I said perf/watt is misleading. How do we all avoid these 'math illusions'? Duke University researchers tell us this is simple, just "flip 'em"

Posting a vehicle’s fuel efficiency in “gallons per mile” (GPM) rather than “miles per gallon” (MPG) would help consumers make better decisions about car purchases and environmental impact, researchers from Duke University’s Fuqua School of Business report in the June 20 issue of Science magazine.
Video of Larrick & Soll discussing their research: click here
Article on Larrick & Soll’s research, which was funded by Duke University.

Check out the above video, you can see that people try to judge by linear improvement in miles/gallon, but this is very misleading. The recommend that we switch to gallons/mile!

Remember back in March 2007, where I said the metric is watt/performance and not perf/watt. http://blogs.sun.com/bmseer/entry/power_efficiency_metrics_clearing_up. Time for SPEC to reconsider their metrics, and only allow default settings to be measured in benchmarks (if power-management is not on by factory default it should NOT be measured in a test - that way customers are best served.

Improving inefficient cars saves a lot of gas, the same valid reasoning shows improving %utilisation IS the big win especially when coupled with efficient servers.

Nothing like a little vindication to start the weekend, OK it's getting late, cya next week :)

for a table on savings at differnet miles/gallon see: http://www.fuqua.duke.edu/news/mpg/table.pdf

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Sun's new IB switches

Friday Jun 20, 2008

Sun is talking more about its IB switches. The large-scale switch is called the Sun Datacenter Switch 3456. You may have seen its internal Sun code name "Magnum."

Sun uses some of these highlights about the Sun Datacenter Switch 3456:

  • Ideally suited for the Sun Blade 6048 modular system to deliver an open PetaScale architecture
  • Highly scalable supporting up to four Sun Datacenter Switches and up to 13,824 server nodes
  • Replaces 300 discrete InfiniBand switches and thousands of cables with a single core switch
  • A 3:1 reduction of physical ports and cables for server connectivity

The smallest IB switch is called the Sun Datacenter Switch 3x24. You may have seen its internal Sun code name "NanoMagnum."

Sun uses some of these highlights about the Sun Datacenter Switch 3x24:

  • When combined with the Sun Blade 6048 modular system, this 1RU 19" InfiniBand switch delivers a high performance switching solution for two up to 288 blade servers
  • Extremely low latency using industry-standard IB transport, and commodity processors from AMD, Intel, and Sun
  • Substitutes competitive 12RU solutions and hundreds of cables with 1/3 number of cables and only occupies up to four rack units for clusters of up to 288 blade servers
If you want to see how you can use this switch to connect six Sun Blade 6048 racks, see this blog: http://blogs.sun.com/simons/entry/inside_nano_magnum_the_sun.

Like this post? del.icio.us | furl | slashdot | technorati | digg

Sun Launches Sun Blade X6450 Server (Xeon) Module Today

Wednesday Jun 18, 2008

Today Sun announced its powerful new Sun Blade X6450 server module at the International Supercomputing Conference (ISC) in Dresden, Germany (blog pics here).

sun.com Sun Blade X6450 features

Sun writes: "The Sun Blade X6450 is the newest in the 6000 series server modules, and brings the Sun Constellation System to the next level of performance through enhanced features that include up to four Intel Xeon dual- or quad-core processors, an optional 16GB Compact Flash storage subsystem, 24 DIMM slots and 110Gbps I/O throughput. This potent technology combination can deliver up to seven teraflops of performance per fully populated Sun Blade 6048 chassis and up to 71% more compute cores."

I'm sure http://blogs.sun.com/HPC/ will write more about ISC in Dresden.

[1] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

another useless unrealistic uber-simplistic TPC-C result

Thursday Jun 12, 2008

The IBM Power 595 IBM reached over 6 million tpmC on the TPC-C benchmark, but IBM avoids single-system TPC-H like the plague, why? Why didn't IBM measure and publish server watts actually used on this benchmark? Did that 4TByte of memory flame their power meters?

    {postscript: an IBM blogger says it is Sun speaking and then points to this blog, no these are the BM Seer's opinions (yes I am a Sun Employee) but don't necesarily represent Sun or Sun's management. I'm glad Sun doesn't post on the above mentioned benchmark. It is worthless. Sun publishes on most benchmarks, I'd say more than IBM, to see a huge list of very reasonable benchmarks avoided by IBM on the power6 servers see: blogs.sun.com/bmseer/entry/they_tried_to_make_ibm}

It is no mystery that my opinion is that the 16-year old TPC-C benchmark has been worthless for at least a decade. It isn't the fact that TPC-C is old but that it does not represent databases today (did it even then?).

Has IBM just optimized solely for TPC-C on hyper-expensive cores? Their engineers basically admit extreme benchmark optimization: http://blogs.sun.com/bmseer/entry/careful_reading_shows_a_lot http://blogs.sun.com/bmseer/tags/tpc-c

It is simplistic, small, encourages silly configs, even honest people in IBM admitted a year ago that it is losing relevance: ftp://ftp.software.ibm.com/eserver/benchmarks/wp_TPC-E_Benchmark_022307.pdf

Even IBM admits in the paper above, "TPC-C configurations do not reflect typical client configurations." They go on to call "Ease of partitioning: Unrealistically easy". Also all referential integrity for every table is turned OFF!

"The TPC-C benchmark is comprised of 5 stored procedure calls: New-Order, Payment, Delivery, Order-Status and Stock-Level." see this Microsoft blog from over a year ago. FIVE, Five, really only five - a huge server doing only 5 very-very simple things on 9 tables. No one in the world has a database that looks like this - it is really useless.

IBM and other vendors keep pushing TPC-C for bragging rights. They spend a huge effort telling customers that they need it.

What's next? IBM re-hyping other ancient benchmarks like Dhrystones as the most relevant benchmark for POWER6?

Disclosure Information:

IBM Power 595 (5 GHz, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5 TPC-C result of 6,085,166 tpmC ($2.81/tpmC, configuration available 12/10/08) Results as of 6/10/08, see www.tpc.org. TPC-C, TPC-H, TPC-E are trademarks of the Transaction Performance Processing Council (TPC).

[10] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Eco on the desktop: Thin(Sunray) vs. Thick(big PC)

Wednesday Jun 11, 2008

Thick versus Thin Clients: Today online at 4PM(east)/1PM(pac) there will be a debate to discuss the energy use and TCO of a thin client model versus their thicker alternatives. See: http://blogs.intel.com/technology/2008/06/ecotechnology_great_debates_at.php

I'm sure the "think thin" will have followup thoughts on this afterwards at: blogs.sun.com/ThinkThin

Desktop systems have a very different usage model (for most people's work environment they are mostly IDLE or very low utilisation), so thin usually wins. Servers are a different beast and you really want to run fewer servers at high utilisation)

Like this post? del.icio.us | furl | slashdot | technorati | digg

SPECpower_ssj too many measurements?

Monday Jun 09, 2008

A posting last week, clearly demonstrated that even small increases in utilisations provide HUGE savings".

Then I started looking at the data, it seems that one only really needs two points(!) {active-idle & 100%) to determine the watts used at any utilisation. Let's take a look at the HP DL580 SPECpower_ssj result.

%utilMeasured
Watts
Linear
Predict
watts
diff
%Diff
100%387w--0%
90%376w375.4w0.6w0%
80%368w363.8w4.2w1%
70%359w352.2w6.8w2%
60%347w340.6w6.4w2%
50%335w329w6w2%
40%322w317.4w4.6w1%
30%309w305.8w3.2w1%
20%294w294.2w-0.2w0%
10%280w282.6w-2.6w-1%
idle271w--0%

I'll look at more at at other SPECpower_ssj results. But it seems that SPEC should just simply add idle watts and wattage measurements at 100% utilisation to ALL SPEC benchmarks and not redesign benchmarks to measure watts at 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%. In the worst case, above the linear prediction was ONLY 2% different than actual watts!

I have long said SPEC should just at watt/perf to all of their benchmarks as currently designed.

Disclosure statement

SPECpower_ssj2008:HP Proliant DL580 G5 (4-chip QC Xeon L7345 1.86GHz), 546 overall ssj_ops/watt, 359,523 ssj_ops and 387 watt at 100% target load, 325,931 ssj_ops and 376 watt at 90% target load, 291,991 ssj_ops and 368 watt at 80% target load, 255,512 ssj_ops and 359 watt at 70% target load, 217,222 ssj_ops and 347 watt at 60% target load, 180,262 ssj_ops and 335 watt at 50% target load, 145,079 ssj_ops and 322 watt at 40% target load, 110,173 ssj_ops and 309 watt at 30% target load, 71,409 ssj_ops and 294 watt at 20% target load, 36,070 ssj_ops and 280 watt at 10% target load, and Active Idle 271 watts. SPEC, SPECpower reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 12/11/07.

In a more realistic configuration the HP DL580 G5, from HP's own power calculators, a HP DL580 G5 with four QC Xeon 2.93GHz Tigerton and 64 GB memory should draw 1,072watts. HP DL580 power consumption from HP Power Calculator system configured with 4 x2.93GHz processors, redundant PSU, 16 x 4GB DIMMs, 8 x 36GB SAS drives,1 x PCI card, 80% utilisation on 9/10/07: http://h30099.www3.hp.com/configurator/powercalcs.asp

[12] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg