Thursday Oct 23, 2008
World Record multi-JVM 2-chip performance. Nice way to end my day presenting some more performance facts...
The Sun Blade T6340 server module
and Sun SPARC Enterprise T5240 server, each with two 1.4 GHz
UltraSPARC T2 Plus processors, obtained the best multi-JVM 2 chip results on
the SPECjbb2005 server-side Java benchmark.
A Sun Blade T6340 server equipped with two UltraSPARC T2 Plus
processors at 1.4GHz, delivered a World Record 2-chip result of 388456
SPECjbb2005 bops, 24279 SPECjbb2005 bops/JVM.
A Sun SPARC Enterprise T5240 server equipped with two UltraSPARC T2
Plus processors at 1.4GHz, delivered an outstanding 2-chip result of
384934 SPECjbb2005 bops, 24058 SPECjbb2005 bops/JVM.
The Sun Blade T6340 (two 1.4 GHz UltraSPARC T2 Plus chips) demonstrated 11%
better performance over the IBM Power 550 (four 4.2 GHz POWER6 chips) result of 350642 SPECjbb2005 bops,
87661 SPECjbb2005 bops/JVM.
The Sun Blade T6340 (two 1.4 GHz UltraSPARC T2 Plus chips) demonstrated
17% better performance over the IBM x3650 result of 330605 SPECjbb2005 bops,
82651 SPECjbb2005 bops/JVM which uses two 3.3 GHz Xeon chips.
The Sun Blade T6340 and Sun SPARC Enterprise T5240 each used Solaris 10 10/08
and Sun JDK 1.6.0_06 Performance Release to obtain these leading results.
Benchmark Description
SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the
database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).
Competitive Landscape
SPECjbb2005 Performance Chart (ordered by performance,
bops: SPECjbb2005 Business Operations per Second (bigger is better)
| System |
Processors |
Performance |
| Ch,Cr,Thr |
GHz Type |
SPECjbb2005 bops |
SPECjbb2005 bops/JVM |
| Sun Blade T6340 |
2,16,128 |
1.4 UltraSPARC T2 Plus |
388456 |
24279 |
| Sun SPARC Enterprise T5240 |
2,16,128 |
1.4 UltraSPARC T2 Plus |
384934 |
24058 |
| IBM Power 550 |
4,8,16 |
4.2 Dual-Core POWER6 |
350642 |
87661 |
| IBM x3650 |
2,8,8 |
3.3 QC Xeon |
330605 |
82651 |
| IBM x3650 |
2,8,8 |
3.1 QC Xeon |
323172 |
80793 |
| Fujitsu RX200 |
2,8,8 |
3.3 QC Xeon |
316728 |
79182 |
| Dell M600 |
2,8,8 |
2.3 QC Xeon |
314513 |
78628 |
| IBM BC HS21XM |
2,8,8 |
3.0 QC Xeon |
310028 |
77507 |
| Dell PE 2950 |
2,8,8 |
3.16 QC Xeon |
305411 |
76353 |
Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.
Disclosure Statement:
SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 10/17/2008 on spec.org. Sun Blade T6340 (2 chips, 16 cores) 388456 SPECjbb2005 bops, 24279 SPECjbb2005 bops/JVM submitted for review.
Sun SE T5240 (2 chips, 16 cores) 384934 SPECjbb2005 bops, 24058 SPECjbb2005 bops/JVM submitted for review.
IBM p550 (4 chips, 8 cores) 350642 SPECjbb2005 bops, 87661 SPECjbb2005 bops/JVM
IBM x3650 (2 chips, 8 cores) 330605 SPECjbb2005 bops, 82651 SPECjbb2005 bops/JVM
IBM x3650 (2 chips, 8 cores) 323172 SPECjbb2005 bops, 80793 SPECjbb2005 bops/JVM
Fujitsu RX200 (2 chips, 8 cores) 316728 SPECjbb2005 bops, 79182 SPECjbb2005 bops/JVM
Dell M600 (2 chips, 8 cores) 314513 SPECjbb2005 bops, 78628 SPECjbb2005 bops/JVM
IBM BC HS21XM (2 chips, 8 cores) 310028 SPECjbb2005 bops, 77507 SPECjbb2005 bops/JVM
Dell PE 2950 (2 chips, 8 cores) 305411 SPECjbb2005 bops, 76353 SPECjbb2005 bops/JVM
Results Summary
| T6340 Results: |
|
388456 SPECjbb2005 bops, 24279 SPECjbb2005 bops/JVM |
| T5240 Results: |
|
384934 SPECjbb2005 bops, 24058 SPECjbb2005 bops/JVM |
| Reference Date: |
|
Oct 21, 2008 |
| Systems: |
|
Sun Blade T6340 |
| |
|
Sun SPARC Enterprise T5240 |
| Total Number Processors: |
|
2, 2 |
| Processor/GHz of Server: |
|
UltraSPARC T2 Plus 1.4 GHz |
| Operating System: |
|
Solaris 10 10/08 |
| JVM: |
|
Java HotSpot(TM) 32-Bit Server, Version 1.6.0_06 Performance Release |
Friday Aug 08, 2008
Sun SPARC Enterprise T5220 / T5240 beats IBM Cell Broadband Engine with
significantly easier application code development!
Pattern matching or string searching are important to a variety of
commercial, government and HPC applications. One of the core
functions needed for text identification algorithms in data
repositories is real-time string searching. For this benchmark, both
IBM and Sun used the Aho-Corasick algorithm for string searching.
Note: Got this from an internal website on info that is going public.
The 2-chip Sun SPARC Enterprise T5240 performed string searching at a
rate of 6.12 GB/s (49.0 Gbit/sec) whereas the 2-chip IBM Cell Broadband Engine
DD3 Blade performed string searching at a rate of 0.48 GB/s (3.8 Gbit/sec).
The 1-chip Sun SPARC Enterprise T5220 performed string searching at a
rate of 3.08 GB/s (24.6 Gbits/s).
The Sun SPARC Enterprise T5240 demonstrated a 2x speedup over the Sun
SPARC Enterprise T5220.
The Aho-Corasick algorithm as deployed on the IBM Cell Broadband
Engine DD3 Blade required substantial optimization and tuning to achieve the
reported performance, whereas on the Sun SPARC Enterprise T5220 or T5240
only a basic implementation of the algorithm and a simple compilation were
needed.
Performance Summary
| System |
Throughput (GBits/sec) |
Chips |
Cores |
GHz |
Sun SPARC Enterprise T5240 |
49.0 |
2 |
16 |
1.4 |
Sun SPARC Enterprise T5220 |
24.6 |
1 |
8 |
1.4 |
IBM Cell Broadband Engine DD3 Blade |
3.8 |
2 |
16 |
3.2 |
IBM results are obtained from Figure 7(d) of
IEEE Computer, Volume 41, Number
4, pp. 42-50, April 2008. Sun benchmark results as of 08/05/2008.
Benchmark Description
One of the core functions needed for text identification algorithms in data
repositories is real-time string searching. This string searching benchmark
demonstrates the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for
both ease of code creation and speed of code execution.
In IEEE Computer, Volume 41, Number 4, pp. 42-50, April 2008, IBM describes a
variant of the Aho-Corasick string searching algorithm that uses deterministic
finite automata. The algorithm first constructs a graph that represents a
dictionary, then walks that graph using successive input characters from a text
file. Each "state" in the graph includes a state transition table (STT)
that is accessed using the next input character from the text file to
determine the address of the next state in the graph. IBM defines an automaton
as a two-step loop that: (1) obtains the address of the next state from the
STT, and (2) fetches the next state in the graph.
IBM reports the performance of its Cell Broadband Engine (CBE) to execute this
algorithm to search a 4.4 MB version of the King James Bible using a dictionary
of the 20,000 most used words in the English language (average word length of
7.59 characters). Each of the 8 synergistic processing elements (SPEs) of each
of the two CBEs executes 16 automata, for a total of 256 automata.
All automata and hence all SPEs access a single, shared dictionary.
IBM describes elaborate optimizations of the Aho-Corasick algorithm, including
state shuffling, state replication, alphabet shuffling and state caching.
These optimizations were required to: (1) overcome "memory congestion", i.e.,
contention amongst the SPEs for access to the shared dictionary, and (2)
compensate for the limited local storage that is associated with each SPE.
These optimizations were necessary to achieve the performance reported for
the CBE DD3 Blade.
IBM does not provide references that indicate where to obtain the dictionary
and Bible. IBM reports the algorithmic performance in Gbits/s but does not
indicate whether an 8-bit byte is extended to 10 bits as required for network
transmission.
In order to closely approximate the dictionary and Bible that were used by IBM,
Sun used a dictionary of 25,144 English words (the Open Solaris
file cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/spell/list)
for which the average word length is 8.22 characters, and a 4.6 MB version
of the King James Bible (www.patriot.net/users/bmcgin/kjv12.zip). For
reporting of results in Gbits/s, the length of a byte is assumed to be 8 bits.
In order to demonstrate the usefulness of Sun's UltraSPARC T2 and T2 Plus
processors for both ease of code generation and speed of code execution, Sun
implemented the Aho-Corasick algorithm using ANSI C. No optimizations of the
algorithm were required to achieve the performance reported for the T5220 and TT5240.
The source code was compiled using the -m64 -xO3 and -xopenmp options. The
dictionary is represented using a graph that comprises 187 MB. Each core of
the T5220 or T5240 executes 8 automata using one OpenMP thread per automaton.
Thus, the T5220 executes 64 total automata and the T5240 executes 128 total
automata. All automata and hence all cores access a single, shared dictionary.
Access to this dictionary is accelerated by the large, shared L2 caches of the
Sun SPARC Enterprise T5220 and T5240.
Disclosure Statement:
Pattern Matching: Sun SPARC Enterprise T5240 (2 x 1.4 GHz UltraSPARC T2 Plus, 2 chips, 16 cores),
Solaris 10, Sun C 5.9, 49.0 GBits/sec;
Sun SPARC Enterprise T5220 (1 x 1.4 GHz UltraSPARC T2, 1 chip, 8 cores),
Solaris 10, Sun C 5.9, 24.6 GBits/sec;
IBM Cell Broadband Engine DD3 Blade (2 x 3.2 GHz Cell Broadband Engine,
2 chips, 16 cores), Linux kernel v2.6.16, IBM CBE Software Development Kit
v2.1, 3.8 GBits/sec.
System Configuration
| Throughput (GBits/sec) |
|
24.6 T5220 |
|   |
|
49.0 T5240 |
| Reference Date: |
|
August 5, 2008 |
| Systems: |
|
Sun SPARC Enterprise T5220, T5240
|
| Total Number Processors: |
|
1, 2 |
| Processor/GHz of Server: |
|
1.4 GHz UltraSPARC T2, T2 Plus |
| Operating System: |
|
Solaris 10 |