Thursday Jan 31, 2008
The Sun Modular Datacenter S20 configured with Sun SPARC Enterprise T5120 servers demonstrated superior performance/watt and performance/space compared to Dell servers using Xeon quad-core processors.
The Sun Modular Datacenter S20 fully configured with the 102 Sun SPARC Enterprise T5120, each with a single UltraSPARC T2, can deliver nearly 455,000 web processing operations/second.
- The Sun Modular Datacenter S20 fully configured with the Sun SPARC Enterprise T5120 systems only requires 160 square feet of space. By comparison, a configuration of Dell servers with 2.66GHz Xeon 5355 in 160
square feet of traditional datacenter space constrained to 150 Watts/square
foot would achieve only about 57,500 web processing operations/second. To
achieve the same level of performance with the Dell configuration in a
traditional datacenter, over 1250 square feet would be needed.
-
The Sun SPARC Enterprise T5120s configured in a Sun Modular
Datacenter S20 are very efficient in terms of space-performance. A Sun Modular
Datacenter S20 fully-configured with Dell servers with 2.66GHz Xeon 5355 would only be able to provide about 1/3 of the web processing
performance of a Sun Modular Datacenter fully-configured with Sun SPARC
Enterprise T5120 servers with UltraSPARC T2.
-
A Sun Modular Datacenter S20 provides 2840 web-processing-ops/sec/sq-ft vs. a
traditional datacenter of Dell servers which provides only 365 web
processing-ops/sec/sq-ft.
-
The Sun Modular Datacenter S20 is very efficient at cooling. Using the same
Sun hardware, the Sun Modular Datacenter S20 is 40% more efficient. This
translates into a savings of 1459 metric tons over 5 years.
Notes:
Due to Sun Modular Datacenter's integrated, high-efficiency power and cooling
of up to 25kW per rack, servers, disks and switches can be racked more densely
than a traditional datacenter.
Many traditional datacenters are constrained to 150 Watts/square foot, this
was used in the above estimates.
BM Seeer: I'll be adding more of the background numbers and calculations when I can find a public version of them, if you are a customer contact Sun and I'm sure you can get versions before I can post them.
Benchmark Description
Web processing performance is based on internal analysis of web
processing workloads. The workloads simulates multiple user web
sessions accessing a web server via static and dynamic HTTP (contains
both HTTP and HTTPS transactions) and is reported as web operations
per second.
System Configuration & Results
| Results |
455,000 web ops/sec |
| Reference Date: |
January 29, 2008 |
| System: |
1 x Sun Modular Datcenter S20 |
| Servers: |
102 x Sun SPARC Enterprise T5120 |
| Total Number Processors: |
102 chips / 816 cores (8 threads/core) |
| Processor/GHz Server: |
Sun UltraSPARC T2 1.4 GHz, 64GB |
| Operating System: |
Solaris 10
|
| Software: |
Sun JSWS 7.0 Update 2
|
Storage & Network
- 6 x Sun StorageTek 3510 (Dual-raid controller)
- 51 x Sun StorageTek 3510 (JBOD)
- 9 x Brocade Silkworm 4100 switches
- 5 x Cisco Catalyst 6509 NEBs Switches
Wednesday Aug 15, 2007
disingenuous = "giving a false appearance of simple frankness".
IBM bloggers try to make valid SPEC estimates (preliminary results)
look like "fake", they keep using words like "virtual" "not real"
and "strange", then for IBM they use word like "real" 7 times.
All UltraSPARC T2 SPEC CPU and SPEC OMP metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use pre-production systems. Sun customer systems, to be announced later, are expected to perform similarly. SPEC rules do allow comparing these preliminary scores and published result.
For details see:
"Estimated" what does that mean for Sun's UltraSPARC T2, which is at:
http://blogs.sun.com/bmseer/entry/estimated_what_does_that_mean
IBM bloggers also imply that IBM doesn't estimate results, I was
seeing lots of IBM POWER6 estimates from IBM 2 years ago and they
were estimates not based on runs.
SPEC, SPECint, SPECfp, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of August 6, 2007.
Tuesday Aug 07, 2007
More about floating-point on the Sun UltraSPARC T2 in this posting, In
the previous posting SPECfp_2006 scores and the UltraSPARC T2 design being open-sourced were discussed.
In the UltraSPARC T2 there are eight floating-point units that are well suited for scientific applications. Based upon preliminary runs the
Sun UltraSPARC T2 processor at 1.4 GHz beats all single chip scores
showing 14230(est)/15081(est) SPECompMbase2001/SPECompMpeak2001.
How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECompMbase2001/SPECompMpeak2001 scores?
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
IBM p520 POWER5+ 1.9GHz processor published result by 85%.
- ...Sun is waiting for POWER6 4.7GHz results, maybe UltraSPARC T2 results will scare IBM from ever publishing a single-chip result?
Benchmark description:
The SpecOMP benchmark is a test of the performance of 9 High
Performance computing applications. It is used to compare the
performance of shared memory servers. All C/C++ and FORTRAN
applications in this suite use the OpenMP programming model that
provides a portable, scalable model for developing parallel
applications for platforms ranging from the desktop to the
supercomputer.
The OpenMP Application Program Interface (API) supports
multi-platform shared-memory parallel programming in C/C++ and Fortran
on all architectures, from the largest Unix servers to the small
Windows NT platforms.
Disclosure statement:
All UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs,
but are nevertheless designated as “estimates” because they use preproduction
systems. SPEC, and SPEComp registered trademarks of Standard Performance
Evaluation Corporation.
Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 14230 (est)/ 15081 (est) SPECompMbase2001/SPECompMpeak2001.
Competitive results from www.spec.org as of
August 6, 2007. IBM p520 1.9GHz (1 chip, 2 cores, 4 threads) published 8141/8174 SPECompMbase2001/SPECompMpeak2001.
Tuesday Aug 07, 2007
Sun UltraSPARC T2 is an amazing chip and very fast! The UltraSPARC T2 features several industry firsts:
- Eight cores and 64 threads
- Integrated 10 GbE networking and I/O
- Dedicated, cryptographic and floating point units per core
- 10 cryptographic functions supported with hardware
- open-source design: www.opensparc.net
Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz,
beat all single chip scores showing 78.3 est. SPECint_rate2006.
How do these preliminary runs (we must use the term "estimated" by
SPEC rules) compare to SPECint_rate2006 results.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
IBM POWER6 4.7GHz processor published result by 29%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
estimated scores of the AMD Barcelona by 23%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
published scores of the 2.66GHz Intel X5355 (Clovertown) by 48%.
Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz,
beat all single chip scores showing 62.3 est. SPECfp_rate2006.
How do these preliminary runs (we must use the term "estimated" by
SPEC rules) compare to SPECfp_rate2006 results.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best
published single-chip IBM POWER6 4.7GHz processor result by 7%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip estimated scores of the AMD Barcelona by 11%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
published scores of the 2.66GHz Intel X5355 (Clovertown) by 66%.
Performance per core doesn't matter GHz doesn't matter, what matters
is numbers of cores, efficiency, and design of the chip! Competitors
are saying that UltraSPARC T2 is proprietary... this makes no sense.
both UltraSPARC T1 and UltraSPARC T2 are open source designs (www.opensparc.net). You do not find the
latest design of Intel, AMD, or IBM as open source designs.
Disclosure Statement:
All Sun UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable”
runs, but are nevertheless designated as “estimates” because they use
preproduction systems. SPEC, SPECint, SPECfp registered trademarks of
Standard Performance Evaluation Corporation. Sun UltraSPARC T2
1.4GHz (1 chip, 8 cores, 64 threads) 78.3 est. SPECint_rate2006,
62.3 est. SPECfp_rate2006.
Competitive results from www.spec.org as of August 6, 2007.
IBM POWER6 4.7GHz (1 chip, 2 cores, 4 threads) 60.9. SPECint_rate2006,
58.0 SPECfp_rate2006.
AMD Barcelona 2.6 GHz (1 chip, 4 cores, 4 threads) 63.9 est SPECint_rate2006,
56.3 est. SPECfp_rate2006. Barcelona estimates based upon "The Register"
article stating 2.6GHz quad is 21% and 50% faster than Intel 2.66 system.
Fujitsu RX300 Intel X5355 2.66 GHz (1 chip, 4 cores, 4 threads) 52.8 SPECint_rate2006, 47.5 SPECfp_rate2006.
Reminder: The Niagara 2 score was obtained from a full "reportable" SPEC
run, but is designated as an "estimate" because a pre-production system
was used.
...more information on the UltraSPARC T2 later today.
Monday Aug 06, 2007
Many news sources now covering UltraSPARC T2, the new high-performance chip from Sun.
This new UltraSPARC T2 chip leads in many ways. I'll cover the performance numbers tomorrow.
For now:
http://www.computerworld.com.au/index.php/id;898889798
http://www.reuters.com/article/technologyNews/idUSN0625780420070806
http://www.channelweb.co.uk/vnunet/news/2195718/sun-lifts-lid-niagara-processor
etc..
For some of my previous comments:
http://blogs.sun.com/bmseer/entry/news_trickles_out_on_niagara2
Please remember that the previous generation chip, the UltraSPARC T1,
just set an application-tier world record (all details at link). How many times has the "old" chip with half as many threads set a world record weeks before the new one is announced?
A final note. I venture that this chip is going to lead for database, application tier, and of course web tier, oh and don't forget HPC, yes it is that versatile.
Monday Jul 16, 2007
Update:
A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster
than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in
both benchmark test cases.
Other runs on the 4-node cluster of Sun Blade X6250 outperformed the
SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the
maximum of 16 cores on all 4 nodes in each cluster.
question: can the Itanic dual-core keep floating?
The 4-node Sun Blade X6250 cluster outperformed the
SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum
of 16 cores on all 4 nodes in each cluster.
Even at the single node configuration, the Sun Blade X6250 beats an SGI
Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4%
faster in the 1-core results.
In summary:
World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160)
beats the best posted results for any single node blades and servers.
All posted results are for 2 socket dual-core platforms
EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model)
results in seconds (smaller is better)
# C P U |
IBM e135
Opt
DC 2.4GHz
Myri net
SLES 9 |
HP BL460
Xeon
DC 3GHz
IB
RHEL 4 |
HP BL460
Opt
DC 3.0GHz
IB
XC3.1 RC1 |
HP DL140
Xeon
DC 3GHz
IB
XC3.1 RC1 |
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4 |
Sun X6250
Xeon 5160
DC 3.0GHz
IB
SLES 10 |
SGI Altix
Itan2
DC 1.6GHz
Pro Pack5 |
SGI Altix XE
Xeon
DC 3GHz
SLES 10 |
| 1 |
- |
- |
- |
- |
- |
822.7 |
1631.4 |
866.1 |
| 2 |
- |
- |
- |
- |
- |
418.5 |
832.7 |
448.8 |
| 4 |
- |
- |
- |
- |
- |
214.9 |
438.4 |
264.8 |
| 8 |
182.9 |
137.2 |
137.8 |
134.7 |
214.3 |
118.6 |
227.2 |
147.9 |
| 16 |
96.3 |
70.4 |
71.3 |
70.5 |
111.4 |
77.5 |
117.9 |
78.1 |
| 32 |
51.5 |
37.0 |
40.6 |
36.6 |
57.9 |
- |
60.2 |
41.9 |
| 64 |
31.5 |
21.5 |
22.9 |
21.1 |
31.8 |
- |
- |
28.0 |
| 96 |
24.7 |
17.3 |
- |
- |
- |
- |
- |
- |
| 128 |
19.0 |
- |
- |
- |
- |
- |
- |
18.1 |
"-" no result published
EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model)
results in seconds (smaller is better)
# C P U |
IBM e135
Opt
DC 2.4GHz
Myri net
SLES 9 |
HP BL460
Xeon
DC 3GHz
IB
RHEL 4 |
HP BL460
Opt
DC 3GHz
IB
XC3.1 RC1 |
HP DL140
Xeon
DC 3.0GHz
IB
XC3.1 RC1 |
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4 |
Sun X6250
Xeon 5160
DC 3GHz
IB
SLES 10 |
SGI Altix
Itan2
DC 1.6GHz
Pro Pack5 |
SGI Altix
XE
Xeon
DC 3GHz
SLES 10 |
| 1 |
- |
- |
- |
- |
- |
1966.4 |
3884.0 |
2043.6 |
| 2 |
- |
- |
- |
- |
- |
987.5 |
2000.4 |
1062.4 |
| 4 |
- |
- |
- |
- |
- |
500.5 |
1054.5 |
620.7 |
| 8 |
424.9 |
310.0 |
306.4 |
258.4 |
490.7 |
258.4 |
526.7 |
316.0 |
| 16 |
216.0 |
165.4 |
- |
160.1 |
253.9 |
164.5 |
272.1 |
174.4 |
| 32 |
112.8 |
82.3 |
84.4 |
83.3 |
129.3 |
- |
139.4 |
90.3 |
| 64 |
61.5 |
43.8 |
43.8 |
43.2 |
68 |
- |
75.6 |
48.7 |
| 96 |
45.2 |
32.3 |
- |
- |
- |
- |
- |
- |
| 128 |
36.8 |
- |
- |
24.4 |
- |
- |
- |
32.8 |
"-" no result published
The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.
Real world CFD engineering models are typically very large and are best analyzed
with many cores in order to achieve reasonable turnaround on run times. Scalability running these
large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores
The PowerFLOW benchmark test suite consists of two
test cases. They are two models of the same analysis but of
differnt sizes(different mesh refinement), pertaining to flow
over a car body. Both models are rather large and scale very well
up to and even beyond 64 cores.
Case #1
Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
Case #2
Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).
It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.
The two test cases in the suite, require from 6 to 8 GB of memory running with only
one core on a single node. This memory requirement per node is reduced when running in a dmp
cluster mode on multi nodes.
Performance when running PowerFLOW in a multi node configuration is significantly
enhanced when using high performance interconnects such as Infiniband
Disclosure Statement:
Exa Corporation Copyright
All information on the EXA website is under Copyright 1996-2007 by Exa Corporation.,
PowerFLOW is a registered trademark of EXA Corporation.
Results from
http://www.exa.com/user_center/index.html as of 07/02/07.
System Configuration
Hardware Configuration:
Sun Blade X6250
4 2-socket Sun Blade X6250
2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
Linux 64-bit SUSE SLES 10
EXA PowerFLOW V3.6c & V4.c
EXA PowerFLOW Benchmark Test Suite
Voltaire GridStack 4.1.5-7 for SLES 10
Friday Jul 13, 2007
The Sun SPARC Enterprise M8000 has topped the performance of the brand new
4.7GHz POWER6 based p570. The Sun Studio 12 Compilers, Solaris 10, and
Sun Performance Library played a key role in obtaining this performance.
The Sun SPARC Enterprise M8000 outperforms the best published POWER6 based
system from IBM p570 by over 12% on the Linpack
benchmark (Highly Parallel Computing). As a reminder IBM cores costs lots more than
any other vendor, so you can't just look at perf/core. Compare systems of similar
pricing and configuration.
The Sun SPARC Enterprise M8000 tops the HP Itanium 2 rx8640
system by 40% on the Linpack HPC benchmark.
The Sun SPARC Enterprise M8000, using Sun Studio 12
delivered a score of 268.6 GFLOPS on the Linpack HPC benchmark.
Funny I read an IBM blog that said all was quiet for them in benchmarks,
Sun decided to keep working during the summer
, and I almost can't keep
going on my regular job, because this blogging hobby is keeping me busy
because so many of my friends in the benchmarking group are producing so
many great results on Sun systems!
LINPACK HPC Performance Chart - GFLOPS (bigger is better)
| System |
GFLOPS |
Processors |
| Total |
Peak |
paralellism |
chips,cores |
Type |
GHz |
| Sun SPARC Enterprise M9000 |
1032.0 |
1228.8 |
128 |
64,128 |
SPARC64 VI |
2.4 |
| Sun SPARC Enterprise M8000 |
268.6 |
307.2 |
32 |
16,32 |
SPARC64 VI |
2.4 |
| Sun SPARC Enterprise M8000 |
255.3 |
291.84 |
32 |
16,32 |
SPARC64 VI |
2.28 |
| IBM p570 |
239.4 |
300.8 |
16 |
8,16 |
POWER6 |
4.7 |
| HP rx8640 |
192.4 |
204.8 |
32 |
16,32 |
Itanium 2 |
1.6 |
Benchmark Description
The Linpack benchmark suite measures the performance for factoring
and solving a dense set of linear equations in double-precision
floating-point.
The Linpack HPC benchmark allows the solution of any size
matrix with a single right hand side. It was developed to allow vendors
to show off their hardware. Because big problems allow for peak
performance potentials, the benchmark is seen as an upper bound of
potential performance of a machine. The run rules are much more
flexible. The solution technique must use a pivoting scheme and the
driver must follow the spirit of the Linpack 1000 or Linpack 100
benchmarks.
Disclosure Statement:
Linpack HPC, results from http://www.netlib.org/benchmark/index.html
as of 07/13/07. Sun SPARC Enterprise M8000 (SPARC64 VI @2.4, 16 chips,
32 cores), 268.6 GFLOPS. IBM p570 (POWER6 4.7GHz, 8 chips, 16 cores)
239.4 GFLOPS. HP rx8640 (Itanium 2 1.6GHz/24MB, 16 chips,
32 cores), 192.4 GFLOPS. Linpack Benchmark Performance Report
Results Summary
| Published Results |
|
Performance: |
|
268.6 GFLOPS |
| System: |
|
Sun SPARC Enterprise M8000, 256GB |
| Total Number Processors: |
|
16 |
| Processor/GHz of Server: |
|
SPARC64 VI, 2.4 GHz |
| Operating System: |
|
Solaris 10 |
| Compiler: |
|
Sun Studio 12 |
Wednesday Jul 11, 2007
Hot off the wire: :AT&T Selects Sun Microsystems Servers for Next-Generation Video Services.
Sun's servers and storage will be deployed in IP-video super hub offices & IP-video hub offices in the AT&T U-verse network.
read more at:
http://money.cnn.com/news/newsfeeds/articles/prnewswire/AQW09811072007-1.htm
...or at:
http://biz.yahoo.com/prnews/070711/aqw098.html?.v=12
Wednesday Jul 11, 2007
Sun Blade X6250 posted World Record on the ABAQUS Explicit benchmark
test suite the Sun Blade X6250 on the MCAE application ABAQUS V6.6.
the Sun Blade X6250 used Xeon 3GHz DC 5160. On the various
test cases Sun beats the Intel Supermicro by or by 1% to 39% !!
The Sun Blade X6250
beats the Intel Supermicro even when you average all of the test case by
an average 4% to 9% (geometric mean of all 6 tests cases at all cpu levels listed).
Both machines have 2 sockets and dual core processors.
Runs were made at 1- 2- and 4-cores and a geometric mean was established
at each of these "cpu" levels based on the 6 test cases in the benchmark test suite.
The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) processors
and under 64-bit Linux SuSE SLES 10 beats all of the following
platforms with results posted at the ABAQUS website
and for all 6 test cases in the ABAQUS "Explicit" benchmark test suite
and at the 3 "cpu" levels (1-, 2- & 4-"cpu's"):
About The ABAQUS Explicit Module
This module designed for crash and high velocity impact analyses
(including wave propagation and inertia effects) is very scalable
and analysis models tend to be very large similar to CFD models.
Timely results are best obtained using multiple processing units
for typically large jobs either on a single multi core server in smp mode or on
a multi node cluster of multi core platforms interconnected in dmp mode.
Consequently this module is meant to run primarily
in a multi cpu situation either in smp mode on a single large multi core machine
or in dmp mode over a cluster of machines.
ABAQUS V6.6-1 Benchmark Test Suites Explicit Benchmark Test Suite Landscape
(time in seconds where smaller is better, Sun % better where bigger is better)
| Platform |
Cores |
e1 |
e2 |
e3 |
e4 |
e5 |
e6 |
Geometric Mean |
| |
| Sun Blade X6250/5160 |
4 |
10451 |
4509 |
3853 |
1887 |
1990 |
5202 |
  |
| Intel Super/5160's/RH4 |
4 |
10696 |
4646 |
3881 |
1997 |
2126 |
5460 |
  |
| Sun % Faster |
  |
2% |
3% |
1% |
6% |
7% |
5% |
4% |
| |
| Sun Blade X6250/5160 |
2 |
14232 |
7401 |
5477 |
2935 |
3327 |
7582 |
  |
| Intel Super/5160's/RH4 |
2 |
14878 |
8044 |
6316 |
3310 |
3483 |
8048 |
  |
| Sun % Faster |
  |
5% |
9% |
15% |
13% |
5% |
6% |
9% |
| |
| Sun Blade X6250/5160 |
1 |
24800 |
14198 |
10174 |
5147 |
6112 |
9553 |
  |
| Intel Super/5160 |
1 |
25076 |
14616 |
10563 |
5225 |
6272 |
13242 |
  |
| Sun % Faster |
  |
1% |
3% |
4% |
1% |
3% |
39% |
8% |
Abaqus/Explicit Benchmark Problems
The problems described below provide an estimate of the performance that can be expected when running Abaqus/Explicit on different computers. The jobs are representative of typical Abaqus/Explicit applications including high-speed dynamic impact events and quasi-static events with complicated contact conditions. The number of increments listed in the tables below are approximate and can vary somewhat depending on the hardware platform and the number of parallel domains.
E1: Car crash
This benchmark consists of passenger car impacting a rigid wall. The car is meshed primarily with shell elements of type S3RS and S4RS with isotropic hardening Mises plasticity material behavior. The various compenents of the car are connected using multi-point constraints and connector elements. Many of the suspension and drivetrain components are modeled as rigid bodies. The car, road surface, and wall are placed into a single general contact domain and the car is given an initial velocity of 25 mph.
E1
Increments: 62,934
Number of elements: 274,632
E2: Cell phone drop
This benchmark consists of a simplified model of a cell phone impacting a fixed rigid floor. The cell phone components are meshed using a variety of element types including C3D8R, C3D10M, and S4R. The material behavior is modeled using linear elasticity, isotropic hardening Mises plasticity, and hyperelasticity. The components are assembled using surface-based mesh ties and placed into a general contact domain that also includes the floor. The initial velocity and orientation of the cell phone is defined such that a severe oblique impact occurs.
E2
Increments: 87,369
Number of elements: 45,785
Memory requirement: 300 MB
E3: Sheet forming
This benchmark consists of forming a sheet metal part by the deep drawing process. The deformable sheet metal blank is meshed with shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. The tools are meshed using surface elements of type SFM3D4R which are declared rigid. General contact is defined between the blank and tools. The analysis sequence consists of two steps. During the first step the blank is clamped between the binder and die and then during the second step the punch is displaced to form the part. Since the process is essentially quasi-static the computations are performed over a sufficiently long time period to render inertial effects negligible. The performance of this analysis is a direct measure of the performance of the three-dimensional general contact algorithm.
E3
Increments: 31,177
Number of elements: 34,540 (deformable only)
Memory requirement: 550 MB
E4: Projectile penetration
This benchmark consists of a projectile penetrating a steel plate at an oblique angle. Both the projectile and plate are meshed using hexahedral elements of type C3D8R and use a rate-dependent isotropic hardening Mises plasticity material model with failure. The projectile and plate are placed into a general contact domain with surface erosion. The edges of the plate are held fixed and the initial velocity of the projectile is specified so that the projectile passes completely through the plate.
E4
Increments: 12,433
Number of elements: 237,100
Memory requirement: 1400 MB
E5: Blast loaded plate
This benchmark consists of a stiffened steel plate subjected to a high intensity blast load. The plate is meshed using shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. There is no contact.
E5
Increments: 81,716
Number of elements: 50,000
Memory requirement: 150 MB
E6: Concentric spheres
This benchmark consists of a large number of concentric spheres with clearance between each sphere. The spheres are meshed using hexahedral elements of type C3D8R and use an isotropic hardening Mises plasticity material model. All of the spheres are placed into a single general contact domain and the outer sphere is violently shaken which results in complex contact interactions between the contained spheres.
E6
Increments: 23,291
Number of elements: 244,124
Memory requirement: 1000 MB
ABAQUS "Standard" & "Explicit" Benchmark Test Suites
Voltaire GridStack 4.1.5-7 for SLES 10
Disclosure Statement:
The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or other countries: Abaqus,
Abaqus/Standard, Abaqus/Explicit.
All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systems.
Results from http://www.simulia.com/support/v66/v66_performance.html as of 7/2/07.
System Configuration
Hardware Configuration:
Sun Blade X6250
4 2-socket Sun Blade X6250's
2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
Linux: 64-bit SUSE SLES 10
ABAQUS V6.6-3
Tuesday Jul 10, 2007
The Sun Blade X6250 outperfoms all posted ANSYS V11.0 (MCAE) results at www.ansys.com website. A single Sun Blade X6250 beats a single Intel S5000 XAL (same 3GHZ Xeon 5160) by as much as 40% at
each of the three "cpu" levels tested (1-, 2-, and all 4 cores available
on both 2 socket platforms equipped with dual core processors).
Sun Wins at these processor configurations in 6 of
the total 7 cases in the benchmark test suite. Overall, on the geometric mean,
Sun was 10% higher.
The only case "bm-2" where the Sun X6250 looses has an exceptionally high I/O component, and even so Sun was only 3-4% slower.
The Sun X6250 had 10K rpm internal disk drives where
the Intel S5000 XAL had 15K rpm drives.
The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest)
and under 64-bit Linux SuSE SLES 10 beats all of the following
platforms with results posted at the ANSYS website
for all 7 test cases in the ANSYS "Standard" benchmark test suite (1-, 2- & 4-cpu).
Yes this result was run with Linux, Sun wants to show that we can
win with every OS. There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0
for X86-64 platform architectures compiled with recent Sun Studio 11 compilers.
This is the first SX64 version that has become available.
Competitive Landscape
ANSYS V 11.0 "Standard" Benchmark Test Suite on X2200 M2 & Constellation Blades
(run times in seconds, smaller is better; for % bigger is better)
| System |
Cores |
bm-1 |
bm-2 |
bm-3 |
bm-4 |
bm-5 |
bm-6 |
bm-7 |
| |
| Sun X6250/5160 |
4 |
100 |
1362 |
343 |
164 |
181 |
131 |
752 |
| Intel S5000XAL/5160 |
4 |
109 |
1312 |
369 |
169 |
187 |
161 |
1048 |
| Sun % better |
|
9% |
-4% |
8% |
3% |
3% |
23% |
39% |
| |
| Sun X6250/5160 |
2 |
118 |
1398 |
385 |
183 |
223 |
169 |
1064 |
| Intel S5000XAL/5160 |
2 |
128 |
1356 |
417 |
186 |
244 |
211 |
1437 |
| Sun % better |
|
9% |
-3% |
8% |
2% |
9% |
25% |
35% |
| |
| Sun X6250/5160 |
1 |
150 |
1455 |
456 |
211 |
339 |
253 |
1770 |
| Intel S5000XAL/5160 |
1 |
164 |
1416 |
489 |
215 |
340 |
314 |
2330 |
| Sun % better |
|
9% |
-3% |
7% |
2% |
1% |
24% |
32% |
(please note: per core performance isn't the right metric for comparing different CPUs, as system costs vary greatly, but they are used here to identify configuration)
It is "SYSTEM" performance not 'core' performance that matters!)
Key Technical Points
- The test cases from the ANSYS standard benchmark test suite all have a substantial I/O
component where 15% to 20% of the total run times are associated with I/O activity
(primarily scratch files).
Performance will be enhanced by using the fastest available drives and striping
together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.
ANSYS 11.0 Standard Test Cases
bm-1
Name:Exhaust Elbow Manifold
Description:Static structural analysis. Solved for equivalent stresses.
Statistics:~850,000 DOF Model
bm-2
Name:Floor Panel
Description:Surface body geometry. Harmonic analysis with mode superposition.
Statistics:~765,000 DOF Model
bm-3
Name:Engine Assembly - Piston and Crank
Description:Assembly with contact. Nonlinear structural DOF solution.
Statistics:~250,000 DOF Model
bm-4
Name:Electric Motor
Description:Electromagnetic analysis. Solved for magnetic field intensities.
Statistics:~250,000 DOF Model
bm-5
Name:Brake Rotor
Description:Thermal transient analysis. Solved for temperature DOF?s.
Statistics:~230,000 DOF Model
bm-6
Name:Wing Section
Description:Static structural analysis.
Statistics:~250,000 DOF Model
Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.
bm-7
Name:Wing Section
Description:Static structural analysis.
Statistics:~800,000 DOF Model
Notes:bm-6 and bm-7 are designed to demonstrate ability of systems to handle larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will be substantially impared in performance on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1 Bm-7 generates a large 12.8 Gb file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On PC workstations the solver will run using less than optimal out-of-core memory requiring excessive I/O during factorization. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.
Disclosure Statement:
The following are trademarks or registered trademarks of ANSYS, Inc. : ANSYS Multiphysics TM All information on the ANSYS website is Copyrighted 2007 by ANSYS, Inc. Results at
http://www.ansys.com/services/hardware-support-db.htm, July 2, 2007.
Hardware Configuration:
Sun Blade X6250
4 2-socket Sun Blade X6250's
2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
32 GB memory
Software Configuration:
64-bit Linux SuSE SLES 10
(note: Sun works great with Linux, that is why we show all kinds of benchmarks! )
ANSYS V11.0
ANSYS 11 "Standard" Benchmark Test Suite
Wednesday Jun 20, 2007
The Sun Fire V890 with 8x 2.1 GHz UltraSPARC IV+ obtained a result of
244846 SPECjbb2005 bops, 30606 SPECjbb2005/JVM on the server-side Java benchmark.
The Sun Fire V890 is 40% faster than the expensive 4-core IBM p570 (4.7 GHz power6). I'll leave it to the reader
to check the system prices to see the "per core performance" comparisons don't work at best, and are in fact are disingenuous because they make you compare systems of very different costs to you. You'll spend more for IBM.
At the high-end the IBM p570 16-core ($Megabucks) is only 2.8 times faster than the Sun Fire V890 2.1GHz (16-core). Again check the prices of the servers to really understand what you are getting.
Postscript:IBM will drone on every time about performance/core
or trying to equate systems on a per-core basis. The reality
is IBM's cores cost so much more than anyone elses cores. Get
very suspicious any time you see IBM saying things like "8-core IBM system vs. 8-core Sun system" or "perf/core IBM wins". Price out one of these comparison and you WILL BE TRULY AMAZED at the smoke & mirrors they are using. Also note how many times they will mention performance on 16-core and they do a price comparison on 8-core with
lots of memory on the Sun system and a sparse config on the IBM system.
If you want to see a chip to chip comparison see
BM Seer's posting of UltraSPARC T1 result. Now that box rocks.
SPECjbb2005 Performance Chart (ordered by performance,
bops : SPECjbb2005 Business Operations per Second (bigger is better)
|
System
|
Date
|
Processors
|
Performance
|
|
(Chips,
Cores,
Threads)
|
GHz/
Type
|
SPECjbb2005
bops
|
JVMs
|
SPECjbb2005
bops/JVM
|
|
IBM p570 power6
|
6/07
|
(8,
16,
32)
|
4.7 power6
|
691,975
|
8
|
86,497
|
|
Sun Fire V890
|
6/07
|
(8,
16,
16)
|
2.1 US-IV+
|
244,846
|
8
|
30,606
|
|
IBM p570 power6
|
6/07
|
(2,
4,
16)
|
4.7 power6
|
175,474
|
2
|
87,737
|
|
Sun Fire V890
|
10/05
|
(8,
16,
16)
|
1.5 US-IV+
|
117,986
|
4
|
29,497
|
Benchmark Description
SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the
database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).
Disclosure Statement:
SPECjbb2005
Sun Fire V890 (8 chip, 16 cores 16 threads) 244846 SPECjbb2005 bops, 30606 SPECjbb2005 bops/JVM,
Sun Fire V890 (8 chip, 16 cores, 16 threads) 117986 SPECjbb2005 bops, 29497 SPECjbb2005 bops/JVM,
IBM System p 570 (4.7 GHz) running AIX 5L V5.3 175,474 SPECjbb2005 bops, 87,737 SPECjbb2005 bops/JVM, 2 chips, 4 cores, 8 threads,
IBM System p 570 (4.7 GHz) running AIX 5L V5.3, 691,975 SPECjbb2005 bops, 86,497 SPECjbb2005 bops/JVM, 8 chips, 16 cores, 32 threads,
SPEC, SPECjbb reg tm of Standard Performance Evaluation
Corporation. Results as of 06/20/2007 on www.spec.org.
Results Summary
|
Results
|
| |
Sun Fire V890: |
|
244846 SPECjbb2005 bops |
| |
|
|
30606 SPECjbb2005 bops/JVM |
| |
Reference Date: |
|
June 20, 2007 |
| Systems: |
|
Sun Fire V890, 64 GB |
| Total Number Processors: |
|
8 |
| Processor/GHz of Server: |
|
US-IV+ 2.1 GHz |
| Operating System: |
|
Solaris 10 6/06 |
| JVM: |
|
Java HotSpot(TM) 32-Bit Server, Version 1.6.0_02 |
See Also
SPECjbb2005 Benchmark Reports
IBM Consolidation Press Release
Wednesday Jun 13, 2007
InfoWorld has published a very positive review of the Sun Fire x4500 server. The combination X4500 running Solaris 10 with Sun's ZFS scored an great 8.8 rating with "Excellent" recommendation.
Paul Venezia, author/reviewer, started with a description X4500 (code name "Thumper"), highlighting it design and unprecedented hard drive capactiy.
He evaluated x4500 running Solaris -- mentioning that other OSes simply did not have the file system capabilities to take full advantage of the huge number
of X4500's drives.
Paul writes: "ZFS and the X4500 go hand in hand, seemingly created for each other in a love story rivaling anything that’s come out of Hollywood in the past 10 years."
"Thumper is aptly named and is a truly unique product from a company that seems to be pulling away from a faltering reputation in the server market. Recent studies have shown that within a few short years, the world will generate more data than it can store. It would seem that Sun is doing its part to bridge that gap."
Read all about it at the full link:
www.infoworld.com/infoworld/article/07/06/07/23TCthumper_1.html
Wednesday Jun 13, 2007
Sun Blade X6250 Delivers a pair of x86 SPEC CPU2006 integer performance World Records:
Sun Blade X6250 (Dual-Core Intel Xeon 5160)
and running Solaris 10 and using Sun Studio 12 compiler delivered the
best x86 result for the SPECint2006 benchmark.
Sun Blade X6250 (Dual-Core Intel Xeon 5160) using Solaris 10 and
Studio 12, delivered x86 4-core world record on
SPECint_rate2006.
Sun Blade X6250 server had a SPECint2006 result of 21.0 and SPECint_rate2006 result of 65.0. The advanced features of freely available
Sun Studio 12 complier were critical for getting this level of
performance on the Sun Blade 6250.
The Sun Blade X6250 is only 3% slower than the peak score of the very-expensive
new IBM POWER6 p570, which was recently announced. SPECint2006 is a single
job stream. So let's now turn to comparing 4 thread results, in this case
the Sun Blade X6250 is 7% faster than the peak SPECint_rate2006 score of
he very-expensive new IBM POWER6 p570 (both IBM and Sun at 4 threads). Oh, and remember that anymore clock
rate is not how you compare systems the Sun Blade X6250 is at 3GHz and the
IBM POWER6 is at 4.7GHz. CPU frequency is basically irrelevant, it is CPU and system architecture that matters!
SPEC CPU2006 Landscape - bigger is better, selected recent results
SPECint2006
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips |
Cores |
Peak |
Base |
| IBM p570 (power6) |
Power6 |
4.7 |
1 |
1 |
21.6 |
17.8 |
| Sun Blade X6250 |
Intel Xeon 5160 |
3.0 |
2 |
4 |
21.0 |
|
| Supermicro X7DB8+ board |
Intel Xeon 5160 |
3.0 |
2 |
4 |
20.8 |
18.9 |
| Sun Ultra 40 M2 |
AMD Opteron 2222SE |
3.0 |
2 |
4 |
16.1 |
|
SPECint_rate2006
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips |
Cores |
Threads / Copies |
Peak |
Base |
| Sun Blade X6250 |
Intel Xeon 5160 |
3.0 |
2 |
4 |
4 |
65.0 |
|
| Supermicro X7DB8+ |
Intel Xeon 5160 |
3.0 |
2 |
4 |
4 |
64.9 |
60.0 |
| IBM p570 (Power6) |
Power6 |
4.7 |
1 |
2 |
4 |
60.9 |
53.2 |
| Sun Ultra 40 M2 |
AMD Opteron 2222SE |
3.0 |
2 |
4 |
4 |
60.4 |
|
| Fujitsu BX620 S3 |
Xeon 5160 (Woodcrest) |
3.0 |
2 |
4 |
4 |
59.4 |
56.7 |
Results as of 06 Jun 2007 from www.spec.org.
Benchmark Description
SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and
CINT2006. CFP2006 targets floating-point performance, while CINT2006
targets integer performance.
Each suite has two different measures. First is the CPU measure, which
is the performance on the suite as a single stream. This can be either
a single thread or automatic compiled parallel run. This measure is
further defined by base and optimized runs. Base uses the same compiler
flags for all kernels, where optimized is allowed to use different
compiler flags for each kernel. Results are compared against a baseline
system run that was standardized by SPEC.
The second measure is Rate. It is a measure of how many CPU measures
can be run at a time. Typically, it is run as n processes on n
processors. It shows how well the same job mix can run on a system
under some load. It also is run as a base and optimized set of
results.
Disclosure Statement:
SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation.
Results from
www.spec.org or from IBM public websites as of 6/06/07.
Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 65.0 SPECint_rate2006;
Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 21.0 SPECint2006;
IBM System p 570 (POWER6, 1chip/1core, AIX 5L v5.3) 21.6 SPECint2006;
IBM System p 570 (POWER6, 4 theads, 1chip/2cores, AIX 5L v5.3) 60.9 SPECint_rate2006.
System Configuration
| Results |
| Reference Date: |
|
Jun 06, 2007 |
| System: |
|
Sun Blade X6250
SPEED: 16GB memory 8x2GB
RATE : 32GB memory 8x4GB |
|
X6250 |
|
21.0 SPECint2006 |
|
X6250 |
|
65.0 SPECint_rate2006 |
| Total Number Processors: |
|
2 x Intel Xeon 5160 |
| Software: |
|
Solaris 10 11/06, Sun Studio 12 Compiler, MicroQuill's SmartHeap Library v7.4 |
See Also
All Benchmark results on Sun Blade 6000 Blade Server
I'm just curious - why 3510 and not 2540?
that's a good question, I don't know the answ...
Perhaps there wasn't enough power for the 2540 - p...
Numbers were based on actual results, even the pow...
That seems to be a URL for results on a T5220...
Yes, the URL lists the bigger more configurable T5...
That "inflated power estimate" was ...