Friday Jan 11, 2008
For the HPC crowd, the latest version of Sun's MPI library for Solaris (both x86 and SPARC of course) can now be
freely downloaded ClusterTools 7.1 download.
Sun's ClusterTools 7.1 is based on Open MPI 1.2.4, which is another of the open efforts which Sun actively contributes.
The release adds Intel support, improved parallel debugger support, PBS Pro validation, improved memory
usage for communication operations, and various updates. Sun's high-performance compiler (Sun Studio 12) is of course also supported.
Friday Oct 26, 2007
Sun gets two new World Records for scientific desktop performance
on the Sun Ultra 24 (single 3.0 GHz Intel DC Xeon E6850)
and the 64-bit SuSE Linux SLED 10 operating system.
Results obtained from the most current competitive platforms have
been recently posted for two different Mathematica 6 benchmarks:
- The Wolfram (Mathematica ISV) Benchmark
- The Independent (Mathematica MMA6.0.nb ) Benchmark
Although both of the Mathematica 6 benchmark test suites
contains 15 test cases these test cases are different
and the two test suites are separate and distinct from each other.
The Ultra 24 beats all results currently listed at both benchmark sites.
The Wolfram (Mathematica ISV) benchmark the Ultra 24
beats other current Intel Xeon (Woodcrest) dual core platforms
(3.0 GHz & 2.66 GHz), Intel based Apple MAC desktops. Itanium 2 platforms,
Pentium 4 platforms, and the IBM Power based platforms.
Alternatively, the independent Mathematica MMA6.0 notebook
benchmark the Ultra 24 beats posted results from primarily
current competitive Apple MAC desktops:
MacPro, MacBook, iMac, and Apple Powerbook G4
Results for both benchmark test suites are shown in the Two Tables
below under "Competitive Landscape"
Table 1. The Wolfram (Mathematica ISV) Benchmark
Summary results as in the installed Mathematica 6 Data Base.
This is the latest version of Mathematica timing tests.
Overall performance in 15 test calculations (Bigger is better)
The current reference is a machine with a 2.4 GHz Pentium 4 processor
| PLATFORM | Score
|
| 1 socket DC 3GHz Intel Xeon DC E6850 SLED 10 SP 1 Ultra 24 |
3.266 |
| 2-socket DC 3GHz Intel Xeon 5160 MS 32-bit |
2.84 |
| 2-socket DC 3.2 GHz Opteron 2224 Ultra 40 M2 64-bit SLED 10 32 GB |
2.736 |
| 2-socket DC 3.2GHz Opteron 2224 Ultra 40 M2 32-bit Windows XP SP2 8GB |
2.45 |
| 2 socket DC 2.66 GHz Intel Xeon 64-bit Apple MAC 10.4.8 |
2.14 |
| 2 socket QC 1.6 GHz Intel Xeon 5310 32-bit Cent OS Linux |
1.88 |
| 2 socket DC 2.5 GHz G5 Apple MAC OS 10.4.8 32-bit |
1.22 |
| 1 socket 2.4 GHz Pent. 4 MS Win XP 32-bit |
1.00 |
The Independent (Mathematica MMA6.0.nb ) Benchmark
Summary results as listed at the independent Mathematica MMA6
http://smc.vnet.net/timings60.html website.
This is the latest version of the "Mathematica MMA" timing tests.
Overall performance in 15 test calculations (Bigger is better)
The current reference machine is one with a 2.33GHz Intel Core 2 Duo processor.
| PLATFORM | Score
|
| Sun Ultra 24 3.0 GHz DC Intel E6850 8GB SuSE 10 SP 1 |
1.27505 |
| MacPro, 3.0GHz Intel Core2 Duo, 4GB, MacOS 10.4.9 [4] |
1.25404 |
| AMD Athlon 64 FX-74, 3.0GHz Socket F (1207 FX) DSDC, Windows [5] |
1.14464 |
| iMac, 2.33GHz Intel Core2 Duo, 3GB, MacOS 10.4.9 [2] |
1.00338 |
| MacBook Pro, 2.33GHz Intel Core2 Duo, 2 GB RAM, MacOS X 10.4.9 [1] |
1.00105 |
| MacBook, 2GHz Intel Core2 Duo, 2GB, MacOS 10.4.10 [6] |
0.880472 |
Benchmark Description
The Wolfram (Mathematica ISV) Benchmark
The Wolfram (Mathematica ISV) benchmark is a revised one
that now comes imbedded in the latest release of Mathematica (currently V6.0)
along with a database of results from current hardware vendor platforms.
This benchmark was developed by Schoeller Porter, one of the principlal
developers of Mathematica. He described the benchmark as follows:
This is the standard benchmark suite for Mathematica, initially introduced
in Mathematica 5.1 (as MathematicaMark2004). It includes both workstation
and parallel benchmarks. The parallel benchmark is automatically invoked
when the Parallel Computing Toolkit is loaded
and compute kernels are available.
It is actively developed, and MathematicaMark 6.0 is the current version.
The 15 Task benchmark includes:
Benchmark Name: MathematicaMark6
Full Version Number:6.0.1
Date: September 14, 2007
Benchmark Result: 3.266
Total Time 26.39
Results:
Data Fitting: 1.273
Digits of Pi: 0.488
Discrete Fourier Transform: 0.765
Eigenvalues of a Matrix: 2.059
Elementary Functions: 3.645
Gamma Function: 0.368
Large Integer Multiplication: 0.734
Matrix Arithmetic: 2.798
Matrix Multiplication: 3.062
Matrix Transpose: 1.298
Numerical Integration: 2.017
Polynomial Expansion: 1.352
Random Number Sort: 1.506
Singular Value Decomposition: 2.346
Solving a Linear System: 2.679
Output
Cell Change Times->{3.398799503863311*^9
The Independent (Mathematica MMA6.0.nb ) Benchmark
The Mathematica MMA 6 benchmark is a widely recognized benchmark.
The tasks are representative important scientific
computing desktop activities.
This benchmark was developed by karl.unterkofler@fh-vorarlberg.ac.at
The benchmark consists of 15 tasks.
Disclosure Statement:
Mathematica MMA 6 Scientific Benchmark Sun Fire Ultra 24 score: 1.27505. Mathematica is a reg
tm of Wolfram Research, Inc. results as of 10/23/07 on http://smc.vnet.net/timings60.htmlResults Summary
The Sun Ultra 24 workstation gives the best
desktop scientific computing performance as demonstrated with both
the The Wolfram (Mathematica ISV) Benchmark and the
The Independent (Mathematica MMA6.0.nb ) Benchmark.
Both of these 15 task benchmarks consists of operations that are representative
of computing a variety of scientific funtions.
| Reference Date | 23 October 2007 |
| |
| The Wolfram (Mathematica ISV) Benchmark |
| Platform | Sun Ultra 24 Workstation |
| Total Number Processors | 1 |
| Processor/GHz of Workstation | Intel DC E6850/3.0 GHz |
| Memory | 4x2 GB DDR2 667 MHz dimms |
| Operating System | 64-bit SUSE SLED 10 SP 1 |
| Graphics | nVidia Quadro FX 1700 framebuffer |
| Disks | 2x146 GB 15K rpm SAS striped |
| Software |
Mathematica 6 (Scientific Application) Wolfram (ISV) Benchmark |
| Composite Score | 3.266 |
| |
| The Independent (Mathematica MMA6.0.nb ) Benchmark |
| Platform | Sun Ultra 24 Workstation |
| Total Number Processors | 1 |
| Processor/GHz of Workstation | Intel DC E6850/3.0 GHz |
| Memory | 4x2 GB DDR2 667 MHz dimms |
| Operating System | 64-bit SUSE SLED 10 SP 1 |
| Graphics | nVidia Quadro FX 1700 framebuffer |
| Disks | 2x146 GB 15K rpm SAS striped |
| Software |
Mathematica 6 (Scientific Application) The Independent (Mathematica MMA6.0.nb ) Benchmark |
| Composite Score | 1.27505 |
Friday Oct 26, 2007
I now see on our Sun internal sites that some of our my fellow engineers didn't fully check
all of the submission dates for all the SPEC APC claims. I'll repost the results, when these APC experts get this all straighten out. Until then I've removed the page so I don't violate any SPEC rules.
This sort of thing usually gets straighten out in a few days. As always unintentional, but it happens in a busy world.
Wednesday Oct 24, 2007
The Sun Ultra 24 desktop sets a world record in the MCAD market.
The Ultra 24 beats competitive platforms from Dell, IBM, and HP.
The single socket Ultra 24 can use either Intel dual-core and quad-core
processors. The Sun Ultra 24 demonstrates both excellent performance
and $/performance.
Pro/E is leading software MCAD system. Most major MCAE ISV applications have
integration with Pro/E. Pro/E is used in a variety of different disciplines
such as automotive, aircraft, aerospace, marine, oil&gas, earth moving,
biomedical, heavy industry, atomic energy, etc.
Sun supports Pro/E on Opteron-based desktop platforms and Xeon-based platforms.
Pro/E users appreciate Solaris for its maturity, reliability, suberb
maintenanace and comprehensive well developed network features. This is
a benefit for many engineering corporations that have distributed design.
The OCUS V5 benchmark has a 32-bit "Normal" benchmark
and a newer 64-bit "Large Memory" benchmark to show performance on larger new workloads.
The 32-bit "Normal" OCUS V5 benchmark and World Record Ultra 24 Performance
The Sun Ultra 24 (3GHz QC Intel QX6850 Xeon processor, 8GB memory,
an nVidia Quadr0 FX 5600 framebuffer, 2x15K SAS striped drives under 64-bit
Win 2003 SP 2 XP 64-Ed. sets a new MCAD world record running
the Pro/E Wildfire OCUS V5 32-bit "Normal" benchmark
beating all "legitimate" hardware vendors with results currently posted
at the OCUS V5 www.proesite.com benchmark website.
Reruns on the same Ultra 24 platform but with a 3GHz DC Intel
Xeon E6850 processor also with nVidia Quadro FX 5600 produced essentially
identical world record results as obtained in the initial runs with a
with a 3GHz QC E6850 processor.
Further reruns on the same Ultra 24 platform with a 3GHz DC Xeon
E6850 but with an nVidia Quadro FX 1700 produced essentially
identical world record results as obtained in the initial runs with a
3GHz QC QX6850 processor.
These results obtained with Pro/E WF 3 are better than any others posted
at the Pro/E Wildfire OCUS V5 "Normal" benchmark website by "legitimate"
harware vendors.
The top most competition comes from current Dell and HP desktop
platforms both with the recent dual-core 3GHz Woodcrest 5160
Intel processors or the Intel Core2 Duo Extreme processors
The 64-bit "Large Memory" OCUS V5 benchmark and World Record Ultra 24 Performance
-
The Sun Ultra 24 with a 3GHz DC Intel Xeon E6850, 8GB memory,
an nVidia Quadr0 FX 1700 framebuffer, 2x15K SAS striped drives under 64-bit
Win 2003 SP 2 XP 64-Ed. sets a new MCAD world record running
the Pro/E Wildfire OCUS V5 64-bit "Large Memory" benchmark
-
Reruns on the same Ultra 24 again with a 3GHz DC Intel Xeon E6850
but with the NVidia Quadro FX 5600 framebuffer instead
of the NVidia Quadro FX 1700 also produced essentailly the same
world record results.
-
Further reruns on the same Ultra 24 platform but now with a 3GHz QC Intel
QX6850 processor (same nVidia Quadro FX 5600 framebuffer)
produced essentially identical world record results as obtained
in the initial runs with a 3GHz DC E6850 Xeon and an nVidia Quadro
FX 1700 framebuffer.
-
These results obtained with Pro/E WF 3 are better than any others posted
at the Pro/E Wildfire OCUS benchmark website.
The top most competition comes from current Dell and HP desktop
platforms both equipped with the recent dual core 3GHz Woodcrest 5160
Intel processors or the Intel Core2 Duo Extreme processors
PRO/E WILDFIRE MCAD OCUS V5 32-bit "NORMAL" BENCHMARK Selected results are
run times in seconds, smaller is better
Ultra 24 vs. Topmost Current Posted Competitive Result
|
|
Time (in seconds) |
|
| Platform |
Processor |
Total |
Graphics |
CPU |
Disk I/O |
OS |
| Ultra 24 |
1x3.0GHz QC Intel QX6850 |
1228 |
664 |
563 |
91 |
Win 64 XP |
| Dell Prec 390 |
1x2.93GHz Intel Core2 X6800 |
1285 |
692 |
591 |
95 |
Win 64 XP |
PRO/E WILDFIRE MCAD OCUS V5 64-bit "Large Memory" BENCHMARK
Selected results are
run times in seconds, smaller is better
Ultra 24 vs. Topmost Current Posted Competitive Result
|
|
Time (in seconds) |
|
| Platform |
Processor |
Total |
Graphics |
CPU |
Disk I/O |
OS |
| Ultra 24 |
1x3.0GHz DC Intel E6850 |
2809 |
877 |
1926 |
352 |
Win 64 XP |
| Dell Prec. 490 |
1x3.0GHz DC Intel 5160 |
3026 |
1094 |
1925 |
341 |
Win 64 XP |
For results see OCUS website: http://www.proesite.com
Results Summary
PRO/E WILDFIRE MCAD OCUS V5 32-bit "NORMAL" BENCHMARK
|
|
| Submitted Results | 32-bit "Normal" OCUS V5 Benchmark |
| Reference Date | 23 October 2007 |
| Platform | Sun Ultra 24 Workstation |
| Total Number Processors | 1 |
| Processor/GHz of Workstation | Intel QC QX6850/3.0 GHz |
| Memory | 4x2 GB DDR2 667MHz dimms |
| Operating System | Win 2003 SP 2 64 Ed. |
| Graphics | nVidia Quadro FX 5600 framebuffer |
| Disks | 2x146 GB 15K rpm SAS striped |
| Software |
Pro/E Wildfire 3 (MCAD Application) |
| OCUS V5 32-bit "Normal" Benchmark |
| Total Elapsed Time | 1228 seconds |
| Total CPU Time | 563 seconds |
| Total Graphics Time | 664 seconds |
| Total Disk I/O Time | 91 seconds |
|
PRO/E WILDFIRE MCAD OCUS V5 64-bit "Large Memory" BENCHMARK
|
|
| Submitted Results | 64-bit OCUS V5 Benchmark |
| Reference Date | 23 October 2007 |
| Platform | Sun Ultra 24 Workstation |
| Total Number Processors | 1 |
| Processor/GHz of Workstation | Intel DC E6850/3.0 GHz |
| Memory | 4x2 GB DDR2 667MHz dimms |
| Operating System | Win 2003 SP 2 64 Ed. |
| Graphics | nVidia Quadro FX 1700 framebuffer |
| Disks | 2x146 GB 15K rpm SAS striped |
| Software |
Pro/E Wildfire 3 (MCAD Application) |
| OCUS V5 64-bit "Large Memory" Benchmark |
| Total Elapsed Time | 2809 seconds |
| Total CPU Time | 1926 seconds |
| Total Graphics Time | 877 seconds |
| Total Disk I/O Time | 352 seconds |
|
Wednesday Aug 08, 2007
Why does Sun designate yesterday's performance results as "estimates",
why that word? Did some Sun marketeer just throw a dart and just pick a big number. No. All
UltraSPARC T2 SPEC CPU and SPEC OMP metrics quoted are from full “reportable” runs,
but are nevertheless designated as “estimates” because they use
pre-production systems. Sun customer systems, to be announced later, are expected to perform similarly. SPEC rules do allow comparing
these preliminary scores and published result.
Is Sun the only vendor to use this clause? No. Intel and AMD have made
a long history of using preliminary numbers at chip announcements to get
the word out about their performance. Sun is just following their lead,
and trumping their performance
Ok, back to why the word "estimates?" The SPEC CPU committee voted
to use that specific word for preliminary scores. Members include
IBM, Intel, AMD, HP, .... And every employee of a member company must follow the rules.
By license agreement, SPEC members and customers agree to run and report results as specified in each benchmark suite's documentation.
from SPEC FAQ
Postings on Sun's UltraSPARC T2 performance:
http://blogs.sun.com/bmseer/entry/performance_of_the_new_sun
http://blogs.sun.com/bmseer/entry/ultrasparc_t2_more_floating_point
http://blogs.sun.com/sprack/entry/ultrasparc_t2_world_class_crypto
OpenSPARC T2:
http://blogs.sun.com/d/entry/ultrasparc_t2_documentation_available
Ubunu (aready booted on UltraSPARC T2):
Ubuntu & Canonical & UltraSPARC T1 (May06).
As a Sun employee I try my best to follow every rule when talking about results in public, but I'm an engineer so sometimes it is hard to follow all the legalese so I try to correct things as soon as I see an error. And I do my best to remind other Sun bloggers to put in the proper disclosure statement for SPEC & TPC benchmark results. Though quite
honestly I wish SPEC & TPC would streamline the rules, make them more consistent, and minimize the lengthy disclosure statements.
Of course because Sun is in the lead and because I made some suggestions, I'm sure this entry will be fully scrutinized by every
competitor. If I made errors let me know in the comments and I will correct them.
Disclosure Statement
SPEC, SPECint, SPECfp, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of August 6, 2007. Actually this one is short because I didn't put any
specific results in this posting, the ones at the links have the more extensive disclosures because they show scores & results.
Tuesday Aug 07, 2007
Beyond UltraSPARC T2 what other technologies matter? There are two more keys to Sun providing such effective performance in the
new single-chip Sun UltraSPARC T2 64-thread processor, that is Solaris (and
now of course OpenSolaris) and Sun Studio compilers. Here is a nice slide of the history of hardware history of SPARC, I borrowed this on from
an entry in "On the Record"
An important thing to remember
that besides Sun's long history with SPARC, we've also lead the way in parallelism. Over 15 years ago, Solaris supported 64-way SPARC systems and
provided near-linear scaling. For those of you old enough to remember, at
that time IBM, SGI, HP, and everyone else thought there was no way Sun
could produce effective 64-way systems. They were wrong and now our competitors have finally
all have introduced systems with lots of processors and/or threads.
Solaris and Sun Studio compilers have a LONG history and lots of experience with industrial-strength applications with lots of threads.
Solaris and Sun Studio compilers were great at scaling to 64-way systems 15 years ago, with a lot more experience and hard work we are even better at scaling and will scale to lots more threads right now. Many thanks to all of those compiler & OS engineers!
Postings on Sun's UltraSPARC T2 performance:
http://blogs.sun.com/bmseer/entry/performance_of_the_new_sun
http://blogs.sun.com/bmseer/entry/ultrasparc_t2_more_floating_point
http://blogs.sun.com/sprack/entry/ultrasparc_t2_world_class_crypto
OpenSPARC T2:
http://blogs.sun.com/d/entry/ultrasparc_t2_documentation_available
...I've focused on Solaris, but there are options, for
example Ubuntu. Ubuntu has already booted on the UltraSPARC T2.
As as a reminder Ubuntu and Canonical proved it on an UltraSPARC T1 almost 14 months ago, see this article on that work.
Tuesday Aug 07, 2007
More about floating-point on the Sun UltraSPARC T2 in this posting, In
the previous posting SPECfp_2006 scores and the UltraSPARC T2 design being open-sourced were discussed.
In the UltraSPARC T2 there are eight floating-point units that are well suited for scientific applications. Based upon preliminary runs the
Sun UltraSPARC T2 processor at 1.4 GHz beats all single chip scores
showing 14230(est)/15081(est) SPECompMbase2001/SPECompMpeak2001.
How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECompMbase2001/SPECompMpeak2001 scores?
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
IBM p520 POWER5+ 1.9GHz processor published result by 85%.
- ...Sun is waiting for POWER6 4.7GHz results, maybe UltraSPARC T2 results will scare IBM from ever publishing a single-chip result?
Benchmark description:
The SpecOMP benchmark is a test of the performance of 9 High
Performance computing applications. It is used to compare the
performance of shared memory servers. All C/C++ and FORTRAN
applications in this suite use the OpenMP programming model that
provides a portable, scalable model for developing parallel
applications for platforms ranging from the desktop to the
supercomputer.
The OpenMP Application Program Interface (API) supports
multi-platform shared-memory parallel programming in C/C++ and Fortran
on all architectures, from the largest Unix servers to the small
Windows NT platforms.
Disclosure statement:
All UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs,
but are nevertheless designated as “estimates” because they use preproduction
systems. SPEC, and SPEComp registered trademarks of Standard Performance
Evaluation Corporation.
Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 14230 (est)/ 15081 (est) SPECompMbase2001/SPECompMpeak2001.
Competitive results from www.spec.org as of
August 6, 2007. IBM p520 1.9GHz (1 chip, 2 cores, 4 threads) published 8141/8174 SPECompMbase2001/SPECompMpeak2001.
Friday Jul 27, 2007
Brian May of the rock group Queen: "...60-year-old guitarist and songwriter said he plans to submit his thesis, ''Radial Velocities in the Zodiacal Dust Cloud,'' to supervisors at Imperial College London within the next two weeks." write the New York Times.
for more see:
http://www.nytimes.com/aponline/arts/AP-People-Brian-May.html
NYT also says:
Filed at 10:10 a.m. ET
LONDON (AP) -- Brian May is completing his doctorate in astrophysics, more than 30 years after he abandoned his studies to form the rock group Queen.
Congrats!
Monday Jul 16, 2007
Update:
A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster
than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in
both benchmark test cases.
Other runs on the 4-node cluster of Sun Blade X6250 outperformed the
SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the
maximum of 16 cores on all 4 nodes in each cluster.
question: can the Itanic dual-core keep floating?
The 4-node Sun Blade X6250 cluster outperformed the
SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum
of 16 cores on all 4 nodes in each cluster.
Even at the single node configuration, the Sun Blade X6250 beats an SGI
Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4%
faster in the 1-core results.
In summary:
World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160)
beats the best posted results for any single node blades and servers.
All posted results are for 2 socket dual-core platforms
EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model)
results in seconds (smaller is better)
# C P U |
IBM e135
Opt
DC 2.4GHz
Myri net
SLES 9 |
HP BL460
Xeon
DC 3GHz
IB
RHEL 4 |
HP BL460
Opt
DC 3.0GHz
IB
XC3.1 RC1 |
HP DL140
Xeon
DC 3GHz
IB
XC3.1 RC1 |
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4 |
Sun X6250
Xeon 5160
DC 3.0GHz
IB
SLES 10 |
SGI Altix
Itan2
DC 1.6GHz
Pro Pack5 |
SGI Altix XE
Xeon
DC 3GHz
SLES 10 |
| 1 |
- |
- |
- |
- |
- |
822.7 |
1631.4 |
866.1 |
| 2 |
- |
- |
- |
- |
- |
418.5 |
832.7 |
448.8 |
| 4 |
- |
- |
- |
- |
- |
214.9 |
438.4 |
264.8 |
| 8 |
182.9 |
137.2 |
137.8 |
134.7 |
214.3 |
118.6 |
227.2 |
147.9 |
| 16 |
96.3 |
70.4 |
71.3 |
70.5 |
111.4 |
77.5 |
117.9 |
78.1 |
| 32 |
51.5 |
37.0 |
40.6 |
36.6 |
57.9 |
- |
60.2 |
41.9 |
| 64 |
31.5 |
21.5 |
22.9 |
21.1 |
31.8 |
- |
- |
28.0 |
| 96 |
24.7 |
17.3 |
- |
- |
- |
- |
- |
- |
| 128 |
19.0 |
- |
- |
- |
- |
- |
- |
18.1 |
"-" no result published
EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model)
results in seconds (smaller is better)
# C P U |
IBM e135
Opt
DC 2.4GHz
Myri net
SLES 9 |
HP BL460
Xeon
DC 3GHz
IB
RHEL 4 |
HP BL460
Opt
DC 3GHz
IB
XC3.1 RC1 |
HP DL140
Xeon
DC 3.0GHz
IB
XC3.1 RC1 |
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4 |
Sun X6250
Xeon 5160
DC 3GHz
IB
SLES 10 |
SGI Altix
Itan2
DC 1.6GHz
Pro Pack5 |
SGI Altix
XE
Xeon
DC 3GHz
SLES 10 |
| 1 |
- |
- |
- |
- |
- |
1966.4 |
3884.0 |
2043.6 |
| 2 |
- |
- |
- |
- |
- |
987.5 |
2000.4 |
1062.4 |
| 4 |
- |
- |
- |
- |
- |
500.5 |
1054.5 |
620.7 |
| 8 |
424.9 |
310.0 |
306.4 |
258.4 |
490.7 |
258.4 |
526.7 |
316.0 |
| 16 |
216.0 |
165.4 |
- |
160.1 |
253.9 |
164.5 |
272.1 |
174.4 |
| 32 |
112.8 |
82.3 |
84.4 |
83.3 |
129.3 |
- |
139.4 |
90.3 |
| 64 |
61.5 |
43.8 |
43.8 |
43.2 |
68 |
- |
75.6 |
48.7 |
| 96 |
45.2 |
32.3 |
- |
- |
- |
- |
- |
- |
| 128 |
36.8 |
- |
- |
24.4 |
- |
- |
- |
32.8 |
"-" no result published
The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.
Real world CFD engineering models are typically very large and are best analyzed
with many cores in order to achieve reasonable turnaround on run times. Scalability running these
large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores
The PowerFLOW benchmark test suite consists of two
test cases. They are two models of the same analysis but of
differnt sizes(different mesh refinement), pertaining to flow
over a car body. Both models are rather large and scale very well
up to and even beyond 64 cores.
Case #1
Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
Case #2
Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).
It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.
The two test cases in the suite, require from 6 to 8 GB of memory running with only
one core on a single node. This memory requirement per node is reduced when running in a dmp
cluster mode on multi nodes.
Performance when running PowerFLOW in a multi node configuration is significantly
enhanced when using high performance interconnects such as Infiniband
Disclosure Statement:
Exa Corporation Copyright
All information on the EXA website is under Copyright 1996-2007 by Exa Corporation.,
PowerFLOW is a registered trademark of EXA Corporation.
Results from
http://www.exa.com/user_center/index.html as of 07/02/07.
System Configuration
Hardware Configuration:
Sun Blade X6250
4 2-socket Sun Blade X6250
2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
Linux 64-bit SUSE SLES 10
EXA PowerFLOW V3.6c & V4.c
EXA PowerFLOW Benchmark Test Suite
Voltaire GridStack 4.1.5-7 for SLES 10
Friday Jul 13, 2007
The Sun SPARC Enterprise M8000 has topped the performance of the brand new
4.7GHz POWER6 based p570. The Sun Studio 12 Compilers, Solaris 10, and
Sun Performance Library played a key role in obtaining this performance.
The Sun SPARC Enterprise M8000 outperforms the best published POWER6 based
system from IBM p570 by over 12% on the Linpack
benchmark (Highly Parallel Computing). As a reminder IBM cores costs lots more than
any other vendor, so you can't just look at perf/core. Compare systems of similar
pricing and configuration.
The Sun SPARC Enterprise M8000 tops the HP Itanium 2 rx8640
system by 40% on the Linpack HPC benchmark.
The Sun SPARC Enterprise M8000, using Sun Studio 12
delivered a score of 268.6 GFLOPS on the Linpack HPC benchmark.
Funny I read an IBM blog that said all was quiet for them in benchmarks,
Sun decided to keep working during the summer
, and I almost can't keep
going on my regular job, because this blogging hobby is keeping me busy
because so many of my friends in the benchmarking group are producing so
many great results on Sun systems!
LINPACK HPC Performance Chart - GFLOPS (bigger is better)
| System |
GFLOPS |
Processors |
| Total |
Peak |
paralellism |
chips,cores |
Type |
GHz |
| Sun SPARC Enterprise M9000 |
1032.0 |
1228.8 |
128 |
64,128 |
SPARC64 VI |
2.4 |
| Sun SPARC Enterprise M8000 |
268.6 |
307.2 |
32 |
16,32 |
SPARC64 VI |
2.4 |
| Sun SPARC Enterprise M8000 |
255.3 |
291.84 |
32 |
16,32 |
SPARC64 VI |
2.28 |
| IBM p570 |
239.4 |
300.8 |
16 |
8,16 |
POWER6 |
4.7 |
| HP rx8640 |
192.4 |
204.8 |
32 |
16,32 |
Itanium 2 |
1.6 |
Benchmark Description
The Linpack benchmark suite measures the performance for factoring
and solving a dense set of linear equations in double-precision
floating-point.
The Linpack HPC benchmark allows the solution of any size
matrix with a single right hand side. It was developed to allow vendors
to show off their hardware. Because big problems allow for peak
performance potentials, the benchmark is seen as an upper bound of
potential performance of a machine. The run rules are much more
flexible. The solution technique must use a pivoting scheme and the
driver must follow the spirit of the Linpack 1000 or Linpack 100
benchmarks.
Disclosure Statement:
Linpack HPC, results from http://www.netlib.org/benchmark/index.html
as of 07/13/07. Sun SPARC Enterprise M8000 (SPARC64 VI @2.4, 16 chips,
32 cores), 268.6 GFLOPS. IBM p570 (POWER6 4.7GHz, 8 chips, 16 cores)
239.4 GFLOPS. HP rx8640 (Itanium 2 1.6GHz/24MB, 16 chips,
32 cores), 192.4 GFLOPS. Linpack Benchmark Performance Report
Results Summary
| Published Results |
|
Performance: |
|
268.6 GFLOPS |
| System: |
|
Sun SPARC Enterprise M8000, 256GB |
| Total Number Processors: |
|
16 |
| Processor/GHz of Server: |
|
SPARC64 VI, 2.4 GHz |
| Operating System: |
|
Solaris 10 |
| Compiler: |
|
Sun Studio 12 |
Thursday Jul 12, 2007
Sun doesn't sleep in the summer (other vendors are quiet, even
those that have brand new products, huh?), Sun continues to set a variety
of world records, and more to come this month and next month.
Here is a review of 4 very recent HPC benchmarks.
A World Record
Another World Record
Another World Record
Another World Record
Also a couple of commercial ones
note: Sun talks about delivered system performance not
... "use 'per-core' quick hide the fact that
these are super expensive cores or 'look at my peaks'" used by others.
Thursday Jul 12, 2007
The Sun Blade X6250 cluster was up to 27% faster or 6% faster on geometric mean than an SGI Altix XE 210 cluster (Xeon 3 GHz dual core 5160 Woodcrest) and Infiniband interconnects.
A cluster of four Sun Blade X6250 Cluster (Xeon 3 GHz 5160) with Infiniband
interconnects was used to set this record. Each of these two socket blades had dual-core Intel Xeon EM64T
5160 3 GHz (Woodcrest) 16 total cores.
The Sun Blade X6250 Cluster (Xeon 3 GHz 5160) cluster running computational
fluid dynamics program (CFD) the "Fluent 6" standard benchmark established
a world record for runs made of the test suite using from 1 to 16 cores.
Workload description
Fluent is one of the most prominent commercial CFD (Computational Fluid Dynamics) codes.
It is distributed worldwide to major engineering organizations in a broad spectrum of disciplines
(aircraft, aerospace, automotive, marine, etc.) that are involved with fluid flow in some manner.
Fluent like many major ISV's has developed a benchmark test suite to evaluate the performance
of platforms. For several years results have been posted from hardware vendor platforms
at the Fluent website.
CFD models tend to be extremely large (fluid flow over entire car, aircraft and submarine bodies
and complex flow involving mixing of species and chemical reaction).
In order to have reasonable run times for the analyses use of many processing units is
necessary. Currently the most effective way of achieving this is via an interconnected cluster
of multi core rack mounted servers or blades. The current set of entries posted at the Fluent
website reflect this fact.
FLUENT 6 Benchmark ("Ratings", bigger is better)
Rating = #f sequential runs in 1 day 86,400/(Total Elapsed Run Time in Seconds)
| Machine |
Sockets |
NCPUS |
FL5M1 |
FL5M2 |
FL5M3 |
FL5L1 |
FL5L2 |
FL5L3 |
| Sun Blade X6250 3GHz WC 5160 |
2 |
8 |
4965.5 |
10504.6 |
2563.8 |
1399.2 |
1028.3 |
174.9 |
| SGI Altix XE210 3GHz WC 5160 |
2 |
8 |
4937.1 |
9626.7 |
2014.0 |
1343.7 |
899.5 |
161.0 |
| |
| Sun Blade X6250 3GHz WC 5160 |
2 |
4 |
2780.4 |
5358.1 |
1336.9 |
731.7 |
573.7 |
101.2 |
| SGI Altix XE210 3GHz WC 5160 |
2 |
4 |
2681.1 |
4657.7 |
998.0 |
679.2 |
449.7 |
80.7 |
| |
| Sun Blade X6250 3GHz WC 5160 |
2 |
serial |
919.4 |
1465.6 |
352.9 |
207.2 |
142.6 |
27.6 |
| SGI Altix XE210 3GHz 5160 |
2 |
serial |
910.9 |
1445.4 |
349.5 |
204.1 |
136.6 |
26.8 |
Other interesting points:
- The "Fluent 6" standard benchmark test suite consists of "small" "medium" and
"large " test cases. However both the small and medium sized test cases are all
really on the small side and do not scale well beyond 16 cores.
- The largest test case in the suite, "fl5l3" requires 9 GB running with only
one core on a single node. This memory requirment per node is reduced when running in a dmp
cluster mode on multi nodes with multi cores.
- Fluent runs are cpu and sometimes memory intensive but do not require high performance I/O file systems.
- Very recently Fluent has devloped a new benchmark test suite with extremely large
models specifically intended to be run either on large multi core servers or
large multi node clusters of multi core platforms.
Workload Details
Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, etc. The test cases represent a range of typical industry simulations.
Descriptions
Class Benchmark Cells Mesh Models Solver Description
small
FL5S1 32,000 hexahedral ke segregated implicit turbulent flow in a bend
FL5S2 32,000 hexahedral ke coupled implicit turbulent flow in a bend
FL5S3 89,856 hexahedral ke coupled implicit flow in a compressor, rotor 37
medium
FL5M1 155,188 tetrahedral ke 6spe reac DPM P1 segregated implicit coal combustion in a boiler, with particle tracking
FL5M2 242,782 hybrid, hanging-node ke segregated implicit turbulent flow in an engine valveport
FL5M3 352,800 hexahedral ke 6spe react segregated implicit combustion in a high velocity burner
large
FL5L1 847,746 hexahedral ke coupled explicit transonic flow around a fighter
FL5L2 3,618,080 hybrid RNG ke segregated implicit external aerodynamics around a car body
FL5L3 9,792,512 hexahedral RSM segregated implicit turbulent flow in a transition duct
Small Class Ratings
Small class problems contain less than 100,000 cells.
FL5S1 - Accelerating turbulent flow in an elbow duct using segregated implicit solver
Accelerating Turbulent Flow in an Elbow Duct using Segregated Implicit Solver
Flow is accelerated through a 90 degree elbow duct with a rectangular
cross section. The geometry and flow have a symmetry plane permitting
the modeling of only half the domain. Because of the curvature of the
duct, significant secondary flow occurs, with velocity components
normal to the principal flow direction. The segregated implicit solver
in FLUENT 5 is used to solve this flow.
Number of cells 32,000
Cell type hexahedral
Models k-epsilon turbulence
Solver segregated implicit
FL5S2 - Accelerating turbulent flow in an elbow duct using coupled implicit solver
Accelerating Turbulent Flow in an Elbow Duct using Coupled Implicit Solver
Flow is accelerated through a 90 degree elbow duct with a rectangular
cross section. The geometry and flow have a symmetry plane permitting
the modeling of only half the domain. Because of the curvature of the
duct, significant secondary flow occurs, with velocity components
normal to the principal flow direction. The coupled implicit solver in
FLUENT 5 is used to solve this flow.
Number of cells 32,000
Cell type hexahedral
Models k-epsilon turbulence
Solver coupled implicit
FL5S3 - Transonic flow in rotating fan
Transonic Flow through a Rotor
The flow through a transonic fan rotor (designated rotor 37 by NASA
Lewis) was computed. It has 36 blades. The calculation was performed at
a rotational speed of 17189 rpm. The domain boundaries consist of a
hub, blade and shroud surface, a pressure inlet and outlet surface, and
periodic surfaces.
Number of cells 89,856
Cell type hexahedral
Models k-epsilon turbulence
Solver coupled implicit
Medium class problems contain between 100,000 and 500,000 cells.
FL5M1 - Coal combustion in a boiler
Coal Combustion in a Boiler
This application couples a continuous gas phase calculation with a
discrete phase (particle) calculation. 500 coal particles are injected
into an industrial boiler where their trajectories are computed using a
Lagrangian formulation that includes dispersed phase inertia,
hydrodynamic drag and the force of gravity. Each particle injection is
subject to heating/cooling, vaporization, boiling and solid combustion.
During the injection calculations, momentum, heat and mass exchanges
are calculated and stored as source terms which are then used in the
subsequent gas phase calculation. Furthermore, stochastic modeling of
particle tracks, requiring a fixed number of "tries" per particle, are
used to account for local turbulent fluctuations. In this calculation,
10 stochastic tries per particle are used, resulting in a total of 5000
particle tracks per discrete phase update. There are 10 continuous
phase iterations per discrete phase update.
Number of cells 155,188
Cell type tetrahedral
Models k-epsilon turbulenc 6 species with reaction dispersed phase
P1 radiation
Solver segregated implicit
FL5M2 - Turbulent flow in an engine valveport
Turbulent Flow in an Engine Valveport
Flow is computed in an automotive valve port modeled using a zonal
hybrid mesh. The region around the valve has been meshed with
tetrahedral cells, while the duct providing the inlet flow to the valve
has been meshed with hexahedra. Pyramid cells are used to transition
between the hexahedral and tetrahedral cells. A fourth cell type called
a prismatic (or wedge) cell is used for the cylinder downstream of the
valve. Furthermore, hanging-node adaption was used to improve the
accuracy of the predicted flow field.
Number of cells 242,782
Cell type hybrid hanging-node adaption
Models k-epsilon turbulence
Solver segregated implicit
FL5M3 - Combustion in a high velocity burner
Combustion in a High Velocity Burner
Fuel (CH4) is injected into ports of a high velocity gas burner located
near the centerline. Air is supplied through the outer ports, with
secondary air delivered into an outer annular region. Directly
downstream of the annulus is a wedge-shaped annular baffle. The mixing
of fuel and air occurs downstream of this baffle and recirculation
zones behind the baffle provide stability and an attachment point for
the flame in the main combustion chamber. Combustion is assumed to
proceed via a two-step reaction mechanism, with turbulent mixing as the
limiting rate, as described by the Magnessen model.
Reference: M. Cavelli, A. Milani, "Spark-ignited wide stability gas
burner for on/off and continuous duty," IFRF HT Meeting, Milan, October
1996.
Number of cells 352,800
Cell type hexahedral
Models k-epsilon turbulenc 6 species with reaction
Solver segregated implicit
Large Class
Large class problems contain more than 500,000 cells.
FL5L1 Transonic flow around a fighter aircraft
Transonic Flow Around a Fighter Aircraft
Flow around the AGARD M-151 combat aircraft research model is computed.
The simulation geometry contains canards and forward swept wings, but
no tail. The conditions modeled were Mach number 0.9 and 10.46 degrees
angle of attack.
Number of cells 847,764
Cell type hexahedral
Models k-epsilon turbulence
Solver segregated implicit
FL5L2 Exterior flow around a passenger sedan
Exterior Flow Around a Passenger Sedan
This benchmark represents the computation of the exterior flow field
around a simplified model of a passenger sedan. The simulation geometry
was used for the Japan External Aerodynamics competition. A
viscous-hybrid grid with prismatic cells is used to adequately model
the boundary layer regions.
Number of cells 3,618,080
Cell type
FL5L2 Exterior flow around a passenger sedan
Exterior Flow Around a Passenger Sedan
This benchmark represents the computation of the exterior flow field
around a simplified model of a passenger sedan. The simulation geometry
was used for the Japan External Aerodynamics competition. A
viscous-hybrid grid with prismatic cells is used to adequately model
the boundary layer regions.
Number of cells 3,618,080
Cell type hybrid
Models k-epsilon turbulence
Solver segregated implicit
FL5L3 Turbulent flow through a transition duct
Turbulent Flow Through a Transition Duct
Turbulent flow of air through a duct is computed for this benchmark.
The cross-sectional planes of the duct transition from a circle at the
inlet to a rectangle at the outflow boundary. The Reynolds-Stress Model
(7 equation) is used for computing turbulence.
Number of cells 9,792,512
Cell type hexahedral
Models RSM turbulence
Solver segregated implicit
The cluster of Sun Blade X6250 outperfomed the following competitive
hardware vendor clusters at all core levels considered
(1 core smp, 1- core parallel, 2- 4- 8- and 16-core parallel runs)
and for all (9) test cases in the benchmark test suite:
HP BL460C (EM64T_WOODCREST_2CORE,3000,WINCCS,IB_HPMPI)
HP DL140 (EM64T_WOODCREST_2CORE,3000,LINUX,IB)
HP DL145_G2 (OPTERON_2CORE,2200,WINCCS,IB_HPMPI)
SGI ALTIX4700 (IA64_MONTECITO_2CORE,1600,LINUX)
SGI ALTIXXE210 (EM64T_WOODCREST_2CORE,3000,LINUX,IB_VOLTAIRE)
TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,SLES10,GIGE)
TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,WINCCS,GIGE)
BULL NOVASCALE (EM64T_WOODCREST_2CORE,3000,RHEL4,IB)
APPRO XTREMESERVER (OPTERON_2CORE,2800,RHEL4,IB)
Disclosure Statement:
All information on the Fluent website is Copyrighted 1995-2007 by Fluent Inc.Results from http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm as of July 2, 2007.
Sun Blade X6250
4 2-socket Sun Blade X6250's
2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest)
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
64-bit SUSE SLES 10
Fluent V6.3.26
Fluent 6 Standard Benchmark Test Suite
Voltaire GridStack 4.1.5-7 for SLES 10
Wednesday Jul 11, 2007
Sun Blade X6250 posted World Record on the ABAQUS Explicit benchmark
test suite the Sun Blade X6250 on the MCAE application ABAQUS V6.6.
the Sun Blade X6250 used Xeon 3GHz DC 5160. On the various
test cases Sun beats the Intel Supermicro by or by 1% to 39% !!
The Sun Blade X6250
beats the Intel Supermicro even when you average all of the test case by
an average 4% to 9% (geometric mean of all 6 tests cases at all cpu levels listed).
Both machines have 2 sockets and dual core processors.
Runs were made at 1- 2- and 4-cores and a geometric mean was established
at each of these "cpu" levels based on the 6 test cases in the benchmark test suite.
The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) processors
and under 64-bit Linux SuSE SLES 10 beats all of the following
platforms with results posted at the ABAQUS website
and for all 6 test cases in the ABAQUS "Explicit" benchmark test suite
and at the 3 "cpu" levels (1-, 2- & 4-"cpu's"):
About The ABAQUS Explicit Module
This module designed for crash and high velocity impact analyses
(including wave propagation and inertia effects) is very scalable
and analysis models tend to be very large similar to CFD models.
Timely results are best obtained using multiple processing units
for typically large jobs either on a single multi core server in smp mode or on
a multi node cluster of multi core platforms interconnected in dmp mode.
Consequently this module is meant to run primarily
in a multi cpu situation either in smp mode on a single large multi core machine
or in dmp mode over a cluster of machines.
ABAQUS V6.6-1 Benchmark Test Suites Explicit Benchmark Test Suite Landscape
(time in seconds where smaller is better, Sun % better where bigger is better)
| Platform |
Cores |
e1 |
e2 |
e3 |
e4 |
e5 |
e6 |
Geometric Mean |
| |
| Sun Blade X6250/5160 |
4 |
10451 |
4509 |
3853 |
1887 |
1990 |
5202 |
  |
| Intel Super/5160's/RH4 |
4 |
10696 |
4646 |
3881 |
1997 |
2126 |
5460 |
  |
| Sun % Faster |
  |
2% |
3% |
1% |
6% |
7% |
5% |
4% |
| |
| Sun Blade X6250/5160 |
2 |
14232 |
7401 |
5477 |
2935 |
3327 |
7582 |
  |
| Intel Super/5160's/RH4 |
2 |
14878 |
8044 |
6316 |
3310 |
3483 |
8048 |
  |
| Sun % Faster |
  |
5% |
9% |
15% |
13% |
5% |
6% |
9% |
| |
| Sun Blade X6250/5160 |
1 |
24800 |
14198 |
10174 |
5147 |
6112 |
9553 |
  |
| Intel Super/5160 |
1 |
25076 |
14616 |
10563 |
5225 |
6272 |
13242 |
  |
| Sun % Faster |
  |
1% |
3% |
4% |
1% |
3% |
39% |
8% |
Abaqus/Explicit Benchmark Problems
The problems described below provide an estimate of the performance that can be expected when running Abaqus/Explicit on different computers. The jobs are representative of typical Abaqus/Explicit applications including high-speed dynamic impact events and quasi-static events with complicated contact conditions. The number of increments listed in the tables below are approximate and can vary somewhat depending on the hardware platform and the number of parallel domains.
E1: Car crash
This benchmark consists of passenger car impacting a rigid wall. The car is meshed primarily with shell elements of type S3RS and S4RS with isotropic hardening Mises plasticity material behavior. The various compenents of the car are connected using multi-point constraints and connector elements. Many of the suspension and drivetrain components are modeled as rigid bodies. The car, road surface, and wall are placed into a single general contact domain and the car is given an initial velocity of 25 mph.
E1
Increments: 62,934
Number of elements: 274,632
E2: Cell phone drop
This benchmark consists of a simplified model of a cell phone impacting a fixed rigid floor. The cell phone components are meshed using a variety of element types including C3D8R, C3D10M, and S4R. The material behavior is modeled using linear elasticity, isotropic hardening Mises plasticity, and hyperelasticity. The components are assembled using surface-based mesh ties and placed into a general contact domain that also includes the floor. The initial velocity and orientation of the cell phone is defined such that a severe oblique impact occurs.
E2
Increments: 87,369
Number of elements: 45,785
Memory requirement: 300 MB
E3: Sheet forming
This benchmark consists of forming a sheet metal part by the deep drawing process. The deformable sheet metal blank is meshed with shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. The tools are meshed using surface elements of type SFM3D4R which are declared rigid. General contact is defined between the blank and tools. The analysis sequence consists of two steps. During the first step the blank is clamped between the binder and die and then during the second step the punch is displaced to form the part. Since the process is essentially quasi-static the computations are performed over a sufficiently long time period to render inertial effects negligible. The performance of this analysis is a direct measure of the performance of the three-dimensional general contact algorithm.
E3
Increments: 31,177
Number of elements: 34,540 (deformable only)
Memory requirement: 550 MB
E4: Projectile penetration
This benchmark consists of a projectile penetrating a steel plate at an oblique angle. Both the projectile and plate are meshed using hexahedral elements of type C3D8R and use a rate-dependent isotropic hardening Mises plasticity material model with failure. The projectile and plate are placed into a general contact domain with surface erosion. The edges of the plate are held fixed and the initial velocity of the projectile is specified so that the projectile passes completely through the plate.
E4
Increments: 12,433
Number of elements: 237,100
Memory requirement: 1400 MB
E5: Blast loaded plate
This benchmark consists of a stiffened steel plate subjected to a high intensity blast load. The plate is meshed using shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. There is no contact.
E5
Increments: 81,716
Number of elements: 50,000
Memory requirement: 150 MB
E6: Concentric spheres
This benchmark consists of a large number of concentric spheres with clearance between each sphere. The spheres are meshed using hexahedral elements of type C3D8R and use an isotropic hardening Mises plasticity material model. All of the spheres are placed into a single general contact domain and the outer sphere is violently shaken which results in complex contact interactions between the contained spheres.
E6
Increments: 23,291
Number of elements: 244,124
Memory requirement: 1000 MB
ABAQUS "Standard" & "Explicit" Benchmark Test Suites
Voltaire GridStack 4.1.5-7 for SLES 10
Disclosure Statement:
The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or other countries: Abaqus,
Abaqus/Standard, Abaqus/Explicit.
All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systems.
Results from http://www.simulia.com/support/v66/v66_performance.html as of 7/2/07.
System Configuration
Hardware Configuration:
Sun Blade X6250
4 2-socket Sun Blade X6250's
2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
Linux: 64-bit SUSE SLES 10
ABAQUS V6.6-3
Thursday Jun 21, 2007
Record SPECapc Unigraphics UGS-NX3 MCAD Benchmark Sun Ultra 40 M2
The Sun Ultra 40 M2 (dual nVidia Quadro FX 5600s SLI mode & 3.0 GHz dual-core Opteron 2222 SE) sets a new world record running the SPEC APC
UGS-NX3 graphics oriented MCAD benchmark beating all desktop platforms,
including the the Woodcrest and Intel Core2 "Extreme Processor" X6800 cpu's.
In dual framebuffer SLI mode the Ultra 40 M2 with 3.0 GHz
2222 SE dual core Opteron processors outperforms a
Dell 690 (3.0 GHz Woodcrest) by 7% overall.
SPECapc Unigraphics NX 3 Benchmark(Larger numbers indicate greater speed)
| System |
Overall Composite |
CPU Composite |
File I/0 Composite |
Graphics Composite |
Sun Ultra 40 M2
2x3.0GHz Opteron 2222SE
2x FX 5600 (SLI) |
9.61 |
4.47 |
2.93 |
20.95 |
Dell Precision 690
2x3.0GHz Woodcrest
2x FX 4600 (SLI) |
8.98 |
3.52 |
3.06 |
27.95 |
Sun Ultra 40 M2
2.8GHz Opteron 2220SE
2x FX 5500 (SLI) |
7.19 |
3.08 |
3.00 |
16.85 |
Dell Precision 690
3.0GHz Woodcrest
2x FX 4500 (SLI) |
6.30 |
3.25 |
1.64 |
12.29 |
Current posted results at the SPEC website for the SPEC APC UGS-NX3
benchmark: http://www.spec.org/gpc/apc.data/specapc_nx3_summary.html
Benchmark Description
The SPEC APC MCAD benchmarks consist of tasks
representative of what a designer would do in a typical
session.
This consists of "Graphics", "CPU", and "I/O" activities.
A subscore is given for each of these subcategories as
well as the overall score.
The benchmark results shown here pertain to the SPEC APC
UGS-NX3 benchmark. The MCAD application Unigraphics was used.
This is a prominent system used by major engineering
organizations worldwide.
The SPEC APC MCAD benchmark test suite for UGS-NX3 was developed
under the auspices of the SPEC APC Committee. Results for a variety
of current desktop platforms from various hardware vendors are shown
at the SPEC APC website.
The characteristics of this MCAD application benchmark
are very similar to other types of MCAD application benchmarks
in that it consists of several groups of tasks each group involving
different types of activity: graphics intensive, cpu intensive,
and I/O intensive.
The benchmark scoring will improve with the clock rate of
the processor. The cpu intensive operations are sufficiently
large that faster dimms will definitely provide some benefit.
The graphics operations are intensive enough that using a better
framebuffer will also contribute to higher performance. In fact
using a second framebuffer in nVidia SLI mode will also improve
performance by providing up to double the graphics performance
component. The models are large enough and the I/O big enough
that using multiple striped disks to store the assemblies and parts
as well as writing plot and other types of database and interface
files will also improve performance .
Unigraphics is one of the prominent top 5 MCAD sytems used
extensively by all sorts of diverse engineering organizations
worldwide. There is a very big and broad market for the desktop
platform that exhibits the leading price/performance with this
code.
Disclosure Statement:
SPEC reg tm, SPECapc server mark of Standard Performance
Evaluation Corporation.
Dell Precision 690,2xFX4600,overall composite 8.98;
Sun Ultra 40 M2,2xFX5500,overall composite 7.19;
Dell Precision 690,2xFX4500,overall composite 6.30.
Sun Ultra 40 M2, 2xFX 5600, overall composite 9.61.
Results from http://www.spec.org/gpc/apc.data/specapc_nx3_summary.html as of June 20, 2007.
| Results |
|
|
|
Dual FX 5600 |
|
Dual FX 5500 |
|
Overall Composite: |
|
9.61 |
|
7.19 |
|
CPU Composite: |
|
4.47 |
|
3.08 |
|
File I/O Composite: |
|
2.93 |
|
3.00 |
|
Graphics Composite: |
|
20.95 |
|
16.85 |
| Reference Date: |
|
06/08/07 |
|
11/10/06 |
| System: |
|
Sun Ultra U40 M2 |
|
Sun Ultra U40 M2 |
| Processor/GHz: |
|
Opteron 2222SE/3.0 |
|
Opteron 2220SE/2.8 |
System Configuration
Hardware Configuration:
Sun Ultra 40 M2
2-socket 2x3.0 GHz dual core Opteron 2222 SE processors
2x4x1 GB DDR2 667 MHz dimms
2x nVidia Quadro FX 5600 (SLI)
Sun Ultra 40 M2
2-socket 2x2.8 GHz dual core Opteron 2200 processors
2x4x1 GB DDR2 667 MHz dimms
2x nVidia Quadro FX 5500 (SLI)
Software Configuration:
64-bit Windows XP Pro SP 1
Unigraphics NX 3 (EDS-PLM Solutions)
SPEC APC UGS-NX3 Benchmark Test Suite
nVidia Quadro driver for Win XP: 160.02
It probably doesn't hurt that the three month avai...