BM Seer Facts & Questions from an Anonymous Sun Source

new version of MPI Library: Sun ClusterTools 7.1

Friday Jan 11, 2008

For the HPC crowd, the latest version of Sun's MPI library for Solaris (both x86 and SPARC of course) can now be freely downloaded ClusterTools 7.1 download.

Sun's ClusterTools 7.1 is based on Open MPI 1.2.4, which is another of the open efforts which Sun actively contributes.

The release adds Intel support, improved parallel debugger support, PBS Pro validation, improved memory usage for communication operations, and various updates. Sun's high-performance compiler (Sun Studio 12) is of course also supported.

Like this post? del.icio.us | furl | slashdot | technorati | digg

Mathematica V6.0 World Records on Sun Ultra 24 Workstation

Friday Oct 26, 2007

Sun gets two new World Records for scientific desktop performance on the Sun Ultra 24 (single 3.0 GHz Intel DC Xeon E6850) and the 64-bit SuSE Linux SLED 10 operating system.

Results obtained from the most current competitive platforms have been recently posted for two different Mathematica 6 benchmarks:

  • The Wolfram (Mathematica ISV) Benchmark
  • The Independent (Mathematica MMA6.0.nb ) Benchmark
Although both of the Mathematica 6 benchmark test suites contains 15 test cases these test cases are different and the two test suites are separate and distinct from each other. The Ultra 24 beats all results currently listed at both benchmark sites.

The Wolfram (Mathematica ISV) benchmark the Ultra 24 beats other current Intel Xeon (Woodcrest) dual core platforms (3.0 GHz & 2.66 GHz), Intel based Apple MAC desktops. Itanium 2 platforms, Pentium 4 platforms, and the IBM Power based platforms.

Alternatively, the independent Mathematica MMA6.0 notebook benchmark the Ultra 24 beats posted results from primarily current competitive Apple MAC desktops: MacPro, MacBook, iMac, and Apple Powerbook G4 Results for both benchmark test suites are shown in the Two Tables below under "Competitive Landscape"

Table 1. The Wolfram (Mathematica ISV) Benchmark

Summary results as in the installed Mathematica 6 Data Base. This is the latest version of Mathematica timing tests. Overall performance in 15 test calculations (Bigger is better) The current reference is a machine with a 2.4 GHz Pentium 4 processor
PLATFORMScore
1 socket DC 3GHz Intel Xeon DC E6850 SLED 10 SP 1 Ultra 24 3.266
2-socket DC 3GHz Intel Xeon 5160 MS 32-bit 2.84
2-socket DC 3.2 GHz Opteron 2224 Ultra 40 M2 64-bit SLED 10 32 GB 2.736
2-socket DC 3.2GHz Opteron 2224 Ultra 40 M2 32-bit Windows XP SP2 8GB 2.45
2 socket DC 2.66 GHz Intel Xeon 64-bit Apple MAC 10.4.8 2.14
2 socket QC 1.6 GHz Intel Xeon 5310 32-bit Cent OS Linux 1.88
2 socket DC 2.5 GHz G5 Apple MAC OS 10.4.8 32-bit 1.22
1 socket 2.4 GHz Pent. 4 MS Win XP 32-bit 1.00

The Independent (Mathematica MMA6.0.nb ) Benchmark

Summary results as listed at the independent Mathematica MMA6 http://smc.vnet.net/timings60.html website. This is the latest version of the "Mathematica MMA" timing tests. Overall performance in 15 test calculations (Bigger is better) The current reference machine is one with a 2.33GHz Intel Core 2 Duo processor.

PLATFORMScore
Sun Ultra 24 3.0 GHz DC Intel E6850 8GB SuSE 10 SP 1 1.27505
MacPro, 3.0GHz Intel Core2 Duo, 4GB, MacOS 10.4.9 [4] 1.25404
AMD Athlon 64 FX-74, 3.0GHz Socket F (1207 FX) DSDC, Windows [5] 1.14464
iMac, 2.33GHz Intel Core2 Duo, 3GB, MacOS 10.4.9 [2] 1.00338
MacBook Pro, 2.33GHz Intel Core2 Duo, 2 GB RAM, MacOS X 10.4.9 [1] 1.00105
MacBook, 2GHz Intel Core2 Duo, 2GB, MacOS 10.4.10 [6] 0.880472

Benchmark Description

The Wolfram (Mathematica ISV) Benchmark

The Wolfram (Mathematica ISV) benchmark is a revised one that now comes imbedded in the latest release of Mathematica (currently V6.0) along with a database of results from current hardware vendor platforms. This benchmark was developed by Schoeller Porter, one of the principlal developers of Mathematica. He described the benchmark as follows: This is the standard benchmark suite for Mathematica, initially introduced in Mathematica 5.1 (as MathematicaMark2004). It includes both workstation and parallel benchmarks. The parallel benchmark is automatically invoked when the Parallel Computing Toolkit is loaded and compute kernels are available. It is actively developed, and MathematicaMark 6.0 is the current version.

The 15 Task benchmark includes:

Benchmark Name: MathematicaMark6
Full Version Number:6.0.1
Date: September 14, 2007
Benchmark Result: 3.266
Total Time 26.39
Results:
Data Fitting: 1.273
Digits of Pi: 0.488
Discrete Fourier Transform: 0.765
Eigenvalues of a Matrix: 2.059
Elementary Functions: 3.645
Gamma Function: 0.368
Large Integer Multiplication: 0.734
Matrix Arithmetic: 2.798
Matrix Multiplication: 3.062
Matrix Transpose: 1.298
Numerical Integration: 2.017
Polynomial Expansion: 1.352
Random Number Sort: 1.506
Singular Value Decomposition: 2.346
Solving a Linear System: 2.679
Output
Cell Change Times->{3.398799503863311*^9

The Independent (Mathematica MMA6.0.nb ) Benchmark

The Mathematica MMA 6 benchmark is a widely recognized benchmark. The tasks are representative important scientific computing desktop activities. This benchmark was developed by karl.unterkofler@fh-vorarlberg.ac.at The benchmark consists of 15 tasks.

Disclosure Statement:

Mathematica MMA 6 Scientific Benchmark Sun Fire Ultra 24 score: 1.27505. Mathematica is a reg tm of Wolfram Research, Inc. results as of 10/23/07 on http://smc.vnet.net/timings60.htmlResults Summary

The Sun Ultra 24 workstation gives the best desktop scientific computing performance as demonstrated with both the The Wolfram (Mathematica ISV) Benchmark and the The Independent (Mathematica MMA6.0.nb ) Benchmark. Both of these 15 task benchmarks consists of operations that are representative of computing a variety of scientific funtions.

    Reference Date 23 October 2007
     
    The Wolfram (Mathematica ISV) Benchmark
    Platform Sun Ultra 24 Workstation
    Total Number Processors 1
    Processor/GHz of Workstation Intel DC E6850/3.0 GHz
    Memory 4x2 GB DDR2 667 MHz dimms
    Operating System 64-bit SUSE SLED 10 SP 1
    Graphics nVidia Quadro FX 1700 framebuffer
    Disks 2x146 GB 15K rpm SAS striped
    Software Mathematica 6 (Scientific Application)
    Wolfram (ISV) Benchmark
    Composite Score3.266
     
    The Independent (Mathematica MMA6.0.nb ) Benchmark
    Platform Sun Ultra 24 Workstation
    Total Number Processors 1
    Processor/GHz of Workstation Intel DC E6850/3.0 GHz
    Memory 4x2 GB DDR2 667 MHz dimms
    Operating System 64-bit SUSE SLED 10 SP 1
    Graphics nVidia Quadro FX 1700 framebuffer
    Disks 2x146 GB 15K rpm SAS striped
    Software Mathematica 6 (Scientific Application)
    The Independent (Mathematica MMA6.0.nb ) Benchmark
    Composite Score1.27505

    Like this post? del.icio.us | furl | slashdot | technorati | digg

SPEC APC SolidWorks 2007 MCAD

Friday Oct 26, 2007

I now see on our Sun internal sites that some of our my fellow engineers didn't fully check all of the submission dates for all the SPEC APC claims. I'll repost the results, when these APC experts get this all straighten out. Until then I've removed the page so I don't violate any SPEC rules.

This sort of thing usually gets straighten out in a few days. As always unintentional, but it happens in a busy world.

Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record MCAD: Pro/E Wildfire OCUS V5 Benchmarks Sun Ultra 24

Wednesday Oct 24, 2007

The Sun Ultra 24 desktop sets a world record in the MCAD market. The Ultra 24 beats competitive platforms from Dell, IBM, and HP. The single socket Ultra 24 can use either Intel dual-core and quad-core processors. The Sun Ultra 24 demonstrates both excellent performance and $/performance.

Pro/E is leading software MCAD system. Most major MCAE ISV applications have integration with Pro/E. Pro/E is used in a variety of different disciplines such as automotive, aircraft, aerospace, marine, oil&gas, earth moving, biomedical, heavy industry, atomic energy, etc.

Sun supports Pro/E on Opteron-based desktop platforms and Xeon-based platforms. Pro/E users appreciate Solaris for its maturity, reliability, suberb maintenanace and comprehensive well developed network features. This is a benefit for many engineering corporations that have distributed design.

The OCUS V5 benchmark has a 32-bit "Normal" benchmark and a newer 64-bit "Large Memory" benchmark to show performance on larger new workloads.

The 32-bit "Normal" OCUS V5 benchmark and World Record Ultra 24 Performance

  • The Sun Ultra 24 (3GHz QC Intel QX6850 Xeon processor, 8GB memory, an nVidia Quadr0 FX 5600 framebuffer, 2x15K SAS striped drives under 64-bit Win 2003 SP 2 XP 64-Ed. sets a new MCAD world record running the Pro/E Wildfire OCUS V5 32-bit "Normal" benchmark beating all "legitimate" hardware vendors with results currently posted at the OCUS V5 www.proesite.com benchmark website.
  • Reruns on the same Ultra 24 platform but with a 3GHz DC Intel Xeon E6850 processor also with nVidia Quadro FX 5600 produced essentially identical world record results as obtained in the initial runs with a with a 3GHz QC E6850 processor.
  • Further reruns on the same Ultra 24 platform with a 3GHz DC Xeon E6850 but with an nVidia Quadro FX 1700 produced essentially identical world record results as obtained in the initial runs with a 3GHz QC QX6850 processor.
  • These results obtained with Pro/E WF 3 are better than any others posted at the Pro/E Wildfire OCUS V5 "Normal" benchmark website by "legitimate" harware vendors. The top most competition comes from current Dell and HP desktop platforms both with the recent dual-core 3GHz Woodcrest 5160 Intel processors or the Intel Core2 Duo Extreme processors
  • The 64-bit "Large Memory" OCUS V5 benchmark and World Record Ultra 24 Performance

    • The Sun Ultra 24 with a 3GHz DC Intel Xeon E6850, 8GB memory, an nVidia Quadr0 FX 1700 framebuffer, 2x15K SAS striped drives under 64-bit Win 2003 SP 2 XP 64-Ed. sets a new MCAD world record running the Pro/E Wildfire OCUS V5 64-bit "Large Memory" benchmark
    • Reruns on the same Ultra 24 again with a 3GHz DC Intel Xeon E6850 but with the NVidia Quadro FX 5600 framebuffer instead of the NVidia Quadro FX 1700 also produced essentailly the same world record results.
    • Further reruns on the same Ultra 24 platform but now with a 3GHz QC Intel QX6850 processor (same nVidia Quadro FX 5600 framebuffer) produced essentially identical world record results as obtained in the initial runs with a 3GHz DC E6850 Xeon and an nVidia Quadro FX 1700 framebuffer.
    • These results obtained with Pro/E WF 3 are better than any others posted at the Pro/E Wildfire OCUS benchmark website. The top most competition comes from current Dell and HP desktop platforms both equipped with the recent dual core 3GHz Woodcrest 5160 Intel processors or the Intel Core2 Duo Extreme processors

    PRO/E WILDFIRE MCAD OCUS V5 32-bit "NORMAL" BENCHMARK Selected results are run times in seconds, smaller is better
    Ultra 24 vs. Topmost Current Posted Competitive Result
    Time (in seconds)
    Platform Processor Total Graphics CPU Disk I/O OS
    Ultra 24 1x3.0GHz QC Intel QX6850 1228 664 563 91 Win 64 XP
    Dell Prec 390 1x2.93GHz Intel Core2 X6800 1285 692 591 95 Win 64 XP

    PRO/E WILDFIRE MCAD OCUS V5 64-bit "Large Memory" BENCHMARK

    Selected results are run times in seconds, smaller is better
    Ultra 24 vs. Topmost Current Posted Competitive Result

    Time (in seconds)
    Platform Processor Total Graphics CPU Disk I/O OS
    Ultra 24 1x3.0GHz DC Intel E6850 2809 877 1926 352 Win 64 XP
    Dell Prec. 490 1x3.0GHz DC Intel 5160 3026 1094 1925 341 Win 64 XP

    For results see OCUS website: http://www.proesite.com

    Results Summary

    PRO/E WILDFIRE MCAD OCUS V5 32-bit "NORMAL" BENCHMARK

    Submitted Results 32-bit "Normal" OCUS V5 Benchmark
    Reference Date 23 October 2007
    Platform Sun Ultra 24 Workstation
    Total Number Processors 1
    Processor/GHz of Workstation Intel QC QX6850/3.0 GHz
    Memory 4x2 GB DDR2 667MHz dimms
    Operating System Win 2003 SP 2 64 Ed.
    Graphics nVidia Quadro FX 5600 framebuffer
    Disks 2x146 GB 15K rpm SAS striped
    Software Pro/E Wildfire 3 (MCAD Application)
    OCUS V5 32-bit "Normal" Benchmark
    Total Elapsed Time 1228 seconds
    Total CPU Time 563 seconds
    Total Graphics Time 664 seconds
    Total Disk I/O Time 91 seconds

    PRO/E WILDFIRE MCAD OCUS V5 64-bit "Large Memory" BENCHMARK
    Submitted Results 64-bit OCUS V5 Benchmark
    Reference Date 23 October 2007
    Platform Sun Ultra 24 Workstation
    Total Number Processors 1
    Processor/GHz of Workstation Intel DC E6850/3.0 GHz
    Memory 4x2 GB DDR2 667MHz dimms
    Operating System Win 2003 SP 2 64 Ed.
    Graphics nVidia Quadro FX 1700 framebuffer
    Disks 2x146 GB 15K rpm SAS striped
    Software Pro/E Wildfire 3 (MCAD Application)
    OCUS V5 64-bit "Large Memory" Benchmark
    Total Elapsed Time 2809 seconds
    Total CPU Time 1926 seconds
    Total Graphics Time 877 seconds
    Total Disk I/O Time 352 seconds

    Like this post? del.icio.us | furl | slashdot | technorati | digg
  • "Estimated" what does that mean for Sun's UltraSPARC T2

    Wednesday Aug 08, 2007

    Why does Sun designate yesterday's performance results as "estimates", why that word? Did some Sun marketeer just throw a dart and just pick a big number. No. All UltraSPARC T2 SPEC CPU and SPEC OMP metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use pre-production systems. Sun customer systems, to be announced later, are expected to perform similarly. SPEC rules do allow comparing these preliminary scores and published result.

    Is Sun the only vendor to use this clause? No. Intel and AMD have made a long history of using preliminary numbers at chip announcements to get the word out about their performance. Sun is just following their lead, and trumping their performance :)

    Ok, back to why the word "estimates?" The SPEC CPU committee voted to use that specific word for preliminary scores. Members include IBM, Intel, AMD, HP, .... And every employee of a member company must follow the rules.

      By license agreement, SPEC members and customers agree to run and report results as specified in each benchmark suite's documentation. from SPEC FAQ

    Postings on Sun's UltraSPARC T2 performance:
    http://blogs.sun.com/bmseer/entry/performance_of_the_new_sun
    http://blogs.sun.com/bmseer/entry/ultrasparc_t2_more_floating_point
    http://blogs.sun.com/sprack/entry/ultrasparc_t2_world_class_crypto
    OpenSPARC T2:
    http://blogs.sun.com/d/entry/ultrasparc_t2_documentation_available
    Ubunu (aready booted on UltraSPARC T2):
    Ubuntu & Canonical & UltraSPARC T1 (May06).

    As a Sun employee I try my best to follow every rule when talking about results in public, but I'm an engineer so sometimes it is hard to follow all the legalese so I try to correct things as soon as I see an error. And I do my best to remind other Sun bloggers to put in the proper disclosure statement for SPEC & TPC benchmark results. Though quite honestly I wish SPEC & TPC would streamline the rules, make them more consistent, and minimize the lengthy disclosure statements.

    Of course because Sun is in the lead and because I made some suggestions, I'm sure this entry will be fully scrutinized by every competitor. If I made errors let me know in the comments and I will correct them.

    Disclosure Statement

    SPEC, SPECint, SPECfp, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of August 6, 2007. Actually this one is short because I didn't put any specific results in this posting, the ones at the links have the more extensive disclosures because they show scores & results.

    [1] Comments

    Solaris and Sun Studio compiler important to UltraSPARC T2 announcements & benchamrks

    Tuesday Aug 07, 2007

    Beyond UltraSPARC T2 what other technologies matter? There are two more keys to Sun providing such effective performance in the new single-chip Sun UltraSPARC T2 64-thread processor, that is Solaris (and now of course OpenSolaris) and Sun Studio compilers. Here is a nice slide of the history of hardware history of SPARC, I borrowed this on from an entry in "On the Record" SPARC History from Sun's On the record blog -- blogs.sun.com/ontherecord

    An important thing to remember that besides Sun's long history with SPARC, we've also lead the way in parallelism. Over 15 years ago, Solaris supported 64-way SPARC systems and provided near-linear scaling. For those of you old enough to remember, at that time IBM, SGI, HP, and everyone else thought there was no way Sun could produce effective 64-way systems. They were wrong and now our competitors have finally all have introduced systems with lots of processors and/or threads.

    Solaris and Sun Studio compilers have a LONG history and lots of experience with industrial-strength applications with lots of threads.

    Solaris and Sun Studio compilers were great at scaling to 64-way systems 15 years ago, with a lot more experience and hard work we are even better at scaling and will scale to lots more threads right now. Many thanks to all of those compiler & OS engineers!

    Postings on Sun's UltraSPARC T2 performance:
    http://blogs.sun.com/bmseer/entry/performance_of_the_new_sun
    http://blogs.sun.com/bmseer/entry/ultrasparc_t2_more_floating_point
    http://blogs.sun.com/sprack/entry/ultrasparc_t2_world_class_crypto
    OpenSPARC T2:
    http://blogs.sun.com/d/entry/ultrasparc_t2_documentation_available

    ...I've focused on Solaris, but there are options, for example Ubuntu. Ubuntu has already booted on the UltraSPARC T2.

    As as a reminder Ubuntu and Canonical proved it on an UltraSPARC T1 almost 14 months ago, see this article on that work.

    [2] Comments

    UltraSPARC T2: more floating-point performance

    Tuesday Aug 07, 2007

    More about floating-point on the Sun UltraSPARC T2 in this posting, In the previous posting SPECfp_2006 scores and the UltraSPARC T2 design being open-sourced were discussed.

    In the UltraSPARC T2 there are eight floating-point units that are well suited for scientific applications. Based upon preliminary runs the Sun UltraSPARC T2 processor at 1.4 GHz beats all single chip scores showing 14230(est)/15081(est) SPECompMbase2001/SPECompMpeak2001.

    How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECompMbase2001/SPECompMpeak2001 scores?

    • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip IBM p520 POWER5+ 1.9GHz processor published result by 85%.
    • ...Sun is waiting for POWER6 4.7GHz results, maybe UltraSPARC T2 results will scare IBM from ever publishing a single-chip result?
    Benchmark description:

    The SpecOMP benchmark is a test of the performance of 9 High Performance computing applications. It is used to compare the performance of shared memory servers. All C/C++ and FORTRAN applications in this suite use the OpenMP programming model that provides a portable, scalable model for developing parallel applications for platforms ranging from the desktop to the supercomputer.

    The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, from the largest Unix servers to the small Windows NT platforms.

    Disclosure statement:

    All UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use preproduction systems. SPEC, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 14230 (est)/ 15081 (est) SPECompMbase2001/SPECompMpeak2001. Competitive results from www.spec.org as of August 6, 2007. IBM p520 1.9GHz (1 chip, 2 cores, 4 threads) published 8141/8174 SPECompMbase2001/SPECompMpeak2001.

    [2] Comments

    Queen Guitarist to Complete Doctorate in Astrophysics

    Friday Jul 27, 2007

    Brian May of the rock group Queen: "...60-year-old guitarist and songwriter said he plans to submit his thesis, ''Radial Velocities in the Zodiacal Dust Cloud,'' to supervisors at Imperial College London within the next two weeks." write the New York Times.

    for more see:
    http://www.nytimes.com/aponline/arts/AP-People-Brian-May.html

    NYT also says:
    Filed at 10:10 a.m. ET
    LONDON (AP) -- Brian May is completing his doctorate in astrophysics, more than 30 years after he abandoned his studies to form the rock group Queen.

    Congrats!

    [1] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Update: World Record EXA PowerFLOW Cluster & Single Node

    Monday Jul 16, 2007

    Update:

    A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in both benchmark test cases.

    Other runs on the 4-node cluster of Sun Blade X6250 outperformed the SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster.

      question: can the Itanic dual-core keep floating?

    The 4-node Sun Blade X6250 cluster outperformed the SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster. Even at the single node configuration, the Sun Blade X6250 beats an SGI Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4% faster in the 1-core results.

    In summary:
    World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160) beats the best posted results for any single node blades and servers. All posted results are for 2 socket dual-core platforms

    EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model) results in seconds (smaller is better)

    #
    C
    P
    U
    IBM e135
    Opt
    DC 2.4GHz
    Myri
    net
    SLES 9
    HP BL460
    Xeon
    DC 3GHz
    IB
    RHEL 4
    HP BL460
    Opt
    DC 3.0GHz
    IB
    XC3.1 RC1
    HP DL140
    Xeon
    DC 3GHz
    IB
    XC3.1 RC1
    HP RX2660
    Itan2
    DC 1.6GHz
    IB
    RHEL 4
    Sun X6250
    Xeon 5160
    DC 3.0GHz
    IB
    SLES 10
    SGI Altix
    Itan2
    DC 1.6GHz
     
    Pro
    Pack5
    SGI Altix
    XE
    Xeon
    DC 3GHz
     
    SLES 10
    1 - - - - - 822.7 1631.4 866.1
    2 - - - - - 418.5 832.7 448.8
    4 - - - - - 214.9 438.4 264.8
    8 182.9 137.2 137.8 134.7 214.3 118.6 227.2 147.9
    16 96.3 70.4 71.3 70.5 111.4 77.5 117.9 78.1
    32 51.5 37.0 40.6 36.6 57.9 - 60.2 41.9
    64 31.5 21.5 22.9 21.1 31.8 - - 28.0
    96 24.7 17.3 - - - - - -
    128 19.0 - - - - - - 18.1

    "-" no result published

    EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model) results in seconds (smaller is better)

    #
    C
    P
    U
    IBM e135
    Opt
    DC 2.4GHz
    Myri
    net
    SLES 9
    HP BL460
    Xeon
    DC 3GHz
    IB
    RHEL 4
    HP BL460
    Opt
    DC 3GHz
    IB
    XC3.1
    RC1
    HP DL140
    Xeon
    DC 3.0GHz
    IB
    XC3.1 RC1
    HP RX2660
    Itan2
    DC 1.6GHz
    IB
    RHEL 4
    Sun X6250
    Xeon 5160
    DC 3GHz
    IB
    SLES 10
    SGI Altix
    Itan2
    DC 1.6GHz
     
    Pro
    Pack5
    SGI Altix
    XE
    Xeon
    DC 3GHz
     
    SLES 10
    1 - - - - - 1966.4 3884.0 2043.6
    2 - - - - - 987.5 2000.4 1062.4
    4 - - - - - 500.5 1054.5 620.7
    8 424.9 310.0 306.4 258.4 490.7 258.4 526.7 316.0
    16 216.0 165.4 - 160.1 253.9 164.5 272.1 174.4
    32 112.8 82.3 84.4 83.3 129.3 - 139.4 90.3
    64 61.5 43.8 43.8 43.2 68 - 75.6 48.7
    96 45.2 32.3 - - - - - -
    128 36.8 - - 24.4 - - - 32.8

    "-" no result published

    The EXA PowerFLOW Benchmark Test Suite
    The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.

    Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores

    The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes(different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.

      Case #1 Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
      Case #2 Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).

    It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.

    The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.

    Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband

    Disclosure Statement:

    Exa Corporation Copyright All information on the EXA website is under Copyright 1996-2007 by Exa Corporation., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of 07/02/07.

    System Configuration

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250
      2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

    Software Configuration:

      Linux 64-bit SUSE SLES 10
      EXA PowerFLOW V3.6c & V4.c
      EXA PowerFLOW Benchmark Test Suite
      Voltaire GridStack 4.1.5-7 for SLES 10

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Linpack Benchmark: Sun SPARC Enterprise M8000 Beats IBM POWER6

    Friday Jul 13, 2007

    The Sun SPARC Enterprise M8000 has topped the performance of the brand new 4.7GHz POWER6 based p570. The Sun Studio 12 Compilers, Solaris 10, and Sun Performance Library played a key role in obtaining this performance.

    The Sun SPARC Enterprise M8000 outperforms the best published POWER6 based system from IBM p570 by over 12% on the Linpack benchmark (Highly Parallel Computing). As a reminder IBM cores costs lots more than any other vendor, so you can't just look at perf/core. Compare systems of similar pricing and configuration.

    The Sun SPARC Enterprise M8000 tops the HP Itanium 2 rx8640 system by 40% on the Linpack HPC benchmark.

    The Sun SPARC Enterprise M8000, using Sun Studio 12 delivered a score of 268.6 GFLOPS on the Linpack HPC benchmark.

      Funny I read an IBM blog that said all was quiet for them in benchmarks, Sun decided to keep working during the summer :), and I almost can't keep going on my regular job, because this blogging hobby is keeping me busy because so many of my friends in the benchmarking group are producing so many great results on Sun systems!

    LINPACK HPC Performance Chart - GFLOPS (bigger is better)

    System GFLOPS Processors
    Total Peak paralellism chips,cores Type GHz
    Sun SPARC Enterprise M9000 1032.0 1228.8 128 64,128 SPARC64 VI 2.4
    Sun SPARC Enterprise M8000 268.6 307.2 32 16,32 SPARC64 VI 2.4
    Sun SPARC Enterprise M8000 255.3 291.84 32 16,32 SPARC64 VI 2.28
    IBM p570 239.4 300.8 16 8,16 POWER6 4.7
    HP rx8640 192.4 204.8 32 16,32 Itanium 2 1.6

    Benchmark Description

    The Linpack benchmark suite measures the performance for factoring and solving a dense set of linear equations in double-precision floating-point.

    The Linpack HPC benchmark allows the solution of any size matrix with a single right hand side. It was developed to allow vendors to show off their hardware. Because big problems allow for peak performance potentials, the benchmark is seen as an upper bound of potential performance of a machine. The run rules are much more flexible. The solution technique must use a pivoting scheme and the driver must follow the spirit of the Linpack 1000 or Linpack 100 benchmarks.

    Disclosure Statement:

    Linpack HPC, results from http://www.netlib.org/benchmark/index.html as of 07/13/07. Sun SPARC Enterprise M8000 (SPARC64 VI @2.4, 16 chips, 32 cores), 268.6 GFLOPS. IBM p570 (POWER6 4.7GHz, 8 chips, 16 cores) 239.4 GFLOPS. HP rx8640 (Itanium 2 1.6GHz/24MB, 16 chips, 32 cores), 192.4 GFLOPS. Linpack Benchmark Performance Report

    Results Summary

    Published Results
    Performance: 268.6 GFLOPS
    System: Sun SPARC Enterprise M8000, 256GB
    Total Number Processors: 16
    Processor/GHz of Server: SPARC64 VI, 2.4 GHz
    Operating System: Solaris 10
    Compiler: Sun Studio 12

    [4] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Lots more results to come

    Thursday Jul 12, 2007

    Sun doesn't sleep in the summer (other vendors are quiet, even those that have brand new products, huh?), Sun continues to set a variety of world records, and more to come this month and next month. Here is a review of 4 very recent HPC benchmarks.

    A World Record

    Another World Record

    Another World Record

    Another World Record

    Also a couple of commercial ones

    note: Sun talks about delivered system performance not
    ... "use 'per-core' quick hide the fact that these are super expensive cores or 'look at my peaks'" used by others.

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    CFD World Record Fluent Sun Blade X6250 Cluster (Xeon 3 GHz 5160)

    Thursday Jul 12, 2007

    The Sun Blade X6250 cluster was up to 27% faster or 6% faster on geometric mean than an SGI Altix XE 210 cluster (Xeon 3 GHz dual core 5160 Woodcrest) and Infiniband interconnects.

    A cluster of four Sun Blade X6250 Cluster (Xeon 3 GHz 5160) with Infiniband interconnects was used to set this record. Each of these two socket blades had dual-core Intel Xeon EM64T 5160 3 GHz (Woodcrest) 16 total cores.

    The Sun Blade X6250 Cluster (Xeon 3 GHz 5160) cluster running computational fluid dynamics program (CFD) the "Fluent 6" standard benchmark established a world record for runs made of the test suite using from 1 to 16 cores.

    Workload description

    Fluent is one of the most prominent commercial CFD (Computational Fluid Dynamics) codes. It is distributed worldwide to major engineering organizations in a broad spectrum of disciplines (aircraft, aerospace, automotive, marine, etc.) that are involved with fluid flow in some manner.

    Fluent like many major ISV's has developed a benchmark test suite to evaluate the performance of platforms. For several years results have been posted from hardware vendor platforms at the Fluent website.

    CFD models tend to be extremely large (fluid flow over entire car, aircraft and submarine bodies and complex flow involving mixing of species and chemical reaction). In order to have reasonable run times for the analyses use of many processing units is necessary. Currently the most effective way of achieving this is via an interconnected cluster of multi core rack mounted servers or blades. The current set of entries posted at the Fluent website reflect this fact.

    FLUENT 6 Benchmark ("Ratings", bigger is better)

    Rating = #f sequential runs in 1 day 86,400/(Total Elapsed Run Time in Seconds)

    Machine Sockets NCPUS FL5M1 FL5M2 FL5M3 FL5L1 FL5L2 FL5L3
    Sun Blade X6250 3GHz WC 5160 2 8 4965.5 10504.6 2563.8 1399.2 1028.3 174.9
    SGI Altix XE210 3GHz WC 5160 2 8 4937.1 9626.7 2014.0 1343.7 899.5 161.0
     
    Sun Blade X6250 3GHz WC 5160 2 4 2780.4 5358.1 1336.9 731.7 573.7 101.2
    SGI Altix XE210 3GHz WC 5160 2 4 2681.1 4657.7 998.0 679.2 449.7 80.7
     
    Sun Blade X6250 3GHz WC 5160 2 serial 919.4 1465.6 352.9 207.2 142.6 27.6
    SGI Altix XE210 3GHz 5160 2 serial 910.9 1445.4 349.5 204.1 136.6 26.8
    Other interesting points:

    • The "Fluent 6" standard benchmark test suite consists of "small" "medium" and "large " test cases. However both the small and medium sized test cases are all really on the small side and do not scale well beyond 16 cores.
    • The largest test case in the suite, "fl5l3" requires 9 GB running with only one core on a single node. This memory requirment per node is reduced when running in a dmp cluster mode on multi nodes with multi cores.
    • Fluent runs are cpu and sometimes memory intensive but do not require high performance I/O file systems.
    • Very recently Fluent has devloped a new benchmark test suite with extremely large models specifically intended to be run either on large multi core servers or large multi node clusters of multi core platforms.

    Workload Details

    Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, etc. The test cases represent a range of typical industry simulations.

    
    Descriptions
    Class   Benchmark       Cells   Mesh    Models  Solver  Description
    small   
            FL5S1          32,000 hexahedral ke  segregated implicit  turbulent flow in a bend
            FL5S2          32,000 hexahedral ke  coupled implicit     turbulent flow in a bend
            FL5S3          89,856 hexahedral ke  coupled implicit     flow in a compressor, rotor 37
    medium  
            FL5M1         155,188 tetrahedral ke  6spe reac DPM P1 segregated implicit coal combustion in a boiler, with particle tracking
            FL5M2         242,782 hybrid, hanging-node ke segregated implicit turbulent flow in an engine valveport
            FL5M3         352,800 hexahedral ke 6spe react segregated implicit combustion in a high velocity burner
    large   
            FL5L1         847,746 hexahedral ke coupled explicit transonic flow around a fighter
            FL5L2       3,618,080  hybrid  RNG ke segregated implicit external aerodynamics around a car body
            FL5L3       9,792,512  hexahedral RSM segregated implicit turbulent flow in a transition duct
    
    
    Small Class Ratings
    
    Small class problems contain less than 100,000 cells.
    
    FL5S1 - Accelerating turbulent flow in an elbow duct using segregated implicit solver
    Accelerating Turbulent Flow in an Elbow Duct using Segregated Implicit Solver
    
    Flow is accelerated through a 90 degree elbow duct with a rectangular
    cross section. The geometry and flow have a symmetry plane permitting
    the modeling of only half the domain. Because of the curvature of the
    duct, significant secondary flow occurs, with velocity components
    normal to the principal flow direction. The segregated implicit solver
    in FLUENT 5 is used to solve this flow.
    
    Number of cells 32,000
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5S2 - Accelerating turbulent flow in an elbow duct using coupled implicit solver
    Accelerating Turbulent Flow in an Elbow Duct using Coupled Implicit Solver
    
    Flow is accelerated through a 90 degree elbow duct with a rectangular
    cross section. The geometry and flow have a symmetry plane permitting
    the modeling of only half the domain. Because of the curvature of the
    duct, significant secondary flow occurs, with velocity components
    normal to the principal flow direction. The coupled implicit solver in
    FLUENT 5 is used to solve this flow.
    
    Number of cells 32,000
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver coupled implicit
    
    FL5S3 - Transonic flow in rotating fan
    Transonic Flow through a Rotor
    
    The flow through a transonic fan rotor (designated rotor 37 by NASA
    Lewis) was computed. It has 36 blades. The calculation was performed at
    a rotational speed of 17189 rpm. The domain boundaries consist of a
    hub, blade and shroud surface, a pressure inlet and outlet surface, and
    periodic surfaces.
    
    Number of cells 89,856
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver coupled implicit
    
    
    Medium class problems contain between 100,000 and 500,000 cells.
    
    FL5M1 - Coal combustion in a boiler
    Coal Combustion in a Boiler
    
    This application couples a continuous gas phase calculation with a
    discrete phase (particle) calculation. 500 coal particles are injected
    into an industrial boiler where their trajectories are computed using a
    Lagrangian formulation that includes dispersed phase inertia,
    hydrodynamic drag and the force of gravity. Each particle injection is
    subject to heating/cooling, vaporization, boiling and solid combustion.
    During the injection calculations, momentum, heat and mass exchanges
    are calculated and stored as source terms which are then used in the
    subsequent gas phase calculation. Furthermore, stochastic modeling of
    particle tracks, requiring a fixed number of "tries" per particle, are
    used to account for local turbulent fluctuations. In this calculation,
    10 stochastic tries per particle are used, resulting in a total of 5000
    particle tracks per discrete phase update. There are 10 continuous
    phase iterations per discrete phase update.
    
    Number of cells 155,188
    Cell type tetrahedral
    Models k-epsilon turbulenc 6 species with reaction dispersed phase
    P1 radiation
    Solver segregated implicit
    
    FL5M2 - Turbulent flow in an engine valveport
    Turbulent Flow in an Engine Valveport
    
    Flow is computed in an automotive valve port modeled using a zonal
    hybrid mesh. The region around the valve has been meshed with
    tetrahedral cells, while the duct providing the inlet flow to the valve
    has been meshed with hexahedra. Pyramid cells are used to transition
    between the hexahedral and tetrahedral cells. A fourth cell type called
    a prismatic (or wedge) cell is used for the cylinder downstream of the
    valve. Furthermore, hanging-node adaption was used to improve the
    accuracy of the predicted flow field.
    
    Number of cells 242,782
    Cell type hybrid hanging-node adaption
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5M3 - Combustion in a high velocity burner
    Combustion in a High Velocity Burner
    
    Fuel (CH4) is injected into ports of a high velocity gas burner located
    near the centerline. Air is supplied through the outer ports, with
    secondary air delivered into an outer annular region. Directly
    downstream of the annulus is a wedge-shaped annular baffle. The mixing
    of fuel and air occurs downstream of this baffle and recirculation
    zones behind the baffle provide stability and an attachment point for
    the flame in the main combustion chamber. Combustion is assumed to
    proceed via a two-step reaction mechanism, with turbulent mixing as the
    limiting rate, as described by the Magnessen model.
    
    Reference: M. Cavelli, A. Milani, "Spark-ignited wide stability gas
    burner for on/off and continuous duty," IFRF HT Meeting, Milan, October
    1996.
    
    Number of cells 352,800
    Cell type hexahedral
    Models k-epsilon turbulenc 6 species with reaction
    Solver segregated implicit
    
    Large Class
    
    Large class problems contain more than 500,000 cells.
    
    FL5L1 Transonic flow around a fighter aircraft
    Transonic Flow Around a Fighter Aircraft
    
    Flow around the AGARD M-151 combat aircraft research model is computed.
    The simulation geometry contains canards and forward swept wings, but
    no tail. The conditions modeled were Mach number 0.9 and 10.46 degrees
    angle of attack.
    
    Number of cells 847,764
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5L2 Exterior flow around a passenger sedan
    Exterior Flow Around a Passenger Sedan
    
    This benchmark represents the computation of the exterior flow field
    around a simplified model of a passenger sedan. The simulation geometry
    was used for the Japan External Aerodynamics competition. A
    viscous-hybrid grid with prismatic cells is used to adequately model
    the boundary layer regions.
    
    Number of cells 3,618,080
    Cell type
    
    FL5L2 Exterior flow around a passenger sedan
    Exterior Flow Around a Passenger Sedan
    
    This benchmark represents the computation of the exterior flow field
    around a simplified model of a passenger sedan. The simulation geometry
    was used for the Japan External Aerodynamics competition. A
    viscous-hybrid grid with prismatic cells is used to adequately model
    the boundary layer regions.
    
    Number of cells 3,618,080
    Cell type hybrid
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5L3 Turbulent flow through a transition duct
    Turbulent Flow Through a Transition Duct
    
    Turbulent flow of air through a duct is computed for this benchmark.
    The cross-sectional planes of the duct transition from a circle at the
    inlet to a rectangle at the outflow boundary. The Reynolds-Stress Model
    (7 equation) is used for computing turbulence.
    
    Number of cells 9,792,512
    Cell type hexahedral
    Models RSM turbulence
    Solver segregated implicit
    

    The cluster of Sun Blade X6250 outperfomed the following competitive hardware vendor clusters at all core levels considered (1 core smp, 1- core parallel, 2- 4- 8- and 16-core parallel runs) and for all (9) test cases in the benchmark test suite:

      HP BL460C (EM64T_WOODCREST_2CORE,3000,WINCCS,IB_HPMPI)
      HP DL140 (EM64T_WOODCREST_2CORE,3000,LINUX,IB)
      HP DL145_G2 (OPTERON_2CORE,2200,WINCCS,IB_HPMPI)
      SGI ALTIX4700 (IA64_MONTECITO_2CORE,1600,LINUX)
      SGI ALTIXXE210 (EM64T_WOODCREST_2CORE,3000,LINUX,IB_VOLTAIRE)
      TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,SLES10,GIGE)
      TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,WINCCS,GIGE)
      BULL NOVASCALE (EM64T_WOODCREST_2CORE,3000,RHEL4,IB)
      APPRO XTREMESERVER (OPTERON_2CORE,2800,RHEL4,IB)

    Disclosure Statement:

    All information on the Fluent website is Copyrighted 1995-2007 by Fluent Inc.Results from http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm as of July 2, 2007.

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest)
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

    Software Configuration:

      64-bit SUSE SLES 10
      Fluent V6.3.26
      Fluent 6 Standard Benchmark Test Suite
      Voltaire GridStack 4.1.5-7 for SLES 10

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record ABAQUS V6.6 on the Sun Blade X6250 Cluster

    Wednesday Jul 11, 2007

    Sun Blade X6250 posted World Record on the ABAQUS Explicit benchmark test suite the Sun Blade X6250 on the MCAE application ABAQUS V6.6. the Sun Blade X6250 used Xeon 3GHz DC 5160. On the various test cases Sun beats the Intel Supermicro by or by 1% to 39% !! The Sun Blade X6250 beats the Intel Supermicro even when you average all of the test case by an average 4% to 9% (geometric mean of all 6 tests cases at all cpu levels listed).

    Both machines have 2 sockets and dual core processors. Runs were made at 1- 2- and 4-cores and a geometric mean was established at each of these "cpu" levels based on the 6 test cases in the benchmark test suite.

    The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) processors and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ABAQUS website and for all 6 test cases in the ABAQUS "Explicit" benchmark test suite and at the 3 "cpu" levels (1-, 2- & 4-"cpu's"):

    About The ABAQUS Explicit Module

    This module designed for crash and high velocity impact analyses (including wave propagation and inertia effects) is very scalable and analysis models tend to be very large similar to CFD models. Timely results are best obtained using multiple processing units for typically large jobs either on a single multi core server in smp mode or on a multi node cluster of multi core platforms interconnected in dmp mode.

    Consequently this module is meant to run primarily in a multi cpu situation either in smp mode on a single large multi core machine or in dmp mode over a cluster of machines.

    ABAQUS V6.6-1 Benchmark Test Suites Explicit Benchmark Test Suite Landscape (time in seconds where smaller is better, Sun % better where bigger is better)

    Platform Cores e1 e2 e3 e4 e5 e6 Geometric Mean
     
    Sun Blade X6250/5160 4 10451 4509 3853 1887 1990 5202  
    Intel Super/5160's/RH4 4 10696 4646 3881 1997 2126 5460  
    Sun % Faster   2% 3% 1% 6% 7% 5% 4%
     
    Sun Blade X6250/5160 2 14232 7401 5477 2935 3327 7582  
    Intel Super/5160's/RH4 2 14878 8044 6316 3310 3483 8048  
    Sun % Faster   5% 9% 15% 13% 5% 6% 9%
     
    Sun Blade X6250/5160 1 24800 14198 10174 5147 6112 9553  
    Intel Super/5160 1 25076 14616 10563 5225 6272 13242  
    Sun % Faster   1% 3% 4% 1% 3% 39% 8%

    Abaqus/Explicit Benchmark Problems

    The problems described below provide an estimate of the performance that can be expected when running Abaqus/Explicit on different computers. The jobs are representative of typical Abaqus/Explicit applications including high-speed dynamic impact events and quasi-static events with complicated contact conditions. The number of increments listed in the tables below are approximate and can vary somewhat depending on the hardware platform and the number of parallel domains.

      E1: Car crash
      This benchmark consists of passenger car impacting a rigid wall. The car is meshed primarily with shell elements of type S3RS and S4RS with isotropic hardening Mises plasticity material behavior. The various compenents of the car are connected using multi-point constraints and connector elements. Many of the suspension and drivetrain components are modeled as rigid bodies. The car, road surface, and wall are placed into a single general contact domain and the car is given an initial velocity of 25 mph.

      E1
      Increments: 62,934
      Number of elements: 274,632

      E2: Cell phone drop
      This benchmark consists of a simplified model of a cell phone impacting a fixed rigid floor. The cell phone components are meshed using a variety of element types including C3D8R, C3D10M, and S4R. The material behavior is modeled using linear elasticity, isotropic hardening Mises plasticity, and hyperelasticity. The components are assembled using surface-based mesh ties and placed into a general contact domain that also includes the floor. The initial velocity and orientation of the cell phone is defined such that a severe oblique impact occurs.

      E2
      Increments: 87,369
      Number of elements: 45,785
      Memory requirement: 300 MB

      E3: Sheet forming
      This benchmark consists of forming a sheet metal part by the deep drawing process. The deformable sheet metal blank is meshed with shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. The tools are meshed using surface elements of type SFM3D4R which are declared rigid. General contact is defined between the blank and tools. The analysis sequence consists of two steps. During the first step the blank is clamped between the binder and die and then during the second step the punch is displaced to form the part. Since the process is essentially quasi-static the computations are performed over a sufficiently long time period to render inertial effects negligible. The performance of this analysis is a direct measure of the performance of the three-dimensional general contact algorithm.

      E3
      Increments: 31,177
      Number of elements: 34,540 (deformable only)
      Memory requirement: 550 MB

      E4: Projectile penetration
      This benchmark consists of a projectile penetrating a steel plate at an oblique angle. Both the projectile and plate are meshed using hexahedral elements of type C3D8R and use a rate-dependent isotropic hardening Mises plasticity material model with failure. The projectile and plate are placed into a general contact domain with surface erosion. The edges of the plate are held fixed and the initial velocity of the projectile is specified so that the projectile passes completely through the plate.

      E4
      Increments: 12,433
      Number of elements: 237,100
      Memory requirement: 1400 MB

      E5: Blast loaded plate
      This benchmark consists of a stiffened steel plate subjected to a high intensity blast load. The plate is meshed using shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. There is no contact.

      E5
      Increments: 81,716
      Number of elements: 50,000
      Memory requirement: 150 MB

      E6: Concentric spheres
      This benchmark consists of a large number of concentric spheres with clearance between each sphere. The spheres are meshed using hexahedral elements of type C3D8R and use an isotropic hardening Mises plasticity material model. All of the spheres are placed into a single general contact domain and the outer sphere is violently shaken which results in complex contact interactions between the contained spheres.

      E6
      Increments: 23,291
      Number of elements: 244,124
      Memory requirement: 1000 MB

      ABAQUS "Standard" & "Explicit" Benchmark Test Suites
      Voltaire GridStack 4.1.5-7 for SLES 10

    Disclosure Statement:

    The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or other countries: Abaqus, Abaqus/Standard, Abaqus/Explicit. All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systems. Results from http://www.simulia.com/support/v66/v66_performance.html as of 7/2/07.

    System Configuration

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
    Software Configuration:

      Linux: 64-bit SUSE SLES 10
    ABAQUS V6.6-3

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Record SPECapc Unigraphics UGS-NX3 MCAD Benchmark Sun Ultra 40 M2

    Thursday Jun 21, 2007

    Record SPECapc Unigraphics UGS-NX3 MCAD Benchmark Sun Ultra 40 M2

    The Sun Ultra 40 M2 (dual nVidia Quadro FX 5600s SLI mode & 3.0 GHz dual-core Opteron 2222 SE) sets a new world record running the SPEC APC UGS-NX3 graphics oriented MCAD benchmark beating all desktop platforms, including the the Woodcrest and Intel Core2 "Extreme Processor" X6800 cpu's.

    In dual framebuffer SLI mode the Ultra 40 M2 with 3.0 GHz 2222 SE dual core Opteron processors outperforms a Dell 690 (3.0 GHz Woodcrest) by 7% overall.

    SPECapc Unigraphics NX 3 Benchmark(Larger numbers indicate greater speed)

    System Overall
    Composite
    CPU
    Composite
    File I/0
    Composite
    Graphics
    Composite
    Sun Ultra 40 M2
    2x3.0GHz Opteron 2222SE
    2x FX 5600 (SLI)
    9.61 4.47 2.93 20.95
    Dell Precision 690
    2x3.0GHz Woodcrest
    2x FX 4600 (SLI)
    8.98 3.52 3.06 27.95
    Sun Ultra 40 M2
    2.8GHz Opteron 2220SE
    2x FX 5500 (SLI)
    7.19 3.08 3.00 16.85
    Dell Precision 690
    3.0GHz Woodcrest
    2x FX 4500 (SLI)
    6.30 3.25 1.64 12.29
    Current posted results at the SPEC website for the SPEC APC UGS-NX3 benchmark: http://www.spec.org/gpc/apc.data/specapc_nx3_summary.html

    Benchmark Description

    The SPEC APC MCAD benchmarks consist of tasks representative of what a designer would do in a typical session. This consists of "Graphics", "CPU", and "I/O" activities. A subscore is given for each of these subcategories as well as the overall score. The benchmark results shown here pertain to the SPEC APC UGS-NX3 benchmark. The MCAD application Unigraphics was used. This is a prominent system used by major engineering organizations worldwide.

    The SPEC APC MCAD benchmark test suite for UGS-NX3 was developed under the auspices of the SPEC APC Committee. Results for a variety of current desktop platforms from various hardware vendors are shown at the SPEC APC website.

    The characteristics of this MCAD application benchmark are very similar to other types of MCAD application benchmarks in that it consists of several groups of tasks each group involving different types of activity: graphics intensive, cpu intensive, and I/O intensive.

    The benchmark scoring will improve with the clock rate of the processor. The cpu intensive operations are sufficiently large that faster dimms will definitely provide some benefit. The graphics operations are intensive enough that using a better framebuffer will also contribute to higher performance. In fact using a second framebuffer in nVidia SLI mode will also improve performance by providing up to double the graphics performance component. The models are large enough and the I/O big enough that using multiple striped disks to store the assemblies and parts as well as writing plot and other types of database and interface files will also improve performance .

    Unigraphics is one of the prominent top 5 MCAD sytems used extensively by all sorts of diverse engineering organizations worldwide. There is a very big and broad market for the desktop platform that exhibits the leading price/performance with this code.

    Disclosure Statement:

    SPEC reg tm, SPECapc server mark of Standard Performance Evaluation Corporation. Dell Precision 690,2xFX4600,overall composite 8.98; Sun Ultra 40 M2,2xFX5500,overall composite 7.19; Dell Precision 690,2xFX4500,overall composite 6.30. Sun Ultra 40 M2, 2xFX 5600, overall composite 9.61. Results from http://www.spec.org/gpc/apc.data/specapc_nx3_summary.html as of June 20, 2007.

    Results
    Dual
    FX 5600
    Dual
    FX 5500
    Overall Composite: 9.61 7.19
    CPU Composite: 4.47 3.08
    File I/O Composite: 2.93 3.00
    Graphics Composite: 20.95 16.85
    Reference Date: 06/08/07 11/10/06
    System: Sun Ultra U40 M2 Sun Ultra U40 M2
    Processor/GHz: Opteron 2222SE/3.0 Opteron 2220SE/2.8

    System Configuration

    Hardware Configuration:

    Sun Ultra 40 M2

      2-socket 2x3.0 GHz dual core Opteron 2222 SE processors
      2x4x1 GB DDR2 667 MHz dimms
      2x nVidia Quadro FX 5600 (SLI)

    Sun Ultra 40 M2

      2-socket 2x2.8 GHz dual core Opteron 2200 processors
      2x4x1 GB DDR2 667 MHz dimms
      2x nVidia Quadro FX 5500 (SLI)
    Software Configuration:

    64-bit Windows XP Pro SP 1

    Unigraphics NX 3 (EDS-PLM Solutions)

    SPEC APC UGS-NX3 Benchmark Test Suite

    nVidia Quadro driver for Win XP: 160.02

    [2] Comments