BM Seer Facts & Questions from an Anonymous Sun Source

Fluent World Records on Sun's Blades

Wednesday Jan 23, 2008

Two benchmark test suites (standard and new) for the Fluent computational fluid dynamics (CFD) code were run on a mini cluster of Sun Blade X6250 blades with the recently announced 3.33 GHz dual core Intel 5260. The Sun Blade X6250 mini cluster beats all posted results for both FLUENT benchmark test suites at each core level from one up to the maximum sixteen cores that were available on the X6250 cluster.

  • In runs of the standard test suite the X6250 cluster was from 15% (lower core levels) to 42% (highest core level) faster than the previous top set of results posted from a Harpertown quad-core 3.0 GHz Xeon.
  • The scaling efficiency with the X6250 cluster ranged from 100% to on average 84% going from 1 to 8 cores when running the standard large test cases.
  • In runs of the new larger and more representative test suite, the X6250 cluster was from 15% (lower core levels) to 65% (highest core level) faster than the top results posted from a Harpertown quad-core 3.0 GHz Xeon.
  • The scaling efficiency with the X6250 cluster ranged from 100% to on average 84% going from 1 to 8 cores when running all six of the test cases in the new test suite.

Four, 2-socket Sun X6250 blades with Infiniband interconnects were used. Comparisons are presented against the results posted at the FLUENT Performance website.

The two benchmark test suites that were considered are first, the long standing standard "FLUENT 6" benchmark test suite consisting of 9 test cases: 3 small, 3 medium sized, and 3 large and is referenced above as the standard test suite. Secondly, the "FLUENT 6.3" benchmark test suite (referenced as the new test suite) consisting of larger models (both in memory requirements and mesh/model size) more suited for multi core multi node dmp cluster run environments and more representative of current actual engineering CFD.

Fluent is one of the most prominent commercial CFD (Computational Fluid Dynamics) codes. It is distributed worldwide to major engineering organizations in a broad spectrum of disciplines (aircraft, aerospace, automotive, marine, etc.) that are involved with fluid flow.

CFD models tend to be extremely large (fluid flow over entire car, aircraft and submarine bodies and complex flow involving mixing of species and chemical reaction). In order to have reasonable run times for the analyses use of many processing units is necessary. Currently the most effective way of achieving this is via an interconnected cluster of multi-core rack-mounted servers or blades.

Standard FLUENT 6 Benchmark Test Suite, large workload results ("ratings" bigger is better)

Click www.fluent.com/software/fluent/fl6bench/fl6bench_6.3/ to see the full table of results.

Rating = No. of sequential runs of test case possible in 1 day 86,400/(Total Elapsed Run Time in Seconds)

System NCPUS FL5L1 FL5L2 FL5L3
Sun X6250 3.33GHz DC 5260 1 259.4 178.5 32.4
Intel 3.0GHz QC Harpertown 1 220.5 151.2 27.9
SGI Altix XE210 3.0GHz Xeon 5160 1 210.7 153.5 29.6
IBM X3550 3GHz DC 5160 1 188.0 134.7 n/a
 
Sun X6250 3.33GHz DC 5260 2 493.4 351.8 61.9
Intel 3.0GHz QC Harpertown 2 420.7 297.3 54.2
SGI Altix XE210 3.0GHz Xeon 5160 2 396.1 298.0 56.2
IBM X3550 3GHz DC 5160 2 342.6 236.8 55.0
 
Sun X6250 3.33GHz DC 5260 4 931.8 675.7 122.0
SGI Altix XE210 3.0GHz Xeon 5160 4 679.2 449.7 80.7
IBM X3550 3GHz DC 5160 4 623.5 411.4 94.9
 
Sun X6250 3.33GHz DC 5260 8 1811.3 1227.3 207.2
Intel 3.0GHz QC Harpertown 8 1279.1 710.5 120.0
SGI Altix XE210 3.0GHz Xeon 5160 8 1343.7 899.5 161.0
IBM X3550 3GHz DC 5160 8 1273.4 862.3 149.9
 
Sun X6250 3.33GHz DC 5260 16 2941.3 1577.4 246.0
SGI Altix XE210 3.0GHz Xeon 5160 16 2584.9 1788.8 319.0
IBM X3550 3GHz DC 5160 16 2479.2 1722.0 306.8

New FLUENT 6.3 Benchmark Test Suite, "Ratings" (bigger is better)

Rating = No. of sequential runs of test case possible in 1 day 86,400/(Total Elapsed Run Time in Seconds)

System NCPUS eddy turbo aircraft sedan truck14m truckpoly
Sun X6250 3.33GHz DC 5260 1 109.2 440.4 96.6 65.1 7.0 8.3
Intel 3.0GHz QC Harpertown 1 95.9 n/a 84.2 55.9 6.2 6.9
 
Sun X6250 3.33GHz DC 5260 2 208.9 823.1 178.8 121.3 14.6 16.1
Intel 3.0GHz QC Harpertown 2 183.1 741.3 162.9 109.6 12.4 13.4
 
Sun X6250 3.33GHz DC 5260 4 415.6 1590.4 353.8 246.2 29.9 31.9
 
Sun X6250 3.33GHz DC 5260 8 780.8 2805.2 577.1 384.4 55.0 57.3
Intel 3.0GHz QC Harpertown 8 491.4 1685.0 321.0 207.2 32.1 33.2
 
Sun X6250 3.33GHz DC 5260 16 1095.8 3744.3 682.9 429.9 73.7 74.6

Key Technical Points

  • The "small" and even "medium" test cases in the standard suite are both not too large and not very representative any more of typical usage.
  • Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with Fluent is very good often linear or perfect up to 64 and even 128 cores
  • Performance when running Fluent in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband
  • Fluent supports a variety of interconnects from various hardware vendors (e.g. Voltaire, Cisco/Topspin, QLogic [formerly Silverstorm], Myrinet) MPI's (HP-MPI, MVAPICH(2), LAM, plus private vendor versions) and communication protocals (e.g. ssh and rsh)
  • There is still not an officially certified version of a Solaris build of Fluent for X86-64 platform architectures. However, a prototype build compiled a while ago with Sun Studio 11 compilers then outperformed all other platforms under other operating systems (64-bit Linux). These competitive results are currently posted at the Fluent website from several hardware vendors including current competitive AMD and Intel based platforms running under 64-bit Linux operating systems.
  • Very recently, Fluent has devloped a new benchmark test suite with larger models specifically intended to be run either on large multi core servers or large multi node clusters of multi core platforms.

    Benchmark Description

    The Original Standard "Fluent 6" Benchmark Test Suite

    Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, numerics and parallelization, compilers and optimization, in addition to performance characteristics of the hardware where the simulation is performed. The problems selected represent a range of simulations typical of those which might be found in industry. The principal objective of this benchmark suite is to provide comprehensive and fair comparative information of the performance of FLUENT on available hardware platforms.

    Disclosure Statement:

    All information on the Fluent website is Copyrighted 1995-2008 by Fluent Inc. Results from www.fluent.com as of January 7, 2008.

    System Configuration

    4 Sun Blade X6250's
    3.33 GHz dual core Intel 5260
    2 internal striped 15K SAS drives (cluster shared file system)
    Infiniband (Voltaire) interconnects

    SuSE Linux Enterprise Server SLES 10
    Voltaire OFED gridstack
    HP-MPI
    Fluent V6.3.26
    Fluent 6 Standard Benchmark Test Suite
    Fluent 6.3 "New" Benchmark Test Suite

    See Also:

    New Fluent benchmark results posted at: http://www.fluent.com/software/fluent/fl6bench/fl6bench_6.3

    Standard Fluent benchmark results posted at: http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm

  • [1] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Update: World Record EXA PowerFLOW Cluster & Single Node

    Monday Jul 16, 2007

    Update:

    A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in both benchmark test cases.

    Other runs on the 4-node cluster of Sun Blade X6250 outperformed the SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster.

      question: can the Itanic dual-core keep floating?

    The 4-node Sun Blade X6250 cluster outperformed the SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster. Even at the single node configuration, the Sun Blade X6250 beats an SGI Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4% faster in the 1-core results.

    In summary:
    World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160) beats the best posted results for any single node blades and servers. All posted results are for 2 socket dual-core platforms

    EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model) results in seconds (smaller is better)

    #
    C
    P
    U
    IBM e135
    Opt
    DC 2.4GHz
    Myri
    net
    SLES 9
    HP BL460
    Xeon
    DC 3GHz
    IB
    RHEL 4
    HP BL460
    Opt
    DC 3.0GHz
    IB
    XC3.1 RC1
    HP DL140
    Xeon
    DC 3GHz
    IB
    XC3.1 RC1
    HP RX2660
    Itan2
    DC 1.6GHz
    IB
    RHEL 4
    Sun X6250
    Xeon 5160
    DC 3.0GHz
    IB
    SLES 10
    SGI Altix
    Itan2
    DC 1.6GHz
     
    Pro
    Pack5
    SGI Altix
    XE
    Xeon
    DC 3GHz
     
    SLES 10
    1 - - - - - 822.7 1631.4 866.1
    2 - - - - - 418.5 832.7 448.8
    4 - - - - - 214.9 438.4 264.8
    8 182.9 137.2 137.8 134.7 214.3 118.6 227.2 147.9
    16 96.3 70.4 71.3 70.5 111.4 77.5 117.9 78.1
    32 51.5 37.0 40.6 36.6 57.9 - 60.2 41.9
    64 31.5 21.5 22.9 21.1 31.8 - - 28.0
    96 24.7 17.3 - - - - - -
    128 19.0 - - - - - - 18.1

    "-" no result published

    EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model) results in seconds (smaller is better)

    #
    C
    P
    U
    IBM e135
    Opt
    DC 2.4GHz
    Myri
    net
    SLES 9
    HP BL460
    Xeon
    DC 3GHz
    IB
    RHEL 4
    HP BL460
    Opt
    DC 3GHz
    IB
    XC3.1
    RC1
    HP DL140
    Xeon
    DC 3.0GHz
    IB
    XC3.1 RC1
    HP RX2660
    Itan2
    DC 1.6GHz
    IB
    RHEL 4
    Sun X6250
    Xeon 5160
    DC 3GHz
    IB
    SLES 10
    SGI Altix
    Itan2
    DC 1.6GHz
     
    Pro
    Pack5
    SGI Altix
    XE
    Xeon
    DC 3GHz
     
    SLES 10
    1 - - - - - 1966.4 3884.0 2043.6
    2 - - - - - 987.5 2000.4 1062.4
    4 - - - - - 500.5 1054.5 620.7
    8 424.9 310.0 306.4 258.4 490.7 258.4 526.7 316.0
    16 216.0 165.4 - 160.1 253.9 164.5 272.1 174.4
    32 112.8 82.3 84.4 83.3 129.3 - 139.4 90.3
    64 61.5 43.8 43.8 43.2 68 - 75.6 48.7
    96 45.2 32.3 - - - - - -
    128 36.8 - - 24.4 - - - 32.8

    "-" no result published

    The EXA PowerFLOW Benchmark Test Suite
    The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.

    Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores

    The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes(different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.

      Case #1 Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
      Case #2 Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).

    It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.

    The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.

    Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband

    Disclosure Statement:

    Exa Corporation Copyright All information on the EXA website is under Copyright 1996-2007 by Exa Corporation., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of 07/02/07.

    System Configuration

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250
      2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

    Software Configuration:

      Linux 64-bit SUSE SLES 10
      EXA PowerFLOW V3.6c & V4.c
      EXA PowerFLOW Benchmark Test Suite
      Voltaire GridStack 4.1.5-7 for SLES 10

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record EXA PowerFLOW Cluster & Single Node

    Thursday Jul 12, 2007

    Entry updated please see http://blogs.sun.com/bmseer/entry/update_world_record_exa_powerflow for the latest.

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record ABAQUS V6.6 on the Sun Blade X6250 Cluster

    Wednesday Jul 11, 2007

    Sun Blade X6250 posted World Record on the ABAQUS Explicit benchmark test suite the Sun Blade X6250 on the MCAE application ABAQUS V6.6. the Sun Blade X6250 used Xeon 3GHz DC 5160. On the various test cases Sun beats the Intel Supermicro by or by 1% to 39% !! The Sun Blade X6250 beats the Intel Supermicro even when you average all of the test case by an average 4% to 9% (geometric mean of all 6 tests cases at all cpu levels listed).

    Both machines have 2 sockets and dual core processors. Runs were made at 1- 2- and 4-cores and a geometric mean was established at each of these "cpu" levels based on the 6 test cases in the benchmark test suite.

    The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) processors and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ABAQUS website and for all 6 test cases in the ABAQUS "Explicit" benchmark test suite and at the 3 "cpu" levels (1-, 2- & 4-"cpu's"):

    About The ABAQUS Explicit Module

    This module designed for crash and high velocity impact analyses (including wave propagation and inertia effects) is very scalable and analysis models tend to be very large similar to CFD models. Timely results are best obtained using multiple processing units for typically large jobs either on a single multi core server in smp mode or on a multi node cluster of multi core platforms interconnected in dmp mode.

    Consequently this module is meant to run primarily in a multi cpu situation either in smp mode on a single large multi core machine or in dmp mode over a cluster of machines.

    ABAQUS V6.6-1 Benchmark Test Suites Explicit Benchmark Test Suite Landscape (time in seconds where smaller is better, Sun % better where bigger is better)

    Platform Cores e1 e2 e3 e4 e5 e6 Geometric Mean
     
    Sun Blade X6250/5160 4 10451 4509 3853 1887 1990 5202  
    Intel Super/5160's/RH4 4 10696 4646 3881 1997 2126 5460  
    Sun % Faster   2% 3% 1% 6% 7% 5% 4%
     
    Sun Blade X6250/5160 2 14232 7401 5477 2935 3327 7582  
    Intel Super/5160's/RH4 2 14878 8044 6316 3310 3483 8048  
    Sun % Faster   5% 9% 15% 13% 5% 6% 9%
     
    Sun Blade X6250/5160 1 24800 14198 10174 5147 6112 9553  
    Intel Super/5160 1 25076 14616 10563 5225 6272 13242  
    Sun % Faster   1% 3% 4% 1% 3% 39% 8%

    Abaqus/Explicit Benchmark Problems

    The problems described below provide an estimate of the performance that can be expected when running Abaqus/Explicit on different computers. The jobs are representative of typical Abaqus/Explicit applications including high-speed dynamic impact events and quasi-static events with complicated contact conditions. The number of increments listed in the tables below are approximate and can vary somewhat depending on the hardware platform and the number of parallel domains.

      E1: Car crash
      This benchmark consists of passenger car impacting a rigid wall. The car is meshed primarily with shell elements of type S3RS and S4RS with isotropic hardening Mises plasticity material behavior. The various compenents of the car are connected using multi-point constraints and connector elements. Many of the suspension and drivetrain components are modeled as rigid bodies. The car, road surface, and wall are placed into a single general contact domain and the car is given an initial velocity of 25 mph.

      E1
      Increments: 62,934
      Number of elements: 274,632

      E2: Cell phone drop
      This benchmark consists of a simplified model of a cell phone impacting a fixed rigid floor. The cell phone components are meshed using a variety of element types including C3D8R, C3D10M, and S4R. The material behavior is modeled using linear elasticity, isotropic hardening Mises plasticity, and hyperelasticity. The components are assembled using surface-based mesh ties and placed into a general contact domain that also includes the floor. The initial velocity and orientation of the cell phone is defined such that a severe oblique impact occurs.

      E2
      Increments: 87,369
      Number of elements: 45,785
      Memory requirement: 300 MB

      E3: Sheet forming
      This benchmark consists of forming a sheet metal part by the deep drawing process. The deformable sheet metal blank is meshed with shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. The tools are meshed using surface elements of type SFM3D4R which are declared rigid. General contact is defined between the blank and tools. The analysis sequence consists of two steps. During the first step the blank is clamped between the binder and die and then during the second step the punch is displaced to form the part. Since the process is essentially quasi-static the computations are performed over a sufficiently long time period to render inertial effects negligible. The performance of this analysis is a direct measure of the performance of the three-dimensional general contact algorithm.

      E3
      Increments: 31,177
      Number of elements: 34,540 (deformable only)
      Memory requirement: 550 MB

      E4: Projectile penetration
      This benchmark consists of a projectile penetrating a steel plate at an oblique angle. Both the projectile and plate are meshed using hexahedral elements of type C3D8R and use a rate-dependent isotropic hardening Mises plasticity material model with failure. The projectile and plate are placed into a general contact domain with surface erosion. The edges of the plate are held fixed and the initial velocity of the projectile is specified so that the projectile passes completely through the plate.

      E4
      Increments: 12,433
      Number of elements: 237,100
      Memory requirement: 1400 MB

      E5: Blast loaded plate
      This benchmark consists of a stiffened steel plate subjected to a high intensity blast load. The plate is meshed using shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. There is no contact.

      E5
      Increments: 81,716
      Number of elements: 50,000
      Memory requirement: 150 MB

      E6: Concentric spheres
      This benchmark consists of a large number of concentric spheres with clearance between each sphere. The spheres are meshed using hexahedral elements of type C3D8R and use an isotropic hardening Mises plasticity material model. All of the spheres are placed into a single general contact domain and the outer sphere is violently shaken which results in complex contact interactions between the contained spheres.

      E6
      Increments: 23,291
      Number of elements: 244,124
      Memory requirement: 1000 MB

      ABAQUS "Standard" & "Explicit" Benchmark Test Suites
      Voltaire GridStack 4.1.5-7 for SLES 10

    Disclosure Statement:

    The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or other countries: Abaqus, Abaqus/Standard, Abaqus/Explicit. All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systems. Results from http://www.simulia.com/support/v66/v66_performance.html as of 7/2/07.

    System Configuration

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
    Software Configuration:

      Linux: 64-bit SUSE SLES 10
    ABAQUS V6.6-3

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record ANSYS on Sun Blade X6250 (Xeon 3GHz DC 5160)

    Tuesday Jul 10, 2007

    The Sun Blade X6250 outperfoms all posted ANSYS V11.0 (MCAE) results at www.ansys.com website. A single Sun Blade X6250 beats a single Intel S5000 XAL (same 3GHZ Xeon 5160) by as much as 40% at each of the three "cpu" levels tested (1-, 2-, and all 4 cores available on both 2 socket platforms equipped with dual core processors). Sun Wins at these processor configurations in 6 of the total 7 cases in the benchmark test suite. Overall, on the geometric mean, Sun was 10% higher.

    The only case "bm-2" where the Sun X6250 looses has an exceptionally high I/O component, and even so Sun was only 3-4% slower. The Sun X6250 had 10K rpm internal disk drives where the Intel S5000 XAL had 15K rpm drives.

    The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ANSYS website for all 7 test cases in the ANSYS "Standard" benchmark test suite (1-, 2- & 4-cpu).

    Yes this result was run with Linux, Sun wants to show that we can win with every OS. There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0 for X86-64 platform architectures compiled with recent Sun Studio 11 compilers. This is the first SX64 version that has become available.

    Competitive Landscape

    ANSYS V 11.0 "Standard" Benchmark Test Suite on X2200 M2 & Constellation Blades (run times in seconds, smaller is better; for % bigger is better)

    System Cores bm-1 bm-2 bm-3 bm-4 bm-5 bm-6 bm-7
     
    Sun X6250/5160 4 100 1362 343 164 181 131 752
    Intel S5000XAL/5160 4 109 1312 369 169 187 161 1048
    Sun % better   9% -4% 8% 3% 3% 23% 39%
     
    Sun X6250/5160 2 118 1398 385 183 223 169 1064
    Intel S5000XAL/5160 2 128 1356 417 186 244 211 1437
    Sun % better   9% -3% 8% 2% 9% 25% 35%
     
    Sun X6250/5160 1 150 1455 456 211 339 253 1770
    Intel S5000XAL/5160 1 164 1416 489 215 340 314 2330
    Sun % better   9% -3% 7% 2% 1% 24% 32%

      (please note: per core performance isn't the right metric for comparing different CPUs, as system costs vary greatly, but they are used here to identify configuration) It is "SYSTEM" performance not 'core' performance that matters!)

    Key Technical Points

    • The test cases from the ANSYS standard benchmark test suite all have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.

    ANSYS 11.0 Standard Test Cases

      bm-1
      Name:Exhaust Elbow Manifold
      Description:Static structural analysis. Solved for equivalent stresses.
      Statistics:~850,000 DOF Model

      bm-2
      Name:Floor Panel
      Description:Surface body geometry. Harmonic analysis with mode superposition.
      Statistics:~765,000 DOF Model

      bm-3
      Name:Engine Assembly - Piston and Crank
      Description:Assembly with contact. Nonlinear structural DOF solution.
      Statistics:~250,000 DOF Model

      bm-4
      Name:Electric Motor
      Description:Electromagnetic analysis. Solved for magnetic field intensities.
      Statistics:~250,000 DOF Model

      bm-5
      Name:Brake Rotor
      Description:Thermal transient analysis. Solved for temperature DOF?s.
      Statistics:~230,000 DOF Model

      bm-6
      Name:Wing Section
      Description:Static structural analysis.
      Statistics:~250,000 DOF Model
      Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

      bm-7
      Name:Wing Section
      Description:Static structural analysis.
      Statistics:~800,000 DOF Model
      Notes:bm-6 and bm-7 are designed to demonstrate ability of systems to handle larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will be substantially impared in performance on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1 Bm-7 generates a large 12.8 Gb file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On PC workstations the solver will run using less than optimal out-of-core memory requiring excessive I/O during factorization. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

    Disclosure Statement:

    The following are trademarks or registered trademarks of ANSYS, Inc. : ANSYS Multiphysics TM All information on the ANSYS website is Copyrighted 2007 by ANSYS, Inc. Results at http://www.ansys.com/services/hardware-support-db.htm, July 2, 2007.

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
      32 GB memory
    Software Configuration:
      64-bit Linux SuSE SLES 10
      (note: Sun works great with Linux, that is why we show all kinds of benchmarks! )
      ANSYS V11.0
      ANSYS 11 "Standard" Benchmark Test Suite

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Sun Blade X6220 4-thread SPEC OMPM2001 World Record

    Thursday Jun 14, 2007

    The Sun Blade X6220 (Opteron 2222SE) delivers the best performance on the SPEC OMPM2001 benchmark suite of all systems running 4-threads results.

    The Sun Blade X6220 server in an two-way, dual-core configuration, produced best 4-thread SPECompM2001 result of 13847. The results show that the combination of Solaris 10 using Sun Studio 12 is unmatched by the competition for assisting users in writing parallel code.

    How does this compare to a 4-thead IBM POWER6?
    We don't know it is just another benchmark that IBM doesn't publish on. But we know that 32-thread IBM System p 570 (4.7 GHz, 32 threads, 16 cores, 8 chip) result of 86,624 SPECompMpeak2001. This is 8 times more threads than what we need to make a useful comparison, or do the math. Check the IBM system pricing to see the expense of a POWER6 versus blades.

    SPECompM2001 Performance - peak, bigger is better
    Result Cores Chips Threads System
    Peak Base
    13847 13348 4 2 4 Sun Blade X6220, Opteron 2222SE, 3.0GHz
    13817 13195 4 1 4 Sun Fire V40z, Opteron 254, 2.8GHz
    13222 12763 4 2 4 Sun Fire X4100/X4200 M2, Opteron 2220SE, 2.8GHz
    12574 12127 4 2 4 Sun Fire X2200 M2, Opteron 2218, 2.6GHz
    10964 10424 4 2 4 Sun Fire X4100, Opteron 285, 2.6GHz
    8174 8141 2 1 4 IBM System p5 520 (1900 MHz, 2 CPU)

    Benchmark Description

    The SPEC OMPM2001 Benchmark Suite was released in June 2001 and tests HPC performance using OpenMP for parallelism.

    • 11 programs (3 in C and 8 in Fortran) parallelized using OpenMP API
    Goals of suite:
    • Targeted to mid-range (4-32 processor) parallel systems
    • Run rules, tools and reporting similar to SPEC CPU2000
    • Programs representative of HPC and Scientific Applications

    Disclosure Statement:

    SPEC, SPEComp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of Jun 6, 2007, Sun result submitted to SPEC. Sun Blade X6220 (4 cores, 2 chips, 4 threads), 13,847 SPECompM2001. Sockets refers to chips. 32-thead 16-core IBM System p 570 (4.7 GHz, 8 chip, 32 threads) result (86,624 SPECompMpeak2001), submitted to SPEC, but IBM does not list the date on their official website http://www-03.ibm.com/systems/p/benchmarks/hpc.html, why not?

    System Configuration

    Result
    X6220 4-threads: 13847 SPECompM2001
    Reference Date: Jun 06, 2007
    System: Sun Blade X6220 16GB memory (4x2GB per chip), DDR667
    Total Number Processors: 2
    Processor/GHz of Server: Opteron 2222SE, 3.0 GHz
    Operating System: Solaris 10 11/06
    Compiler: Sun Studio 12

    See Also
    sun.com X6220 Blade Benchmark Page

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Sun Blade X6250 & Sun Studio 12 x86 World Record

    Wednesday Jun 13, 2007

    Sun Blade X6250 Delivers a pair of x86 SPEC CPU2006 integer performance World Records:

    Sun Blade X6250 (Dual-Core Intel Xeon 5160) and running Solaris 10 and using Sun Studio 12 compiler delivered the best x86 result for the SPECint2006 benchmark.

    Sun Blade X6250 (Dual-Core Intel Xeon 5160) using Solaris 10 and Studio 12, delivered x86 4-core world record on SPECint_rate2006.

    Sun Blade X6250 server had a SPECint2006 result of 21.0 and SPECint_rate2006 result of 65.0. The advanced features of freely available Sun Studio 12 complier were critical for getting this level of performance on the Sun Blade 6250.

    The Sun Blade X6250 is only 3% slower than the peak score of the very-expensive new IBM POWER6 p570, which was recently announced. SPECint2006 is a single job stream. So let's now turn to comparing 4 thread results, in this case the Sun Blade X6250 is 7% faster than the peak SPECint_rate2006 score of he very-expensive new IBM POWER6 p570 (both IBM and Sun at 4 threads). Oh, and remember that anymore clock rate is not how you compare systems the Sun Blade X6250 is at 3GHz and the IBM POWER6 is at 4.7GHz. CPU frequency is basically irrelevant, it is CPU and system architecture that matters!

    SPEC CPU2006 Landscape - bigger is better, selected recent results

    SPECint2006

    System Processors Performance Results
    Type GHz Chips Cores Peak Base
    IBM p570 (power6) Power6 4.7 1 1 21.6 17.8
    Sun Blade X6250 Intel Xeon 5160 3.0 2 4 21.0
    Supermicro X7DB8+ board Intel Xeon 5160 3.0 2 4 20.8 18.9
    Sun Ultra 40 M2 AMD Opteron 2222SE 3.0 2 4 16.1

    SPECint_rate2006

    System Processors Performance Results
    Type GHz Chips Cores Threads
    / Copies
    Peak Base
    Sun Blade X6250 Intel Xeon 5160 3.0 2 4 4 65.0
    Supermicro X7DB8+ Intel Xeon 5160 3.0 2 4 4 64.9 60.0
    IBM p570 (Power6) Power6 4.7 1 2 4 60.9 53.2
    Sun Ultra 40 M2 AMD Opteron 2222SE 3.0 2 4 4 60.4
    Fujitsu BX620 S3 Xeon 5160 (Woodcrest) 3.0 2 4 4 59.4 56.7

    Results as of 06 Jun 2007 from www.spec.org.

    Benchmark Description

    SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

    Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

    The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

    Disclosure Statement:

    SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org or from IBM public websites as of 6/06/07. Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 65.0 SPECint_rate2006; Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 21.0 SPECint2006; IBM System p 570 (POWER6, 1chip/1core, AIX 5L v5.3) 21.6 SPECint2006; IBM System p 570 (POWER6, 4 theads, 1chip/2cores, AIX 5L v5.3) 60.9 SPECint_rate2006.

    System Configuration

    Results
    Reference Date: Jun 06, 2007
    System: Sun Blade X6250
    SPEED: 16GB memory 8x2GB
    RATE : 32GB memory 8x4GB
    X6250 21.0 SPECint2006
    X6250 65.0 SPECint_rate2006
    Total Number Processors: 2 x Intel Xeon 5160
    Software: Solaris 10 11/06, Sun Studio 12 Compiler, MicroQuill's SmartHeap Library v7.4

    See Also

  • All Benchmark results on Sun Blade 6000 Blade Server
  • [4] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    SPARC & Opteron perfect storm for SPECjAppServer World Record

    Thursday Jan 18, 2007

    The Sun Blade 8000 & Sun Fire E6900 a perfect mix for SPECjAppServer2004 World Record Performance.

    The Sun Blade 8000 Modular Server, consisting of ten Sun Blade X8420 Server Modules as the application tier (4x Opteron 8220 DC 2.8GHz) and a Sun Fire E6900 for the Database tier (24x UltraSPARC IV+ 1.8 GHz) delivered a WORLD RECORD result of 7174.56 SPECjAppServer2004 JOPS@Standard.

    The ten Sun Blade X8420 Server Modules demonstrated 5% better performance over the best HP result of 6812.79 SPECjAppServer2004 JOPS@Standard which used 11 rx3600 servers and a Superdome for the database with 32 dual-core Itanium2.

    This result shows the Sun Blade 8000 Modular Server with 64% better performance over the IBM result of 4368.02 SPECjAppServer2004 JOPS@Standard using 20 IBM XSeries BladeCenter HS20 blades for the application servers and IBM p5 570 for the database.

    This result shows the Sun Blade X8420 Server Module with 8% improved scaling over the Sun Blade X8400 Server Module.

    This benchmark result demonstrates that the Sun Blade X8420 and Sun Fire E6900 running the Solaris 10 Operating system can support over 43,000 concurrent users accessing J2EE applications.

    Result highlights the performance benefits of the latest BEA Weblogic Server release 9.2 on Sun Blade X8420 Server Modules.

    This benchmark used IBM DB2 8.2.6 on the Sun Fire E6900 equipped with 24 UltraSPARC IV+ to deliver this world record result.

    Competitive Landscape

    SPECjAppServer2004 Performance Chart (bigger is better) as of 01/17/2007

      SPECjAppServer2004
    JOPS@Standard
    J2EE Server DB Server
    Sun 7174.56 1x Sun Blade 8000 (10 x X8420)
    80 cores, 40 chips @ 2.8 GHz AMD 8220
    BEA WebLogic 9.2
    1 x Sun Fire E6900
    48 cores, 24 chips @ 1.8 GHz US-IV+
    IBM DB2 8.2.6
    HP 6812.79 11 x HP rx3600
    44 cores, 22 chips @ 1.6 GHz Itanium 2
    Oracle OC4J 10.1.3.2
    1 x 9000 Superdome
    64 cores, 32 chips @ 1.6 GHz Itanium 2
    Oracle 10g 10.2.0.2
    Sun 6662.98 1x Sun Blade (10 x X8400)
    80 cores, 40 chips @ 2.6 GHz AMD 880
    BEA WebLogic 9.2
    1 x Sun Fire E6900
    48 cores, 24 chips @ 1.5 GHz US-IV+
    IBM DB2 8.2.5
    HP 4915.49 4 x HP rx6600
    32 cores, 16 chips @ 1.6 GHz Itanium 2
    BEA WebLogic 9.1
    1 x 9000 Superdome
    64 cores, 32 chips @ 1.6 GHz Itanium 2
    Oracle 10g 10.2.0.2
    IBM 4368.02 2x IBM HS20
    40 cores, 40 chips @ 3.6 GHz Intel Xeon
    WebSphere 6.1
    1 x IBM p570
    16 cores, 8 chips @ 1.9 GHz IBM Power5
    IBM DB2 v9.1
    Sun 4098.77 7 x Sun Fire T2000
    56 cores, 7 chips @ 1.2 GHz US-T1
    BEA WebLogic 9.0
    1 x Sun Fire E6900
    40 cores, 20 chips @ 1.5 GHz US-IV+
    Oracle 10g 10.1.0.4

    SPECjAppServer2004 Results Page

    Benchmark Description

    SPECjAppServer2004 (Java Application Server) is a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition (J2EE) technology-based application servers. SPECjAppServer2004 is an end-to-end application which exercises all major J2EE technologies implemented by compliant application servers as follows:

    • The web container, including servlets and JSPs
    • The EJB container
    • EJB2.0 Container Managed Persistence
    • JMS and Message Driven Beans
    • Transaction management
    • Database connectivity

    Moreover, SPECjAppServer2004 also heavily exercises all parts of the underlying infrastructure that make up the application environment, including hardware, JVM software, database software, JDBC drivers, and the system network.

    The primary metric of the SPECjAppServer2004 benchmark is jAppServer Operations Per Second (JOPS) which is calculated by adding the metrics of the Dealership Management Application in the Dealer Domain and the Manufacturing Application in the Manufacturing Domain. There is NO price/performance metric in this benchmark.

    Disclosure Statement:

    SPECjAppServer2004 10 Sun Fire X8420 (80 cores, 40 chips) and 1 Sun Fire E6900 (48 cores, 24 chips) 7174.56 SPECjAppServer2004 JOPS@Standard.
    SPECjAppServer2004 11 HP rx3600 (44 cores, 22 chips) and HP 9000 Superdome (64 cores, 32 chips) 6812.79 SPECjAppServer2004 JOPS@Standard.
    SPECjAppServer2004 20 IBM xSeries BladeCenter HS20 (40 cores, 40 chips) and IBM eServer p5 570 (16 cores, 8 chips) 4368.02 SPECjAppServer2004 JOPS@Standard.
    SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation.
    Results from http://www.spec.org. as of 01/17/2007.

      Certified Results 7174.56 SPECjAppServer2004 JOPS@Standard
      Reference Date: Jan 17, 2007
      Systems: 10 x Sun Blade X8420, 32GB
      1 x Sun Fire E6900, 192GB, 4 x Sun StorageTek SE3510 FC Array
      Total Number Processors: 40, 24
      Processor/GHz of Server: AMD Opteron 8220 2.8 GHz
      UltraSPARC IV+ 1.8 GHz
      Operating System: Solaris 10 6/06
      Software: BEA WebLogic 9.2 Advantage Edition
      IBM DB2 8.2.6 Enterprise Editon
      JVM: J2SE 5.0 update 10

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    bucket-o-records SPEC CPU2006 Sun Blade X8420

    Thursday Jan 11, 2007

    Sun Blade X8420 is 1.9x faster than the best Intel Woodcrest system on SPECint_rate2006 and is also 2.1x faster than the best Intel Woodcrest on SPECfp_rate2006. The Sun Blade X8420 is also 22% faster than 4-way Itanium2 dual-core on SPECfp_rate.

    Sun Blade X8420 delivered the best result with SPECint_rate2006 score of 93.1, using Solaris 10 and Studio 11 combo. The Sun Blade X8420 also delivered the best result of of 87.3 for the SPECfp_rate2006 benchmark for all x86 systems.

    SPEC CPU2006 Performance Charts (bigger is better, selected recent results)

    SPECint_rate2006

    System Processors Performance Results
    Type GHz Chips Cores Threads Peak Base
    Sun Blade X8420 AMD Opteron 8220 2.8 4 8 8 93.1 80.4
    Fujitsu CELSIUS R640 Xeon 5160 (Woodcrest) 3.0 2 4 4 50.3 48.8
    Sun Ultra 40 M2 AMD Opteron 2220SE 2.8 2 4 4 48.8 41.9
    HP DL585 Opteron 854 2.8 4 4 4 46.9 41.4
    Supermicro X7DBE Xeon 5160 (Woodcrest) 3.0 2 4 4 --- 45.2
    Sun Fire X4200 Opteron 285 2.6 2 4 4 42.8 37.8
    Fujjitsu RX220 Opteron 280 2.4 2 4 4 40.0 35.7
    Sun Fire X4200 Opteron 256 3.0 2 2 2 26.4 23.1
    HP DL585 Opteron 854 2.8 2 2 2 25.2 22.3
    Dell PrecWork 380 Pentium EE 3.73 1 2 2 -- 23.1
    HP DL380 G4 Pentium 4 3.8 2 2 2 -- 20.9

    SPECfp_rate2006

    System Processors Performance Results
    Type GHz Chips Cores Threads Peak Base
    Sun Blade X8420 AMD Opteron 8220 2.8 4 8 8 87.3 82.5
    HP rx6600 Itanium2 dual-core 1.6 4 8 8 71.4 69.1
    HP DL585 Opteron 854 2.8 4 4 4 49.3 45.6
    FSC CELSIUS R640 Intel Xeon 5160 (Woodcrest), WinXP Pro 3.0 2 4 4 42.5 41.4
    Sun Fire X4200 Opteron 285 2.6 2 4 4 38.1 36.0

    Results as of 09 Jan 2007 from www.spec.org.

    Benchmark Description

    SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

    Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

    The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

    Disclosure Statement:

      SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 1/9/07. Sun Blade X8420 (AMD Opteron 8220, 4chips/8cores, Solaris 10) 93.1 SPECint_rate2006. Sun Blade X8420 (AMD Opteron 8220, 4chips/8cores, Solaris 10) 87.3 SPECint_rate2006.

    Results Summary

      Results
      X8420 93.1 SPECint_rate2006
      X8420 87.3 SPECfp_rate2006
      Reference Date: Jan 09, 2007
      System: Sun Blade X8420, 64GB memory
      Processors: four 2.8 GHz Opteron 8220
      Software: Solaris 10, Sun Studio 11

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record SPEC OMP2001 Sun Blade X8420

    Thursday Jan 11, 2007

    The Sun Fire X8420 M2 topped the IBM System p5 550 (2.1 GHz, 4 CPU) result by 16%. The Sun Blade X8420 delivers the best performance on the SPEC OMPM2001 benchmark suite of all systems running 8-threads results using dual-core AMD Opteron 8220, produced best SPECompM2001 result of 23224.

    The results show that the combination of Solaris 10 using Sun Studio 11 is unmatched by the competition for assisting users in writing parallel code.

    More info on sun.com X8420 Benchmark Page.

    SPECompM2001 (bigger is better, ordered by peak metric)

    Result Cores Chips Thrds System
    Peak Base
    23224 22531 8 4 8 Sun Blade X8420 M2, Opteron 8220, 2.8GHz,64GB
    21167 20409 8 4 8 Sun Fire X4600 M2, Opteron 8220SE, 2.8GHz,16GB
    20319 19708 8 8 8 Sun Fire X4600, Opteron 856, 3.0GHz
    19983 15355 4 2 8 IBM System p5 550 (2.1 GHz, 4 CPU)
    19653 18949 8 4 8 Sun Blade X8400, Opteron 885, 2.6GHz
    17948 16621 8 4 8 HP DL585, Opteron 880, 2.4GHz
    16096 14335 8 4 8 IBM 570, POWER5, 1.9 GHz

    Benchmark Description

    The SPEC OMPM2001 Benchmark Suite was released in June 2001 and tests HPC performance using OpenMP for parallelism.

    • 11 programs (3 in C and 8 in Fortran) parallelized using OpenMP API
    Goals of suite:
    • Targeted to mid-range (4-32 processor) parallel systems
    • Run rules, tools and reporting similar to SPEC CPU2000
    • Programs representative of HPC and Scientific Applications

    Disclosure Statement:

      SPEC, SPEComp reg tm of Standard Performance Evaluation Corporation. Results from www.SPEC.org as of Jan 08, 2007, Sun result submitted to SPEC. Sun Blade X8420 (8 cores, 4 chips, 8 threads), 23,224 SPECompM2001. IBM System p5 550 (4 cores, 2 chips, 8 threads), 19,983 SPECompM2001. Sockets refers to chips.

    Results Summary

      Result
      X8420 8-threads: 23224 SPECompM2001
      Reference Date: Jan 09, 2007
      System: Sun Blade X8420
      Total Memory : 64 GB DDR667 (16x4GB DIMMs)
      Processors/GHz: Four Opteron 8220, 2.8 GHz
      Operating System: Solaris 10
      Compiler: Sun Studio 11

    Like this post? del.icio.us | furl | slashdot | technorati | digg