BM Seer Unofficial thoughts from an anonymous Sun employee

ABAQUS V6.7 Benchmarks Sun Blade X6250 Cluster World Record

Friday Jan 25, 2008

The ABAQUS "Explicit" benchmark test suite was run on a mini cluster of Sun Blade X6250 blades with the recently announced 3.33 GHz dual-core Intel 5260. The Sun Blade X6250 mini cluster beats all posted results at the ABAQUS V6.7 website up to the eight cores.

  • The closest posted results from a competitor's platform were primarily from an HP XC with dual-core 3GHz 5160 processors and to a limited degree (at the 4 "cpu" level) by an Intel Supermicro with 3GHz quad-core E5472's.
  • In runs of the six cases in the benchmark test suite, the X6250 cluster was nominally 17% faster than the best results coming either from the top HP or Intel cluster over the 4-core levels considered and considering results for all 6 test cases.
  • The scalability efficiency of the X6250 cluster ranged from 100% (at 1 core) to 81% (geometric mean at 8 cores) and considering all 6 test cases at each of the four core levels.
  • Four 2 socket Sun X6250 blades with Infiniband interconnects were used and runs were made at different core levels: 1, 2, 4, and 8. Comparisons are presented against the current leading competitors' results also obtained with high performance interconnects and posted at the ABAQUS V6.7 website. This includes results from IBM, HP, and Intel platforms and clusters with current dual-core and quad-core Intel processors.

    ABAQUS V6.7 "Explicit" Benchmark Test Suite, time in elapsed seconds

    Please note, this table has been modified since the original posting to correct the table and make sure only V6.7 results are shown, sorry for the confusion, but the Sun internal information sites changed since my posting.
    System CPU Benchmark Test
    e1 e2 e3 e4 e5 e6
    One core results
    Sun Blade X6250 3.33GHz DC 5260 23565 12399 11037 4884 4648 11975
    Sun Blade X6250 3.0GHz QC 5365 26401 14236 12302 5456 5349 13266
    Intel Supermicro 3.0GHz QC E5472 24815 13738 12504 5273 5299 13456
    HP XC 3.0GHz DC 5160 23957 13659 11289 5157 5122 12601
    Bull R440 3.0GHz DC 5160 25132 14086 12237 5352 5231 13213
    Two core results
    Sun Blade X6250 3.33GHz DC 5260 12008 6465 5218 2647 2447 6739
    Sun Blade X6250 3.0GHz QC 5365 14262 7501 6379 2959 2742 7486
    Intel Supermicro 3.0GHz QC E5472 14060 7151 6341 2900 2693 7880
    HP XC 3.0GHz DC 5160 13229 6998 6201 2838 2657 7336
    Bull R440 3.0GHz DC 5160 13859 7283 6575 2997 2756 7752
    Four core results
    Sun Blade X6250 3.33GHz DC 5260 7868 3888 3064 1482 1328 4025
    Sun Blade X6250 3.0GHz QC 5365 8595 4195 3372 1577 1440 4375
    Intel Supermicro 3.0GHz QC E5472 8264 3857 3438 1616 1440 4534
    HP XC 3.0GHz DC 5160 9843 4434 4413 1856 1619 5235
    Bull R440 3.0GHz DC 5160 10067 4559 4485 1964 1651 5378
    Eight core results
    Sun Blade X6250 3.33GHz DC 5260 5209 2439 1922 979 736 2510
    Sun Blade X6250 3.0GHz QC 5365 5650 2556 2158 1090 824 2774
    Intel Supermicro 3.0GHz QC E5472 6077 2473 2529 1205 910 3339
    HP XC 3.0GHz DC 5160 5140 2311 2280 1074 823 2948
    Bull R440 3.0GHz DC 5160 5366 2406 2303 1127 860 3092

    About The ABAQUS Explicit Module

    This module designed for crash and high velocity impact analyses is very scalable and analysis models tend to be very large similar to CFD models. Timely results are best obtained using multiple processing units for typically large jobs either on a single multi core server in smp mode or on a multi node cluster of multi core platforms interconnected in dmp mode.

    • The test cases in the ABAQUS "Explicit" benchmark test suite do not require much memory (all around a few hundred megabytes)
    • The ABAQUS test cases scale very well up to 16 cores. All of the solvers in the Explicit module work in dmp mode on clusters. The ABAQUS default mode for MPI is HP-MPI.
    • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the ABAQUS job. This is done in the "abaqus_v6.env" file that either resides in the subdirectory from where the job was launched or in the abaqus "site" subdirectory under the home installation directory.
    • The test cases for the ABAQUS benchmark test suites all have a substantial I/O component. This I/O activity is primarily associated with temporary scratch files. Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects.

    System Configuration

  • 4 Sun Blade X6250
  • 3.33 GHz dual-core Intel 5260
  • 2 internal striped 15K SAS drives (cluster shared file system)
  • Infiniband (Voltaire) interconnects
  • 64-bit SUSE Linux Enterprise Server SLES 10
  • Voltaire OFED GridStack-4.1.5_7-sles-k2.6.16.21-0.8-smp-x86_64
  • HP-MPI
  • ABAQUS V6.7 Explicit Module
  • ABAQUS 6.7 Explicit Benchmark Test Suite
  • Disclosure Statement:

    The following are trademarks or registered trademarks of Dassault Systems or its subsidiaries in the United States and/or other countries: Abaqus, Abaqus/Standard, Abaqus/Explicit. All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systemes. Results from http://www.simulia.com/support/v67/v67_performance.html as of Jan. 18, 2008.

    [2] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    CFD World Record Fluent Sun Blade X6250 Cluster (Xeon 3 GHz 5160)

    Thursday Jul 12, 2007

    The Sun Blade X6250 cluster was up to 27% faster or 6% faster on geometric mean than an SGI Altix XE 210 cluster (Xeon 3 GHz dual core 5160 Woodcrest) and Infiniband interconnects.

    A cluster of four Sun Blade X6250 Cluster (Xeon 3 GHz 5160) with Infiniband interconnects was used to set this record. Each of these two socket blades had dual-core Intel Xeon EM64T 5160 3 GHz (Woodcrest) 16 total cores.

    The Sun Blade X6250 Cluster (Xeon 3 GHz 5160) cluster running computational fluid dynamics program (CFD) the "Fluent 6" standard benchmark established a world record for runs made of the test suite using from 1 to 16 cores.

    Workload description

    Fluent is one of the most prominent commercial CFD (Computational Fluid Dynamics) codes. It is distributed worldwide to major engineering organizations in a broad spectrum of disciplines (aircraft, aerospace, automotive, marine, etc.) that are involved with fluid flow in some manner.

    Fluent like many major ISV's has developed a benchmark test suite to evaluate the performance of platforms. For several years results have been posted from hardware vendor platforms at the Fluent website.

    CFD models tend to be extremely large (fluid flow over entire car, aircraft and submarine bodies and complex flow involving mixing of species and chemical reaction). In order to have reasonable run times for the analyses use of many processing units is necessary. Currently the most effective way of achieving this is via an interconnected cluster of multi core rack mounted servers or blades. The current set of entries posted at the Fluent website reflect this fact.

    FLUENT 6 Benchmark ("Ratings", bigger is better)

    Rating = #f sequential runs in 1 day 86,400/(Total Elapsed Run Time in Seconds)

    Machine Sockets NCPUS FL5M1 FL5M2 FL5M3 FL5L1 FL5L2 FL5L3
    Sun Blade X6250 3GHz WC 5160 2 8 4965.5 10504.6 2563.8 1399.2 1028.3 174.9
    SGI Altix XE210 3GHz WC 5160 2 8 4937.1 9626.7 2014.0 1343.7 899.5 161.0
     
    Sun Blade X6250 3GHz WC 5160 2 4 2780.4 5358.1 1336.9 731.7 573.7 101.2
    SGI Altix XE210 3GHz WC 5160 2 4 2681.1 4657.7 998.0 679.2 449.7 80.7
     
    Sun Blade X6250 3GHz WC 5160 2 serial 919.4 1465.6 352.9 207.2 142.6 27.6
    SGI Altix XE210 3GHz 5160 2 serial 910.9 1445.4 349.5 204.1 136.6 26.8
    Other interesting points:

    • The "Fluent 6" standard benchmark test suite consists of "small" "medium" and "large " test cases. However both the small and medium sized test cases are all really on the small side and do not scale well beyond 16 cores.
    • The largest test case in the suite, "fl5l3" requires 9 GB running with only one core on a single node. This memory requirment per node is reduced when running in a dmp cluster mode on multi nodes with multi cores.
    • Fluent runs are cpu and sometimes memory intensive but do not require high performance I/O file systems.
    • Very recently Fluent has devloped a new benchmark test suite with extremely large models specifically intended to be run either on large multi core servers or large multi node clusters of multi core platforms.

    Workload Details

    Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, etc. The test cases represent a range of typical industry simulations.

    
    Descriptions
    Class   Benchmark       Cells   Mesh    Models  Solver  Description
    small   
            FL5S1          32,000 hexahedral ke  segregated implicit  turbulent flow in a bend
            FL5S2          32,000 hexahedral ke  coupled implicit     turbulent flow in a bend
            FL5S3          89,856 hexahedral ke  coupled implicit     flow in a compressor, rotor 37
    medium  
            FL5M1         155,188 tetrahedral ke  6spe reac DPM P1 segregated implicit coal combustion in a boiler, with particle tracking
            FL5M2         242,782 hybrid, hanging-node ke segregated implicit turbulent flow in an engine valveport
            FL5M3         352,800 hexahedral ke 6spe react segregated implicit combustion in a high velocity burner
    large   
            FL5L1         847,746 hexahedral ke coupled explicit transonic flow around a fighter
            FL5L2       3,618,080  hybrid  RNG ke segregated implicit external aerodynamics around a car body
            FL5L3       9,792,512  hexahedral RSM segregated implicit turbulent flow in a transition duct
    
    
    Small Class Ratings
    
    Small class problems contain less than 100,000 cells.
    
    FL5S1 - Accelerating turbulent flow in an elbow duct using segregated implicit solver
    Accelerating Turbulent Flow in an Elbow Duct using Segregated Implicit Solver
    
    Flow is accelerated through a 90 degree elbow duct with a rectangular
    cross section. The geometry and flow have a symmetry plane permitting
    the modeling of only half the domain. Because of the curvature of the
    duct, significant secondary flow occurs, with velocity components
    normal to the principal flow direction. The segregated implicit solver
    in FLUENT 5 is used to solve this flow.
    
    Number of cells 32,000
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5S2 - Accelerating turbulent flow in an elbow duct using coupled implicit solver
    Accelerating Turbulent Flow in an Elbow Duct using Coupled Implicit Solver
    
    Flow is accelerated through a 90 degree elbow duct with a rectangular
    cross section. The geometry and flow have a symmetry plane permitting
    the modeling of only half the domain. Because of the curvature of the
    duct, significant secondary flow occurs, with velocity components
    normal to the principal flow direction. The coupled implicit solver in
    FLUENT 5 is used to solve this flow.
    
    Number of cells 32,000
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver coupled implicit
    
    FL5S3 - Transonic flow in rotating fan
    Transonic Flow through a Rotor
    
    The flow through a transonic fan rotor (designated rotor 37 by NASA
    Lewis) was computed. It has 36 blades. The calculation was performed at
    a rotational speed of 17189 rpm. The domain boundaries consist of a
    hub, blade and shroud surface, a pressure inlet and outlet surface, and
    periodic surfaces.
    
    Number of cells 89,856
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver coupled implicit
    
    
    Medium class problems contain between 100,000 and 500,000 cells.
    
    FL5M1 - Coal combustion in a boiler
    Coal Combustion in a Boiler
    
    This application couples a continuous gas phase calculation with a
    discrete phase (particle) calculation. 500 coal particles are injected
    into an industrial boiler where their trajectories are computed using a
    Lagrangian formulation that includes dispersed phase inertia,
    hydrodynamic drag and the force of gravity. Each particle injection is
    subject to heating/cooling, vaporization, boiling and solid combustion.
    During the injection calculations, momentum, heat and mass exchanges
    are calculated and stored as source terms which are then used in the
    subsequent gas phase calculation. Furthermore, stochastic modeling of
    particle tracks, requiring a fixed number of "tries" per particle, are
    used to account for local turbulent fluctuations. In this calculation,
    10 stochastic tries per particle are used, resulting in a total of 5000
    particle tracks per discrete phase update. There are 10 continuous
    phase iterations per discrete phase update.
    
    Number of cells 155,188
    Cell type tetrahedral
    Models k-epsilon turbulenc 6 species with reaction dispersed phase
    P1 radiation
    Solver segregated implicit
    
    FL5M2 - Turbulent flow in an engine valveport
    Turbulent Flow in an Engine Valveport
    
    Flow is computed in an automotive valve port modeled using a zonal
    hybrid mesh. The region around the valve has been meshed with
    tetrahedral cells, while the duct providing the inlet flow to the valve
    has been meshed with hexahedra. Pyramid cells are used to transition
    between the hexahedral and tetrahedral cells. A fourth cell type called
    a prismatic (or wedge) cell is used for the cylinder downstream of the
    valve. Furthermore, hanging-node adaption was used to improve the
    accuracy of the predicted flow field.
    
    Number of cells 242,782
    Cell type hybrid hanging-node adaption
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5M3 - Combustion in a high velocity burner
    Combustion in a High Velocity Burner
    
    Fuel (CH4) is injected into ports of a high velocity gas burner located
    near the centerline. Air is supplied through the outer ports, with
    secondary air delivered into an outer annular region. Directly
    downstream of the annulus is a wedge-shaped annular baffle. The mixing
    of fuel and air occurs downstream of this baffle and recirculation
    zones behind the baffle provide stability and an attachment point for
    the flame in the main combustion chamber. Combustion is assumed to
    proceed via a two-step reaction mechanism, with turbulent mixing as the
    limiting rate, as described by the Magnessen model.
    
    Reference: M. Cavelli, A. Milani, "Spark-ignited wide stability gas
    burner for on/off and continuous duty," IFRF HT Meeting, Milan, October
    1996.
    
    Number of cells 352,800
    Cell type hexahedral
    Models k-epsilon turbulenc 6 species with reaction
    Solver segregated implicit
    
    Large Class
    
    Large class problems contain more than 500,000 cells.
    
    FL5L1 Transonic flow around a fighter aircraft
    Transonic Flow Around a Fighter Aircraft
    
    Flow around the AGARD M-151 combat aircraft research model is computed.
    The simulation geometry contains canards and forward swept wings, but
    no tail. The conditions modeled were Mach number 0.9 and 10.46 degrees
    angle of attack.
    
    Number of cells 847,764
    Cell type hexahedral
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5L2 Exterior flow around a passenger sedan
    Exterior Flow Around a Passenger Sedan
    
    This benchmark represents the computation of the exterior flow field
    around a simplified model of a passenger sedan. The simulation geometry
    was used for the Japan External Aerodynamics competition. A
    viscous-hybrid grid with prismatic cells is used to adequately model
    the boundary layer regions.
    
    Number of cells 3,618,080
    Cell type
    
    FL5L2 Exterior flow around a passenger sedan
    Exterior Flow Around a Passenger Sedan
    
    This benchmark represents the computation of the exterior flow field
    around a simplified model of a passenger sedan. The simulation geometry
    was used for the Japan External Aerodynamics competition. A
    viscous-hybrid grid with prismatic cells is used to adequately model
    the boundary layer regions.
    
    Number of cells 3,618,080
    Cell type hybrid
    Models k-epsilon turbulence
    Solver segregated implicit
    
    FL5L3 Turbulent flow through a transition duct
    Turbulent Flow Through a Transition Duct
    
    Turbulent flow of air through a duct is computed for this benchmark.
    The cross-sectional planes of the duct transition from a circle at the
    inlet to a rectangle at the outflow boundary. The Reynolds-Stress Model
    (7 equation) is used for computing turbulence.
    
    Number of cells 9,792,512
    Cell type hexahedral
    Models RSM turbulence
    Solver segregated implicit
    

    The cluster of Sun Blade X6250 outperfomed the following competitive hardware vendor clusters at all core levels considered (1 core smp, 1- core parallel, 2- 4- 8- and 16-core parallel runs) and for all (9) test cases in the benchmark test suite:

      HP BL460C (EM64T_WOODCREST_2CORE,3000,WINCCS,IB_HPMPI)
      HP DL140 (EM64T_WOODCREST_2CORE,3000,LINUX,IB)
      HP DL145_G2 (OPTERON_2CORE,2200,WINCCS,IB_HPMPI)
      SGI ALTIX4700 (IA64_MONTECITO_2CORE,1600,LINUX)
      SGI ALTIXXE210 (EM64T_WOODCREST_2CORE,3000,LINUX,IB_VOLTAIRE)
      TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,SLES10,GIGE)
      TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,WINCCS,GIGE)
      BULL NOVASCALE (EM64T_WOODCREST_2CORE,3000,RHEL4,IB)
      APPRO XTREMESERVER (OPTERON_2CORE,2800,RHEL4,IB)

    Disclosure Statement:

    All information on the Fluent website is Copyrighted 1995-2007 by Fluent Inc.Results from http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm as of July 2, 2007.

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest)
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

    Software Configuration:

      64-bit SUSE SLES 10
      Fluent V6.3.26
      Fluent 6 Standard Benchmark Test Suite
      Voltaire GridStack 4.1.5-7 for SLES 10

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record ABAQUS V6.6 on the Sun Blade X6250 Cluster

    Wednesday Jul 11, 2007

    Sun Blade X6250 posted World Record on the ABAQUS Explicit benchmark test suite the Sun Blade X6250 on the MCAE application ABAQUS V6.6. the Sun Blade X6250 used Xeon 3GHz DC 5160. On the various test cases Sun beats the Intel Supermicro by or by 1% to 39% !! The Sun Blade X6250 beats the Intel Supermicro even when you average all of the test case by an average 4% to 9% (geometric mean of all 6 tests cases at all cpu levels listed).

    Both machines have 2 sockets and dual core processors. Runs were made at 1- 2- and 4-cores and a geometric mean was established at each of these "cpu" levels based on the 6 test cases in the benchmark test suite.

    The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) processors and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ABAQUS website and for all 6 test cases in the ABAQUS "Explicit" benchmark test suite and at the 3 "cpu" levels (1-, 2- & 4-"cpu's"):

    About The ABAQUS Explicit Module

    This module designed for crash and high velocity impact analyses (including wave propagation and inertia effects) is very scalable and analysis models tend to be very large similar to CFD models. Timely results are best obtained using multiple processing units for typically large jobs either on a single multi core server in smp mode or on a multi node cluster of multi core platforms interconnected in dmp mode.

    Consequently this module is meant to run primarily in a multi cpu situation either in smp mode on a single large multi core machine or in dmp mode over a cluster of machines.

    ABAQUS V6.6-1 Benchmark Test Suites Explicit Benchmark Test Suite Landscape (time in seconds where smaller is better, Sun % better where bigger is better)

    Platform Cores e1 e2 e3 e4 e5 e6 Geometric Mean
     
    Sun Blade X6250/5160 4 10451 4509 3853 1887 1990 5202  
    Intel Super/5160's/RH4 4 10696 4646 3881 1997 2126 5460  
    Sun % Faster   2% 3% 1% 6% 7% 5% 4%
     
    Sun Blade X6250/5160 2 14232 7401 5477 2935 3327 7582  
    Intel Super/5160's/RH4 2 14878 8044 6316 3310 3483 8048  
    Sun % Faster   5% 9% 15% 13% 5% 6% 9%
     
    Sun Blade X6250/5160 1 24800 14198 10174 5147 6112 9553  
    Intel Super/5160 1 25076 14616 10563 5225 6272 13242  
    Sun % Faster   1% 3% 4% 1% 3% 39% 8%

    Abaqus/Explicit Benchmark Problems

    The problems described below provide an estimate of the performance that can be expected when running Abaqus/Explicit on different computers. The jobs are representative of typical Abaqus/Explicit applications including high-speed dynamic impact events and quasi-static events with complicated contact conditions. The number of increments listed in the tables below are approximate and can vary somewhat depending on the hardware platform and the number of parallel domains.

      E1: Car crash
      This benchmark consists of passenger car impacting a rigid wall. The car is meshed primarily with shell elements of type S3RS and S4RS with isotropic hardening Mises plasticity material behavior. The various compenents of the car are connected using multi-point constraints and connector elements. Many of the suspension and drivetrain components are modeled as rigid bodies. The car, road surface, and wall are placed into a single general contact domain and the car is given an initial velocity of 25 mph.

      E1
      Increments: 62,934
      Number of elements: 274,632

      E2: Cell phone drop
      This benchmark consists of a simplified model of a cell phone impacting a fixed rigid floor. The cell phone components are meshed using a variety of element types including C3D8R, C3D10M, and S4R. The material behavior is modeled using linear elasticity, isotropic hardening Mises plasticity, and hyperelasticity. The components are assembled using surface-based mesh ties and placed into a general contact domain that also includes the floor. The initial velocity and orientation of the cell phone is defined such that a severe oblique impact occurs.

      E2
      Increments: 87,369
      Number of elements: 45,785
      Memory requirement: 300 MB

      E3: Sheet forming
      This benchmark consists of forming a sheet metal part by the deep drawing process. The deformable sheet metal blank is meshed with shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. The tools are meshed using surface elements of type SFM3D4R which are declared rigid. General contact is defined between the blank and tools. The analysis sequence consists of two steps. During the first step the blank is clamped between the binder and die and then during the second step the punch is displaced to form the part. Since the process is essentially quasi-static the computations are performed over a sufficiently long time period to render inertial effects negligible. The performance of this analysis is a direct measure of the performance of the three-dimensional general contact algorithm.

      E3
      Increments: 31,177
      Number of elements: 34,540 (deformable only)
      Memory requirement: 550 MB

      E4: Projectile penetration
      This benchmark consists of a projectile penetrating a steel plate at an oblique angle. Both the projectile and plate are meshed using hexahedral elements of type C3D8R and use a rate-dependent isotropic hardening Mises plasticity material model with failure. The projectile and plate are placed into a general contact domain with surface erosion. The edges of the plate are held fixed and the initial velocity of the projectile is specified so that the projectile passes completely through the plate.

      E4
      Increments: 12,433
      Number of elements: 237,100
      Memory requirement: 1400 MB

      E5: Blast loaded plate
      This benchmark consists of a stiffened steel plate subjected to a high intensity blast load. The plate is meshed using shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. There is no contact.

      E5
      Increments: 81,716
      Number of elements: 50,000
      Memory requirement: 150 MB

      E6: Concentric spheres
      This benchmark consists of a large number of concentric spheres with clearance between each sphere. The spheres are meshed using hexahedral elements of type C3D8R and use an isotropic hardening Mises plasticity material model. All of the spheres are placed into a single general contact domain and the outer sphere is violently shaken which results in complex contact interactions between the contained spheres.

      E6
      Increments: 23,291
      Number of elements: 244,124
      Memory requirement: 1000 MB

      ABAQUS "Standard" & "Explicit" Benchmark Test Suites
      Voltaire GridStack 4.1.5-7 for SLES 10

    Disclosure Statement:

    The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or other countries: Abaqus, Abaqus/Standard, Abaqus/Explicit. All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systems. Results from http://www.simulia.com/support/v66/v66_performance.html as of 7/2/07.

    System Configuration

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
    Software Configuration:

      Linux: 64-bit SUSE SLES 10
    ABAQUS V6.6-3

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record ANSYS on Sun Blade X6250 (Xeon 3GHz DC 5160)

    Tuesday Jul 10, 2007

    The Sun Blade X6250 outperfoms all posted ANSYS V11.0 (MCAE) results at www.ansys.com website. A single Sun Blade X6250 beats a single Intel S5000 XAL (same 3GHZ Xeon 5160) by as much as 40% at each of the three "cpu" levels tested (1-, 2-, and all 4 cores available on both 2 socket platforms equipped with dual core processors). Sun Wins at these processor configurations in 6 of the total 7 cases in the benchmark test suite. Overall, on the geometric mean, Sun was 10% higher.

    The only case "bm-2" where the Sun X6250 looses has an exceptionally high I/O component, and even so Sun was only 3-4% slower. The Sun X6250 had 10K rpm internal disk drives where the Intel S5000 XAL had 15K rpm drives.

    The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ANSYS website for all 7 test cases in the ANSYS "Standard" benchmark test suite (1-, 2- & 4-cpu).

    Yes this result was run with Linux, Sun wants to show that we can win with every OS. There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0 for X86-64 platform architectures compiled with recent Sun Studio 11 compilers. This is the first SX64 version that has become available.

    Competitive Landscape

    ANSYS V 11.0 "Standard" Benchmark Test Suite on X2200 M2 & Constellation Blades (run times in seconds, smaller is better; for % bigger is better)

    System Cores bm-1 bm-2 bm-3 bm-4 bm-5 bm-6 bm-7
     
    Sun X6250/5160 4 100 1362 343 164 181 131 752
    Intel S5000XAL/5160 4 109 1312 369 169 187 161 1048
    Sun % better   9% -4% 8% 3% 3% 23% 39%
     
    Sun X6250/5160 2 118 1398 385 183 223 169 1064
    Intel S5000XAL/5160 2 128 1356 417 186 244 211 1437
    Sun % better   9% -3% 8% 2% 9% 25% 35%
     
    Sun X6250/5160 1 150 1455 456 211 339 253 1770
    Intel S5000XAL/5160 1 164 1416 489 215 340 314 2330
    Sun % better   9% -3% 7% 2% 1% 24% 32%

      (please note: per core performance isn't the right metric for comparing different CPUs, as system costs vary greatly, but they are used here to identify configuration) It is "SYSTEM" performance not 'core' performance that matters!)

    Key Technical Points

    • The test cases from the ANSYS standard benchmark test suite all have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.

    ANSYS 11.0 Standard Test Cases

      bm-1
      Name:Exhaust Elbow Manifold
      Description:Static structural analysis. Solved for equivalent stresses.
      Statistics:~850,000 DOF Model

      bm-2
      Name:Floor Panel
      Description:Surface body geometry. Harmonic analysis with mode superposition.
      Statistics:~765,000 DOF Model

      bm-3
      Name:Engine Assembly - Piston and Crank
      Description:Assembly with contact. Nonlinear structural DOF solution.
      Statistics:~250,000 DOF Model

      bm-4
      Name:Electric Motor
      Description:Electromagnetic analysis. Solved for magnetic field intensities.
      Statistics:~250,000 DOF Model

      bm-5
      Name:Brake Rotor
      Description:Thermal transient analysis. Solved for temperature DOF?s.
      Statistics:~230,000 DOF Model

      bm-6
      Name:Wing Section
      Description:Static structural analysis.
      Statistics:~250,000 DOF Model
      Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

      bm-7
      Name:Wing Section
      Description:Static structural analysis.
      Statistics:~800,000 DOF Model
      Notes:bm-6 and bm-7 are designed to demonstrate ability of systems to handle larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will be substantially impared in performance on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1 Bm-7 generates a large 12.8 Gb file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On PC workstations the solver will run using less than optimal out-of-core memory requiring excessive I/O during factorization. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

    Disclosure Statement:

    The following are trademarks or registered trademarks of ANSYS, Inc. : ANSYS Multiphysics TM All information on the ANSYS website is Copyrighted 2007 by ANSYS, Inc. Results at http://www.ansys.com/services/hardware-support-db.htm, July 2, 2007.

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
      32 GB memory
    Software Configuration:
      64-bit Linux SuSE SLES 10
      (note: Sun works great with Linux, that is why we show all kinds of benchmarks! )
      ANSYS V11.0
      ANSYS 11 "Standard" Benchmark Test Suite

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Sun Blade X6250 & Sun Studio 12 x86 World Record

    Wednesday Jun 13, 2007

    Sun Blade X6250 Delivers a pair of x86 SPEC CPU2006 integer performance World Records:

    Sun Blade X6250 (Dual-Core Intel Xeon 5160) and running Solaris 10 and using Sun Studio 12 compiler delivered the best x86 result for the SPECint2006 benchmark.

    Sun Blade X6250 (Dual-Core Intel Xeon 5160) using Solaris 10 and Studio 12, delivered x86 4-core world record on SPECint_rate2006.

    Sun Blade X6250 server had a SPECint2006 result of 21.0 and SPECint_rate2006 result of 65.0. The advanced features of freely available Sun Studio 12 complier were critical for getting this level of performance on the Sun Blade 6250.

    The Sun Blade X6250 is only 3% slower than the peak score of the very-expensive new IBM POWER6 p570, which was recently announced. SPECint2006 is a single job stream. So let's now turn to comparing 4 thread results, in this case the Sun Blade X6250 is 7% faster than the peak SPECint_rate2006 score of he very-expensive new IBM POWER6 p570 (both IBM and Sun at 4 threads). Oh, and remember that anymore clock rate is not how you compare systems the Sun Blade X6250 is at 3GHz and the IBM POWER6 is at 4.7GHz. CPU frequency is basically irrelevant, it is CPU and system architecture that matters!

    SPEC CPU2006 Landscape - bigger is better, selected recent results

    SPECint2006

    System Processors Performance Results
    Type GHz Chips Cores Peak Base
    IBM p570 (power6) Power6 4.7 1 1 21.6 17.8
    Sun Blade X6250 Intel Xeon 5160 3.0 2 4 21.0
    Supermicro X7DB8+ board Intel Xeon 5160 3.0 2 4 20.8 18.9
    Sun Ultra 40 M2 AMD Opteron 2222SE 3.0 2 4 16.1

    SPECint_rate2006

    System Processors Performance Results
    Type GHz Chips Cores Threads
    / Copies
    Peak Base
    Sun Blade X6250 Intel Xeon 5160 3.0 2 4 4 65.0
    Supermicro X7DB8+ Intel Xeon 5160 3.0 2 4 4 64.9 60.0
    IBM p570 (Power6) Power6 4.7 1 2 4 60.9 53.2
    Sun Ultra 40 M2 AMD Opteron 2222SE 3.0 2 4 4 60.4
    Fujitsu BX620 S3 Xeon 5160 (Woodcrest) 3.0 2 4 4 59.4 56.7

    Results as of 06 Jun 2007 from www.spec.org.

    Benchmark Description

    SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

    Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

    The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

    Disclosure Statement:

    SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org or from IBM public websites as of 6/06/07. Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 65.0 SPECint_rate2006; Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 21.0 SPECint2006; IBM System p 570 (POWER6, 1chip/1core, AIX 5L v5.3) 21.6 SPECint2006; IBM System p 570 (POWER6, 4 theads, 1chip/2cores, AIX 5L v5.3) 60.9 SPECint_rate2006.

    System Configuration

    Results
    Reference Date: Jun 06, 2007
    System: Sun Blade X6250
    SPEED: 16GB memory 8x2GB
    RATE : 32GB memory 8x4GB
    X6250 21.0 SPECint2006
    X6250 65.0 SPECint_rate2006
    Total Number Processors: 2 x Intel Xeon 5160
    Software: Solaris 10 11/06, Sun Studio 12 Compiler, MicroQuill's SmartHeap Library v7.4

    See Also

  • All Benchmark results on Sun Blade 6000 Blade Server
  • [4] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg