BM Seer Unofficial thoughts from an anonymous Sun employee

ANSYS using Sun Storage 7410 with the Sun Blade X6250

Monday Nov 10, 2008

The HPC MCAE application ANSYS V11.0 Standard Benchmarks on the Sun Blade X6250 with Sun Storage 7410 show good performance Relative to the Sun Fire X4540 Storage Server and to Direct Attached Storage (DAS).

Note: I know that many customers that use ANSYS use NAS storage. Because of the ease of use of NAS, I don't think many are doing the tradeoff between the two. Regardless this is an interesting test...

Finite element applications and many other HPC applications have a substantial I/O component. The Sun Storage 7410 Unified Storage System is well-suited for many customers that want to use NAS storage for HPC MCAE applications such as ANSYS. In this test the performance of the Sun Blade X6250 was measured with both the Sun Storage 7410 Unified Storage System and the Sun Fire X4540 Storage Server.

  • The Sun Storage 7410 system connected via 1GbE/10GbE switch over NFS was very comparable in performance to a Sun Fire X4540 Storage Server also connected via a 1GbE/10GbE switch over NFS.
  • In certain intentionally very I/O intensive test cases e.g. ANSYS "bm-2" the Sun Storage 7410 system with was 7% faster than the Sun Fire X4540 Storage Server system
  • When running 10 sequences of the ANSYS "BM" test suite concurrently on 10 Sun Blade X6250's in a cluster all sharing the same Sun Storage 7410, there was nominally less than a 33% increase in the runtime of any one sequence due to impact from sharing the same I/O system by all nodes.
  • When running the ANSYS "BM" test suite on a single Sun Blade X6250 node and using the Sun Storage 7410 connected via a 10GbE switch as the shared file system the total runtime for the entire benchmark sequence was 8% faster than a Direct Access Storage (DAS) system (4 striped 15k rpm SAS drives) running internally on the same X6250 compute node. Overall, the Sun Storage 7410 outperformed the DAS configuration on 23 (72%) of 32 test cases.
  • For the most I/O intensive test case "bm-7" (see description below) the Sun Storage 7410 connected via a 10GbE switch as the shared file system was from 14% to 32% faster than the Direct Access Storage (DAS) system (4 striped 15k rpm SAS drives) running internally on the same X6250 compute node.
  • Tests were performed running the ANSYS 11 "BM" Test Suite (for SMP single node mode).
  • These results demonstrate the advantages of the fully integrated hardware and software of the Sun Unified Storage Systems.

Results Summary (All configurations run under SuSE Linux SLES 10 SP 1)

ANSYS V 11.0 "BM" Test Suite - SMP Single Node Mode
 
1x X6250
DAS
1x X6250
X4540
1x X6250
SS7410
10x X6250
SS7410
Test Cores WALL (sec) WALL (sec) WALL (sec) WALL (sec)
Min Max
bm-1 1 173 165 172 197
bm-1 2 139 137 139 166
bm-1 4 115 121 123 149
bm-1 8 138 111 119 142
bm-2 1 1204 1334 1246 1334 1467
bm-2 2 1149 1208 1313 1417
bm-2 4 1107 1240 1159 1271 1371
bm-2 8 1152 1137 1277 1329
bm-3 1 557 688 727 815
bm-3 2 486 636 670 772
bm-3 4 437 591 670 698
bm-3 8 538 559 606 676
bm-4 1 220 202 207 244
bm-4 2 191 179 178 219
bm-4 4 174 160 160 190
bm-4 8 235 148 147 175
bm-5 1 580 344 355 395
bm-5 2 357 240 257 276
bm-5 4 242 194 207 241
bm-5 8 335 200 221 266
bm-6 1 301 255 263 289
bm-6 2 202 175 186 212
bm-6 4 153 133 138 154
bm-6 8 169 115 120 148
bm-7 1 2466 2079 2178 2284
bm-7 2 1621 1410 1425 1564
bm-7 4 1213 1060 1092 1267
bm-7 8 1227 925 955 1133
bm-8 1 1467 1279 1311 1378
bm-8 2 1008 929 945 1068
bm-8 4 783 796 821 933
bm-8 8 1016 858 881 928
 
Total 21155 19439

Key Technical Points

About The ANSYS "Standard" & "Distributed" Benchmarks

ANSYS is a general purpose engineering analysis MCAE application that is based on the Finite Element Method. It performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior or deformations are concerned.

In the most recent release of the ANSYS benchmarks there are now two test suites: The SMP "BM" suite designed to run on a single node with multi processors and the DMP "BMD" suite intended to run on multi node clusters.

  • The test cases from the ANSYS standard benchmark test suite all have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.
  • The ANSYS standard test cases don't scale very well; at best on up 8 cores.
  • The memory requirements for the test cases in the ANSYS standard benchmark test suite are not great requiring less than 3GB.
  • There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0 for X86-64 platform architectures compiled with recent Sun Studio 11 compilers. This is the first SX64 version that has become available.

Benchmark Description

ANSYS 11.0 Standard Benchmarks

bm-1
Name: Exhaust Elbow Manifold
Description: Static structural analysis. Solved for equivalent stresses.
Statistics: ~850,000 DOF Model
Solver Used: Sparse Solver
Memory Used: 750 Mb Total, 560 Mb Solver

Notes: This model exhibits ideal performance for the sparse solver. It is a shell model where the sparse solver is faster than the pcg solver. Contrast this job with bm-7 where the sparse solver requires much more memory and CPU time to solve a similar sized model that is dominated by 3D elements.

bm-2
Name: Floor Panel
Description: Surface body geometry. Harmonic analysis with mode superposition.
Statistics: ~765,000 DOF Model
Solver Used: Sparse Solver/ Block Lanczos
Memory Used: Total memory used 1380 MB, 600 Mb for Lanczos solver, 400 Gb I/O

Notes: This shell model uses block Lanczos to compute 200 modes as part of a harmonic analysis. The lanczos run uses out-of-core mode using 600 Mb of memory and requiring 400 Gb of I/O. This is a good medium sized model to test system CPU performance and I/O performance. The mesh has been refined to model performance characteristics of larger, more detailed models.

bm-3
Name: Engine Assembly - Piston and Crank
Description: Assembly with contact. Nonlinear structural DOF solution.
Statistics: ~250,000 DOF Model
Solver Used: Sparse Solver, Nonlinear with contact
Memory Used: Total job Memory 800 Mb, Solver memory 600 Mb 9 Cumulative

Notes: This is a good medium sized nonlinear analysis with contact. It should run well on most systems. The default solver options will use pivoting so it has to run in out-of-core mode.

bm-4
Name: Electric Motor
Description: Electromagnetic analysis. Solved for magnetic field intensities.
Statistics: ~250,000 DOF Model
Solver Used: Sparse Solver, Emag application, Nonlinear ? 4 cumulative load steps
Memory Used: Total job memory 650 Mbytes
Notes: Small model, very fast solver times. Should run well on all systems.

bm-5
Name: Brake Rotor
Description: Thermal transient analysis. Solved for temperature DOF?s.
Statistics: ~230,000 DOF Model
Solver Used: JCG Solver
Memory Used: Total job memory 210 Mb

Notes: Small model of a thermal analysis. Uses JCG iterative solver (diagonal preconditioner only).

bm-6
Name: Wing Section
Description: Static structural analysis.
Statistics: ~250,000 DOF Model
Solver Used: Sparse Solver
Memory Used: Total memory 1015 Mb, Solver memory 707 Mb ( bm-6 )

Notes: bm-6 and bm-7 have larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 is a more complex model that performs better on larger systems. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems with 1 million or more degrees of freedom.

bm-7
Name: Wing Section
Description: Static structural analysis.
Statistics: ~800,000 DOF Model
Solver Used: Sparse Solver
Memory Used: Total memory 2700 Mb, Solver memory 1200 Mb ( bm-7 on 64-bit system )

Notes: bm-6 and bm-7 have larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will run slower on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1. Bm-7 generates a large 12.8 Gbyte file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On small-memory sytems the solver will run out-of-core memory and will require heavy I/O. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems with 1 million or more degrees of freedom.

Disclosure Statement:

The following are trademarks or registered trademarks of ANSYS, Inc., ANSYS Multiphysics TM. All information on the ANSYS website is Copyrighted by ANSYS, Inc. Results from http://www.ansys.com/services/hardware-support-db.htm as of Jun 2, 2008.

System Configuration

Hardware Configuration:

  • Sun Blade X6250
  • 3.00 GHz Quad-core Intel E5460 processors
  • 10 GbE interconnect
  • Sun Fire X4540 Storage Server
  • Sun Storage 7410 Unified Storage System
  • 4x striped 15K RPM SAS drives (DAS)

    Software Configuration:

  • OS : 64-bit SUSE Linux Enterprise Server SLES 10 SP 1
  • MPI: HP-MPI hpmpi-2.02.05.01-20070708r.x86_64.rpm
  • Application: ANSYS V 11.0 Module
  • Benchmark: ANSYS 11 Standard "BM" & Distributed "BMD" Benchmark Test Suites
  • See Also

    [4] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record ANSYS on Sun Blade X6250 (Xeon 3GHz DC 5160)

    Tuesday Jul 10, 2007

    The Sun Blade X6250 outperfoms all posted ANSYS V11.0 (MCAE) results at www.ansys.com website. A single Sun Blade X6250 beats a single Intel S5000 XAL (same 3GHZ Xeon 5160) by as much as 40% at each of the three "cpu" levels tested (1-, 2-, and all 4 cores available on both 2 socket platforms equipped with dual core processors). Sun Wins at these processor configurations in 6 of the total 7 cases in the benchmark test suite. Overall, on the geometric mean, Sun was 10% higher.

    The only case "bm-2" where the Sun X6250 looses has an exceptionally high I/O component, and even so Sun was only 3-4% slower. The Sun X6250 had 10K rpm internal disk drives where the Intel S5000 XAL had 15K rpm drives.

    The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ANSYS website for all 7 test cases in the ANSYS "Standard" benchmark test suite (1-, 2- & 4-cpu).

    Yes this result was run with Linux, Sun wants to show that we can win with every OS. There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0 for X86-64 platform architectures compiled with recent Sun Studio 11 compilers. This is the first SX64 version that has become available.

    Competitive Landscape

    ANSYS V 11.0 "Standard" Benchmark Test Suite on X2200 M2 & Constellation Blades (run times in seconds, smaller is better; for % bigger is better)

    System Cores bm-1 bm-2 bm-3 bm-4 bm-5 bm-6 bm-7
     
    Sun X6250/5160 4 100 1362 343 164 181 131 752
    Intel S5000XAL/5160 4 109 1312 369 169 187 161 1048
    Sun % better   9% -4% 8% 3% 3% 23% 39%
     
    Sun X6250/5160 2 118 1398 385 183 223 169 1064
    Intel S5000XAL/5160 2 128 1356 417 186 244 211 1437
    Sun % better   9% -3% 8% 2% 9% 25% 35%
     
    Sun X6250/5160 1 150 1455 456 211 339 253 1770
    Intel S5000XAL/5160 1 164 1416 489 215 340 314 2330
    Sun % better   9% -3% 7% 2% 1% 24% 32%

      (please note: per core performance isn't the right metric for comparing different CPUs, as system costs vary greatly, but they are used here to identify configuration) It is "SYSTEM" performance not 'core' performance that matters!)

    Key Technical Points

    • The test cases from the ANSYS standard benchmark test suite all have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.

    ANSYS 11.0 Standard Test Cases

      bm-1
      Name:Exhaust Elbow Manifold
      Description:Static structural analysis. Solved for equivalent stresses.
      Statistics:~850,000 DOF Model

      bm-2
      Name:Floor Panel
      Description:Surface body geometry. Harmonic analysis with mode superposition.
      Statistics:~765,000 DOF Model

      bm-3
      Name:Engine Assembly - Piston and Crank
      Description:Assembly with contact. Nonlinear structural DOF solution.
      Statistics:~250,000 DOF Model

      bm-4
      Name:Electric Motor
      Description:Electromagnetic analysis. Solved for magnetic field intensities.
      Statistics:~250,000 DOF Model

      bm-5
      Name:Brake Rotor
      Description:Thermal transient analysis. Solved for temperature DOF?s.
      Statistics:~230,000 DOF Model

      bm-6
      Name:Wing Section
      Description:Static structural analysis.
      Statistics:~250,000 DOF Model
      Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

      bm-7
      Name:Wing Section
      Description:Static structural analysis.
      Statistics:~800,000 DOF Model
      Notes:bm-6 and bm-7 are designed to demonstrate ability of systems to handle larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will be substantially impared in performance on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1 Bm-7 generates a large 12.8 Gb file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On PC workstations the solver will run using less than optimal out-of-core memory requiring excessive I/O during factorization. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

    Disclosure Statement:

    The following are trademarks or registered trademarks of ANSYS, Inc. : ANSYS Multiphysics TM All information on the ANSYS website is Copyrighted 2007 by ANSYS, Inc. Results at http://www.ansys.com/services/hardware-support-db.htm, July 2, 2007.

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250's
      2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
      32 GB memory
    Software Configuration:
      64-bit Linux SuSE SLES 10
      (note: Sun works great with Linux, that is why we show all kinds of benchmarks! )
      ANSYS V11.0
      ANSYS 11 "Standard" Benchmark Test Suite

    Like this post? del.icio.us | furl | slashdot | technorati | digg