ANSYS using Sun Storage 7410 with the Sun Blade X6250
Monday Nov 10, 2008
The HPC MCAE application ANSYS V11.0 Standard Benchmarks on the Sun Blade X6250 with Sun Storage 7410 show good performance Relative to the Sun Fire X4540 Storage Server and to Direct Attached Storage (DAS).
Note: I know that many customers that use ANSYS use NAS storage. Because of the ease of use of NAS, I don't think many are doing the tradeoff between the two. Regardless this is an interesting test...
Finite element applications and many other HPC applications have a substantial I/O component. The Sun Storage 7410 Unified Storage System is well-suited for many customers that want to use NAS storage for HPC MCAE applications such as ANSYS. In this test the performance of the Sun Blade X6250 was measured with both the Sun Storage 7410 Unified Storage System and the Sun Fire X4540 Storage Server.
- The Sun Storage 7410 system connected via 1GbE/10GbE switch over NFS was very comparable in performance to a Sun Fire X4540 Storage Server also connected via a 1GbE/10GbE switch over NFS.
- In certain intentionally very I/O intensive test cases e.g. ANSYS "bm-2" the Sun Storage 7410 system with was 7% faster than the Sun Fire X4540 Storage Server system
- When running 10 sequences of the ANSYS "BM" test suite concurrently on 10 Sun Blade X6250's in a cluster all sharing the same Sun Storage 7410, there was nominally less than a 33% increase in the runtime of any one sequence due to impact from sharing the same I/O system by all nodes.
- When running the ANSYS "BM" test suite on a single Sun Blade X6250 node and using the Sun Storage 7410 connected via a 10GbE switch as the shared file system the total runtime for the entire benchmark sequence was 8% faster than a Direct Access Storage (DAS) system (4 striped 15k rpm SAS drives) running internally on the same X6250 compute node. Overall, the Sun Storage 7410 outperformed the DAS configuration on 23 (72%) of 32 test cases.
- For the most I/O intensive test case "bm-7" (see description below) the Sun Storage 7410 connected via a 10GbE switch as the shared file system was from 14% to 32% faster than the Direct Access Storage (DAS) system (4 striped 15k rpm SAS drives) running internally on the same X6250 compute node.
- Tests were performed running the ANSYS 11 "BM" Test Suite (for SMP single node mode).
- These results demonstrate the advantages of the fully integrated hardware and software of the Sun Unified Storage Systems.
Results Summary (All configurations run under SuSE Linux SLES 10 SP 1)
| ANSYS V 11.0 "BM" Test Suite - SMP Single Node Mode | ||||||
|---|---|---|---|---|---|---|
| 1x X6250 DAS |
1x X6250 X4540 |
1x X6250 SS7410 |
10x X6250 SS7410 |
|||
| Test | Cores | WALL (sec) | WALL (sec) | WALL (sec) | WALL (sec) | |
| Min | Max | |||||
| bm-1 | 1 | 173 | 165 | 172 | 197 | |
| bm-1 | 2 | 139 | 137 | 139 | 166 | |
| bm-1 | 4 | 115 | 121 | 123 | 149 | |
| bm-1 | 8 | 138 | 111 | 119 | 142 | |
| bm-2 | 1 | 1204 | 1334 | 1246 | 1334 | 1467 |
| bm-2 | 2 | 1149 | 1208 | 1313 | 1417 | |
| bm-2 | 4 | 1107 | 1240 | 1159 | 1271 | 1371 |
| bm-2 | 8 | 1152 | 1137 | 1277 | 1329 | |
| bm-3 | 1 | 557 | 688 | 727 | 815 | |
| bm-3 | 2 | 486 | 636 | 670 | 772 | |
| bm-3 | 4 | 437 | 591 | 670 | 698 | |
| bm-3 | 8 | 538 | 559 | 606 | 676 | |
| bm-4 | 1 | 220 | 202 | 207 | 244 | |
| bm-4 | 2 | 191 | 179 | 178 | 219 | |
| bm-4 | 4 | 174 | 160 | 160 | 190 | |
| bm-4 | 8 | 235 | 148 | 147 | 175 | |
| bm-5 | 1 | 580 | 344 | 355 | 395 | |
| bm-5 | 2 | 357 | 240 | 257 | 276 | |
| bm-5 | 4 | 242 | 194 | 207 | 241 | |
| bm-5 | 8 | 335 | 200 | 221 | 266 | |
| bm-6 | 1 | 301 | 255 | 263 | 289 | |
| bm-6 | 2 | 202 | 175 | 186 | 212 | |
| bm-6 | 4 | 153 | 133 | 138 | 154 | |
| bm-6 | 8 | 169 | 115 | 120 | 148 | |
| bm-7 | 1 | 2466 | 2079 | 2178 | 2284 | |
| bm-7 | 2 | 1621 | 1410 | 1425 | 1564 | |
| bm-7 | 4 | 1213 | 1060 | 1092 | 1267 | |
| bm-7 | 8 | 1227 | 925 | 955 | 1133 | |
| bm-8 | 1 | 1467 | 1279 | 1311 | 1378 | |
| bm-8 | 2 | 1008 | 929 | 945 | 1068 | |
| bm-8 | 4 | 783 | 796 | 821 | 933 | |
| bm-8 | 8 | 1016 | 858 | 881 | 928 | |
| Total | 21155 | 19439 | ||||
Key Technical Points
About The ANSYS "Standard" & "Distributed" Benchmarks
ANSYS is a general purpose engineering analysis MCAE application that is based on the Finite Element Method. It performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior or deformations are concerned.
In the most recent release of the ANSYS benchmarks there are now two test suites: The SMP "BM" suite designed to run on a single node with multi processors and the DMP "BMD" suite intended to run on multi node clusters.
- The test cases from the ANSYS standard benchmark test suite all have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.
- The ANSYS standard test cases don't scale very well; at best on up 8 cores.
- The memory requirements for the test cases in the ANSYS standard benchmark test suite are not great requiring less than 3GB.
- There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0 for X86-64 platform architectures compiled with recent Sun Studio 11 compilers. This is the first SX64 version that has become available.
Benchmark Description
ANSYS 11.0 Standard Benchmarks
bm-1
Name: Exhaust Elbow Manifold
Description: Static structural analysis. Solved for equivalent stresses.
Statistics: ~850,000 DOF Model
Solver Used: Sparse Solver
Memory Used: 750 Mb Total, 560 Mb Solver
Notes: This model exhibits ideal performance for the sparse solver. It is a shell model where the sparse solver is faster than the pcg solver. Contrast this job with bm-7 where the sparse solver requires much more memory and CPU time to solve a similar sized model that is dominated by 3D elements.
bm-2
Name: Floor Panel
Description: Surface body geometry. Harmonic analysis with mode superposition.
Statistics: ~765,000 DOF Model
Solver Used: Sparse Solver/ Block Lanczos
Memory Used: Total memory used 1380 MB, 600 Mb for Lanczos solver, 400 Gb I/O
Notes: This shell model uses block Lanczos to compute 200 modes as part of a harmonic analysis. The lanczos run uses out-of-core mode using 600 Mb of memory and requiring 400 Gb of I/O. This is a good medium sized model to test system CPU performance and I/O performance. The mesh has been refined to model performance characteristics of larger, more detailed models.
bm-3
Name: Engine Assembly - Piston and Crank
Description: Assembly with contact. Nonlinear structural DOF solution.
Statistics: ~250,000 DOF Model
Solver Used: Sparse Solver, Nonlinear with contact
Memory Used: Total job Memory 800 Mb, Solver memory 600 Mb 9 Cumulative
Notes: This is a good medium sized nonlinear analysis with contact. It should run well on most systems. The default solver options will use pivoting so it has to run in out-of-core mode.
bm-4
Name: Electric Motor
Description: Electromagnetic analysis. Solved for magnetic field intensities.
Statistics: ~250,000 DOF Model
Solver Used: Sparse Solver, Emag application, Nonlinear ? 4 cumulative load steps
Memory Used: Total job memory 650 Mbytes
Notes: Small model, very fast solver times. Should run well on all systems.
bm-5
Name: Brake Rotor
Description: Thermal transient analysis. Solved for temperature DOF?s.
Statistics: ~230,000 DOF Model
Solver Used: JCG Solver
Memory Used: Total job memory 210 Mb
Notes: Small model of a thermal analysis. Uses JCG iterative solver (diagonal preconditioner only).
bm-6
Name: Wing Section
Description: Static structural analysis.
Statistics: ~250,000 DOF Model
Solver Used: Sparse Solver
Memory Used: Total memory 1015 Mb, Solver memory 707 Mb ( bm-6 )
Notes: bm-6 and bm-7 have larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 is a more complex model that performs better on larger systems. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems with 1 million or more degrees of freedom.
bm-7
Name: Wing Section
Description: Static structural analysis.
Statistics: ~800,000 DOF Model
Solver Used: Sparse Solver
Memory Used: Total memory 2700 Mb, Solver memory 1200 Mb ( bm-7 on 64-bit system )
Notes: bm-6 and bm-7 have larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will run slower on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1. Bm-7 generates a large 12.8 Gbyte file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On small-memory sytems the solver will run out-of-core memory and will require heavy I/O. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems with 1 million or more degrees of freedom.
Disclosure Statement:
The following are trademarks or registered trademarks of ANSYS, Inc., ANSYS Multiphysics TM. All information on the ANSYS website is Copyrighted by ANSYS, Inc. Results from http://www.ansys.com/services/hardware-support-db.htm as of Jun 2, 2008.
System Configuration
Hardware Configuration:
See Also











Do you have any tests comparing these results to a...
I've heard there are some new Lustre results in th...
any iSCSI perf numbers ? (with linux clients)
I haven't seen any iSCSI numbers yet. I'm sure th...