Sun Blade X6275 Cluster Beats SGI Running Fluent Benchmarks
A Sun Blade 6048 Modular System with 8 Sun Blade X6275 Server Modules configured with QDR InfiniBand cluster interconnect delivered outstanding performance running the FLUENT 12 benchmark test suite. Sun consistently delivered the best or near best results per node for the 6 benchmark tests considered up to the available nodes considered for these runs.
- The Sun Blade X6275 cluster delivered the best results for the truck_poly_14M tests for all Rank counts tested.
-
For this large truck_poly_14m test case, the Sun Blade X6275 cluster
beat the best results by SGI by as much as 19%.
- Of the 54 test cases presented here, the Sun Blade X6275 cluster delivered the best results in 87% of the tests, 47 of the 54 cases.
Performance Landscape
| FLUENT 12 Benchmark Test Suite |
|||||||||
|---|---|---|---|---|---|---|---|---|---|
|
Results are "Ratings" (bigger is better) Rating = No. of sequential runs of test case possible in 1 day 86,400/(Total Elapsed Run Time in Seconds) |
|||||||||
| System |
Nodes | Ranks | Benchmark Test Case | ||||||
| eddy 417k |
turbo 500k |
aircraft 2m |
sedan 4m |
truck 14m |
truck_poly 14m |
||||
| |
|||||||||
| Sun Blade X6275 | 16 | 128 | 6496.2 | 19307.3 | 8408.8 | 6341.3 | 1060.1 | 984.1 | |
| Best Intel | 16 | 128 | 5236.4 (3) | 15638.0 (7) | 7981.5 (1) | 6582.9 (1) | 1005.8 (1) | 933.0 (1) | |
| Best SGI | 16 | 128 | 7578.9 (5) | 14706.4 (6) | 6789.8 (4) | 6249.5 (5) | 1044.7 (4) | 926.0 (4) | |
| |
|||||||||
| Sun Blade X6275 | 8 | 64 | 5308.8 | 26790.7 | 5574.2 | 5074.9 | 547.2 | 525.2 | |
| Best Intel | 8 | 64 | 5016.0 (1) | 25226.3 (1) | 5220.5 (1) | 4614.2 (1) | 513.4 (1) | 490.9 (1) | |
| Best SGI | 8 | 64 | 5142.9 (4) | 23834.5 (4) | 4614.2 (4) | 4352.6 (4) | 529.4 (4) | 479.2 (4) | |
| |
|||||||||
| Sun Blade X6275 | 4 | 32 | 3066.5 | 13768.9 | 3066.5 | 2602.4 | 289.0 | 270.3 | |
| Best Intel | 4 | 32 | 2856.2 (1) | 13041.5 (1) | 2837.4 (1) | 2465.0 (1) | 266.4 (1) | 251.2 (1) | |
| Best SGI | 4 | 32 | 3083.0 (4) | 13190.8 (4) | 2588.8 (5) | 2445.9 (5) | 266.6 (4) | 246.5 (4) | |
| |
|||||||||
| Sun Blade X6275 | 2 | 16 | 1714.3 | 7545.9 | 1519.1 | 1345.8 | 144.4 | 141.8 | |
| Best Intel | 2 | 16 | 1585.3 (1) | 7125.8 (1) | 1428.1 (1) | 1278.6 (1) | 134.7 (1) | 132.5 (1) | |
| Best SGI | 2 | 16 | 1708.4 (4) | 7384.6 (4) | 1507.9 (4) | 1264.1 (5) | 128.8 (4) | 133.5 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 8 | 931.8 | 4061.1 | 827.2 | 681.5 | 73.0 | 73.8 | |
| Best Intel | 1 | 8 | 920.1 (2) | 3900.7 (2) | 784.9 (2) | 644.9 (1) | 70.2 (2)) | 70.9 (2) | |
| Best SGI | 1 | 8 | 953.1 (4) | 4032.7 (4) | 843.3 (4) | 651.0 (4) | 71.4 (4) | 72.0 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 4 | 550.4 | 2425.3 | 533.6 | 423.0 | 41.6 | 41.6 | |
| Best Intel | 1 | 4 | 515.7 (1) | 2244.2 (1) | 490.8 (1) | 392.2 (1) | 37.8 (1) | 38.4 (1) | |
| Best SGI | 1 | 4 | 561.6 (4) | 2416.8 (4) | 526.9 (4) | 412.6 (4) | 40.9 (4) | 40.8 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 2 | 299.6 | 1328.2 | 293.9 | 232.1 | 21.3 | 21.6 | |
| Best Intel | 1 | 2 | 274.3 (1) | 1201.7 (1) | 266.1 (1) | 214.2 (1) | 18.9 (1) | 19.6 (1) | |
| Best SGI | 1 | 2 | 294.2 (4) | 1302.7 (4) | 289.0 (4) | 226.4 (4) | 20.5 (4) | 21.2 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | 1 | 154.7 | 682.6 | 149.1 | 114.8 | 9.7 | 10.1 | |
| Best Intel | 1 | 1 | 143.5 (1) | 631.1 (1) | 137.4 (1) | 106.2 (1) | 8.8 (1) | 9.0 (1) | |
| Best SGI | 1 | 1 | 153.3 (4) | 677.5 (4) | 147.3 (4) | 111.2 (4) | 10.3 (4) | 9.5 (4) | |
| |
|||||||||
| Sun Blade X6275 | 1 | serial | 155.6 | 676.6 | 156.9 | 110.0 | 9.4 | 10.3 | |
| Best Intel | 1 | serial | 146.6 (2) | 650.0 (2) | 150.2 (2) | 105.6 (2) | 8.8 (2) | 9.7 (2) | |
| |
|||||||||
Sun Blade X6275, X5570 QC 2.93 GHz, QDR SMT on / Turbo mode on
(1) Intel Whitebox (X5560 QC 2.80 GHz, RHEL5, IB)
(2) Intel Whitebox (X5570 QC 2.93 GHz, RHEL5)
(3) Intel Whitebox (X5482 QC 3.20 GHz, RHEL5, IB)
(4) SGI Altix ICE_8200IP95 (X5570 2.93 GHz +turbo, SLES10, IB)
(5) SGI Altix_ICE_8200IP95 (X5570 2.93 GHz, SLES10, IB)
(6) SGI Altix_ICE_8200EX (Intel64 QC 3.00 GHz, Linux, IB)
(7) Qlogic Cluster (X5472 QC 3.00 GHz, RHEL5.2, IB Truescale)
Results and Configuration Summary
Hardware Configuration:
- 8 x Sun Blade X6275 Server Module ( Dual-Node Blade, 16 nodes )
each node with
-
2 x 2.93GHz Intel X5570 QC processors
24 GB (6 x 4GB, 1333 MHz DDR3 dimms)
On-board QDR InfiniBand Host Channel Adapters, QNEM
Software Configuration:
-
OS: 64-bit SUSE Linux Enterprise Server SLES 10 SP 2
Interconnect Software: OFED ver 1.4.1
Shared File System: Lustre ver 1.8.0.1
Application: FLUENT V12.0.16
Benchmark: FLUENT 12 Benchmark Test Suite
Benchmark Description
The benchmark tests are representative of typical user large CFD models intended for execution in distributed memory processor (DMP) mode over a cluster of multi-processor platforms.
Key Points and Best Practices
Observations About the Results
The Sun Blade X6275 cluster delivered excellent performance, especially shining with the larger modelsThese processors include a turbo boost feature coupled with a speedstep option in the CPU section of the Advanced BIOS settings. This, under specific circumstances, can provide a cpu up clocking, temporarily increasing the processor frequency from 2.93GHz to 3.2GHz.
Memory placement is a very significant factor with Nehalem processors. Current Nehalem platforms have two sockets. Each socket has three memory channels and each channel has 3 bays for DIMMs. For example if one DIMM is placed in the 1st bay of each of the 3 channels the DIMM speed will be 1333 MHz with the X5570's altering the DIMM arrangement to an off balance configuration by say adding just one more DIMM into the 2nd bay of one channel will cause the DIMM frequency to drop from 1333 MHz to 1067 MHz.
About the FLUENT 12 Benchmark Test Suite
The FLUENT application performs computational fluid dynamic analysis on a variety of different types of flow and allows for chemically reacting species. transient dynamic and can be linear or nonlinear as far
- CFD models tend to be very large where grid refinement is required to capture with accuracy conditions in the boundary layer region adjacent to the body over which flow is occurring. Fine grids are required to also determine accurate turbulence conditions. As such these models can run for many hours or even days as well using a large number of processors.
- CFD models typically scale very well and are very suited for execution on clusters. The FLUENT 12 benchmark test cases scale well.
- The memory requirements for the test cases in the FLUENT 12 benchmark test suite range from a few hundred megabytes to about 25 GB. As the job is distributed over multiple nodes the memory requirements per node correspondingly are reduced.
- The benchmark test cases for the FLUENT module do not have a substantial I/O component. component. However performance will be enhanced very substantially by using high performance interconnects such as InfiniBand for inter node cluster message passing. This nodal message passing data can be stored locally on each node or on a shared file system.
- As a result of the large amount of inter node message passing performance can be further enhanced by more than a 3x factor as indicated here by implementing the Lustre based shared file I/O system.
See Also
FLUENT 12.0 Benchmark:
http://www.fluent.com/software/fluent/fl6bench/fl6bench_6.4.x/
Disclosure Statement
All information on the Fluent website is Copyrighted 1995-2009 by Fluent Inc. Results from http://www.fluent.com/software/fluent/fl6bench/ as of October 20, 2009 and this presentation.

Great post!
Thank you!
Posted by chopard watches on November 09, 2009 at 06:16 PM PST #