Update: World Record EXA PowerFLOW Cluster & Single Node
Monday Jul 16, 2007
Update:
A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in both benchmark test cases.
Other runs on the 4-node cluster of Sun Blade X6250 outperformed the SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster.
-
question: can the Itanic dual-core keep floating?
The 4-node Sun Blade X6250 cluster outperformed the SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster. Even at the single node configuration, the Sun Blade X6250 beats an SGI Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4% faster in the 1-core results.
In summary:
World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160)
beats the best posted results for any single node blades and servers.
All posted results are for 2 socket dual-core platforms
EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model) results in seconds (smaller is better)
| # C P U |
IBM e135 Opt DC 2.4GHz Myri net SLES 9 |
HP BL460 Xeon DC 3GHz IB RHEL 4 |
HP BL460 Opt DC 3.0GHz IB XC3.1 RC1 |
HP DL140 Xeon DC 3GHz IB XC3.1 RC1 |
HP RX2660 Itan2 DC 1.6GHz IB RHEL 4 |
Sun X6250 Xeon 5160 DC 3.0GHz IB SLES 10 |
SGI Altix Itan2 DC 1.6GHz Pro Pack5 |
SGI Altix XE Xeon DC 3GHz SLES 10 |
|---|---|---|---|---|---|---|---|---|
| 1 | - | - | - | - | - | 822.7 | 1631.4 | 866.1 |
| 2 | - | - | - | - | - | 418.5 | 832.7 | 448.8 |
| 4 | - | - | - | - | - | 214.9 | 438.4 | 264.8 |
| 8 | 182.9 | 137.2 | 137.8 | 134.7 | 214.3 | 118.6 | 227.2 | 147.9 |
| 16 | 96.3 | 70.4 | 71.3 | 70.5 | 111.4 | 77.5 | 117.9 | 78.1 |
| 32 | 51.5 | 37.0 | 40.6 | 36.6 | 57.9 | - | 60.2 | 41.9 |
| 64 | 31.5 | 21.5 | 22.9 | 21.1 | 31.8 | - | - | 28.0 |
| 96 | 24.7 | 17.3 | - | - | - | - | - | - |
| 128 | 19.0 | - | - | - | - | - | - | 18.1 |
"-" no result published
EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model) results in seconds (smaller is better)
| # C P U |
IBM e135 Opt DC 2.4GHz Myri net SLES 9 |
HP BL460 Xeon DC 3GHz IB RHEL 4 |
HP BL460 Opt DC 3GHz IB XC3.1 RC1 |
HP DL140 Xeon DC 3.0GHz IB XC3.1 RC1 |
HP RX2660 Itan2 DC 1.6GHz IB RHEL 4 |
Sun X6250 Xeon 5160 DC 3GHz IB SLES 10 |
SGI Altix Itan2 DC 1.6GHz Pro Pack5 |
SGI Altix XE Xeon DC 3GHz SLES 10 |
|---|---|---|---|---|---|---|---|---|
| 1 | - | - | - | - | - | 1966.4 | 3884.0 | 2043.6 |
| 2 | - | - | - | - | - | 987.5 | 2000.4 | 1062.4 |
| 4 | - | - | - | - | - | 500.5 | 1054.5 | 620.7 |
| 8 | 424.9 | 310.0 | 306.4 | 258.4 | 490.7 | 258.4 | 526.7 | 316.0 |
| 16 | 216.0 | 165.4 | - | 160.1 | 253.9 | 164.5 | 272.1 | 174.4 |
| 32 | 112.8 | 82.3 | 84.4 | 83.3 | 129.3 | - | 139.4 | 90.3 |
| 64 | 61.5 | 43.8 | 43.8 | 43.2 | 68 | - | 75.6 | 48.7 |
| 96 | 45.2 | 32.3 | - | - | - | - | - | - |
| 128 | 36.8 | - | - | 24.4 | - | - | - | 32.8 |
"-" no result published
The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.
Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores
The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes(different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.
-
Case #1
Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
Case #2 Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).
It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.
The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.
Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband
Disclosure Statement:
Exa Corporation Copyright All information on the EXA website is under Copyright 1996-2007 by Exa Corporation., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of 07/02/07.
System Configuration
Hardware Configuration:
Sun Blade X6250
-
4 2-socket Sun Blade X6250
2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
-
Linux 64-bit SUSE SLES 10
EXA PowerFLOW V3.6c & V4.c
EXA PowerFLOW Benchmark Test Suite
Voltaire GridStack 4.1.5-7 for SLES 10
Tags: 6250 blade exa hpc performance powerflow sun










