BM Seer Facts & Questions from an Anonymous Sun Source

Update: World Record EXA PowerFLOW Cluster & Single Node

Monday Jul 16, 2007

Update:

A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in both benchmark test cases.

Other runs on the 4-node cluster of Sun Blade X6250 outperformed the SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster.

    question: can the Itanic dual-core keep floating?

The 4-node Sun Blade X6250 cluster outperformed the SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster. Even at the single node configuration, the Sun Blade X6250 beats an SGI Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4% faster in the 1-core results.

In summary:
World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160) beats the best posted results for any single node blades and servers. All posted results are for 2 socket dual-core platforms

EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model) results in seconds (smaller is better)

#
C
P
U
IBM e135
Opt
DC 2.4GHz
Myri
net
SLES 9
HP BL460
Xeon
DC 3GHz
IB
RHEL 4
HP BL460
Opt
DC 3.0GHz
IB
XC3.1 RC1
HP DL140
Xeon
DC 3GHz
IB
XC3.1 RC1
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4
Sun X6250
Xeon 5160
DC 3.0GHz
IB
SLES 10
SGI Altix
Itan2
DC 1.6GHz
 
Pro
Pack5
SGI Altix
XE
Xeon
DC 3GHz
 
SLES 10
1 - - - - - 822.7 1631.4 866.1
2 - - - - - 418.5 832.7 448.8
4 - - - - - 214.9 438.4 264.8
8 182.9 137.2 137.8 134.7 214.3 118.6 227.2 147.9
16 96.3 70.4 71.3 70.5 111.4 77.5 117.9 78.1
32 51.5 37.0 40.6 36.6 57.9 - 60.2 41.9
64 31.5 21.5 22.9 21.1 31.8 - - 28.0
96 24.7 17.3 - - - - - -
128 19.0 - - - - - - 18.1

"-" no result published

EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model) results in seconds (smaller is better)

#
C
P
U
IBM e135
Opt
DC 2.4GHz
Myri
net
SLES 9
HP BL460
Xeon
DC 3GHz
IB
RHEL 4
HP BL460
Opt
DC 3GHz
IB
XC3.1
RC1
HP DL140
Xeon
DC 3.0GHz
IB
XC3.1 RC1
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4
Sun X6250
Xeon 5160
DC 3GHz
IB
SLES 10
SGI Altix
Itan2
DC 1.6GHz
 
Pro
Pack5
SGI Altix
XE
Xeon
DC 3GHz
 
SLES 10
1 - - - - - 1966.4 3884.0 2043.6
2 - - - - - 987.5 2000.4 1062.4
4 - - - - - 500.5 1054.5 620.7
8 424.9 310.0 306.4 258.4 490.7 258.4 526.7 316.0
16 216.0 165.4 - 160.1 253.9 164.5 272.1 174.4
32 112.8 82.3 84.4 83.3 129.3 - 139.4 90.3
64 61.5 43.8 43.8 43.2 68 - 75.6 48.7
96 45.2 32.3 - - - - - -
128 36.8 - - 24.4 - - - 32.8

"-" no result published

The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.

Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores

The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes(different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.

    Case #1 Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
    Case #2 Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).

It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.

The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.

Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband

Disclosure Statement:

Exa Corporation Copyright All information on the EXA website is under Copyright 1996-2007 by Exa Corporation., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of 07/02/07.

System Configuration

Hardware Configuration:

Sun Blade X6250

    4 2-socket Sun Blade X6250
    2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
    Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

Software Configuration:

    Linux 64-bit SUSE SLES 10
    EXA PowerFLOW V3.6c & V4.c
    EXA PowerFLOW Benchmark Test Suite
    Voltaire GridStack 4.1.5-7 for SLES 10

Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record EXA PowerFLOW Cluster & Single Node

Thursday Jul 12, 2007

Entry updated please see http://blogs.sun.com/bmseer/entry/update_world_record_exa_powerflow for the latest.

Like this post? del.icio.us | furl | slashdot | technorati | digg