BM Seer Facts & Questions from an Anonymous Sun Source

Sun Blade X6250 Cluster EXA PowerFLOW World Record

Thursday Jan 24, 2008

The EXA PowerFLOW benchmark test suite was run on a mini cluster of Sun Blade X6250 blades with the recently announced 3.33 GHz dual-core Intel 5260. The Sun Blade X6250 mini cluster beats all posted results at the PowerFLOW Performance website up to the eight cores that were considered.

  • In runs of the benchmark test suite, the Sun X6250 cluster was nominally 20% faster than the best result from the top IBM, HP, or SGI clusters. The variation was from 15% to 24% faster over the 4-core levels considered.
  • The scaling efficiency of the Sun X6250 cluster ranged from 100% (at 1 core) to on average 83% (at 8 cores).

Four 2-socket Sun X6250 blades with Infiniband interconnects were used and runs were made at different core levels: 1, 2, 4, and 8. Comparisons are presented against the current leading competitors' results also obtained with high perfomance interconnects and posted at the EXA PowerFLOW Performance website. This includes results from IBM, HP, and SGI platforms.

EXA PowerFLOW V 3.6c Benchmark Case 1 (Smaller Model), Results are total elapsed time in seconds

System Processor Number of Cores
1 2 4 8
Sun Blade X6250 3.33GHz DC Intel X5260 720.49 389.36 195.43 110.36
Sun Blade X6250 3.0GHz DC Intel 5160 822.71 418.47 214.48 118.63
Sun Blade X6250 3.0GHz QC Intel X5365 844.30 430.72 214.41 121.25
Sun Fire X2200 M2 3.0GHz DC Opteron 943.41 461.11 232.93 123.12
Sun Blade X8440 3.0GHz DC Opteron 937.24 472.58 238.58 127.44
HP BL460 3.0GHz DC Xeon -- -- -- 137.22
Sun Fire X4450 2.93GHz QC Intel 7350 874.12 462.12 241.08 137.74
IBM e1350 3.0GHz DC Xeon 5160 -- -- 240.31 141.46
HP BL465 2.6GHz DC Opteron -- -- -- 146.31
SGI Altix 3.0GHz DC Xeon 866.09 448.80 264.82 147.88
HP rx2660 1.6GHz DC Itanium2 -- -- -- 214.25
SGI Altix 1.6GHz DC Itanium2 1631.4 832.68 438.43 227.24

EXA PowerFLOW V 3.6c Benchmark Case 1 (Smaller Model)

Results are total elapsed time in seconds

System Processor Number of Cores
1 2 4 8
Sun Blade X6250 3.33GHz DC Intel X5260 1714.22 925.77 463.11 252.64
Sun Blade X6250 3.0GHz DC Intel 5160 1966.38 987.47 500.54 258.42
Sun Blade X6250 3.0GHz QC Intel X5365 1991.78 1010.63 507.21 278.64
Sun Fire X2200 M2 3.0GHz DC Opteron 2273.02 1086.65 550.34 282.53
Sun Blade X8440 3.0GHz DC Opteron 2210.13 1130.27 562.68 289.81
HP BL460 3.0GHz DC Xeon -- -- -- 310.03
IBM e1350 3.0GHz DC Xeon 5160 -- -- 557.04 314.23
SGI Altix 3.0GHz DC Xeon 2043.59 1062.38 620.67 315.96
Sun Fire X4450 2.93GHz QC Intel 7350 2062.00 1066.34 598.81 319.92
HP BL465 2.6GHz DC Opteron -- -- -- 331.82
HP rx2660 1.6GHz DC Itanium2 -- -- -- 490.74
SGI Altix 1.6GHz DC Itanium2 3883.97 2000.44 1054.45 526.74

Key Technical Points

  • Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 or larger.
  • Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband
  • PowerFLOW supports a variety of interconnects from various hardware vendors (starting with gigabit ethernet then Infiniband e.g. Voltaire, Cisco/Topspin, QLogic, then Myrinet) MPI's (HP-MPI, MVAPICH 2, LAM) and communication protocals (e.g. ssh and rsh)
  • There is still not an officially certified version of a Solaris build of PowerFLOW for X86-64 platform architectures.
  • The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes (different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.
  • The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.
  • PowerFLOW runs are cpu and memory intensive but do not require any special high performance I/O file systems.
  • When running the test suite a run script is provided ("exabench") that will automatically run one or both test cases over a range of core levels on the particular cluster nodes as specified in an "mpi_file" along with the number of requested cores to be used per node.
Disclosure Statement:

Exa Corporation Copyright All information on the EXA website is Copyrighted @ 2007, 1996-2006 by Exa Corp oration., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of January 17, 2008.

Benchmark Description

The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.

    Case #1
    Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).

    Case #2
    Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).

When describing the size of the cases, it is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To account for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.

System Configuration

  • 4 Sun Blade X6250's
  • 3.33 GHz dual core Intel 5260 processors
  • 2 internal striped 15K SAS drives (cluster shared file system)
  • Infiniband (Voltaire) interconnects
  • SUSE Linux Enterprise Server SLES 10
    Voltaire OFED GridStack-4.1.5_7-sles-k2.6.16.21-0.8-smp-x86_64
    HP-MPI
    EXA PowerFLOW 3.6c
    PowerFLOW 3.6 Benchmark Test Suite

    See Also

    Current EXA PowerFLOW V3.6c & V4.6c results at (EXA password required):
    http://www.exa.com/user_center/index.html

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Update: World Record EXA PowerFLOW Cluster & Single Node

    Monday Jul 16, 2007

    Update:

    A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in both benchmark test cases.

    Other runs on the 4-node cluster of Sun Blade X6250 outperformed the SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster.

      question: can the Itanic dual-core keep floating?

    The 4-node Sun Blade X6250 cluster outperformed the SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster. Even at the single node configuration, the Sun Blade X6250 beats an SGI Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4% faster in the 1-core results.

    In summary:
    World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160) beats the best posted results for any single node blades and servers. All posted results are for 2 socket dual-core platforms

    EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model) results in seconds (smaller is better)

    #
    C
    P
    U
    IBM e135
    Opt
    DC 2.4GHz
    Myri
    net
    SLES 9
    HP BL460
    Xeon
    DC 3GHz
    IB
    RHEL 4
    HP BL460
    Opt
    DC 3.0GHz
    IB
    XC3.1 RC1
    HP DL140
    Xeon
    DC 3GHz
    IB
    XC3.1 RC1
    HP RX2660
    Itan2
    DC 1.6GHz
    IB
    RHEL 4
    Sun X6250
    Xeon 5160
    DC 3.0GHz
    IB
    SLES 10
    SGI Altix
    Itan2
    DC 1.6GHz
     
    Pro
    Pack5
    SGI Altix
    XE
    Xeon
    DC 3GHz
     
    SLES 10
    1 - - - - - 822.7 1631.4 866.1
    2 - - - - - 418.5 832.7 448.8
    4 - - - - - 214.9 438.4 264.8
    8 182.9 137.2 137.8 134.7 214.3 118.6 227.2 147.9
    16 96.3 70.4 71.3 70.5 111.4 77.5 117.9 78.1
    32 51.5 37.0 40.6 36.6 57.9 - 60.2 41.9
    64 31.5 21.5 22.9 21.1 31.8 - - 28.0
    96 24.7 17.3 - - - - - -
    128 19.0 - - - - - - 18.1

    "-" no result published

    EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model) results in seconds (smaller is better)

    #
    C
    P
    U
    IBM e135
    Opt
    DC 2.4GHz
    Myri
    net
    SLES 9
    HP BL460
    Xeon
    DC 3GHz
    IB
    RHEL 4
    HP BL460
    Opt
    DC 3GHz
    IB
    XC3.1
    RC1
    HP DL140
    Xeon
    DC 3.0GHz
    IB
    XC3.1 RC1
    HP RX2660
    Itan2
    DC 1.6GHz
    IB
    RHEL 4
    Sun X6250
    Xeon 5160
    DC 3GHz
    IB
    SLES 10
    SGI Altix
    Itan2
    DC 1.6GHz
     
    Pro
    Pack5
    SGI Altix
    XE
    Xeon
    DC 3GHz
     
    SLES 10
    1 - - - - - 1966.4 3884.0 2043.6
    2 - - - - - 987.5 2000.4 1062.4
    4 - - - - - 500.5 1054.5 620.7
    8 424.9 310.0 306.4 258.4 490.7 258.4 526.7 316.0
    16 216.0 165.4 - 160.1 253.9 164.5 272.1 174.4
    32 112.8 82.3 84.4 83.3 129.3 - 139.4 90.3
    64 61.5 43.8 43.8 43.2 68 - 75.6 48.7
    96 45.2 32.3 - - - - - -
    128 36.8 - - 24.4 - - - 32.8

    "-" no result published

    The EXA PowerFLOW Benchmark Test Suite
    The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.

    Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores

    The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes(different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.

      Case #1 Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
      Case #2 Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).

    It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.

    The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.

    Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband

    Disclosure Statement:

    Exa Corporation Copyright All information on the EXA website is under Copyright 1996-2007 by Exa Corporation., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of 07/02/07.

    System Configuration

    Hardware Configuration:

    Sun Blade X6250

      4 2-socket Sun Blade X6250
      2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
      Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

    Software Configuration:

      Linux 64-bit SUSE SLES 10
      EXA PowerFLOW V3.6c & V4.c
      EXA PowerFLOW Benchmark Test Suite
      Voltaire GridStack 4.1.5-7 for SLES 10

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    World Record EXA PowerFLOW Cluster & Single Node

    Thursday Jul 12, 2007

    Entry updated please see http://blogs.sun.com/bmseer/entry/update_world_record_exa_powerflow for the latest.

    Like this post? del.icio.us | furl | slashdot | technorati | digg