BM Seer Unofficial thoughts from an anonymous Sun employee

Fluent World Records on Sun's Blades

Wednesday Jan 23, 2008

Two benchmark test suites (standard and new) for the Fluent computational fluid dynamics (CFD) code were run on a mini cluster of Sun Blade X6250 blades with the recently announced 3.33 GHz dual core Intel 5260. The Sun Blade X6250 mini cluster beats all posted results for both FLUENT benchmark test suites at each core level from one up to the maximum sixteen cores that were available on the X6250 cluster.

  • In runs of the standard test suite the X6250 cluster was from 15% (lower core levels) to 42% (highest core level) faster than the previous top set of results posted from a Harpertown quad-core 3.0 GHz Xeon.
  • The scaling efficiency with the X6250 cluster ranged from 100% to on average 84% going from 1 to 8 cores when running the standard large test cases.
  • In runs of the new larger and more representative test suite, the X6250 cluster was from 15% (lower core levels) to 65% (highest core level) faster than the top results posted from a Harpertown quad-core 3.0 GHz Xeon.
  • The scaling efficiency with the X6250 cluster ranged from 100% to on average 84% going from 1 to 8 cores when running all six of the test cases in the new test suite.

Four, 2-socket Sun X6250 blades with Infiniband interconnects were used. Comparisons are presented against the results posted at the FLUENT Performance website.

The two benchmark test suites that were considered are first, the long standing standard "FLUENT 6" benchmark test suite consisting of 9 test cases: 3 small, 3 medium sized, and 3 large and is referenced above as the standard test suite. Secondly, the "FLUENT 6.3" benchmark test suite (referenced as the new test suite) consisting of larger models (both in memory requirements and mesh/model size) more suited for multi core multi node dmp cluster run environments and more representative of current actual engineering CFD.

Fluent is one of the most prominent commercial CFD (Computational Fluid Dynamics) codes. It is distributed worldwide to major engineering organizations in a broad spectrum of disciplines (aircraft, aerospace, automotive, marine, etc.) that are involved with fluid flow.

CFD models tend to be extremely large (fluid flow over entire car, aircraft and submarine bodies and complex flow involving mixing of species and chemical reaction). In order to have reasonable run times for the analyses use of many processing units is necessary. Currently the most effective way of achieving this is via an interconnected cluster of multi-core rack-mounted servers or blades.

Standard FLUENT 6 Benchmark Test Suite, large workload results ("ratings" bigger is better)

Click www.fluent.com/software/fluent/fl6bench/fl6bench_6.3/ to see the full table of results.

Rating = No. of sequential runs of test case possible in 1 day 86,400/(Total Elapsed Run Time in Seconds)

System NCPUS FL5L1 FL5L2 FL5L3
Sun X6250 3.33GHz DC 5260 1 259.4 178.5 32.4
Intel 3.0GHz QC Harpertown 1 220.5 151.2 27.9
SGI Altix XE210 3.0GHz Xeon 5160 1 210.7 153.5 29.6
IBM X3550 3GHz DC 5160 1 188.0 134.7 n/a
 
Sun X6250 3.33GHz DC 5260 2 493.4 351.8 61.9
Intel 3.0GHz QC Harpertown 2 420.7 297.3 54.2
SGI Altix XE210 3.0GHz Xeon 5160 2 396.1 298.0 56.2
IBM X3550 3GHz DC 5160 2 342.6 236.8 55.0
 
Sun X6250 3.33GHz DC 5260 4 931.8 675.7 122.0
SGI Altix XE210 3.0GHz Xeon 5160 4 679.2 449.7 80.7
IBM X3550 3GHz DC 5160 4 623.5 411.4 94.9
 
Sun X6250 3.33GHz DC 5260 8 1811.3 1227.3 207.2
Intel 3.0GHz QC Harpertown 8 1279.1 710.5 120.0
SGI Altix XE210 3.0GHz Xeon 5160 8 1343.7 899.5 161.0
IBM X3550 3GHz DC 5160 8 1273.4 862.3 149.9
 
Sun X6250 3.33GHz DC 5260 16 2941.3 1577.4 246.0
SGI Altix XE210 3.0GHz Xeon 5160 16 2584.9 1788.8 319.0
IBM X3550 3GHz DC 5160 16 2479.2 1722.0 306.8

New FLUENT 6.3 Benchmark Test Suite, "Ratings" (bigger is better)

Rating = No. of sequential runs of test case possible in 1 day 86,400/(Total Elapsed Run Time in Seconds)

System NCPUS eddy turbo aircraft sedan truck14m truckpoly
Sun X6250 3.33GHz DC 5260 1 109.2 440.4 96.6 65.1 7.0 8.3
Intel 3.0GHz QC Harpertown 1 95.9 n/a 84.2 55.9 6.2 6.9
 
Sun X6250 3.33GHz DC 5260 2 208.9 823.1 178.8 121.3 14.6 16.1
Intel 3.0GHz QC Harpertown 2 183.1 741.3 162.9 109.6 12.4 13.4
 
Sun X6250 3.33GHz DC 5260 4 415.6 1590.4 353.8 246.2 29.9 31.9
 
Sun X6250 3.33GHz DC 5260 8 780.8 2805.2 577.1 384.4 55.0 57.3
Intel 3.0GHz QC Harpertown 8 491.4 1685.0 321.0 207.2 32.1 33.2
 
Sun X6250 3.33GHz DC 5260 16 1095.8 3744.3 682.9 429.9 73.7 74.6

Key Technical Points

  • The "small" and even "medium" test cases in the standard suite are both not too large and not very representative any more of typical usage.
  • Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with Fluent is very good often linear or perfect up to 64 and even 128 cores
  • Performance when running Fluent in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband
  • Fluent supports a variety of interconnects from various hardware vendors (e.g. Voltaire, Cisco/Topspin, QLogic [formerly Silverstorm], Myrinet) MPI's (HP-MPI, MVAPICH(2), LAM, plus private vendor versions) and communication protocals (e.g. ssh and rsh)
  • There is still not an officially certified version of a Solaris build of Fluent for X86-64 platform architectures. However, a prototype build compiled a while ago with Sun Studio 11 compilers then outperformed all other platforms under other operating systems (64-bit Linux). These competitive results are currently posted at the Fluent website from several hardware vendors including current competitive AMD and Intel based platforms running under 64-bit Linux operating systems.
  • Very recently, Fluent has devloped a new benchmark test suite with larger models specifically intended to be run either on large multi core servers or large multi node clusters of multi core platforms.

    Benchmark Description

    The Original Standard "Fluent 6" Benchmark Test Suite

    Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, numerics and parallelization, compilers and optimization, in addition to performance characteristics of the hardware where the simulation is performed. The problems selected represent a range of simulations typical of those which might be found in industry. The principal objective of this benchmark suite is to provide comprehensive and fair comparative information of the performance of FLUENT on available hardware platforms.

    Disclosure Statement:

    All information on the Fluent website is Copyrighted 1995-2008 by Fluent Inc. Results from www.fluent.com as of January 7, 2008.

    System Configuration

    4 Sun Blade X6250's
    3.33 GHz dual core Intel 5260
    2 internal striped 15K SAS drives (cluster shared file system)
    Infiniband (Voltaire) interconnects

    SuSE Linux Enterprise Server SLES 10
    Voltaire OFED gridstack
    HP-MPI
    Fluent V6.3.26
    Fluent 6 Standard Benchmark Test Suite
    Fluent 6.3 "New" Benchmark Test Suite

    See Also:

    New Fluent benchmark results posted at: http://www.fluent.com/software/fluent/fl6bench/fl6bench_6.3

    Standard Fluent benchmark results posted at: http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm

  • [1] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg
    Comments:

    [Trackback] Bookmarked your post over at Blog Bookmarker.com!

    Posted by fluent on January 25, 2008 at 06:00 AM PST #

    Post a Comment:
    Comments are closed for this entry.