BM Seer Unofficial thoughts from an anonymous Sun employee

CFD World Record Fluent Sun Blade X6250 Cluster (Xeon 3 GHz 5160)

Thursday Jul 12, 2007

The Sun Blade X6250 cluster was up to 27% faster or 6% faster on geometric mean than an SGI Altix XE 210 cluster (Xeon 3 GHz dual core 5160 Woodcrest) and Infiniband interconnects.

A cluster of four Sun Blade X6250 Cluster (Xeon 3 GHz 5160) with Infiniband interconnects was used to set this record. Each of these two socket blades had dual-core Intel Xeon EM64T 5160 3 GHz (Woodcrest) 16 total cores.

The Sun Blade X6250 Cluster (Xeon 3 GHz 5160) cluster running computational fluid dynamics program (CFD) the "Fluent 6" standard benchmark established a world record for runs made of the test suite using from 1 to 16 cores.

Workload description

Fluent is one of the most prominent commercial CFD (Computational Fluid Dynamics) codes. It is distributed worldwide to major engineering organizations in a broad spectrum of disciplines (aircraft, aerospace, automotive, marine, etc.) that are involved with fluid flow in some manner.

Fluent like many major ISV's has developed a benchmark test suite to evaluate the performance of platforms. For several years results have been posted from hardware vendor platforms at the Fluent website.

CFD models tend to be extremely large (fluid flow over entire car, aircraft and submarine bodies and complex flow involving mixing of species and chemical reaction). In order to have reasonable run times for the analyses use of many processing units is necessary. Currently the most effective way of achieving this is via an interconnected cluster of multi core rack mounted servers or blades. The current set of entries posted at the Fluent website reflect this fact.

FLUENT 6 Benchmark ("Ratings", bigger is better)

Rating = #f sequential runs in 1 day 86,400/(Total Elapsed Run Time in Seconds)

Machine Sockets NCPUS FL5M1 FL5M2 FL5M3 FL5L1 FL5L2 FL5L3
Sun Blade X6250 3GHz WC 5160 2 8 4965.5 10504.6 2563.8 1399.2 1028.3 174.9
SGI Altix XE210 3GHz WC 5160 2 8 4937.1 9626.7 2014.0 1343.7 899.5 161.0
 
Sun Blade X6250 3GHz WC 5160 2 4 2780.4 5358.1 1336.9 731.7 573.7 101.2
SGI Altix XE210 3GHz WC 5160 2 4 2681.1 4657.7 998.0 679.2 449.7 80.7
 
Sun Blade X6250 3GHz WC 5160 2 serial 919.4 1465.6 352.9 207.2 142.6 27.6
SGI Altix XE210 3GHz 5160 2 serial 910.9 1445.4 349.5 204.1 136.6 26.8
Other interesting points:

  • The "Fluent 6" standard benchmark test suite consists of "small" "medium" and "large " test cases. However both the small and medium sized test cases are all really on the small side and do not scale well beyond 16 cores.
  • The largest test case in the suite, "fl5l3" requires 9 GB running with only one core on a single node. This memory requirment per node is reduced when running in a dmp cluster mode on multi nodes with multi cores.
  • Fluent runs are cpu and sometimes memory intensive but do not require high performance I/O file systems.
  • Very recently Fluent has devloped a new benchmark test suite with extremely large models specifically intended to be run either on large multi core servers or large multi node clusters of multi core platforms.

Workload Details

Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, etc. The test cases represent a range of typical industry simulations.


Descriptions
Class   Benchmark       Cells   Mesh    Models  Solver  Description
small   
        FL5S1          32,000 hexahedral ke  segregated implicit  turbulent flow in a bend
        FL5S2          32,000 hexahedral ke  coupled implicit     turbulent flow in a bend
        FL5S3          89,856 hexahedral ke  coupled implicit     flow in a compressor, rotor 37
medium  
        FL5M1         155,188 tetrahedral ke  6spe reac DPM P1 segregated implicit coal combustion in a boiler, with particle tracking
        FL5M2         242,782 hybrid, hanging-node ke segregated implicit turbulent flow in an engine valveport
        FL5M3         352,800 hexahedral ke 6spe react segregated implicit combustion in a high velocity burner
large   
        FL5L1         847,746 hexahedral ke coupled explicit transonic flow around a fighter
        FL5L2       3,618,080  hybrid  RNG ke segregated implicit external aerodynamics around a car body
        FL5L3       9,792,512  hexahedral RSM segregated implicit turbulent flow in a transition duct

Small Class Ratings

Small class problems contain less than 100,000 cells.

FL5S1 - Accelerating turbulent flow in an elbow duct using segregated implicit solver
Accelerating Turbulent Flow in an Elbow Duct using Segregated Implicit Solver

Flow is accelerated through a 90 degree elbow duct with a rectangular
cross section. The geometry and flow have a symmetry plane permitting
the modeling of only half the domain. Because of the curvature of the
duct, significant secondary flow occurs, with velocity components
normal to the principal flow direction. The segregated implicit solver
in FLUENT 5 is used to solve this flow.

Number of cells 32,000
Cell type hexahedral
Models k-epsilon turbulence
Solver segregated implicit

FL5S2 - Accelerating turbulent flow in an elbow duct using coupled implicit solver
Accelerating Turbulent Flow in an Elbow Duct using Coupled Implicit Solver

Flow is accelerated through a 90 degree elbow duct with a rectangular
cross section. The geometry and flow have a symmetry plane permitting
the modeling of only half the domain. Because of the curvature of the
duct, significant secondary flow occurs, with velocity components
normal to the principal flow direction. The coupled implicit solver in
FLUENT 5 is used to solve this flow.

Number of cells 32,000
Cell type hexahedral
Models k-epsilon turbulence
Solver coupled implicit

FL5S3 - Transonic flow in rotating fan
Transonic Flow through a Rotor

The flow through a transonic fan rotor (designated rotor 37 by NASA
Lewis) was computed. It has 36 blades. The calculation was performed at
a rotational speed of 17189 rpm. The domain boundaries consist of a
hub, blade and shroud surface, a pressure inlet and outlet surface, and
periodic surfaces.

Number of cells 89,856
Cell type hexahedral
Models k-epsilon turbulence
Solver coupled implicit


Medium class problems contain between 100,000 and 500,000 cells.

FL5M1 - Coal combustion in a boiler
Coal Combustion in a Boiler

This application couples a continuous gas phase calculation with a
discrete phase (particle) calculation. 500 coal particles are injected
into an industrial boiler where their trajectories are computed using a
Lagrangian formulation that includes dispersed phase inertia,
hydrodynamic drag and the force of gravity. Each particle injection is
subject to heating/cooling, vaporization, boiling and solid combustion.
During the injection calculations, momentum, heat and mass exchanges
are calculated and stored as source terms which are then used in the
subsequent gas phase calculation. Furthermore, stochastic modeling of
particle tracks, requiring a fixed number of "tries" per particle, are
used to account for local turbulent fluctuations. In this calculation,
10 stochastic tries per particle are used, resulting in a total of 5000
particle tracks per discrete phase update. There are 10 continuous
phase iterations per discrete phase update.

Number of cells 155,188
Cell type tetrahedral
Models k-epsilon turbulenc 6 species with reaction dispersed phase
P1 radiation
Solver segregated implicit

FL5M2 - Turbulent flow in an engine valveport
Turbulent Flow in an Engine Valveport

Flow is computed in an automotive valve port modeled using a zonal
hybrid mesh. The region around the valve has been meshed with
tetrahedral cells, while the duct providing the inlet flow to the valve
has been meshed with hexahedra. Pyramid cells are used to transition
between the hexahedral and tetrahedral cells. A fourth cell type called
a prismatic (or wedge) cell is used for the cylinder downstream of the
valve. Furthermore, hanging-node adaption was used to improve the
accuracy of the predicted flow field.

Number of cells 242,782
Cell type hybrid hanging-node adaption
Models k-epsilon turbulence
Solver segregated implicit

FL5M3 - Combustion in a high velocity burner
Combustion in a High Velocity Burner

Fuel (CH4) is injected into ports of a high velocity gas burner located
near the centerline. Air is supplied through the outer ports, with
secondary air delivered into an outer annular region. Directly
downstream of the annulus is a wedge-shaped annular baffle. The mixing
of fuel and air occurs downstream of this baffle and recirculation
zones behind the baffle provide stability and an attachment point for
the flame in the main combustion chamber. Combustion is assumed to
proceed via a two-step reaction mechanism, with turbulent mixing as the
limiting rate, as described by the Magnessen model.

Reference: M. Cavelli, A. Milani, "Spark-ignited wide stability gas
burner for on/off and continuous duty," IFRF HT Meeting, Milan, October
1996.

Number of cells 352,800
Cell type hexahedral
Models k-epsilon turbulenc 6 species with reaction
Solver segregated implicit

Large Class

Large class problems contain more than 500,000 cells.

FL5L1 Transonic flow around a fighter aircraft
Transonic Flow Around a Fighter Aircraft

Flow around the AGARD M-151 combat aircraft research model is computed.
The simulation geometry contains canards and forward swept wings, but
no tail. The conditions modeled were Mach number 0.9 and 10.46 degrees
angle of attack.

Number of cells 847,764
Cell type hexahedral
Models k-epsilon turbulence
Solver segregated implicit

FL5L2 Exterior flow around a passenger sedan
Exterior Flow Around a Passenger Sedan

This benchmark represents the computation of the exterior flow field
around a simplified model of a passenger sedan. The simulation geometry
was used for the Japan External Aerodynamics competition. A
viscous-hybrid grid with prismatic cells is used to adequately model
the boundary layer regions.

Number of cells 3,618,080
Cell type

FL5L2 Exterior flow around a passenger sedan
Exterior Flow Around a Passenger Sedan

This benchmark represents the computation of the exterior flow field
around a simplified model of a passenger sedan. The simulation geometry
was used for the Japan External Aerodynamics competition. A
viscous-hybrid grid with prismatic cells is used to adequately model
the boundary layer regions.

Number of cells 3,618,080
Cell type hybrid
Models k-epsilon turbulence
Solver segregated implicit

FL5L3 Turbulent flow through a transition duct
Turbulent Flow Through a Transition Duct

Turbulent flow of air through a duct is computed for this benchmark.
The cross-sectional planes of the duct transition from a circle at the
inlet to a rectangle at the outflow boundary. The Reynolds-Stress Model
(7 equation) is used for computing turbulence.

Number of cells 9,792,512
Cell type hexahedral
Models RSM turbulence
Solver segregated implicit

The cluster of Sun Blade X6250 outperfomed the following competitive hardware vendor clusters at all core levels considered (1 core smp, 1- core parallel, 2- 4- 8- and 16-core parallel runs) and for all (9) test cases in the benchmark test suite:

    HP BL460C (EM64T_WOODCREST_2CORE,3000,WINCCS,IB_HPMPI)
    HP DL140 (EM64T_WOODCREST_2CORE,3000,LINUX,IB)
    HP DL145_G2 (OPTERON_2CORE,2200,WINCCS,IB_HPMPI)
    SGI ALTIX4700 (IA64_MONTECITO_2CORE,1600,LINUX)
    SGI ALTIXXE210 (EM64T_WOODCREST_2CORE,3000,LINUX,IB_VOLTAIRE)
    TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,SLES10,GIGE)
    TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,WINCCS,GIGE)
    BULL NOVASCALE (EM64T_WOODCREST_2CORE,3000,RHEL4,IB)
    APPRO XTREMESERVER (OPTERON_2CORE,2800,RHEL4,IB)

Disclosure Statement:

All information on the Fluent website is Copyrighted 1995-2007 by Fluent Inc.Results from http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm as of July 2, 2007.

Sun Blade X6250

    4 2-socket Sun Blade X6250's
    2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest)
    Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

Software Configuration:

    64-bit SUSE SLES 10
    Fluent V6.3.26
    Fluent 6 Standard Benchmark Test Suite
    Voltaire GridStack 4.1.5-7 for SLES 10

Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

Post a Comment:
Comments are closed for this entry.