Friday Jan 25, 2008
The ABAQUS "Explicit" benchmark test suite
was run on a mini cluster of Sun Blade X6250
blades with the recently announced 3.33 GHz dual-core Intel 5260. The Sun Blade X6250 mini cluster beats all
posted results at the ABAQUS V6.7 website
up to the eight cores.
The closest posted results from a competitor's platform were primarily from
an HP XC with dual-core 3GHz 5160 processors
and to a limited degree (at the 4 "cpu" level)
by an Intel Supermicro with 3GHz quad-core E5472's.
In runs of the six cases in the benchmark test suite, the X6250
cluster was nominally 17% faster than the best results coming either from
the top HP or Intel cluster over the 4-core levels considered
and considering results for all 6 test cases.
The scalability efficiency of the X6250 cluster ranged from 100%
(at 1 core) to 81% (geometric mean at 8 cores) and considering all 6
test cases at each of the four core levels.
Four 2 socket Sun X6250 blades with
Infiniband interconnects were used and runs were
made at different core levels: 1, 2, 4, and 8.
Comparisons are presented against the current leading competitors' results
also obtained with high performance interconnects and posted
at the ABAQUS V6.7 website. This includes results from IBM, HP,
and Intel platforms and clusters
with current dual-core and quad-core Intel processors.
ABAQUS V6.7 "Explicit" Benchmark Test Suite, time in elapsed seconds
Please note, this table has been modified since the original posting to correct the
table and make sure only V6.7 results are shown, sorry for the confusion, but the Sun internal
information sites changed since my posting.
| System |
CPU |
Benchmark Test |
| e1 |
e2 |
e3 |
e4 |
e5 |
e6 |
| One core results |
| Sun Blade X6250 |
3.33GHz DC 5260 |
23565 |
12399 |
11037 |
4884 |
4648 |
11975 |
| Sun Blade X6250 |
3.0GHz QC 5365 |
26401 |
14236 |
12302 |
5456 |
5349 |
13266 |
| Intel Supermicro |
3.0GHz QC E5472 |
24815 |
13738 |
12504 |
5273 |
5299 |
13456 |
| HP XC |
3.0GHz DC 5160 |
23957 |
13659 |
11289 |
5157 |
5122 |
12601 |
| Bull R440 |
3.0GHz DC 5160 |
25132 |
14086 |
12237 |
5352 |
5231 |
13213 |
| Two core results |
| Sun Blade X6250 |
3.33GHz DC 5260 |
12008 |
6465 |
5218 |
2647 |
2447 |
6739 |
| Sun Blade X6250 |
3.0GHz QC 5365 |
14262 |
7501 |
6379 |
2959 |
2742 |
7486 |
| Intel Supermicro |
3.0GHz QC E5472 |
14060 |
7151 |
6341 |
2900 |
2693 |
7880 |
| HP XC |
3.0GHz DC 5160 |
13229 |
6998 |
6201 |
2838 |
2657 |
7336 |
| Bull R440 |
3.0GHz DC 5160 |
13859 |
7283 |
6575 |
2997 |
2756 |
7752 |
| Four core results |
| Sun Blade X6250 |
3.33GHz DC 5260 |
7868 |
3888 |
3064 |
1482 |
1328 |
4025 |
| Sun Blade X6250 |
3.0GHz QC 5365 |
8595 |
4195 |
3372 |
1577 |
1440 |
4375 |
| Intel Supermicro |
3.0GHz QC E5472 |
8264 |
3857 |
3438 |
1616 |
1440 |
4534 |
| HP XC |
3.0GHz DC 5160 |
9843 |
4434 |
4413 |
1856 |
1619 |
5235 |
| Bull R440 |
3.0GHz DC 5160 |
10067 |
4559 |
4485 |
1964 |
1651 |
5378 |
| Eight core results |
| Sun Blade X6250 |
3.33GHz DC 5260 |
5209 |
2439 |
1922 |
979 |
736 |
2510 |
| Sun Blade X6250 |
3.0GHz QC 5365 |
5650 |
2556 |
2158 |
1090 |
824 |
2774 |
| Intel Supermicro |
3.0GHz QC E5472 |
6077 |
2473 |
2529 |
1205 |
910 |
3339 |
| HP XC |
3.0GHz DC 5160 |
5140 |
2311 |
2280 |
1074 |
823 |
2948 |
| Bull R440 |
3.0GHz DC 5160 |
5366 |
2406 |
2303 |
1127 |
860 |
3092 |
About The ABAQUS Explicit Module
This module designed for crash and high velocity impact analyses
is very scalable
and analysis models tend to be very large similar to CFD models.
Timely results are best obtained using multiple processing units
for typically large jobs either on a single multi core server in smp mode or on
a multi node cluster of multi core platforms interconnected in dmp mode.
- The test cases in the ABAQUS "Explicit" benchmark test suite do not require much memory (all around a few hundred megabytes)
- The ABAQUS test cases scale very well up to 16 cores. All of the solvers in the Explicit module
work in dmp mode on clusters. The ABAQUS default mode for MPI is HP-MPI.
- Based on the maximum physical memory on a platform the user can stipulate
the maximum portion of this memory that can be allocated to the ABAQUS job. This is done in
the "abaqus_v6.env" file that either resides in the subdirectory from where the job was launched
or in the abaqus "site" subdirectory under the home installation directory.
- The test cases for the ABAQUS benchmark test suites all have a substantial I/O
component. This I/O activity is primarily associated with temporary scratch files.
Performance will be enhanced by using the fastest available drives and striping
together more than one of them or using a high performance disk storage system with
high performance interconnects.
System Configuration
4 Sun Blade X6250
3.33 GHz dual-core Intel 5260
2 internal striped 15K SAS drives (cluster shared file system)
Infiniband (Voltaire) interconnects
64-bit SUSE Linux Enterprise Server SLES 10
Voltaire OFED GridStack-4.1.5_7-sles-k2.6.16.21-0.8-smp-x86_64
HP-MPI
ABAQUS V6.7 Explicit Module
ABAQUS 6.7 Explicit Benchmark Test Suite
Disclosure Statement:
The following are trademarks or registered trademarks of Dassault Systems or its subsidiaries in the United States and/or other countries: Abaqus,
Abaqus/Standard, Abaqus/Explicit.
All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systemes.
Results from http://www.simulia.com/support/v67/v67_performance.html as of Jan. 18, 2008.
Thursday Jul 12, 2007
The Sun Blade X6250 cluster was up to 27% faster or 6% faster on geometric mean than an SGI Altix XE 210 cluster (Xeon 3 GHz dual core 5160 Woodcrest) and Infiniband interconnects.
A cluster of four Sun Blade X6250 Cluster (Xeon 3 GHz 5160) with Infiniband
interconnects was used to set this record. Each of these two socket blades had dual-core Intel Xeon EM64T
5160 3 GHz (Woodcrest) 16 total cores.
The Sun Blade X6250 Cluster (Xeon 3 GHz 5160) cluster running computational
fluid dynamics program (CFD) the "Fluent 6" standard benchmark established
a world record for runs made of the test suite using from 1 to 16 cores.
Workload description
Fluent is one of the most prominent commercial CFD (Computational Fluid Dynamics) codes.
It is distributed worldwide to major engineering organizations in a broad spectrum of disciplines
(aircraft, aerospace, automotive, marine, etc.) that are involved with fluid flow in some manner.
Fluent like many major ISV's has developed a benchmark test suite to evaluate the performance
of platforms. For several years results have been posted from hardware vendor platforms
at the Fluent website.
CFD models tend to be extremely large (fluid flow over entire car, aircraft and submarine bodies
and complex flow involving mixing of species and chemical reaction).
In order to have reasonable run times for the analyses use of many processing units is
necessary. Currently the most effective way of achieving this is via an interconnected cluster
of multi core rack mounted servers or blades. The current set of entries posted at the Fluent
website reflect this fact.
FLUENT 6 Benchmark ("Ratings", bigger is better)
Rating = #f sequential runs in 1 day 86,400/(Total Elapsed Run Time in Seconds)
| Machine |
Sockets |
NCPUS |
FL5M1 |
FL5M2 |
FL5M3 |
FL5L1 |
FL5L2 |
FL5L3 |
| Sun Blade X6250 3GHz WC 5160 |
2 |
8 |
4965.5 |
10504.6 |
2563.8 |
1399.2 |
1028.3 |
174.9 |
| SGI Altix XE210 3GHz WC 5160 |
2 |
8 |
4937.1 |
9626.7 |
2014.0 |
1343.7 |
899.5 |
161.0 |
| |
| Sun Blade X6250 3GHz WC 5160 |
2 |
4 |
2780.4 |
5358.1 |
1336.9 |
731.7 |
573.7 |
101.2 |
| SGI Altix XE210 3GHz WC 5160 |
2 |
4 |
2681.1 |
4657.7 |
998.0 |
679.2 |
449.7 |
80.7 |
| |
| Sun Blade X6250 3GHz WC 5160 |
2 |
serial |
919.4 |
1465.6 |
352.9 |
207.2 |
142.6 |
27.6 |
| SGI Altix XE210 3GHz 5160 |
2 |
serial |
910.9 |
1445.4 |
349.5 |
204.1 |
136.6 |
26.8 |
Other interesting points:
- The "Fluent 6" standard benchmark test suite consists of "small" "medium" and
"large " test cases. However both the small and medium sized test cases are all
really on the small side and do not scale well beyond 16 cores.
- The largest test case in the suite, "fl5l3" requires 9 GB running with only
one core on a single node. This memory requirment per node is reduced when running in a dmp
cluster mode on multi nodes with multi cores.
- Fluent runs are cpu and sometimes memory intensive but do not require high performance I/O file systems.
- Very recently Fluent has devloped a new benchmark test suite with extremely large
models specifically intended to be run either on large multi core servers or
large multi node clusters of multi core platforms.
Workload Details
Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, etc. The test cases represent a range of typical industry simulations.
Descriptions
Class Benchmark Cells Mesh Models Solver Description
small
FL5S1 32,000 hexahedral ke segregated implicit turbulent flow in a bend
FL5S2 32,000 hexahedral ke coupled implicit turbulent flow in a bend
FL5S3 89,856 hexahedral ke coupled implicit flow in a compressor, rotor 37
medium
FL5M1 155,188 tetrahedral ke 6spe reac DPM P1 segregated implicit coal combustion in a boiler, with particle tracking
FL5M2 242,782 hybrid, hanging-node ke segregated implicit turbulent flow in an engine valveport
FL5M3 352,800 hexahedral ke 6spe react segregated implicit combustion in a high velocity burner
large
FL5L1 847,746 hexahedral ke coupled explicit transonic flow around a fighter
FL5L2 3,618,080 hybrid RNG ke segregated implicit external aerodynamics around a car body
FL5L3 9,792,512 hexahedral RSM segregated implicit turbulent flow in a transition duct
Small Class Ratings
Small class problems contain less than 100,000 cells.
FL5S1 - Accelerating turbulent flow in an elbow duct using segregated implicit solver
Accelerating Turbulent Flow in an Elbow Duct using Segregated Implicit Solver
Flow is accelerated through a 90 degree elbow duct with a rectangular
cross section. The geometry and flow have a symmetry plane permitting
the modeling of only half the domain. Because of the curvature of the
duct, significant secondary flow occurs, with velocity components
normal to the principal flow direction. The segregated implicit solver
in FLUENT 5 is used to solve this flow.
Number of cells 32,000
Cell type hexahedral
Models k-epsilon turbulence
Solver segregated implicit
FL5S2 - Accelerating turbulent flow in an elbow duct using coupled implicit solver
Accelerating Turbulent Flow in an Elbow Duct using Coupled Implicit Solver
Flow is accelerated through a 90 degree elbow duct with a rectangular
cross section. The geometry and flow have a symmetry plane permitting
the modeling of only half the domain. Because of the curvature of the
duct, significant secondary flow occurs, with velocity components
normal to the principal flow direction. The coupled implicit solver in
FLUENT 5 is used to solve this flow.
Number of cells 32,000
Cell type hexahedral
Models k-epsilon turbulence
Solver coupled implicit
FL5S3 - Transonic flow in rotating fan
Transonic Flow through a Rotor
The flow through a transonic fan rotor (designated rotor 37 by NASA
Lewis) was computed. It has 36 blades. The calculation was performed at
a rotational speed of 17189 rpm. The domain boundaries consist of a
hub, blade and shroud surface, a pressure inlet and outlet surface, and
periodic surfaces.
Number of cells 89,856
Cell type hexahedral
Models k-epsilon turbulence
Solver coupled implicit
Medium class problems contain between 100,000 and 500,000 cells.
FL5M1 - Coal combustion in a boiler
Coal Combustion in a Boiler
This application couples a continuous gas phase calculation with a
discrete phase (particle) calculation. 500 coal particles are injected
into an industrial boiler where their trajectories are computed using a
Lagrangian formulation that includes dispersed phase inertia,
hydrodynamic drag and the force of gravity. Each particle injection is
subject to heating/cooling, vaporization, boiling and solid combustion.
During the injection calculations, momentum, heat and mass exchanges
are calculated and stored as source terms which are then used in the
subsequent gas phase calculation. Furthermore, stochastic modeling of
particle tracks, requiring a fixed number of "tries" per particle, are
used to account for local turbulent fluctuations. In this calculation,
10 stochastic tries per particle are used, resulting in a total of 5000
particle tracks per discrete phase update. There are 10 continuous
phase iterations per discrete phase update.
Number of cells 155,188
Cell type tetrahedral
Models k-epsilon turbulenc 6 species with reaction dispersed phase
P1 radiation
Solver segregated implicit
FL5M2 - Turbulent flow in an engine valveport
Turbulent Flow in an Engine Valveport
Flow is computed in an automotive valve port modeled using a zonal
hybrid mesh. The region around the valve has been meshed with
tetrahedral cells, while the duct providing the inlet flow to the valve
has been meshed with hexahedra. Pyramid cells are used to transition
between the hexahedral and tetrahedral cells. A fourth cell type called
a prismatic (or wedge) cell is used for the cylinder downstream of the
valve. Furthermore, hanging-node adaption was used to improve the
accuracy of the predicted flow field.
Number of cells 242,782
Cell type hybrid hanging-node adaption
Models k-epsilon turbulence
Solver segregated implicit
FL5M3 - Combustion in a high velocity burner
Combustion in a High Velocity Burner
Fuel (CH4) is injected into ports of a high velocity gas burner located
near the centerline. Air is supplied through the outer ports, with
secondary air delivered into an outer annular region. Directly
downstream of the annulus is a wedge-shaped annular baffle. The mixing
of fuel and air occurs downstream of this baffle and recirculation
zones behind the baffle provide stability and an attachment point for
the flame in the main combustion chamber. Combustion is assumed to
proceed via a two-step reaction mechanism, with turbulent mixing as the
limiting rate, as described by the Magnessen model.
Reference: M. Cavelli, A. Milani, "Spark-ignited wide stability gas
burner for on/off and continuous duty," IFRF HT Meeting, Milan, October
1996.
Number of cells 352,800
Cell type hexahedral
Models k-epsilon turbulenc 6 species with reaction
Solver segregated implicit
Large Class
Large class problems contain more than 500,000 cells.
FL5L1 Transonic flow around a fighter aircraft
Transonic Flow Around a Fighter Aircraft
Flow around the AGARD M-151 combat aircraft research model is computed.
The simulation geometry contains canards and forward swept wings, but
no tail. The conditions modeled were Mach number 0.9 and 10.46 degrees
angle of attack.
Number of cells 847,764
Cell type hexahedral
Models k-epsilon turbulence
Solver segregated implicit
FL5L2 Exterior flow around a passenger sedan
Exterior Flow Around a Passenger Sedan
This benchmark represents the computation of the exterior flow field
around a simplified model of a passenger sedan. The simulation geometry
was used for the Japan External Aerodynamics competition. A
viscous-hybrid grid with prismatic cells is used to adequately model
the boundary layer regions.
Number of cells 3,618,080
Cell type
FL5L2 Exterior flow around a passenger sedan
Exterior Flow Around a Passenger Sedan
This benchmark represents the computation of the exterior flow field
around a simplified model of a passenger sedan. The simulation geometry
was used for the Japan External Aerodynamics competition. A
viscous-hybrid grid with prismatic cells is used to adequately model
the boundary layer regions.
Number of cells 3,618,080
Cell type hybrid
Models k-epsilon turbulence
Solver segregated implicit
FL5L3 Turbulent flow through a transition duct
Turbulent Flow Through a Transition Duct
Turbulent flow of air through a duct is computed for this benchmark.
The cross-sectional planes of the duct transition from a circle at the
inlet to a rectangle at the outflow boundary. The Reynolds-Stress Model
(7 equation) is used for computing turbulence.
Number of cells 9,792,512
Cell type hexahedral
Models RSM turbulence
Solver segregated implicit
The cluster of Sun Blade X6250 outperfomed the following competitive
hardware vendor clusters at all core levels considered
(1 core smp, 1- core parallel, 2- 4- 8- and 16-core parallel runs)
and for all (9) test cases in the benchmark test suite:
HP BL460C (EM64T_WOODCREST_2CORE,3000,WINCCS,IB_HPMPI)
HP DL140 (EM64T_WOODCREST_2CORE,3000,LINUX,IB)
HP DL145_G2 (OPTERON_2CORE,2200,WINCCS,IB_HPMPI)
SGI ALTIX4700 (IA64_MONTECITO_2CORE,1600,LINUX)
SGI ALTIXXE210 (EM64T_WOODCREST_2CORE,3000,LINUX,IB_VOLTAIRE)
TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,SLES10,GIGE)
TYAN TYPHOON_630 (EM64T_WOODCREST_2CORE,2300,WINCCS,GIGE)
BULL NOVASCALE (EM64T_WOODCREST_2CORE,3000,RHEL4,IB)
APPRO XTREMESERVER (OPTERON_2CORE,2800,RHEL4,IB)
Disclosure Statement:
All information on the Fluent website is Copyrighted 1995-2007 by Fluent Inc.Results from http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm as of July 2, 2007.
Sun Blade X6250
4 2-socket Sun Blade X6250's
2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest)
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
64-bit SUSE SLES 10
Fluent V6.3.26
Fluent 6 Standard Benchmark Test Suite
Voltaire GridStack 4.1.5-7 for SLES 10
Wednesday Jul 11, 2007
Sun Blade X6250 posted World Record on the ABAQUS Explicit benchmark
test suite the Sun Blade X6250 on the MCAE application ABAQUS V6.6.
the Sun Blade X6250 used Xeon 3GHz DC 5160. On the various
test cases Sun beats the Intel Supermicro by or by 1% to 39% !!
The Sun Blade X6250
beats the Intel Supermicro even when you average all of the test case by
an average 4% to 9% (geometric mean of all 6 tests cases at all cpu levels listed).
Both machines have 2 sockets and dual core processors.
Runs were made at 1- 2- and 4-cores and a geometric mean was established
at each of these "cpu" levels based on the 6 test cases in the benchmark test suite.
The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) processors
and under 64-bit Linux SuSE SLES 10 beats all of the following
platforms with results posted at the ABAQUS website
and for all 6 test cases in the ABAQUS "Explicit" benchmark test suite
and at the 3 "cpu" levels (1-, 2- & 4-"cpu's"):
About The ABAQUS Explicit Module
This module designed for crash and high velocity impact analyses
(including wave propagation and inertia effects) is very scalable
and analysis models tend to be very large similar to CFD models.
Timely results are best obtained using multiple processing units
for typically large jobs either on a single multi core server in smp mode or on
a multi node cluster of multi core platforms interconnected in dmp mode.
Consequently this module is meant to run primarily
in a multi cpu situation either in smp mode on a single large multi core machine
or in dmp mode over a cluster of machines.
ABAQUS V6.6-1 Benchmark Test Suites Explicit Benchmark Test Suite Landscape
(time in seconds where smaller is better, Sun % better where bigger is better)
| Platform |
Cores |
e1 |
e2 |
e3 |
e4 |
e5 |
e6 |
Geometric Mean |
| |
| Sun Blade X6250/5160 |
4 |
10451 |
4509 |
3853 |
1887 |
1990 |
5202 |
  |
| Intel Super/5160's/RH4 |
4 |
10696 |
4646 |
3881 |
1997 |
2126 |
5460 |
  |
| Sun % Faster |
  |
2% |
3% |
1% |
6% |
7% |
5% |
4% |
| |
| Sun Blade X6250/5160 |
2 |
14232 |
7401 |
5477 |
2935 |
3327 |
7582 |
  |
| Intel Super/5160's/RH4 |
2 |
14878 |
8044 |
6316 |
3310 |
3483 |
8048 |
  |
| Sun % Faster |
  |
5% |
9% |
15% |
13% |
5% |
6% |
9% |
| |
| Sun Blade X6250/5160 |
1 |
24800 |
14198 |
10174 |
5147 |
6112 |
9553 |
  |
| Intel Super/5160 |
1 |
25076 |
14616 |
10563 |
5225 |
6272 |
13242 |
  |
| Sun % Faster |
  |
1% |
3% |
4% |
1% |
3% |
39% |
8% |
Abaqus/Explicit Benchmark Problems
The problems described below provide an estimate of the performance that can be expected when running Abaqus/Explicit on different computers. The jobs are representative of typical Abaqus/Explicit applications including high-speed dynamic impact events and quasi-static events with complicated contact conditions. The number of increments listed in the tables below are approximate and can vary somewhat depending on the hardware platform and the number of parallel domains.
E1: Car crash
This benchmark consists of passenger car impacting a rigid wall. The car is meshed primarily with shell elements of type S3RS and S4RS with isotropic hardening Mises plasticity material behavior. The various compenents of the car are connected using multi-point constraints and connector elements. Many of the suspension and drivetrain components are modeled as rigid bodies. The car, road surface, and wall are placed into a single general contact domain and the car is given an initial velocity of 25 mph.
E1
Increments: 62,934
Number of elements: 274,632
E2: Cell phone drop
This benchmark consists of a simplified model of a cell phone impacting a fixed rigid floor. The cell phone components are meshed using a variety of element types including C3D8R, C3D10M, and S4R. The material behavior is modeled using linear elasticity, isotropic hardening Mises plasticity, and hyperelasticity. The components are assembled using surface-based mesh ties and placed into a general contact domain that also includes the floor. The initial velocity and orientation of the cell phone is defined such that a severe oblique impact occurs.
E2
Increments: 87,369
Number of elements: 45,785
Memory requirement: 300 MB
E3: Sheet forming
This benchmark consists of forming a sheet metal part by the deep drawing process. The deformable sheet metal blank is meshed with shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. The tools are meshed using surface elements of type SFM3D4R which are declared rigid. General contact is defined between the blank and tools. The analysis sequence consists of two steps. During the first step the blank is clamped between the binder and die and then during the second step the punch is displaced to form the part. Since the process is essentially quasi-static the computations are performed over a sufficiently long time period to render inertial effects negligible. The performance of this analysis is a direct measure of the performance of the three-dimensional general contact algorithm.
E3
Increments: 31,177
Number of elements: 34,540 (deformable only)
Memory requirement: 550 MB
E4: Projectile penetration
This benchmark consists of a projectile penetrating a steel plate at an oblique angle. Both the projectile and plate are meshed using hexahedral elements of type C3D8R and use a rate-dependent isotropic hardening Mises plasticity material model with failure. The projectile and plate are placed into a general contact domain with surface erosion. The edges of the plate are held fixed and the initial velocity of the projectile is specified so that the projectile passes completely through the plate.
E4
Increments: 12,433
Number of elements: 237,100
Memory requirement: 1400 MB
E5: Blast loaded plate
This benchmark consists of a stiffened steel plate subjected to a high intensity blast load. The plate is meshed using shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. There is no contact.
E5
Increments: 81,716
Number of elements: 50,000
Memory requirement: 150 MB
E6: Concentric spheres
This benchmark consists of a large number of concentric spheres with clearance between each sphere. The spheres are meshed using hexahedral elements of type C3D8R and use an isotropic hardening Mises plasticity material model. All of the spheres are placed into a single general contact domain and the outer sphere is violently shaken which results in complex contact interactions between the contained spheres.
E6
Increments: 23,291
Number of elements: 244,124
Memory requirement: 1000 MB
ABAQUS "Standard" & "Explicit" Benchmark Test Suites
Voltaire GridStack 4.1.5-7 for SLES 10
Disclosure Statement:
The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or other countries: Abaqus,
Abaqus/Standard, Abaqus/Explicit.
All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systems.
Results from http://www.simulia.com/support/v66/v66_performance.html as of 7/2/07.
System Configuration
Hardware Configuration:
Sun Blade X6250
4 2-socket Sun Blade X6250's
2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:
Linux: 64-bit SUSE SLES 10
ABAQUS V6.6-3
Tuesday Jul 10, 2007
The Sun Blade X6250 outperfoms all posted ANSYS V11.0 (MCAE) results at www.ansys.com website. A single Sun Blade X6250 beats a single Intel S5000 XAL (same 3GHZ Xeon 5160) by as much as 40% at
each of the three "cpu" levels tested (1-, 2-, and all 4 cores available
on both 2 socket platforms equipped with dual core processors).
Sun Wins at these processor configurations in 6 of
the total 7 cases in the benchmark test suite. Overall, on the geometric mean,
Sun was 10% higher.
The only case "bm-2" where the Sun X6250 looses has an exceptionally high I/O component, and even so Sun was only 3-4% slower.
The Sun X6250 had 10K rpm internal disk drives where
the Intel S5000 XAL had 15K rpm drives.
The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest)
and under 64-bit Linux SuSE SLES 10 beats all of the following
platforms with results posted at the ANSYS website
for all 7 test cases in the ANSYS "Standard" benchmark test suite (1-, 2- & 4-cpu).
Yes this result was run with Linux, Sun wants to show that we can
win with every OS. There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0
for X86-64 platform architectures compiled with recent Sun Studio 11 compilers.
This is the first SX64 version that has become available.
Competitive Landscape
ANSYS V 11.0 "Standard" Benchmark Test Suite on X2200 M2 & Constellation Blades
(run times in seconds, smaller is better; for % bigger is better)
| System |
Cores |
bm-1 |
bm-2 |
bm-3 |
bm-4 |
bm-5 |
bm-6 |
bm-7 |
| |
| Sun X6250/5160 |
4 |
100 |
1362 |
343 |
164 |
181 |
131 |
752 |
| Intel S5000XAL/5160 |
4 |
109 |
1312 |
369 |
169 |
187 |
161 |
1048 |
| Sun % better |
|
9% |
-4% |
8% |
3% |
3% |
23% |
39% |
| |
| Sun X6250/5160 |
2 |
118 |
1398 |
385 |
183 |
223 |
169 |
1064 |
| Intel S5000XAL/5160 |
2 |
128 |
1356 |
417 |
186 |
244 |
211 |
1437 |
| Sun % better |
|
9% |
-3% |
8% |
2% |
9% |
25% |
35% |
| |
| Sun X6250/5160 |
1 |
150 |
1455 |
456 |
211 |
339 |
253 |
1770 |
| Intel S5000XAL/5160 |
1 |
164 |
1416 |
489 |
215 |
340 |
314 |
2330 |
| Sun % better |
|
9% |
-3% |
7% |
2% |
1% |
24% |
32% |
(please note: per core performance isn't the right metric for comparing different CPUs, as system costs vary greatly, but they are used here to identify configuration)
It is "SYSTEM" performance not 'core' performance that matters!)
Key Technical Points
- The test cases from the ANSYS standard benchmark test suite all have a substantial I/O
component where 15% to 20% of the total run times are associated with I/O activity
(primarily scratch files).
Performance will be enhanced by using the fastest available drives and striping
together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.
ANSYS 11.0 Standard Test Cases
bm-1
Name:Exhaust Elbow Manifold
Description:Static structural analysis. Solved for equivalent stresses.
Statistics:~850,000 DOF Model
bm-2
Name:Floor Panel
Description:Surface body geometry. Harmonic analysis with mode superposition.
Statistics:~765,000 DOF Model
bm-3
Name:Engine Assembly - Piston and Crank
Description:Assembly with contact. Nonlinear structural DOF solution.
Statistics:~250,000 DOF Model
bm-4
Name:Electric Motor
Description:Electromagnetic analysis. Solved for magnetic field intensities.
Statistics:~250,000 DOF Model
bm-5
Name:Brake Rotor
Description:Thermal transient analysis. Solved for temperature DOF?s.
Statistics:~230,000 DOF Model
bm-6
Name:Wing Section
Description:Static structural analysis.
Statistics:~250,000 DOF Model
Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.
bm-7
Name:Wing Section
Description:Static structural analysis.
Statistics:~800,000 DOF Model
Notes:bm-6 and bm-7 are designed to demonstrate ability of systems to handle larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will be substantially impared in performance on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1 Bm-7 generates a large 12.8 Gb file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On PC workstations the solver will run using less than optimal out-of-core memory requiring excessive I/O during factorization. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.
Disclosure Statement:
The following are trademarks or registered trademarks of ANSYS, Inc. : ANSYS Multiphysics TM All information on the ANSYS website is Copyrighted 2007 by ANSYS, Inc. Results at
http://www.ansys.com/services/hardware-support-db.htm, July 2, 2007.
Hardware Configuration:
Sun Blade X6250
4 2-socket Sun Blade X6250's
2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
32 GB memory
Software Configuration:
64-bit Linux SuSE SLES 10
(note: Sun works great with Linux, that is why we show all kinds of benchmarks! )
ANSYS V11.0
ANSYS 11 "Standard" Benchmark Test Suite
Wednesday Jun 13, 2007
Sun Blade X6250 Delivers a pair of x86 SPEC CPU2006 integer performance World Records:
Sun Blade X6250 (Dual-Core Intel Xeon 5160)
and running Solaris 10 and using Sun Studio 12 compiler delivered the
best x86 result for the SPECint2006 benchmark.
Sun Blade X6250 (Dual-Core Intel Xeon 5160) using Solaris 10 and
Studio 12, delivered x86 4-core world record on
SPECint_rate2006.
Sun Blade X6250 server had a SPECint2006 result of 21.0 and SPECint_rate2006 result of 65.0. The advanced features of freely available
Sun Studio 12 complier were critical for getting this level of
performance on the Sun Blade 6250.
The Sun Blade X6250 is only 3% slower than the peak score of the very-expensive
new IBM POWER6 p570, which was recently announced. SPECint2006 is a single
job stream. So let's now turn to comparing 4 thread results, in this case
the Sun Blade X6250 is 7% faster than the peak SPECint_rate2006 score of
he very-expensive new IBM POWER6 p570 (both IBM and Sun at 4 threads). Oh, and remember that anymore clock
rate is not how you compare systems the Sun Blade X6250 is at 3GHz and the
IBM POWER6 is at 4.7GHz. CPU frequency is basically irrelevant, it is CPU and system architecture that matters!
SPEC CPU2006 Landscape - bigger is better, selected recent results
SPECint2006
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips |
Cores |
Peak |
Base |
| IBM p570 (power6) |
Power6 |
4.7 |
1 |
1 |
21.6 |
17.8 |
| Sun Blade X6250 |
Intel Xeon 5160 |
3.0 |
2 |
4 |
21.0 |
|
| Supermicro X7DB8+ board |
Intel Xeon 5160 |
3.0 |
2 |
4 |
20.8 |
18.9 |
| Sun Ultra 40 M2 |
AMD Opteron 2222SE |
3.0 |
2 |
4 |
16.1 |
|
SPECint_rate2006
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips |
Cores |
Threads / Copies |
Peak |
Base |
| Sun Blade X6250 |
Intel Xeon 5160 |
3.0 |
2 |
4 |
4 |
65.0 |
|
| Supermicro X7DB8+ |
Intel Xeon 5160 |
3.0 |
2 |
4 |
4 |
64.9 |
60.0 |
| IBM p570 (Power6) |
Power6 |
4.7 |
1 |
2 |
4 |
60.9 |
53.2 |
| Sun Ultra 40 M2 |
AMD Opteron 2222SE |
3.0 |
2 |
4 |
4 |
60.4 |
|
| Fujitsu BX620 S3 |
Xeon 5160 (Woodcrest) |
3.0 |
2 |
4 |
4 |
59.4 |
56.7 |
Results as of 06 Jun 2007 from www.spec.org.
Benchmark Description
SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and
CINT2006. CFP2006 targets floating-point performance, while CINT2006
targets integer performance.
Each suite has two different measures. First is the CPU measure, which
is the performance on the suite as a single stream. This can be either
a single thread or automatic compiled parallel run. This measure is
further defined by base and optimized runs. Base uses the same compiler
flags for all kernels, where optimized is allowed to use different
compiler flags for each kernel. Results are compared against a baseline
system run that was standardized by SPEC.
The second measure is Rate. It is a measure of how many CPU measures
can be run at a time. Typically, it is run as n processes on n
processors. It shows how well the same job mix can run on a system
under some load. It also is run as a base and optimized set of
results.
Disclosure Statement:
SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation.
Results from
www.spec.org or from IBM public websites as of 6/06/07.
Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 65.0 SPECint_rate2006;
Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 21.0 SPECint2006;
IBM System p 570 (POWER6, 1chip/1core, AIX 5L v5.3) 21.6 SPECint2006;
IBM System p 570 (POWER6, 4 theads, 1chip/2cores, AIX 5L v5.3) 60.9 SPECint_rate2006.
System Configuration
| Results |
| Reference Date: |
|
Jun 06, 2007 |
| System: |
|
Sun Blade X6250
SPEED: 16GB memory 8x2GB
RATE : 32GB memory 8x4GB |
|
X6250 |
|
21.0 SPECint2006 |
|
X6250 |
|
65.0 SPECint_rate2006 |
| Total Number Processors: |
|
2 x Intel Xeon 5160 |
| Software: |
|
Solaris 10 11/06, Sun Studio 12 Compiler, MicroQuill's SmartHeap Library v7.4 |
See Also
All Benchmark results on Sun Blade 6000 Blade Server
The CPU entries for the single-core HP XW 6200 and...
You are right, if I was King those benchmark peopl...