Tuesday Sep 18, 2007
Intel non-default BIOS change results by 25%? Sure turning off
prefetch is a technique but if you don't know if a priori if you
should, then should you use it to judge performance?
Always interesting when you have more information. I guess our
friends at AMD wanted everyone to see what our friends at Intel
were doing so they submitted two SPEC results for them.
Case in point on Clovertown there are two AMD results on the same hardware that gives
25% difference.
Point: Normal mode = prefetch on
Gives 163,080 SPECjbb2005 bops
www.spec.org/jbb2005/results/res2007q2/jbb2005-20070326-00276.txt
Counter-point: Disable HW prefetcher in BIOS for benchmark imprv
Gives 203,754 SPECjbb2005 bops
www.spec.org/jbb2005/results/res2007q2/jbb2005-20070326-00275.txt
...both on the same hardware:
same: 2-socket SuperMicro X7DBE (Intel 2.66GHz Xeon quad-core X5355), 16 GB
Disclosure statement
SPECjbb2005 SuperMicro X7DBE (2 chips, 8 cores, 2.66 GHz) SPECjbb2005 bops=163080, SPECjbb2005 bops/JVM=81540 submitted by AMD;
SuperMicro X7DBE (2 chips, 8 cores, 2.66 GHz) SPECjbb2005 bops=203754, SPECjbb2005 bops/JVM=101877 submitted by AMD; SPEC, SPECjbb are registered trademarks of Standard Performance Evaluation Corporation. Results 3/7/07 on www.spec.org.
Friday Mar 30, 2007
The Sun Fire X2200 M2 server beats Woodcrest on
large CFD models. The X2200 M2 Cluster beats all currently posted
Opteron cluster results (dual core HP XC4000 2.2GHz, HP DL145 G2 2.2GHz,
HP XW9300 2.4GHz, and HP DL585 2.6GHz) for all "cpu" levels and for all
test cases. All clusters had the high performance Infiniband interconnects.
The X2200 M2 beats the IBM X3650 2.66GHz quad core Clovertown across the board at
all cpu levels and for all test cases.
Tests were run on the official version of Fluent (lnxamd64 V6.3.26 build).
The Sun Opteron server numbers were generated under 64-bit SUSE SLES 9 SP 3.
Sun many customers that use Solaris, Linux, and windows so we show
benchmarks on all of these.
Although the X2200 M2 cluster has the best performance on the larger
and more complex tests, "FL5L3". It is most closely representative of
actual customer benchmarks (requires over 9GB of memory, best run using
several cpu's). FL5L3 simulates turbulent flow through a transition duct.
Note that the X2200 M2 cluster results shown in following table are consistently
better than those obtained on the two Woodcrest cluster systems at the same
"cpu" levels and for all indicated "cpu" levels (4 to 32).
The efficiency of the Sun X2200 M2 cluster is superb at well above 90% up to 32 cores. This essentially perfect scalability is contrasted with the Woodcrest
clusters where scalability has dropped off and efficiency is below 70% at
and above 4 cores.
Scaling Performance : Results in "Ratings" (# runs/day, bigger is better)
| System |
4 Cores |
8 Cores |
16 Cores |
32 Cores |
Sun X2200 M2 2.8GHz Operton |
89.9 |
174.4 |
341.5 |
664.4 |
HP BL460C 3.0GHz Woodcrest |
80.3 |
155.4 |
299.0 |
576.0 |
HP DL140 3.0GHz Woodcrest |
N/A |
160.7 |
320.5 |
620.1 |
Bull NovaScale 3.0GHz Woodcrest |
78.9 |
157.8 |
313.2 |
619.0 |
Fluent Performance : Results in "Ratings" (# runs/day, bigger is better)
| System |
Interconnect/MPI |
cores |
FL5L1 |
FL5L2 |
FL5L3 |
| X2200 2.8GHz DC 2220 SLES 9 SP 3 |
IB(V)/HP-MPI |
8 |
1219.5 |
952.1 |
174.4 |
| X2100 3.0GHz SC 156 SLES 9 SP3 |
IB(V)/MVAPICH |
8 |
1148.2 |
1063.4 |
184.6 |
| HPDL140 3.0GHz DC WC EM64T Linux |
IB/HP-MPI |
8 |
1378.0 |
915.0 |
160.7 |
| Bull Nova 3.0 GHz DC WC EM64T RHEL4 |
IB |
8 |
1323.6 |
884.1 |
157.8 |
| HP BL460C 3.0GHz WC EM64T WinCCS |
IB(V) |
8 |
1289.6 |
881.6 |
155.4 |
| Intel White 3.0GHz WC EM64T DC RHAS4 |
IB(Mellanox) |
8 |
--- |
828.0 |
137.8 |
| Tyan Typh. 630 2.3GHz WC SLES 10 |
GbE |
8 |
1011.7 |
692.4 |
122.7 |
| Tyan Typh. 630 2.3GHz WC WinCCS |
GbE |
8 |
981.8 |
635.3 |
--- |
| HPDL140 3.6GHz EM64T WINCCS |
IB |
8 |
970.8 |
675.0 |
120.0 |
| HPDL585 2.6GHz DC 152 RHEL4 |
IB(V)/HP-MPI |
8 |
966.2 |
723.2 |
119.2 |
| HPXC4000 2.2GHz DC 148 Linux |
IB(V)/HP-MPI |
8 |
951.0 |
680.4 |
102.7 |
| HPDL145 G2 Opteron 2.2GHz DC WinCCS |
IB(V) |
8 |
847.1 |
654.5 |
119.2 |
| IBMX3650 2.66GHz 4C Clovert. EM64T RHEL4 |
? |
8 |
953.6 |
551.2 |
93.3 |
Benchmark Description
Nine industrial CFD applications ranging in size from 32,000 to
10,000,000 cells have been selected to demonstrate the performance of
FLUENT on a variety of hardware platforms. The performance of a CFD
code will depend on several factors including size and topology of the
mesh, physical models, numerics and parallelization, compilers and
optimization, in addition to performance characteristics of the
hardware where the simulation is performed. The problems selected
represent a range of simulations typical of those which might be found
in industry. The principal objective of this benchmark suite is to
provide comprehensive and fair comparative information of the
performance of FLUENT on available hardware platforms.
System Configuration
Hardware Configuration:
Sun Fire X2200 M2
2-socket 2x2.8 GHz dual core Opteron 2220 processors
4x1GB + 4x2GB (12GB) DDR2 667 MHz dimms
IB(Voltaire)/PCI-Express (interconnect)
Software Configuration:
64-bit SuSE SLES 9 SP 3
Fluent V6.3.26
Voltaire Infiniband Software Stack: 3.5.5_16-S2sles9.k2.6.5_7.244_smp.x86_64
Message Passing Interface: HP-MPI V hpmpi-2.02.05.00-20061003r.x86_64
See Also
Current V6.2(.16) results at:
http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm
If there were already Intel-based results for a SP...
Agreed, but with such a difference, how does one k...
The degree of performance difference doesn't ...
OK so how would a user know what characteristic of...
Why the characteristics of their workload that mak...