Thursday Jan 24, 2008
The EXA PowerFLOW benchmark test suite
was run on a mini cluster of Sun Blade X6250
blades with the recently announced 3.33 GHz dual-core Intel 5260. The Sun Blade X6250 mini cluster beats all
posted results at the PowerFLOW Performance website
up to the eight cores that were considered.
- In runs of the benchmark test suite, the Sun X6250
cluster was nominally 20% faster than the best result from
the top IBM, HP, or SGI clusters. The variation was from 15% to 24%
faster over the 4-core levels considered.
- The scaling efficiency of the Sun X6250 cluster ranged from 100%
(at 1 core) to on average 83% (at 8 cores).
Four 2-socket Sun X6250 blades with Infiniband
interconnects were used and runs were
made at different core levels: 1, 2, 4, and 8.
Comparisons are presented against the current leading competitors' results
also obtained with high perfomance interconnects and posted
at the EXA PowerFLOW Performance website. This includes results from IBM,
HP, and SGI platforms.
EXA PowerFLOW V 3.6c Benchmark Case 1 (Smaller Model), Results are total elapsed time in seconds
| System |
Processor |
Number of Cores |
| 1 |
2 |
4 |
8 |
| Sun Blade X6250 |
3.33GHz DC Intel X5260 |
720.49 |
389.36 |
195.43 |
110.36 |
| Sun Blade X6250 |
3.0GHz DC Intel 5160 |
822.71 |
418.47 |
214.48 |
118.63 |
| Sun Blade X6250 |
3.0GHz QC Intel X5365 |
844.30 |
430.72 |
214.41 |
121.25 |
| Sun Fire X2200 M2 |
3.0GHz DC Opteron |
943.41 |
461.11 |
232.93 |
123.12 |
| Sun Blade X8440 |
3.0GHz DC Opteron |
937.24 |
472.58 |
238.58 |
127.44 |
| HP BL460 |
3.0GHz DC Xeon |
-- |
-- |
-- |
137.22 |
| Sun Fire X4450 |
2.93GHz QC Intel 7350 |
874.12 |
462.12 |
241.08 |
137.74 |
| IBM e1350 |
3.0GHz DC Xeon 5160 |
-- |
-- |
240.31 |
141.46 |
| HP BL465 |
2.6GHz DC Opteron |
-- |
-- |
-- |
146.31 |
| SGI Altix |
3.0GHz DC Xeon |
866.09 |
448.80 |
264.82 |
147.88 |
| HP rx2660 |
1.6GHz DC Itanium2 |
-- |
-- |
-- |
214.25 |
| SGI Altix |
1.6GHz DC Itanium2 |
1631.4 |
832.68 |
438.43 |
227.24 |
EXA PowerFLOW V 3.6c Benchmark Case 1 (Smaller Model)
Results are total elapsed time in seconds
| System |
Processor |
Number of Cores |
| 1 |
2 |
4 |
8 |
| Sun Blade X6250 |
3.33GHz DC Intel X5260 |
1714.22 |
925.77 |
463.11 |
252.64 |
| Sun Blade X6250 |
3.0GHz DC Intel 5160 |
1966.38 |
987.47 |
500.54 |
258.42 |
| Sun Blade X6250 |
3.0GHz QC Intel X5365 |
1991.78 |
1010.63 |
507.21 |
278.64 |
| Sun Fire X2200 M2 |
3.0GHz DC Opteron |
2273.02 |
1086.65 |
550.34 |
282.53 |
| Sun Blade X8440 |
3.0GHz DC Opteron |
2210.13 |
1130.27 |
562.68 |
289.81 |
| HP BL460 |
3.0GHz DC Xeon |
-- |
-- |
-- |
310.03 |
| IBM e1350 |
3.0GHz DC Xeon 5160 |
-- |
-- |
557.04 |
314.23 |
| SGI Altix |
3.0GHz DC Xeon |
2043.59 |
1062.38 |
620.67 |
315.96 |
| Sun Fire X4450 |
2.93GHz QC Intel 7350 |
2062.00 |
1066.34 |
598.81 |
319.92 |
| HP BL465 |
2.6GHz DC Opteron |
-- |
-- |
-- |
331.82 |
| HP rx2660 |
1.6GHz DC Itanium2 |
-- |
-- |
-- |
490.74 |
| SGI Altix |
1.6GHz DC Itanium2 |
3883.97 |
2000.44 |
1054.45 |
526.74 |
Key Technical Points
- Real world CFD engineering models
are typically very large and are best analyzed
with many cores in order to achieve reasonable
turnaround on run times. Scalability running these
large models with PowerFLOW is very good often
linear or perfect up to 64 or larger.
- Performance when running PowerFLOW in a
multi node configuration is significantly
enhanced when using high performance interconnects such as Infiniband
- PowerFLOW supports a variety of interconnects from various hardware vendors
(starting with gigabit ethernet then
Infiniband e.g. Voltaire, Cisco/Topspin, QLogic,
then Myrinet) MPI's
(HP-MPI, MVAPICH 2, LAM) and communication protocals
(e.g. ssh and rsh)
- There is still not an officially certified version
of a Solaris build of PowerFLOW
for X86-64 platform architectures.
- The PowerFLOW benchmark test suite consists of two
test cases. They are two models of the same analysis but of
differnt sizes (different mesh refinement), pertaining to flow
over a car body. Both models are rather large and scale very well
up to and even beyond 64 cores.
- The two test cases in the suite, require from
6 to 8 GB of memory running with only
one core on a single node. This memory
requirement per node is reduced when running in a dmp
cluster mode on multi nodes.
- PowerFLOW runs are cpu and memory intensive but do not require
any special high performance I/O file systems.
- When running the test suite a run script is provided ("exabench")
that will automatically run one or both test cases over a range
of core levels on the particular cluster
nodes as specified in an "mpi_file" along
with the number of requested cores to be used per node.
Disclosure Statement:
Exa Corporation Copyright
All information on the EXA website is Copyrighted @ 2007, 1996-2006 by Exa Corp
oration.,
PowerFLOW is a registered trademark of EXA Corporation.
Results from http://www.exa.com/user_center/index.html as of January 17, 2008.
Benchmark Description
The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard
cases, each a simulation of external airflow around an automobile.
Case #1
Description: This smaller case has 18.2 million voxels (8.4 million
fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
Case #2
Description: This larger case has 23.6 million voxels (18.9 million
fine-equivalent) and 1.7 million surfels (1.5 million
fine-equivalent).
When describing the size of the cases, it is important to note that
voxels and surfels within different VR regions have different
computational costs associated with them. To account for this,
fine-equivalent voxels and surfels are a measure of computational load
that takes into account the lower cost of processing coarser scales of
resolution. For example, a voxel at the second-finest scale, is
processed only half as often (every other timestep) as a voxel at the
finest scale, and thus has half the computational cost.
System Configuration
4 Sun Blade X6250's
3.33 GHz dual core Intel 5260 processors
2 internal striped 15K SAS drives (cluster shared file system)
Infiniband (Voltaire) interconnects
SUSE Linux Enterprise Server SLES 10
Voltaire OFED GridStack-4.1.5_7-sles-k2.6.16.21-0.8-smp-x86_64
HP-MPI
EXA PowerFLOW 3.6c
PowerFLOW 3.6 Benchmark Test Suite
See Also
Current EXA PowerFLOW V3.6c & V4.6c results at (EXA password required):
http://www.exa.com/user_center/index.html
Tuesday Apr 10, 2007
Solaris 10 outperforms Linux by more than 25% on some tests
when running the same applications on the same hardware. This helps
prove that Solaris is a very high-performance operating system.
Pro/E is the foremost MCAD system and is distributed to major
engineering corporations worldwide. Most Product Life Management
systems (PLM) have seamless links to this biggest and best MCAD
system. This includes all of the major MCAE ISV applications.
The Pro/E Wildfire 3 OCUS V5 benchmarks were used to demonstrate this
Solaris superiority over Linux. These benchmarks are endorsed by Pro/E users as
being very representative of typical and most frequently used
operations that include large memory requirements associated with the
increasingly commonplace large assemblies now seen that exceed 32-bit
capabilities.
The Solaris Studio compilers and associated performance libraries
continue to make applications perform better on X64 platforms
relative to builds of the same applications using other
compilers, performance libraries, and operating systems on the
same platforms.
In fact a recent Desktop Engineering Article
demonstrated that Sun Ultra 40 M2 desktops with four large capacity
146 GB 15K rpm internal drives, 2-sockets with dual core 3.0 GHz
Opteron 2222 processors, and 32 GB of 667 MHz DDR2 memory (8 4 GB
dimms) can function essentially as a personal server
permitting the engineer to perform design operations with his MCAD
system and concurrently perform CPU and I/O intensive design
verification
analyses:
http://www.deskeng.com/Articles/Feature/A-Server-on-Every-Desk-200702081652.html
The Sun desktops use the nVidia Quadro FX framebuffers that allow the user to
perform MCAD or even more graphics intensive operations in a minimum of rendering time. Sun desktops
equipped with the high end framebuffer offerings have set world
records with graphics intensive benchmarks such as the SPEC APC
UGS-NX3 benchmark:
http://www.spec.org/gpc/apc.data/specapc_nx3_summary.html
as well as the Ensight engineering visualization
benchmark:
http://www.ensight.com/rendering-performance-tests.html
The Pro/E Wildfire 3 MCAD OCUS V5 (time in seconds)
Solaris 10 vs. Linux on X64 (Sun Ultra 40 M2 - same hardware)
| |
Total |
Graphics |
CPU |
Disk I/O |
Solaris 10 %Faster |
| 32-bit Normal Benchmark |
| Solaris 10 |
1810 |
913 |
893 |
96 |
25% |
| SuSE Linux 10 |
2271 |
990 |
1278 |
107 |
| 64-bit Large-Memory Benchmark |
| Solaris 10 |
5224 |
1202 |
4008 |
388 |
6% |
| SuSE Linux 10 |
5563 |
1373 |
4164 |
441 |
Configuration
Sun Ultra 40 M2 desktop
2x2.8 GHz DC Opteron 2220's
8 GB (2x4x1 667 MHz DDR2 dimms)
1x nVidia Quadro FX 5500
Solaris 10
64-bit SUSE Linux Enterprise for Desktop (SLED 10)
Application: 64-bit Pro/E Wildfire 3
Benchmark: Pro/E OCUS V5 (32-bit Normal benchmark, 64-bit Large Memory benchmark)
Monday Feb 05, 2007
Solaris can improve your performance and Solaris gives you great features.
Sun really feels that Solaris has a strong lead over other operating
systems. We've shown various head-to-head comparisons on this blog. You
can see links to a few of those below.
It is also important to remember that Sun also has many important customers
who are running RedHat Linux, SuSE Linux, and Windows. So a variety of
benchmarks are also done with those, as you can see in last week's entry
on SPECjAppServer.
So expect to see results on a mix of operating systems as Sun fully
understands different customers have different needs. We still believe
most can get many benefits from moving to Solaris -- so if you are one
of those people who can switch, the evidence continues to mount that
it is a very good idea to use Solaris.
January 17, 2007
Variety ways Solaris is leading Linux
December 20, 2006
Solaris again beating Linux on benchmark
January 03, 2007
update: Solaris beating Linux Performance
September 22, 2006
Yet another Solaris v. Linux performance comparison
EDA vendors supporting Solaris:
http://blogs.sun.com/bmseer/entry/another_strong_isv_votes_for
http://blogs.sun.com/bmseer/entry/eda_vendors_seeing_solaris_benefits
afternote:
I forgot to mention that Solaris is also Open:
http://www.opensolaris.org/os/
...not something you see with IBM's AIX or HP/UX.
Wednesday Jan 03, 2007
(Update with corrections of previous entry)
Sun Fire X4100/X4200M2 4-thread 2-socket World Record, shows that
Solaris 10 and Sun Studio 11 are faster than Opterons running Linux (31% faster than PGI and 8% faster than Pathscale compilers).
The Sun Fire X4100/X4200M2 two-way dual-core server produced best SPECompM2001 result of 13222.
Sun Solaris 10/Studio11 was 31% faster than Linux/PGI with
AMD Tyan system using SuSE Linux SLES9 SP3 64-bit/PGI 6.2-4. The AMD Tyan even used faster CL4 DIMMs. Both results submitted Nov 2006.
The Sun Fire X4100/X4200M2 also topped the IBM p5 520 POWER5+ 1.9GHz AIX5L V5.3 result by 61%.
Solaris/Studio 11 is 8% faster than Linux SuSE SLES9 SP3 64bit using QLogic PathScale Compiler Suite v2.5.
There is a growing body of favorable comparisons showing
Solaris advantages over Linux on performance. Remember the
previous BM Seer posting on Java performance.
Even more examples of Solaris beating Linux coming soon.
SPECompM2001 Results (bigger is better)
| SPECompM2001 |
Cores |
Chips |
Thrds |
System |
| 13222 |
4 |
2 |
4 |
Sun Fire X4100/X4200 M2, Opteron 2220SE, 2.8GHz |
| 12574 |
4 |
2 |
4 |
Sun Fire X2200 M2, Opteron 2218, 2.6GHz |
| 12172 |
4 |
2 |
4 |
AMD Tyan n6650w, Opteron 8220, 2.8GHz |
| 10085 |
4 |
2 |
4 |
AMD Tyan n6650w, Opteron 8220, 2.8GHz |
| 8174 |
2 |
1 |
4 |
IBM System p5 520 (1900 MHz, 2 CPU) |
Benchmark Description
The SPEC OMPM2001 Benchmark Suite was released in June 2001 and
tests HPC performance using OpenMP for parallelism. It consists of
11 programs (8 in Fortran and 3 in C) parallelized using OpenMP API.
Goals of the benchmark:
Targeted to mid-range (4-32 processor) parallel systems
Run rules, tools and reporting similar to SPEC CPU2000
Programs representative of HPC and Scientific Applications
See Also:
SPEC OMP2001 Page
sun.com X4100 Benchmark Page
sun.com X4200 Benchmark Page
Disclosure Statement:
SPEC, SPEComp reg tm of Standard Performance Evaluation Corporation.
Results from www.spec.org Results as of Jan 2, 2007.
Sun Fire X4100/X4200 M2 (4 cores, 2 chips, 4 threads), 13,222 SPECompM2001.
AMD Tyan n6650w (4 cores, 2 chips, 4 threads), 10,085 SPECompM2001 PGI compiler.
AMD Tyan n6650w (4 cores, 2 chips, 4 threads), 12,172 SPECompM2001 Pathscale compiler. Sockets refers to chips.
Results Summary
|
X4100/X4200 M2 4-threads: |
|
13222 SPECompM2001 |
|
X2200 M2 4-threads: |
|
12574 SPECompM2001 |
| Reference Date: |
|
Oct 16, 2006 |
| System: |
|
Sun Fire X4100/X4200M2 16GB memory (4x2GB per chip), DDR667 |
| Processors: |
|
two Opteron 2220SE, 2.8 GHz |
| Operating System: |
|
Solaris 10 |
| Compiler: |
|
Sun Studio 11 |
Wednesday Dec 20, 2006
This entry has been updated, for the latest please go to:
http://blogs.sun.com/bmseer/entry/update_solaris_beating_linux_performance.