BM Seer Facts & Questions from an Anonymous Sun Source

Sun Blade X6250 Cluster EXA PowerFLOW World Record

Thursday Jan 24, 2008

The EXA PowerFLOW benchmark test suite was run on a mini cluster of Sun Blade X6250 blades with the recently announced 3.33 GHz dual-core Intel 5260. The Sun Blade X6250 mini cluster beats all posted results at the PowerFLOW Performance website up to the eight cores that were considered.

  • In runs of the benchmark test suite, the Sun X6250 cluster was nominally 20% faster than the best result from the top IBM, HP, or SGI clusters. The variation was from 15% to 24% faster over the 4-core levels considered.
  • The scaling efficiency of the Sun X6250 cluster ranged from 100% (at 1 core) to on average 83% (at 8 cores).

Four 2-socket Sun X6250 blades with Infiniband interconnects were used and runs were made at different core levels: 1, 2, 4, and 8. Comparisons are presented against the current leading competitors' results also obtained with high perfomance interconnects and posted at the EXA PowerFLOW Performance website. This includes results from IBM, HP, and SGI platforms.

EXA PowerFLOW V 3.6c Benchmark Case 1 (Smaller Model), Results are total elapsed time in seconds

System Processor Number of Cores
1 2 4 8
Sun Blade X6250 3.33GHz DC Intel X5260 720.49 389.36 195.43 110.36
Sun Blade X6250 3.0GHz DC Intel 5160 822.71 418.47 214.48 118.63
Sun Blade X6250 3.0GHz QC Intel X5365 844.30 430.72 214.41 121.25
Sun Fire X2200 M2 3.0GHz DC Opteron 943.41 461.11 232.93 123.12
Sun Blade X8440 3.0GHz DC Opteron 937.24 472.58 238.58 127.44
HP BL460 3.0GHz DC Xeon -- -- -- 137.22
Sun Fire X4450 2.93GHz QC Intel 7350 874.12 462.12 241.08 137.74
IBM e1350 3.0GHz DC Xeon 5160 -- -- 240.31 141.46
HP BL465 2.6GHz DC Opteron -- -- -- 146.31
SGI Altix 3.0GHz DC Xeon 866.09 448.80 264.82 147.88
HP rx2660 1.6GHz DC Itanium2 -- -- -- 214.25
SGI Altix 1.6GHz DC Itanium2 1631.4 832.68 438.43 227.24

EXA PowerFLOW V 3.6c Benchmark Case 1 (Smaller Model)

Results are total elapsed time in seconds

System Processor Number of Cores
1 2 4 8
Sun Blade X6250 3.33GHz DC Intel X5260 1714.22 925.77 463.11 252.64
Sun Blade X6250 3.0GHz DC Intel 5160 1966.38 987.47 500.54 258.42
Sun Blade X6250 3.0GHz QC Intel X5365 1991.78 1010.63 507.21 278.64
Sun Fire X2200 M2 3.0GHz DC Opteron 2273.02 1086.65 550.34 282.53
Sun Blade X8440 3.0GHz DC Opteron 2210.13 1130.27 562.68 289.81
HP BL460 3.0GHz DC Xeon -- -- -- 310.03
IBM e1350 3.0GHz DC Xeon 5160 -- -- 557.04 314.23
SGI Altix 3.0GHz DC Xeon 2043.59 1062.38 620.67 315.96
Sun Fire X4450 2.93GHz QC Intel 7350 2062.00 1066.34 598.81 319.92
HP BL465 2.6GHz DC Opteron -- -- -- 331.82
HP rx2660 1.6GHz DC Itanium2 -- -- -- 490.74
SGI Altix 1.6GHz DC Itanium2 3883.97 2000.44 1054.45 526.74

Key Technical Points

  • Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 or larger.
  • Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband
  • PowerFLOW supports a variety of interconnects from various hardware vendors (starting with gigabit ethernet then Infiniband e.g. Voltaire, Cisco/Topspin, QLogic, then Myrinet) MPI's (HP-MPI, MVAPICH 2, LAM) and communication protocals (e.g. ssh and rsh)
  • There is still not an officially certified version of a Solaris build of PowerFLOW for X86-64 platform architectures.
  • The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes (different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.
  • The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.
  • PowerFLOW runs are cpu and memory intensive but do not require any special high performance I/O file systems.
  • When running the test suite a run script is provided ("exabench") that will automatically run one or both test cases over a range of core levels on the particular cluster nodes as specified in an "mpi_file" along with the number of requested cores to be used per node.
Disclosure Statement:

Exa Corporation Copyright All information on the EXA website is Copyrighted @ 2007, 1996-2006 by Exa Corp oration., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of January 17, 2008.

Benchmark Description

The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.

    Case #1
    Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).

    Case #2
    Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).

When describing the size of the cases, it is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To account for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.

System Configuration

  • 4 Sun Blade X6250's
  • 3.33 GHz dual core Intel 5260 processors
  • 2 internal striped 15K SAS drives (cluster shared file system)
  • Infiniband (Voltaire) interconnects
  • SUSE Linux Enterprise Server SLES 10
    Voltaire OFED GridStack-4.1.5_7-sles-k2.6.16.21-0.8-smp-x86_64
    HP-MPI
    EXA PowerFLOW 3.6c
    PowerFLOW 3.6 Benchmark Test Suite

    See Also

    Current EXA PowerFLOW V3.6c & V4.6c results at (EXA password required):
    http://www.exa.com/user_center/index.html

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Solaris 10 Outperforms Linux: same MCAD software & same hardware

    Tuesday Apr 10, 2007

    Solaris 10 outperforms Linux by more than 25% on some tests when running the same applications on the same hardware. This helps prove that Solaris is a very high-performance operating system.

    Pro/E is the foremost MCAD system and is distributed to major engineering corporations worldwide. Most Product Life Management systems (PLM) have seamless links to this biggest and best MCAD system. This includes all of the major MCAE ISV applications.

    The Pro/E Wildfire 3 OCUS V5 benchmarks were used to demonstrate this Solaris superiority over Linux. These benchmarks are endorsed by Pro/E users as being very representative of typical and most frequently used operations that include large memory requirements associated with the increasingly commonplace large assemblies now seen that exceed 32-bit capabilities.

    The Solaris Studio compilers and associated performance libraries continue to make applications perform better on X64 platforms relative to builds of the same applications using other compilers, performance libraries, and operating systems on the same platforms.

    In fact a recent Desktop Engineering Article demonstrated that Sun Ultra 40 M2 desktops with four large capacity 146 GB 15K rpm internal drives, 2-sockets with dual core 3.0 GHz Opteron 2222 processors, and 32 GB of 667 MHz DDR2 memory (8 4 GB dimms) can function essentially as a personal server permitting the engineer to perform design operations with his MCAD system and concurrently perform CPU and I/O intensive design verification analyses:
    http://www.deskeng.com/Articles/Feature/A-Server-on-Every-Desk-200702081652.html

    The Sun desktops use the nVidia Quadro FX framebuffers that allow the user to perform MCAD or even more graphics intensive operations in a minimum of rendering time. Sun desktops equipped with the high end framebuffer offerings have set world records with graphics intensive benchmarks such as the SPEC APC UGS-NX3 benchmark:
    http://www.spec.org/gpc/apc.data/specapc_nx3_summary.html

    as well as the Ensight engineering visualization benchmark:
    http://www.ensight.com/rendering-performance-tests.html

    The Pro/E Wildfire 3 MCAD OCUS V5 (time in seconds)

    Solaris 10 vs. Linux on X64 (Sun Ultra 40 M2 - same hardware)

      Total Graphics CPU Disk I/O Solaris 10
    %Faster
    32-bit Normal Benchmark
    Solaris 10 1810 913 893 96 25%
    SuSE Linux 10 2271 990 1278 107
    64-bit Large-Memory Benchmark
    Solaris 10 5224 1202 4008 388 6%
    SuSE Linux 10 5563 1373 4164 441

    Configuration

    Sun Ultra 40 M2 desktop
    2x2.8 GHz DC Opteron 2220's
    8 GB (2x4x1 667 MHz DDR2 dimms)
    1x nVidia Quadro FX 5500
    Solaris 10
    64-bit SUSE Linux Enterprise for Desktop (SLED 10)
    Application: 64-bit Pro/E Wildfire 3
    Benchmark: Pro/E OCUS V5 (32-bit Normal benchmark, 64-bit Large Memory benchmark)

    [2] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Solaris, Linux, Windows

    Monday Feb 05, 2007

    Solaris can improve your performance and Solaris gives you great features. Sun really feels that Solaris has a strong lead over other operating systems. We've shown various head-to-head comparisons on this blog. You can see links to a few of those below.

    It is also important to remember that Sun also has many important customers who are running RedHat Linux, SuSE Linux, and Windows. So a variety of benchmarks are also done with those, as you can see in last week's entry on SPECjAppServer.

    So expect to see results on a mix of operating systems as Sun fully understands different customers have different needs. We still believe most can get many benefits from moving to Solaris -- so if you are one of those people who can switch, the evidence continues to mount that it is a very good idea to use Solaris.

    January 17, 2007
    Variety ways Solaris is leading Linux

    December 20, 2006
    Solaris again beating Linux on benchmark

    January 03, 2007
    update: Solaris beating Linux Performance

    September 22, 2006
    Yet another Solaris v. Linux performance comparison

    EDA vendors supporting Solaris:
    http://blogs.sun.com/bmseer/entry/another_strong_isv_votes_for
    http://blogs.sun.com/bmseer/entry/eda_vendors_seeing_solaris_benefits

    afternote:
    I forgot to mention that Solaris is also Open:
    http://www.opensolaris.org/os/

    ...not something you see with IBM's AIX or HP/UX.

    [2] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    update: Solaris beating Linux Performance

    Wednesday Jan 03, 2007

    (Update with corrections of previous entry)

    Sun Fire X4100/X4200M2 4-thread 2-socket World Record, shows that Solaris 10 and Sun Studio 11 are faster than Opterons running Linux (31% faster than PGI and 8% faster than Pathscale compilers). The Sun Fire X4100/X4200M2 two-way dual-core server produced best SPECompM2001 result of 13222.

    Sun Solaris 10/Studio11 was 31% faster than Linux/PGI with AMD Tyan system using SuSE Linux SLES9 SP3 64-bit/PGI 6.2-4. The AMD Tyan even used faster CL4 DIMMs. Both results submitted Nov 2006. The Sun Fire X4100/X4200M2 also topped the IBM p5 520 POWER5+ 1.9GHz AIX5L V5.3 result by 61%.

    Solaris/Studio 11 is 8% faster than Linux SuSE SLES9 SP3 64bit using QLogic PathScale Compiler Suite v2.5.

    There is a growing body of favorable comparisons showing Solaris advantages over Linux on performance. Remember the previous BM Seer posting on Java performance.

    Even more examples of Solaris beating Linux coming soon.

    SPECompM2001 Results (bigger is better)
    SPECompM2001 Cores Chips Thrds System
    13222 4 2 4 Sun Fire X4100/X4200 M2, Opteron 2220SE, 2.8GHz
    12574 4 2 4 Sun Fire X2200 M2, Opteron 2218, 2.6GHz
    12172 4 2 4 AMD Tyan n6650w, Opteron 8220, 2.8GHz
    10085 4 2 4 AMD Tyan n6650w, Opteron 8220, 2.8GHz
    8174 2 1 4 IBM System p5 520 (1900 MHz, 2 CPU)

    Benchmark Description

    The SPEC OMPM2001 Benchmark Suite was released in June 2001 and tests HPC performance using OpenMP for parallelism. It consists of 11 programs (8 in Fortran and 3 in C) parallelized using OpenMP API.

    Goals of the benchmark:

  • Targeted to mid-range (4-32 processor) parallel systems
  • Run rules, tools and reporting similar to SPEC CPU2000
  • Programs representative of HPC and Scientific Applications
  • See Also:
    SPEC OMP2001 Page
    sun.com X4100 Benchmark Page
    sun.com X4200 Benchmark Page

    Disclosure Statement:

    SPEC, SPEComp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org Results as of Jan 2, 2007. Sun Fire X4100/X4200 M2 (4 cores, 2 chips, 4 threads), 13,222 SPECompM2001. AMD Tyan n6650w (4 cores, 2 chips, 4 threads), 10,085 SPECompM2001 PGI compiler. AMD Tyan n6650w (4 cores, 2 chips, 4 threads), 12,172 SPECompM2001 Pathscale compiler. Sockets refers to chips.

    Results Summary

      X4100/X4200 M2 4-threads: 13222 SPECompM2001
      X2200 M2 4-threads: 12574 SPECompM2001
      Reference Date: Oct 16, 2006
      System: Sun Fire X4100/X4200M2 16GB memory (4x2GB per chip), DDR667
      Processors: two Opteron 2220SE, 2.8 GHz
      Operating System: Solaris 10
      Compiler: Sun Studio 11

    [3] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Solaris again beating Linux on benchmark

    Wednesday Dec 20, 2006

    This entry has been updated, for the latest please go to:
    http://blogs.sun.com/bmseer/entry/update_solaris_beating_linux_performance.

    [2] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg