BM Seer Unofficial thoughts from an anonymous Sun employee

Sun Fire X2200 M2 running Fluent CFD Beats Woodcrest & Clovertown

Friday Mar 30, 2007

The Sun Fire X2200 M2 server beats Woodcrest on large CFD models. The X2200 M2 Cluster beats all currently posted Opteron cluster results (dual core HP XC4000 2.2GHz, HP DL145 G2 2.2GHz, HP XW9300 2.4GHz, and HP DL585 2.6GHz) for all "cpu" levels and for all test cases. All clusters had the high performance Infiniband interconnects.

The X2200 M2 beats the IBM X3650 2.66GHz quad core Clovertown across the board at all cpu levels and for all test cases.

Tests were run on the official version of Fluent (lnxamd64 V6.3.26 build). The Sun Opteron server numbers were generated under 64-bit SUSE SLES 9 SP 3. Sun many customers that use Solaris, Linux, and windows so we show benchmarks on all of these.

Although the X2200 M2 cluster has the best performance on the larger and more complex tests, "FL5L3". It is most closely representative of actual customer benchmarks (requires over 9GB of memory, best run using several cpu's). FL5L3 simulates turbulent flow through a transition duct.

Note that the X2200 M2 cluster results shown in following table are consistently better than those obtained on the two Woodcrest cluster systems at the same "cpu" levels and for all indicated "cpu" levels (4 to 32).

The efficiency of the Sun X2200 M2 cluster is superb at well above 90% up to 32 cores. This essentially perfect scalability is contrasted with the Woodcrest clusters where scalability has dropped off and efficiency is below 70% at and above 4 cores.

Scaling Performance : Results in "Ratings" (# runs/day, bigger is better)

System 4 Cores 8 Cores 16 Cores 32 Cores
Sun X2200 M2
2.8GHz Operton
89.9 174.4 341.5 664.4
HP BL460C
3.0GHz Woodcrest
80.3 155.4 299.0 576.0
HP DL140
3.0GHz Woodcrest
N/A 160.7 320.5 620.1
Bull NovaScale
3.0GHz Woodcrest
78.9 157.8 313.2 619.0

Fluent Performance : Results in "Ratings" (# runs/day, bigger is better)

System Interconnect/MPI cores FL5L1 FL5L2 FL5L3
X2200 2.8GHz DC 2220 SLES 9 SP 3 IB(V)/HP-MPI 8 1219.5 952.1 174.4
X2100 3.0GHz SC 156 SLES 9 SP3 IB(V)/MVAPICH 8 1148.2 1063.4 184.6
HPDL140 3.0GHz DC WC EM64T Linux IB/HP-MPI 8 1378.0 915.0 160.7
Bull Nova 3.0 GHz DC WC EM64T RHEL4 IB 8 1323.6 884.1 157.8
HP BL460C 3.0GHz WC EM64T WinCCS IB(V) 8 1289.6 881.6 155.4
Intel White 3.0GHz WC EM64T DC RHAS4 IB(Mellanox) 8 --- 828.0 137.8
Tyan Typh. 630 2.3GHz WC SLES 10 GbE 8 1011.7 692.4 122.7
Tyan Typh. 630 2.3GHz WC WinCCS GbE 8 981.8 635.3 ---
HPDL140 3.6GHz EM64T WINCCS IB 8 970.8 675.0 120.0
HPDL585 2.6GHz DC 152 RHEL4 IB(V)/HP-MPI 8 966.2 723.2 119.2
HPXC4000 2.2GHz DC 148 Linux IB(V)/HP-MPI 8 951.0 680.4 102.7
HPDL145 G2 Opteron 2.2GHz DC WinCCS IB(V) 8 847.1 654.5 119.2
IBMX3650 2.66GHz 4C Clovert. EM64T RHEL4 ? 8 953.6 551.2 93.3

Benchmark Description

Nine industrial CFD applications ranging in size from 32,000 to 10,000,000 cells have been selected to demonstrate the performance of FLUENT on a variety of hardware platforms. The performance of a CFD code will depend on several factors including size and topology of the mesh, physical models, numerics and parallelization, compilers and optimization, in addition to performance characteristics of the hardware where the simulation is performed. The problems selected represent a range of simulations typical of those which might be found in industry. The principal objective of this benchmark suite is to provide comprehensive and fair comparative information of the performance of FLUENT on available hardware platforms.

System Configuration

Hardware Configuration:

    Sun Fire X2200 M2
    2-socket 2x2.8 GHz dual core Opteron 2220 processors
    4x1GB + 4x2GB (12GB) DDR2 667 MHz dimms
    IB(Voltaire)/PCI-Express (interconnect)

Software Configuration:

    64-bit SuSE SLES 9 SP 3
    Fluent V6.3.26
    Voltaire Infiniband Software Stack: 3.5.5_16-S2sles9.k2.6.5_7.244_smp.x86_64
    Message Passing Interface: HP-MPI V hpmpi-2.02.05.00-20061003r.x86_64

See Also

Current V6.2(.16) results at:
http://www.fluent.com/software/fluent/fl5bench/flbench_6.3/fullres.htm

Like this post? del.icio.us | furl | slashdot | technorati | digg

Intel funny comparisons and calculations

Thursday Mar 22, 2007

Intel is really playing with information, ZDNet writes:

    "...instead of comparing Intel's latest greatest chips to AMD's latest greatest chips (as Intel should be doing to legimately convince Wall St., the press, and customers of leadership and/or breakaway performance), more than half of the data points that show Intel leading or breaking away show it doing so against older AMD chips (in some cases, single-core chips or chips from an older generation of Opterons) and in some cases, with retired benchmarks"
SOURCE: "AMD’s no angel, but Intel’s public usage of benchmark data is feloniously misleading" Posted by David Berlind @ 3:33pm http://blogs.zdnet.com/Berlind/?p=366

David missed something we've blogged about here, some Intel systems with normal-size memory use more watts when compared to some systems. Watt & configuration data really needs to be shown somewhere in the Intel presentation mentioned above if a chart is to have validity. http://blogs.sun.com/bmseer/entry/woodcrest_memory_lacks_some_important

and one other thing, in this presentation Intel used perf/$ and perf/watt. As blogged yesterday, everyone needs to use $/perf and watt/perf to really help customers.

Like this post? del.icio.us | furl | slashdot | technorati | digg

bucket-o-records SPEC CPU2006 Sun Blade X8420

Thursday Jan 11, 2007

Sun Blade X8420 is 1.9x faster than the best Intel Woodcrest system on SPECint_rate2006 and is also 2.1x faster than the best Intel Woodcrest on SPECfp_rate2006. The Sun Blade X8420 is also 22% faster than 4-way Itanium2 dual-core on SPECfp_rate.

Sun Blade X8420 delivered the best result with SPECint_rate2006 score of 93.1, using Solaris 10 and Studio 11 combo. The Sun Blade X8420 also delivered the best result of of 87.3 for the SPECfp_rate2006 benchmark for all x86 systems.

SPEC CPU2006 Performance Charts (bigger is better, selected recent results)

SPECint_rate2006

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
Sun Blade X8420 AMD Opteron 8220 2.8 4 8 8 93.1 80.4
Fujitsu CELSIUS R640 Xeon 5160 (Woodcrest) 3.0 2 4 4 50.3 48.8
Sun Ultra 40 M2 AMD Opteron 2220SE 2.8 2 4 4 48.8 41.9
HP DL585 Opteron 854 2.8 4 4 4 46.9 41.4
Supermicro X7DBE Xeon 5160 (Woodcrest) 3.0 2 4 4 --- 45.2
Sun Fire X4200 Opteron 285 2.6 2 4 4 42.8 37.8
Fujjitsu RX220 Opteron 280 2.4 2 4 4 40.0 35.7
Sun Fire X4200 Opteron 256 3.0 2 2 2 26.4 23.1
HP DL585 Opteron 854 2.8 2 2 2 25.2 22.3
Dell PrecWork 380 Pentium EE 3.73 1 2 2 -- 23.1
HP DL380 G4 Pentium 4 3.8 2 2 2 -- 20.9

SPECfp_rate2006

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
Sun Blade X8420 AMD Opteron 8220 2.8 4 8 8 87.3 82.5
HP rx6600 Itanium2 dual-core 1.6 4 8 8 71.4 69.1
HP DL585 Opteron 854 2.8 4 4 4 49.3 45.6
FSC CELSIUS R640 Intel Xeon 5160 (Woodcrest), WinXP Pro 3.0 2 4 4 42.5 41.4
Sun Fire X4200 Opteron 285 2.6 2 4 4 38.1 36.0

Results as of 09 Jan 2007 from www.spec.org.

Benchmark Description

SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

Disclosure Statement:

    SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 1/9/07. Sun Blade X8420 (AMD Opteron 8220, 4chips/8cores, Solaris 10) 93.1 SPECint_rate2006. Sun Blade X8420 (AMD Opteron 8220, 4chips/8cores, Solaris 10) 87.3 SPECint_rate2006.

Results Summary

    Results
    X8420 93.1 SPECint_rate2006
    X8420 87.3 SPECfp_rate2006
    Reference Date: Jan 09, 2007
    System: Sun Blade X8420, 64GB memory
    Processors: four 2.8 GHz Opteron 8220
    Software: Solaris 10, Sun Studio 11

Out-of-box Performance: Java6 advances

Thursday Dec 14, 2006

In case you missed it: "Java 6 Leads Out-of-the-Box Server Performance"

Dave Dagastine's blog this week goes into the advances of Java 6. Java6 is Sun's fastest most-reliable release and specifically targets out-of-the-box performance. link: http://blogs.sun.com/dagastine/entry/java_6_leads_out_of

Bottom line: It means no tuning options are needed for the JVM to achieve optimal performance. YEAH! Lots of great details in the blog.

Like this post? del.icio.us | furl | slashdot | technorati | digg

Intel chip power & details on Opteron vs. Woodcrest, etc

Friday Dec 08, 2006

Here is a processor wattage chart, but notice that Intel has the memory controller off chip so you need to add 30-35 watts to this figures (opteron includes this on chip) http://www.intel.com/products/processor_number/chart/xeon.htm

For more details on power budget breakdown I got pointed to these two pages for an AMD comparison (looks like it is part of a bigger preso?, no confidential statement): http://www.amd.com/us-en/assets/content_type/DownloadableAssets/opt_vs_wc_8_dimms.pps
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/4_opt_vs_4_pax.pps

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

details on power budgets: Opteron advantage over Woodcrest

Thursday Dec 07, 2006

More details on power budget differences that give Opteron at least a 34% lead over Woodcrest.

I gave some basics of this in this posting: http://blogs.sun.com/bmseer/entry/design_strategies%3A_wattage_advantage_of

Woodcrest power budget: Dual-core Xeon's : 160 watts per socket (80w each) PLUS 44.8 watts for chipset (incl memory controllers) PLUS 66.4 watts 166.4 watts FBDIMM (16 DIMMs).

    {{typo corrected: yes FB-DIMMs suck an amazing 170 watts for 16 DIMMs -- that's nearly 100watts more than DDR2. That is why Intel-based systems only report wattage on small memory configs, but still use the same large memory configs for various benchmarks.}}

Opteron power budget: Dual-core Opteron's: 190 watts socket (95w max each) PLUS 16 watts for chipset PLUS 70.4 watts for DDR2 (16 DIMMs).

...and this is just looking at just the chips -- and not adding the typical controllers you'd have for a functioning system like disk , network, etc...

[3] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Woodcrest memory lacks an important power-saving feature

Thursday Dec 07, 2006

There are technical reasons why a 32GB 2-processor Woodcrest server draws a hefty 510 watts. Intel decided not to implement the energy saving "page open mode" for the power-hungry FB-DIMMs. So CPU power throttling may have limited benefit on Woodcrest systems.

System 8GB 10GB 16GB 32GB
Woodcrest 330 watts 2-socket 400 watts 1-socket 430 watts 2-socket 510 watts 2-socket
Disk Config 1x150GB 7200 rpm SATA disk disk one 73GB 15K rpm SAS (disk idle) just 2 SATA HDDs
Sources: Intel disclosed intel whitepaper Sun measured www.c0t0d0s0.org posting

Intel has shown that a 10GB 1 socket Woodcrest draws 400 watts, but you have to dig past some marketing spin to find it, see page 3 of www.intel.com/it/pdf/energy-efficient-perf-for-the-data-center.pdf.

Sun publishes benchmark performance and watts on Sun Fire T2000(~330 watts) and the Sun Fire T1000(~185 watts), performance, and configuration on all of its benchmarks http://www.sun.com/servers/coolthreads/t1000/benchmarks.jsp.

  • 330 watts 32GB Sun Fire T2000
    • 32GB; 4 x 73GB 10K rpm SAS disks, 3 Northstar NICs, Crystal FCAL
    • 32GB T2000 has 100 less watts and twice the memory of the 16GB Woodcrest config
    • measured by Sun, CPUs busy, network busy, disks idle
  • 185 watts 16GB Sun Fire T1000
    • Measured on every T2000/T1000 benchmark

Woodcrest 16GB 430 watt measured config details:

    Dell 2950
    2 x 3GHz Woodcrest Xeon 5160 (4MB L2 cache)
    16GB = 8 x 2GB DIMM;
    one 73GB 15K rpm SAS (disk idle)
    1.333MHz FSB
    PERC 5/i, x6 Backplane Integrated Controller Card
    QLogic 2462 Dual Channel 4GB Optical FC HBA PCI-E
    OS: SuSE - SLES
    all bios settings correct

Like this post? del.icio.us | furl | slashdot | technorati | digg

Design strategies: wattage advantage of Opteron vs. Woodcrest

Tuesday Dec 05, 2006

Some things to look at when you seen marketing around wattage. You can avoid errors by really looking at total measured wattage when systems running and doing real work. I've seen a lot of Intel marketing about wattage of Woodcrest being 65 watts. But that really doesn't show the whole picture. I'll break it down a bit...

What GHz at what wattage?:First recognize that Woodcrest 2.66 GHz & 2.33 GHz is 65 watts for chip only, but Woodcrest at 3.0 GHz is 80 watts. ...and all benchmarks I've seen is on the 80 watt 3.0 GHz systems.

What about the memory controller?: The CPU isn't everything. Woodcrest designs have an external memory controller. Opteron designs have an integrated memory controller. So you need to add another 30 watts (or more) for the pair of Woodcrest CPUs.

What about the memory technology differences?: The CPU+Memory_controller isn't everything. Woodcrest designs use FB-DIMMs. Opteron designs use the more power efficient DDR2. FB-DIMMS draw a lot more power. In fact, as I've blogged about before, 32GB 2-socket Woodcrest system draws 500 watts! Measured when the CPU is busy. Sun's Opteron systems is way over 100 watts less.

Every IT department I talk to really wants to cut cost out -- power consumption is a growing a major factor in IT costs.

...this just in...

Sun is now shipping a wattage meter with the "Try-and-buy" program for Sun Fire T2000. More details at: http://blogs.sun.com/cohen/entry/kill_a_watt_--_power

Like this post? del.icio.us | furl | slashdot | technorati | digg

few hundred watts a difference? maybe 600kW!

Friday Dec 01, 2006

Hallway discussion: "What diff does 200 watts really make?" I was having a discussion about 32GB T2000 (330watts) vs 32GB Woodcrest(500 watts) and was told this is only about 200watt difference really in the end a very small change.

But the problem is that when you aggregate and consider all of the losses from utility to computing it can add up rather quickly. ...maybe up to 600kW or more. To give everyone an example, I'll base the power losses on a real Sun datacenter of mixed systems, but use the T2000 as an example and a Woodcrest system as a comparable.

For example let's say we had 700 KW of T2000 32GB servers (700,000kW/330watt = 2,121 servers or 100 racks). We lose about 40% due to air conditioning and power distribution in the datacenter and 3% loss in utility power distribution. all and all this is 1200kW of power out of the Utility.

OK with woodcrest this is 2121 servers x 500watt/server or 1.06MW for servers. Assuming same percentage loss at every stage this means the utility has to provide 1.83MW(for Woodcrest) vs. 1.2MW(for Sun Fire T2000).

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

another datapoint of Woodcrest burning lots-o-watts

Friday Nov 17, 2006

Surprised that a 32GB 2-processor Woodcrest server draws a hefty 510 watts. Woodcrest vendors need to be transparent... it will get out. A recent internet search found that even Intel knows a 10GB 1 socket Woodcrest draws 400 watts, see page 3 of www.intel.com/it/pdf/energy-efficient-perf-for-the-data-center.pdf.

Woodcrest vendors need to publish configuration, performance and watts all together whenever they show performance! No more games.

System 8GB 10GB 16GB 32GB
Woodcrest 330 watts 2-socket 400 watts 1-socket 430 watts 2-socket 510 watts 2-socket
Disk Config 1x150GB 7200 rpm SATA disk disk one 73GB 15K rpm SAS (disk idle) just 2 SATA HDDs
Source: Intel disclosed intel whitepaper Sun measured www.c0t0d0s0.org posting

Sun publishes benchmark performance and watts on Sun Fire T2000(~330 watts) and the Sun Fire T1000(~185 watts), performance, and configuration on all of its benchmarks http://www.sun.com/servers/coolthreads/t1000/benchmarks.jsp.

  • 330 watts 32GB Sun Fire T2000
    • 32GB; 4 x 73GB 10K rpm SAS disks, 3 Northstar NICs, Crystal FCAL
    • 32GB T2000 has 100 less watts and twice the memory of the 16GB Woodcrest config
    • measured by Sun, CPUs busy, network busy, disks idle
  • 185 watts 16GB Sun Fire T1000
    • Measured on every T2000/T1000 benchmark

Woodcrest 16GB 430 watt measured config details:

    Dell 2950
    2 x 3GHz Woodcrest Xeon 5160 (4MB L2 cache)
    16GB = 8 x 2GB DIMM;
    one 73GB 15K rpm SAS (disk idle)
    1.333MHz FSB
    PERC 5/i, x6 Backplane Integrated Controller Card
    QLogic 2462 Dual Channel 4GB Optical FC HBA PCI-E
    OS: SuSE - SLES
    all bios settings correct

If you have a woodcrest measure the watts and post them, clearly wastecrest vendors don't want you to know.

Seems the Intel likes to use marketing spin and avoid the facts: http://www.intel.com/business/bss/infrastructure/enterprise/power_thermal.pdf and http://www.intel.com/performance/server/xeon/ppw.htm
Also http://www.principledtechnologies.com/clients/reports/Intel/WSPECint_rate_0506.pdf

Like this post? del.icio.us | furl | slashdot | technorati | digg

Total Tyranny of low utilization datacenters

Friday Nov 17, 2006

The Total Tyranny of low utilization datacenters

In this blog and other blogs I've commented on, Woodcrest supporters always want to say their servers are better at low utilisation. This is totally the wrong way to go! They first claim typical datacenters are running at low utilisations, example: Xen claims typical datacenters are at 15%. Horrible, HORRIBLE.

So why shouldn't use just add all kinds of techniques to power at lower utilisations, clearly that is the best way to save money? Right? Wrong.

Lets take a simple example of a 400 watt server(@ 100%) that saves 20 watts for each 10% reduction in utilisation. Will show this in a table below and compare equivalent work done compared to 100% so you can see the hyperbolic nature of the curve. Of course I'm only looking at one server so there is some discretisation but when you have a datacenter it will quickly approach these numbers.

%Utilisation 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
Watts-at-Util 400 380 360 340 320 300 280 260 240 220 220
watts/work 400 422 450 486 533 600 700 867 1200 2200 inf.

Now that I've got you shocked, let's look at a more typical example. Lets compare 5 servers running at 10% utilisation (that is 220 watts each or 1100 watts for the 5 of them). A single server running at 50% utilisation only uses 300 watts! The 10% case almost require 3.7 times more power! OUCH!

Bottom line: It is far too easy to be fooled to think you are saving money if power-saving features at low utilisation is your answer. By the by, a significant number of Sun's large servers run at over 80% utilisation using Solaris, of course.

Here is an example from 2004 of someone on different products who likely understands this math. As reported in Computerworld:

    "Dennis Callahan, CIO at The Guardian Life Insurance Company of America in New York, server utilization has shot up to nearly 50% in the past 18 months, with a goal in coming years of nearly 70%.

Like this post? del.icio.us | furl | slashdot | technorati | digg

New World Record SPECint_rate2006 Sun Ultra 40 M2 Workstation

Thursday Nov 16, 2006

The Sun Ultra 40 M2 Workstation demonstrates a new World Record integer throughput performance for all x86 systems, sets a new world record on the new and improved SPEC cpu benchmark called "SPECint_rate2006." It fixes things like SPECint_rate2000 has/had floating-point applications in the integer suite, whaaaat? yes strange but true.

The Sun Ultra 40 M2 delivered the SPECint_rate2006 score of 48.8, using Solaris 10 and Studio 11 combination. Sun's Opteron beats Woodcrest by 7%. As you can see below 'Peak' means you add a few more compiler flags. I guess Woodcrest didn't have any others to try on Woodcrest or maybe they saw no improvement so they avoided publishing? Anyone know?

Competitive Landscape

Selected SPEC CPU2006 (SPECint_rate2006) Performance Results - bigger is better, see www.spec.org for complete results.

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
Sun Ultra 40 M2 AMD Opteron 2220SE 2.8 2 4 4 48.4 41.9
HP DL585 Opteron 854 2.8 4 4 4 46.9 41.4
Supermicro X7DBE Woodcrest, Xeon 5160 3.0 2 4 4 --- 45.2
Sun Fire X4200 Opteron 285 2.6 2 4 4 42.8 37.8
Fujjitsu RX220 Opteron 280 2.4 2 4 4 40.0 35.7
Sun Fire X4200 Opteron 256 3.0 2 2 2 26.4 23.1
HP DL585 Opteron 854 2.8 2 2 2 25.2 22.3
Dell PrecWork 380 Pentium EE 3.73 1 2 2 -- 23.1
HP DL380 G4 Pentium 4 3.8 2 2 2 -- 20.9

Benchmark Description

SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by www.spec.org.

The second measure is Rate. I think this one is a LOT more important. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors or threads. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results. "Rate" is what you use for any mult-threaded workstation and all servers.

Disclosure Statement:

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 11/14/06. Sun Ultra 40 M2, 48.8 SPECint_rate2006.

System Configuration

  • Sun Ultra 40 M2
  • 2 x 2.8 GHz Opteron 2220SE
  • 16GB memory
  • Solaris 10
  • Sun Studio 11
  • 48.8 SPECint_rate2006

Like this post? del.icio.us | furl | slashdot | technorati | digg

Sun Opteron x4100 outscaling woodcrest (and outperforming = side benefit)

Wednesday Nov 15, 2006

Woodcrest scaling issues? Yes, remember scaling is critical for system performance, so don't look too much at single core performance or single job performance as it can lead to the wrong conclusions. In fact Sun's Opteron scaling means that the Sun systems can outperform Woodcrest by 18% to 22% as shown below.

On a 4 core/2chip Intel Woodcrest systems they are only seeing 2.8x to 2.9x on 4 cores -- this doesn't bode well for quad-core or larger systems made out of these. Sun sees 3.6x to 4.1x scaling in the table below. Couple this with the high-wattage of these Woodcrest (31-Oct posting) and Woodcrest may have issues?

Opteron leads poor Woodcrest scaling & performance on Fluent 6 Benchmark (Both systems 2 sockets and using dual-core)

System GHz/Chip #cores FL5M3 (scaling) FL5L2 (scaling)
INTEL S5000XAL 3.0GHz Xeon Woodcrest 5160 4-core 827.0 (2.8x) 400.0 (2.9x)
INTEL S5000XAL 3.0GHz Xeon Woodcrest 5160 2-core 553.7 (1.9x) 226.0 (1.6x)
INTEL S5000XAL 3.0GHz Xeon Woodcrest 5160 1-core 297.3 (1.0x) 138.0 (1.0x)
Sun
Sun X4100 M2 2.8GHz Opteron DC 2200 4-core 979.9 (3.6x) 486.6 (4.1x)
Sun X4100 M2 2.8GHz Opteron DC 2200 2-core 516.1 (1.9x) 241.8 (2.1x)
Sun X4100 M2 2.8GHz Opteron DC 2200 1-core 273.5 (1.0x) 117.6 (1.0x)

Rating = No. of sequential runs of test case possible in 1 day, 86,400/(Total Elapsed Run Time in Seconds)

Fluent results at: http://www.fluent.com/software/fluent/fl5bench/flbench_6.2/fullres.htm

...I suspect even better performance and scaling on Sun Fire X4100 M2 with Solaris...

Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record Performance SPECapc Sun beats Woodcrest

Tuesday Nov 14, 2006

Sun Ultra 40 M2 w/2xFX 5500 nVidia Framebuffers (SLI) World Record Performance SPECapc Unigraphics UGS-NX3

The Sun Ultra 40 M2 with dual nVidia Quadro FX 5500s in SLI mode sets a world record running the SPEC APC UGS-NX3 graphics oriented MCAD benchmark beating all desktop platforms, including the Woodcrest and Intel Core2 "Extreme Processor" X6800 cpu's.

The SPEC APC MCAD benchmarks consist of tasks representative of what a designer would do in a typical session. This consists of "Graphics", "CPU", and "I/O" activities.

  • In dual framebuffer SLI mode the Ultra 40 M2 with 2.8GHz 2220SE dual core Opteron processors outperforms a Dell 690 (3.0 GHz Woodcrest) by 14% overall and by 37% in the graphics test components.
  • In addition, in dual framebuffer SLI mode the existing Ultra 40 outperforms the Dell 690 (Woocrest 3.0 GHz) by 16% overall and by 61% in the graphics component. The Ultra 40 with 3.0 GHz single core Opteron 256 processors (400 MHz DDR1 dimms) versus the 2.8 GHz dual core Opteron 2220SE processors (667 MHz DDR2 dimms), edging the Ultra 40 M2 by about 1%.

The Sun Ultra 40 with a single nVidia Quadro FX 5500 outperforms most other high end desktops equipped with a single framebuffer with currently posted results obtained running the SPEC APC UGS-NX3 benchmark.

  • The Sun Ultra 40 with FX 5500 framebuffer outperforms (is faster than) Woodcrest desktops. H-P XW 6400 (4% overall, 39% on graphics); Dell Precision 690 (9% overall, 52% on graphics); IBM Intellistation Z Pro 9228 (14% overall, 62% on graphics)

  • The Sun Ultra 40 with FX 5500 framebuffer also outperforms all desktops equipped with the Intel 2.93 GHz X6800 "Extreme Processors". H-P XW 4400 (6% overall, 47% on graphics); Dell Precision 390 (10% overall, 47% on graphics)

Sun Opteron desktops have dominated with leading MCAD benchmark results dating back to the introduction of the Sun W1100 and W2100. Sun desktops continue to exhibit excellent MCAD performance as demonstrated by the world record results here for this SPEC APC UGS-NX3 benchmark.

SPECapc Unigraphics NX 3 Benchmark Competitive Landscape (larger is faster):

System Overall
Composite
CPU
Composite
File I/0
Composite
Graphics
Composite
Sun Ultra 40
3.0GHz Opteron 256
2x FX 5500 (SLI)
7.28 2.94 2.85 19.81
Sun Ultra 40 M2
2.8GHz Opteron 2220SE
2x FX 5500 (SLI)
7.19 3.08 3.00 16.85
Fujitsu Siemens CELSIUS
3.0GHz Intel 5150
FX 5500
6.42 3.67 2.28 10.17
Dell Precision 690
3.0GHz Woodcrest
2x FX 4500 (SLI)
6.30 3.25 1.64 12.29
Sun Ultra 40
3.0GHz Opteron 256
FX 5500
5.66 2.94 1.96 10.11
HP xw6400 WS
3.0GHz Woodcrest
FX 4500
5.42 3.39 3.51 7.26
HP xw4400
2.93GHz X6800
FX 3500
5.33 3.40 4.52 6.87
Dell Precision 690
3.00 GHz Woodcrest
FX 3500
5.17 3.38 3.69 6.64
Dell Precision 390
2.93 GHz X6800
FX 3500
5.16 3.46 2.18 6.87
IBM Intellistation Z Pro 9228
3.0GHz Woodcrest
FX 3500
4.96 3.43 2.84 6.23

Results Summary for the SPECapc Unigraphics NX 3 benchmark:
Results
Dual
FX 5500
Dual
FX 5500
Overall Composite: 7.19 7.28
CPU Composite: 3.08 2.94
File I/O Composite: 3.00 2.85
Graphics Composite: 16.85 19.81
Reference Date: 11/10/06 10/12/06
System: Sun Ultra U40 M2 Sun Ultra U40
Processor/GHz: Opteron 2220SE/2.8 Opteron 256/3.0

Disclosure Statement:

SPEC reg tm, SPECapc server mark of Standard Performance Evaluation Corporation. Results from www.spec.org as of Oct 12, 2006: Sun Ultra 40, 2xFX 5500, overall composite 7.28; Dell Precision 690, 2xFX 4500, overall composite 6.30. Results from www.spec.org as of Oct 12, 2006: Sun Ultra 40, FX 5500, overall composite 5.66; HP xw6400, FX 4500, overall composite 5.42; Dell Precision 690, FX 3500, overall composite 5.17; IBM Intellistation Z Pro 9228, FX 3500, overall composite 4.96. Results from www.spec.org as of Nov. 8, 2006: Fujitsu Siemens CELSIUS, FX 5500, overall composite 6.42. Results from www.spec.org as of Nov 10, 2006: Sun Ultra 40 M2, 2xFX 5500, overall composite 7.19. Results from www.spec.org as of Oct 12, 2006: Sun Ultra 40, FX 5500, overall composite 5.66; HP xw4400, FX 3500, overall composite 5.33; Dell Precision 390, FX 3500, overall composite 5.16.

Like this post? del.icio.us | furl | slashdot | technorati | digg

Woodcrest scaling problems? (Fluent part1)

Friday Nov 03, 2006

Does the Woodcrest have scaling issues now? It may be caused by the rush to increase core count without really considering design. On a 4 core/2chip Intel Woodcrest systems we are only seeing 3.0x to 3.3x on 4 cores -- this doesn't bode well for quad-core or larger systems made out of these. Couple this with the high-wattage of these chips (Tuesday's posting) and this chip may have issues?

Poor Woodcrest scaling & Performance on Fluent 6 Benchmark

System GHz/Chip #cores FL5L1 (scaling) FL5L2 (scaling)
INTEL S5000XAL 3.0GHz Xeon Woodcrest 5160 4-core 2-Socket 631.8 (3.3x) 400.0 (3.0x)
INTEL S5000XAL 3.0GHz Xeon Woodcrest 5160 2-core 1-Socket 372.8 (1.9x) 226.0 (1.7x)
INTEL S5000XAL 3.0GHz Xeon Woodcrest 5160 1-core 194.0 (1.0x) 133.0 (1.0x)

Rating = No. of sequential runs of test case possible in 1 day, 86,400/(Total Elapsed Run Time in Seconds)

Fluent results at: http://www.fluent.com/software/fluent/fl5bench/flbench_6.2/fullres.htm

Like this post? del.icio.us | furl | slashdot | technorati | digg