BM Seer Facts & Questions from an Anonymous Sun Source

Web Processing Performance Sun Modular Datacenter (Sun MD) S20

Thursday Jan 31, 2008

The Sun Modular Datacenter S20 configured with Sun SPARC Enterprise T5120 servers demonstrated superior performance/watt and performance/space compared to Dell servers using Xeon quad-core processors.

The Sun Modular Datacenter S20 fully configured with the 102 Sun SPARC Enterprise T5120, each with a single UltraSPARC T2, can deliver nearly 455,000 web processing operations/second.

  • The Sun Modular Datacenter S20 fully configured with the Sun SPARC Enterprise T5120 systems only requires 160 square feet of space. By comparison, a configuration of Dell servers with 2.66GHz Xeon 5355 in 160 square feet of traditional datacenter space constrained to 150 Watts/square foot would achieve only about 57,500 web processing operations/second. To achieve the same level of performance with the Dell configuration in a traditional datacenter, over 1250 square feet would be needed.
  • The Sun SPARC Enterprise T5120s configured in a Sun Modular Datacenter S20 are very efficient in terms of space-performance. A Sun Modular Datacenter S20 fully-configured with Dell servers with 2.66GHz Xeon 5355 would only be able to provide about 1/3 of the web processing performance of a Sun Modular Datacenter fully-configured with Sun SPARC Enterprise T5120 servers with UltraSPARC T2.
  • A Sun Modular Datacenter S20 provides 2840 web-processing-ops/sec/sq-ft vs. a traditional datacenter of Dell servers which provides only 365 web processing-ops/sec/sq-ft.
  • The Sun Modular Datacenter S20 is very efficient at cooling. Using the same Sun hardware, the Sun Modular Datacenter S20 is 40% more efficient. This translates into a savings of 1459 metric tons over 5 years.
Notes:

    Due to Sun Modular Datacenter's integrated, high-efficiency power and cooling of up to 25kW per rack, servers, disks and switches can be racked more densely than a traditional datacenter.

    Many traditional datacenters are constrained to 150 Watts/square foot, this was used in the above estimates.

    BM Seeer: I'll be adding more of the background numbers and calculations when I can find a public version of them, if you are a customer contact Sun and I'm sure you can get versions before I can post them.

Benchmark Description

Web processing performance is based on internal analysis of web processing workloads. The workloads simulates multiple user web sessions accessing a web server via static and dynamic HTTP (contains both HTTP and HTTPS transactions) and is reported as web operations per second.

System Configuration & Results

Results 455,000 web ops/sec
Reference Date: January 29, 2008
System: 1 x Sun Modular Datcenter S20
Servers: 102 x Sun SPARC Enterprise T5120
Total Number Processors: 102 chips / 816 cores (8 threads/core)
Processor/GHz Server: Sun UltraSPARC T2 1.4 GHz, 64GB
Operating System: Solaris 10
Software: Sun JSWS 7.0 Update 2

Storage & Network

  • 6 x Sun StorageTek 3510 (Dual-raid controller)
  • 51 x Sun StorageTek 3510 (JBOD)
  • 9 x Brocade Silkworm 4100 switches
  • 5 x Cisco Catalyst 6509 NEBs Switches

[10] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

IBM real & real

Wednesday Aug 15, 2007

disingenuous = "giving a false appearance of simple frankness".

IBM bloggers try to make valid SPEC estimates (preliminary results) look like "fake", they keep using words like "virtual" "not real" and "strange", then for IBM they use word like "real" 7 times.

All UltraSPARC T2 SPEC CPU and SPEC OMP metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use pre-production systems. Sun customer systems, to be announced later, are expected to perform similarly. SPEC rules do allow comparing these preliminary scores and published result.

For details see:
"Estimated" what does that mean for Sun's UltraSPARC T2, which is at:
http://blogs.sun.com/bmseer/entry/estimated_what_does_that_mean

IBM bloggers also imply that IBM doesn't estimate results, I was seeing lots of IBM POWER6 estimates from IBM 2 years ago and they were estimates not based on runs.

SPEC, SPECint, SPECfp, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of August 6, 2007.

[8] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

UltraSPARC T2: more floating-point performance

Tuesday Aug 07, 2007

More about floating-point on the Sun UltraSPARC T2 in this posting, In the previous posting SPECfp_2006 scores and the UltraSPARC T2 design being open-sourced were discussed.

In the UltraSPARC T2 there are eight floating-point units that are well suited for scientific applications. Based upon preliminary runs the Sun UltraSPARC T2 processor at 1.4 GHz beats all single chip scores showing 14230(est)/15081(est) SPECompMbase2001/SPECompMpeak2001.

How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECompMbase2001/SPECompMpeak2001 scores?

  • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip IBM p520 POWER5+ 1.9GHz processor published result by 85%.
  • ...Sun is waiting for POWER6 4.7GHz results, maybe UltraSPARC T2 results will scare IBM from ever publishing a single-chip result?
Benchmark description:

The SpecOMP benchmark is a test of the performance of 9 High Performance computing applications. It is used to compare the performance of shared memory servers. All C/C++ and FORTRAN applications in this suite use the OpenMP programming model that provides a portable, scalable model for developing parallel applications for platforms ranging from the desktop to the supercomputer.

The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, from the largest Unix servers to the small Windows NT platforms.

Disclosure statement:

All UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use preproduction systems. SPEC, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 14230 (est)/ 15081 (est) SPECompMbase2001/SPECompMpeak2001. Competitive results from www.spec.org as of August 6, 2007. IBM p520 1.9GHz (1 chip, 2 cores, 4 threads) published 8141/8174 SPECompMbase2001/SPECompMpeak2001.

[2] Comments

Performance of the new Sun UltraSPARC T2

Tuesday Aug 07, 2007

Sun UltraSPARC T2 is an amazing chip and very fast! The UltraSPARC T2 features several industry firsts:

  • Eight cores and 64 threads
  • Integrated 10 GbE networking and I/O
  • Dedicated, cryptographic and floating point units per core
  • 10 cryptographic functions supported with hardware
  • open-source design: www.opensparc.net

Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz, beat all single chip scores showing 78.3 est. SPECint_rate2006. How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECint_rate2006 results.

  • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip IBM POWER6 4.7GHz processor published result by 29%.
  • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip estimated scores of the AMD Barcelona by 23%.
  • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip published scores of the 2.66GHz Intel X5355 (Clovertown) by 48%.
Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz, beat all single chip scores showing 62.3 est. SPECfp_rate2006. How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECfp_rate2006 results.
  • These Sun UltraSPARC T2 1.4GHz processor scores beat the best published single-chip IBM POWER6 4.7GHz processor result by 7%.
  • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip estimated scores of the AMD Barcelona by 11%.
  • These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip published scores of the 2.66GHz Intel X5355 (Clovertown) by 66%.

Performance per core doesn't matter GHz doesn't matter, what matters is numbers of cores, efficiency, and design of the chip! Competitors are saying that UltraSPARC T2 is proprietary... this makes no sense. both UltraSPARC T1 and UltraSPARC T2 are open source designs (www.opensparc.net). You do not find the latest design of Intel, AMD, or IBM as open source designs.

Disclosure Statement:

All Sun UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs, but are nevertheless designated as “estimates” because they use preproduction systems. SPEC, SPECint, SPECfp registered trademarks of Standard Performance Evaluation Corporation. Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 78.3 est. SPECint_rate2006, 62.3 est. SPECfp_rate2006. Competitive results from www.spec.org as of August 6, 2007. IBM POWER6 4.7GHz (1 chip, 2 cores, 4 threads) 60.9. SPECint_rate2006, 58.0 SPECfp_rate2006. AMD Barcelona 2.6 GHz (1 chip, 4 cores, 4 threads) 63.9 est SPECint_rate2006, 56.3 est. SPECfp_rate2006. Barcelona estimates based upon "The Register" article stating 2.6GHz quad is 21% and 50% faster than Intel 2.66 system. Fujitsu RX300 Intel X5355 2.66 GHz (1 chip, 4 cores, 4 threads) 52.8 SPECint_rate2006, 47.5 SPECfp_rate2006.

Reminder: The Niagara 2 score was obtained from a full "reportable" SPEC run, but is designated as an "estimate" because a pre-production system was used.

...more information on the UltraSPARC T2 later today.

[6] Comments

Lots hitting the wires: UltraSPARC T2, the next generation

Monday Aug 06, 2007

Many news sources now covering UltraSPARC T2, the new high-performance chip from Sun. This new UltraSPARC T2 chip leads in many ways. I'll cover the performance numbers tomorrow.

For now:
http://www.computerworld.com.au/index.php/id;898889798
http://www.reuters.com/article/technologyNews/idUSN0625780420070806
http://www.channelweb.co.uk/vnunet/news/2195718/sun-lifts-lid-niagara-processor
etc..

For some of my previous comments:
http://blogs.sun.com/bmseer/entry/news_trickles_out_on_niagara2

Please remember that the previous generation chip, the UltraSPARC T1, just set an application-tier world record (all details at link). How many times has the "old" chip with half as many threads set a world record weeks before the new one is announced?

A final note. I venture that this chip is going to lead for database, application tier, and of course web tier, oh and don't forget HPC, yes it is that versatile.

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

Update: World Record EXA PowerFLOW Cluster & Single Node

Monday Jul 16, 2007

Update:

A single-node Sun Blade X6250(Intel Xeon 3 GHz DC 5160) is two times faster than a single-node SGI 1.6GHz Itanium 2 dual-core from runs with 1, 2, and 4 cores in both benchmark test cases.

Other runs on the 4-node cluster of Sun Blade X6250 outperformed the SGI Itanium2 dual-core 1.6GHz cluster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster.

    question: can the Itanic dual-core keep floating?

The 4-node Sun Blade X6250 cluster outperformed the SGI Altix XE cluster by 25% faster in runs of both test cases up to the maximum of 16 cores on all 4 nodes in each cluster. Even at the single node configuration, the Sun Blade X6250 beats an SGI Altix (3 GHz Xeon 5160 DC) by up to 23% in 4 core runs. It is also 4% faster in the 1-core results.

In summary:
World Record single-node Sun Blade X6250 (Intel Xeon 3 GHz DC 5160) beats the best posted results for any single node blades and servers. All posted results are for 2 socket dual-core platforms

EXA PowerFLOW V 6.3c Benchmark Case 1 (Smaller Model) results in seconds (smaller is better)

#
C
P
U
IBM e135
Opt
DC 2.4GHz
Myri
net
SLES 9
HP BL460
Xeon
DC 3GHz
IB
RHEL 4
HP BL460
Opt
DC 3.0GHz
IB
XC3.1 RC1
HP DL140
Xeon
DC 3GHz
IB
XC3.1 RC1
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4
Sun X6250
Xeon 5160
DC 3.0GHz
IB
SLES 10
SGI Altix
Itan2
DC 1.6GHz
 
Pro
Pack5
SGI Altix
XE
Xeon
DC 3GHz
 
SLES 10
1 - - - - - 822.7 1631.4 866.1
2 - - - - - 418.5 832.7 448.8
4 - - - - - 214.9 438.4 264.8
8 182.9 137.2 137.8 134.7 214.3 118.6 227.2 147.9
16 96.3 70.4 71.3 70.5 111.4 77.5 117.9 78.1
32 51.5 37.0 40.6 36.6 57.9 - 60.2 41.9
64 31.5 21.5 22.9 21.1 31.8 - - 28.0
96 24.7 17.3 - - - - - -
128 19.0 - - - - - - 18.1

"-" no result published

EXA PowerFLOW V 6.3c Benchmark Case 2 (Larger Model) results in seconds (smaller is better)

#
C
P
U
IBM e135
Opt
DC 2.4GHz
Myri
net
SLES 9
HP BL460
Xeon
DC 3GHz
IB
RHEL 4
HP BL460
Opt
DC 3GHz
IB
XC3.1
RC1
HP DL140
Xeon
DC 3.0GHz
IB
XC3.1 RC1
HP RX2660
Itan2
DC 1.6GHz
IB
RHEL 4
Sun X6250
Xeon 5160
DC 3GHz
IB
SLES 10
SGI Altix
Itan2
DC 1.6GHz
 
Pro
Pack5
SGI Altix
XE
Xeon
DC 3GHz
 
SLES 10
1 - - - - - 1966.4 3884.0 2043.6
2 - - - - - 987.5 2000.4 1062.4
4 - - - - - 500.5 1054.5 620.7
8 424.9 310.0 306.4 258.4 490.7 258.4 526.7 316.0
16 216.0 165.4 - 160.1 253.9 164.5 272.1 174.4
32 112.8 82.3 84.4 83.3 129.3 - 139.4 90.3
64 61.5 43.8 43.8 43.2 68 - 75.6 48.7
96 45.2 32.3 - - - - - -
128 36.8 - - 24.4 - - - 32.8

"-" no result published

The EXA PowerFLOW Benchmark Test Suite
The PowerFLOW performance benchmark test suite consists of two standard cases, each a simulation of external airflow around an automobile.

Real world CFD engineering models are typically very large and are best analyzed with many cores in order to achieve reasonable turnaround on run times. Scalability running these large models with PowerFLOW is very good often linear or perfect up to 64 and even 128 cores

The PowerFLOW benchmark test suite consists of two test cases. They are two models of the same analysis but of differnt sizes(different mesh refinement), pertaining to flow over a car body. Both models are rather large and scale very well up to and even beyond 64 cores.

    Case #1 Description: This smaller case has 18.2 million voxels (8.4 million fine-equivalent) and 1.2 million surfels (690 K fine-equivalent).
    Case #2 Description: This larger case has 23.6 million voxels (18.9 million fine-equivalent) and 1.7 million surfels (1.5 million fine-equivalent).

It is important to note that voxels and surfels within different VR regions have different computational costs associated with them. To acount for this, fine-equivalent voxels and surfels are a measure of computational load that takes into account the lower cost of processing coarser scales of resolution. For example, a voxel at the second-finest scale, is processed only half as often (every other timestep) as a voxel at the finest scale, and thus has half the computational cost.

The two test cases in the suite, require from 6 to 8 GB of memory running with only one core on a single node. This memory requirement per node is reduced when running in a dmp cluster mode on multi nodes.

Performance when running PowerFLOW in a multi node configuration is significantly enhanced when using high performance interconnects such as Infiniband

Disclosure Statement:

Exa Corporation Copyright All information on the EXA website is under Copyright 1996-2007 by Exa Corporation., PowerFLOW is a registered trademark of EXA Corporation. Results from http://www.exa.com/user_center/index.html as of 07/02/07.

System Configuration

Hardware Configuration:

Sun Blade X6250

    4 2-socket Sun Blade X6250
    2x3GHz DC Intel Xeon EM64T 5160 (Woodcrest)
    Infiniband (Voltaire) Interconnects (PCI-Express HCA's)

Software Configuration:

    Linux 64-bit SUSE SLES 10
    EXA PowerFLOW V3.6c & V4.c
    EXA PowerFLOW Benchmark Test Suite
    Voltaire GridStack 4.1.5-7 for SLES 10

Like this post? del.icio.us | furl | slashdot | technorati | digg

Linpack Benchmark: Sun SPARC Enterprise M8000 Beats IBM POWER6

Friday Jul 13, 2007

The Sun SPARC Enterprise M8000 has topped the performance of the brand new 4.7GHz POWER6 based p570. The Sun Studio 12 Compilers, Solaris 10, and Sun Performance Library played a key role in obtaining this performance.

The Sun SPARC Enterprise M8000 outperforms the best published POWER6 based system from IBM p570 by over 12% on the Linpack benchmark (Highly Parallel Computing). As a reminder IBM cores costs lots more than any other vendor, so you can't just look at perf/core. Compare systems of similar pricing and configuration.

The Sun SPARC Enterprise M8000 tops the HP Itanium 2 rx8640 system by 40% on the Linpack HPC benchmark.

The Sun SPARC Enterprise M8000, using Sun Studio 12 delivered a score of 268.6 GFLOPS on the Linpack HPC benchmark.

    Funny I read an IBM blog that said all was quiet for them in benchmarks, Sun decided to keep working during the summer :), and I almost can't keep going on my regular job, because this blogging hobby is keeping me busy because so many of my friends in the benchmarking group are producing so many great results on Sun systems!

LINPACK HPC Performance Chart - GFLOPS (bigger is better)

System GFLOPS Processors
Total Peak paralellism chips,cores Type GHz
Sun SPARC Enterprise M9000 1032.0 1228.8 128 64,128 SPARC64 VI 2.4
Sun SPARC Enterprise M8000 268.6 307.2 32 16,32 SPARC64 VI 2.4
Sun SPARC Enterprise M8000 255.3 291.84 32 16,32 SPARC64 VI 2.28
IBM p570 239.4 300.8 16 8,16 POWER6 4.7
HP rx8640 192.4 204.8 32 16,32 Itanium 2 1.6

Benchmark Description

The Linpack benchmark suite measures the performance for factoring and solving a dense set of linear equations in double-precision floating-point.

The Linpack HPC benchmark allows the solution of any size matrix with a single right hand side. It was developed to allow vendors to show off their hardware. Because big problems allow for peak performance potentials, the benchmark is seen as an upper bound of potential performance of a machine. The run rules are much more flexible. The solution technique must use a pivoting scheme and the driver must follow the spirit of the Linpack 1000 or Linpack 100 benchmarks.

Disclosure Statement:

Linpack HPC, results from http://www.netlib.org/benchmark/index.html as of 07/13/07. Sun SPARC Enterprise M8000 (SPARC64 VI @2.4, 16 chips, 32 cores), 268.6 GFLOPS. IBM p570 (POWER6 4.7GHz, 8 chips, 16 cores) 239.4 GFLOPS. HP rx8640 (Itanium 2 1.6GHz/24MB, 16 chips, 32 cores), 192.4 GFLOPS. Linpack Benchmark Performance Report

Results Summary

Published Results
Performance: 268.6 GFLOPS
System: Sun SPARC Enterprise M8000, 256GB
Total Number Processors: 16
Processor/GHz of Server: SPARC64 VI, 2.4 GHz
Operating System: Solaris 10
Compiler: Sun Studio 12

[4] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record EXA PowerFLOW Cluster & Single Node

Thursday Jul 12, 2007

Entry updated please see http://blogs.sun.com/bmseer/entry/update_world_record_exa_powerflow for the latest.

Like this post? del.icio.us | furl | slashdot | technorati | digg

AT&T picks Sun for next-gen Video

Wednesday Jul 11, 2007

Hot off the wire: :AT&T Selects Sun Microsystems Servers for Next-Generation Video Services.

Sun's servers and storage will be deployed in IP-video super hub offices & IP-video hub offices in the AT&T U-verse network.

read more at:
http://money.cnn.com/news/newsfeeds/articles/prnewswire/AQW09811072007-1.htm ...or at:
http://biz.yahoo.com/prnews/070711/aqw098.html?.v=12

Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record ABAQUS V6.6 on the Sun Blade X6250 Cluster

Wednesday Jul 11, 2007

Sun Blade X6250 posted World Record on the ABAQUS Explicit benchmark test suite the Sun Blade X6250 on the MCAE application ABAQUS V6.6. the Sun Blade X6250 used Xeon 3GHz DC 5160. On the various test cases Sun beats the Intel Supermicro by or by 1% to 39% !! The Sun Blade X6250 beats the Intel Supermicro even when you average all of the test case by an average 4% to 9% (geometric mean of all 6 tests cases at all cpu levels listed).

Both machines have 2 sockets and dual core processors. Runs were made at 1- 2- and 4-cores and a geometric mean was established at each of these "cpu" levels based on the 6 test cases in the benchmark test suite.

The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) processors and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ABAQUS website and for all 6 test cases in the ABAQUS "Explicit" benchmark test suite and at the 3 "cpu" levels (1-, 2- & 4-"cpu's"):

About The ABAQUS Explicit Module

This module designed for crash and high velocity impact analyses (including wave propagation and inertia effects) is very scalable and analysis models tend to be very large similar to CFD models. Timely results are best obtained using multiple processing units for typically large jobs either on a single multi core server in smp mode or on a multi node cluster of multi core platforms interconnected in dmp mode.

Consequently this module is meant to run primarily in a multi cpu situation either in smp mode on a single large multi core machine or in dmp mode over a cluster of machines.

ABAQUS V6.6-1 Benchmark Test Suites Explicit Benchmark Test Suite Landscape (time in seconds where smaller is better, Sun % better where bigger is better)

Platform Cores e1 e2 e3 e4 e5 e6 Geometric Mean
 
Sun Blade X6250/5160 4 10451 4509 3853 1887 1990 5202  
Intel Super/5160's/RH4 4 10696 4646 3881 1997 2126 5460  
Sun % Faster   2% 3% 1% 6% 7% 5% 4%
 
Sun Blade X6250/5160 2 14232 7401 5477 2935 3327 7582  
Intel Super/5160's/RH4 2 14878 8044 6316 3310 3483 8048  
Sun % Faster   5% 9% 15% 13% 5% 6% 9%
 
Sun Blade X6250/5160 1 24800 14198 10174 5147 6112 9553  
Intel Super/5160 1 25076 14616 10563 5225 6272 13242  
Sun % Faster   1% 3% 4% 1% 3% 39% 8%

Abaqus/Explicit Benchmark Problems

The problems described below provide an estimate of the performance that can be expected when running Abaqus/Explicit on different computers. The jobs are representative of typical Abaqus/Explicit applications including high-speed dynamic impact events and quasi-static events with complicated contact conditions. The number of increments listed in the tables below are approximate and can vary somewhat depending on the hardware platform and the number of parallel domains.

    E1: Car crash
    This benchmark consists of passenger car impacting a rigid wall. The car is meshed primarily with shell elements of type S3RS and S4RS with isotropic hardening Mises plasticity material behavior. The various compenents of the car are connected using multi-point constraints and connector elements. Many of the suspension and drivetrain components are modeled as rigid bodies. The car, road surface, and wall are placed into a single general contact domain and the car is given an initial velocity of 25 mph.

    E1
    Increments: 62,934
    Number of elements: 274,632

    E2: Cell phone drop
    This benchmark consists of a simplified model of a cell phone impacting a fixed rigid floor. The cell phone components are meshed using a variety of element types including C3D8R, C3D10M, and S4R. The material behavior is modeled using linear elasticity, isotropic hardening Mises plasticity, and hyperelasticity. The components are assembled using surface-based mesh ties and placed into a general contact domain that also includes the floor. The initial velocity and orientation of the cell phone is defined such that a severe oblique impact occurs.

    E2
    Increments: 87,369
    Number of elements: 45,785
    Memory requirement: 300 MB

    E3: Sheet forming
    This benchmark consists of forming a sheet metal part by the deep drawing process. The deformable sheet metal blank is meshed with shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. The tools are meshed using surface elements of type SFM3D4R which are declared rigid. General contact is defined between the blank and tools. The analysis sequence consists of two steps. During the first step the blank is clamped between the binder and die and then during the second step the punch is displaced to form the part. Since the process is essentially quasi-static the computations are performed over a sufficiently long time period to render inertial effects negligible. The performance of this analysis is a direct measure of the performance of the three-dimensional general contact algorithm.

    E3
    Increments: 31,177
    Number of elements: 34,540 (deformable only)
    Memory requirement: 550 MB

    E4: Projectile penetration
    This benchmark consists of a projectile penetrating a steel plate at an oblique angle. Both the projectile and plate are meshed using hexahedral elements of type C3D8R and use a rate-dependent isotropic hardening Mises plasticity material model with failure. The projectile and plate are placed into a general contact domain with surface erosion. The edges of the plate are held fixed and the initial velocity of the projectile is specified so that the projectile passes completely through the plate.

    E4
    Increments: 12,433
    Number of elements: 237,100
    Memory requirement: 1400 MB

    E5: Blast loaded plate
    This benchmark consists of a stiffened steel plate subjected to a high intensity blast load. The plate is meshed using shell elements of type S4R and uses an isotropic hardening Mises plasticity material model. There is no contact.

    E5
    Increments: 81,716
    Number of elements: 50,000
    Memory requirement: 150 MB

    E6: Concentric spheres
    This benchmark consists of a large number of concentric spheres with clearance between each sphere. The spheres are meshed using hexahedral elements of type C3D8R and use an isotropic hardening Mises plasticity material model. All of the spheres are placed into a single general contact domain and the outer sphere is violently shaken which results in complex contact interactions between the contained spheres.

    E6
    Increments: 23,291
    Number of elements: 244,124
    Memory requirement: 1000 MB

    ABAQUS "Standard" & "Explicit" Benchmark Test Suites
    Voltaire GridStack 4.1.5-7 for SLES 10

Disclosure Statement:

The following are trademarks or registered trademarks of Abaqus, Inc. or its subsidiaries in the United States and/or other countries: Abaqus, Abaqus/Standard, Abaqus/Explicit. All information on the ABAQUS website is Copyrighted 2004-2007 by Dassault Systems. Results from http://www.simulia.com/support/v66/v66_performance.html as of 7/2/07.

System Configuration

Hardware Configuration:

Sun Blade X6250

    4 2-socket Sun Blade X6250's
    2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
    Infiniband (Voltaire) Interconnects (PCI-Express HCA's)
Software Configuration:

    Linux: 64-bit SUSE SLES 10
ABAQUS V6.6-3

Like this post? del.icio.us | furl | slashdot | technorati | digg

World Record ANSYS on Sun Blade X6250 (Xeon 3GHz DC 5160)

Tuesday Jul 10, 2007

The Sun Blade X6250 outperfoms all posted ANSYS V11.0 (MCAE) results at www.ansys.com website. A single Sun Blade X6250 beats a single Intel S5000 XAL (same 3GHZ Xeon 5160) by as much as 40% at each of the three "cpu" levels tested (1-, 2-, and all 4 cores available on both 2 socket platforms equipped with dual core processors). Sun Wins at these processor configurations in 6 of the total 7 cases in the benchmark test suite. Overall, on the geometric mean, Sun was 10% higher.

The only case "bm-2" where the Sun X6250 looses has an exceptionally high I/O component, and even so Sun was only 3-4% slower. The Sun X6250 had 10K rpm internal disk drives where the Intel S5000 XAL had 15K rpm drives.

The Sun Blade X6250 with 3.0GHz Xeon EM64T 5160 (Woodcrest) and under 64-bit Linux SuSE SLES 10 beats all of the following platforms with results posted at the ANSYS website for all 7 test cases in the ANSYS "Standard" benchmark test suite (1-, 2- & 4-cpu).

Yes this result was run with Linux, Sun wants to show that we can win with every OS. There now is an officially certified, supported and maintained version of a Solaris build of ANSYS V11.0 for X86-64 platform architectures compiled with recent Sun Studio 11 compilers. This is the first SX64 version that has become available.

Competitive Landscape

ANSYS V 11.0 "Standard" Benchmark Test Suite on X2200 M2 & Constellation Blades (run times in seconds, smaller is better; for % bigger is better)

System Cores bm-1 bm-2 bm-3 bm-4 bm-5 bm-6 bm-7
 
Sun X6250/5160 4 100 1362 343 164 181 131 752
Intel S5000XAL/5160 4 109 1312 369 169 187 161 1048
Sun % better   9% -4% 8% 3% 3% 23% 39%
 
Sun X6250/5160 2 118 1398 385 183 223 169 1064
Intel S5000XAL/5160 2 128 1356 417 186 244 211 1437
Sun % better   9% -3% 8% 2% 9% 25% 35%
 
Sun X6250/5160 1 150 1455 456 211 339 253 1770
Intel S5000XAL/5160 1 164 1416 489 215 340 314 2330
Sun % better   9% -3% 7% 2% 1% 24% 32%

    (please note: per core performance isn't the right metric for comparing different CPUs, as system costs vary greatly, but they are used here to identify configuration) It is "SYSTEM" performance not 'core' performance that matters!)

Key Technical Points

  • The test cases from the ANSYS standard benchmark test suite all have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system with high performance interconnects. When running with the SX64 build a ZFS system might be a good idea to employ.

ANSYS 11.0 Standard Test Cases

    bm-1
    Name:Exhaust Elbow Manifold
    Description:Static structural analysis. Solved for equivalent stresses.
    Statistics:~850,000 DOF Model

    bm-2
    Name:Floor Panel
    Description:Surface body geometry. Harmonic analysis with mode superposition.
    Statistics:~765,000 DOF Model

    bm-3
    Name:Engine Assembly - Piston and Crank
    Description:Assembly with contact. Nonlinear structural DOF solution.
    Statistics:~250,000 DOF Model

    bm-4
    Name:Electric Motor
    Description:Electromagnetic analysis. Solved for magnetic field intensities.
    Statistics:~250,000 DOF Model

    bm-5
    Name:Brake Rotor
    Description:Thermal transient analysis. Solved for temperature DOF?s.
    Statistics:~230,000 DOF Model

    bm-6
    Name:Wing Section
    Description:Static structural analysis.
    Statistics:~250,000 DOF Model
    Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

    bm-7
    Name:Wing Section
    Description:Static structural analysis.
    Statistics:~800,000 DOF Model
    Notes:bm-6 and bm-7 are designed to demonstrate ability of systems to handle larger memory demands and increased I/O. bm-6 should run well on any system. Bm-7 will be substantially impared in performance on a 32-bit machine limited to 2 or 3 Gbytes of memory. The model used for these runs selects Solid95 20-node brick elements. The cost of matrix factorization for these elements is much higher than the shell dominated model in bm-1 Bm-7 generates a large 12.8 Gb file containing the factored matrix. It requires aver 1 Gbyte of solver memory to run in optimal out-of-core mode. On PC workstations the solver will run using less than optimal out-of-core memory requiring excessive I/O during factorization. Comparing bm-6 and bm-7 is a good indication of performance characteristics for systems as larger problems are attempted. These problems will differentiate hardware performance most accurately for users expecting to solve problems approaching 1 million degrees of freedom or more.

Disclosure Statement:

The following are trademarks or registered trademarks of ANSYS, Inc. : ANSYS Multiphysics TM All information on the ANSYS website is Copyrighted 2007 by ANSYS, Inc. Results at http://www.ansys.com/services/hardware-support-db.htm, July 2, 2007.

Hardware Configuration:

Sun Blade X6250

    4 2-socket Sun Blade X6250's
    2x3.0 GHz DC Intel Xeon EM64T 5160 (Woodcrest) processors
    32 GB memory
Software Configuration:
    64-bit Linux SuSE SLES 10
    (note: Sun works great with Linux, that is why we show all kinds of benchmarks! )
    ANSYS V11.0
    ANSYS 11 "Standard" Benchmark Test Suite

Like this post? del.icio.us | furl | slashdot | technorati | digg

Current SPECjbb2005 Sun Fire V890 2.1GHz

Wednesday Jun 20, 2007

The Sun Fire V890 with 8x 2.1 GHz UltraSPARC IV+ obtained a result of 244846 SPECjbb2005 bops, 30606 SPECjbb2005/JVM on the server-side Java benchmark.

The Sun Fire V890 is 40% faster than the expensive 4-core IBM p570 (4.7 GHz power6). I'll leave it to the reader to check the system prices to see the "per core performance" comparisons don't work at best, and are in fact are disingenuous because they make you compare systems of very different costs to you. You'll spend more for IBM.

At the high-end the IBM p570 16-core ($Megabucks) is only 2.8 times faster than the Sun Fire V890 2.1GHz (16-core). Again check the prices of the servers to really understand what you are getting.

Postscript:IBM will drone on every time about performance/core or trying to equate systems on a per-core basis. The reality is IBM's cores cost so much more than anyone elses cores. Get very suspicious any time you see IBM saying things like "8-core IBM system vs. 8-core Sun system" or "perf/core IBM wins". Price out one of these comparison and you WILL BE TRULY AMAZED at the smoke & mirrors they are using. Also note how many times they will mention performance on 16-core and they do a price comparison on 8-core with lots of memory on the Sun system and a sparse config on the IBM system.

If you want to see a chip to chip comparison see BM Seer's posting of UltraSPARC T1 result. Now that box rocks.

SPECjbb2005 Performance Chart (ordered by performance, bops : SPECjbb2005 Business Operations per Second (bigger is better)

System Date Processors Performance
(Chips, Cores, Threads) GHz/ Type SPECjbb2005
bops
JVMs SPECjbb2005
bops/JVM
IBM p570 power6 6/07 (8, 16, 32) 4.7 power6 691,975 8 86,497
Sun Fire V890 6/07 (8, 16, 16) 2.1 US-IV+ 244,846 8 30,606
IBM p570 power6 6/07 (2, 4, 16) 4.7 power6 175,474 2 87,737
Sun Fire V890 10/05 (8, 16, 16) 1.5 US-IV+ 117,986 4 29,497

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Disclosure Statement:

SPECjbb2005 Sun Fire V890 (8 chip, 16 cores 16 threads) 244846 SPECjbb2005 bops, 30606 SPECjbb2005 bops/JVM, Sun Fire V890 (8 chip, 16 cores, 16 threads) 117986 SPECjbb2005 bops, 29497 SPECjbb2005 bops/JVM, IBM System p 570 (4.7 GHz) running AIX 5L V5.3 175,474 SPECjbb2005 bops, 87,737 SPECjbb2005 bops/JVM, 2 chips, 4 cores, 8 threads, IBM System p 570 (4.7 GHz) running AIX 5L V5.3, 691,975 SPECjbb2005 bops, 86,497 SPECjbb2005 bops/JVM, 8 chips, 16 cores, 32 threads, SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 06/20/2007 on www.spec.org.

Results Summary
Results
Sun Fire V890: 244846 SPECjbb2005 bops
30606 SPECjbb2005 bops/JVM
Reference Date: June 20, 2007
Systems: Sun Fire V890, 64 GB
Total Number Processors: 8
Processor/GHz of Server: US-IV+ 2.1 GHz
Operating System: Solaris 10 6/06
JVM: Java HotSpot(TM) 32-Bit Server, Version 1.6.0_02

See Also

SPECjbb2005 Benchmark Reports

IBM Consolidation Press Release

[9] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

InfoWorld review Sun X4500 server "Thumper"

Wednesday Jun 13, 2007

InfoWorld has published a very positive review of the Sun Fire x4500 server. The combination X4500 running Solaris 10 with Sun's ZFS scored an great 8.8 rating with "Excellent" recommendation.

Paul Venezia, author/reviewer, started with a description X4500 (code name "Thumper"), highlighting it design and unprecedented hard drive capactiy. He evaluated x4500 running Solaris -- mentioning that other OSes simply did not have the file system capabilities to take full advantage of the huge number of X4500's drives.

    Paul writes: "ZFS and the X4500 go hand in hand, seemingly created for each other in a love story rivaling anything that’s come out of Hollywood in the past 10 years."

    "Thumper is aptly named and is a truly unique product from a company that seems to be pulling away from a faltering reputation in the server market. Recent studies have shown that within a few short years, the world will generate more data than it can store. It would seem that Sun is doing its part to bridge that gap."

Read all about it at the full link: www.infoworld.com/infoworld/article/07/06/07/23TCthumper_1.html

Like this post? del.icio.us | furl | slashdot | technorati | digg

Sun Blade X6250 & Sun Studio 12 x86 World Record

Wednesday Jun 13, 2007

Sun Blade X6250 Delivers a pair of x86 SPEC CPU2006 integer performance World Records:

Sun Blade X6250 (Dual-Core Intel Xeon 5160) and running Solaris 10 and using Sun Studio 12 compiler delivered the best x86 result for the SPECint2006 benchmark.

Sun Blade X6250 (Dual-Core Intel Xeon 5160) using Solaris 10 and Studio 12, delivered x86 4-core world record on SPECint_rate2006.

Sun Blade X6250 server had a SPECint2006 result of 21.0 and SPECint_rate2006 result of 65.0. The advanced features of freely available Sun Studio 12 complier were critical for getting this level of performance on the Sun Blade 6250.

The Sun Blade X6250 is only 3% slower than the peak score of the very-expensive new IBM POWER6 p570, which was recently announced. SPECint2006 is a single job stream. So let's now turn to comparing 4 thread results, in this case the Sun Blade X6250 is 7% faster than the peak SPECint_rate2006 score of he very-expensive new IBM POWER6 p570 (both IBM and Sun at 4 threads). Oh, and remember that anymore clock rate is not how you compare systems the Sun Blade X6250 is at 3GHz and the IBM POWER6 is at 4.7GHz. CPU frequency is basically irrelevant, it is CPU and system architecture that matters!

SPEC CPU2006 Landscape - bigger is better, selected recent results

SPECint2006

System Processors Performance Results
Type GHz Chips Cores Peak Base
IBM p570 (power6) Power6 4.7 1 1 21.6 17.8
Sun Blade X6250 Intel Xeon 5160 3.0 2 4 21.0
Supermicro X7DB8+ board Intel Xeon 5160 3.0 2 4 20.8 18.9
Sun Ultra 40 M2 AMD Opteron 2222SE 3.0 2 4 16.1

SPECint_rate2006

System Processors Performance Results
Type GHz Chips Cores Threads
/ Copies
Peak Base
Sun Blade X6250 Intel Xeon 5160 3.0 2 4 4 65.0
Supermicro X7DB8+ Intel Xeon 5160 3.0 2 4 4 64.9 60.0
IBM p570 (Power6) Power6 4.7 1 2 4 60.9 53.2
Sun Ultra 40 M2 AMD Opteron 2222SE 3.0 2 4 4 60.4
Fujitsu BX620 S3 Xeon 5160 (Woodcrest) 3.0 2 4 4 59.4 56.7

Results as of 06 Jun 2007 from www.spec.org.

Benchmark Description

SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

Disclosure Statement:

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org or from IBM public websites as of 6/06/07. Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 65.0 SPECint_rate2006; Sun Blade X6250 (Intel Xeon 5160, 2chips/4cores, Solaris 10) 21.0 SPECint2006; IBM System p 570 (POWER6, 1chip/1core, AIX 5L v5.3) 21.6 SPECint2006; IBM System p 570 (POWER6, 4 theads, 1chip/2cores, AIX 5L v5.3) 60.9 SPECint_rate2006.

System Configuration

Results
Reference Date: Jun 06, 2007
System: Sun Blade X6250
SPEED: 16GB memory 8x2GB
RATE : 32GB memory 8x4GB
X6250 21.0 SPECint2006
X6250 65.0 SPECint_rate2006
Total Number Processors: 2 x Intel Xeon 5160
Software: Solaris 10 11/06, Sun Studio 12 Compiler, MicroQuill's SmartHeap Library v7.4

See Also

  • All Benchmark results on Sun Blade 6000 Blade Server
  • [4] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg