BM Seer Facts & Questions from an Anonymous Sun Source

UltraSPARC T2 Plus: Designed for Web, App, & Database

Friday Apr 11, 2008

To see under the covers and the design of the amazing UltraSPARC T2 Plus based systems check out this great blog: http://blogs.sun.com/deniss/date/20080410. More postings to come on this great product. Remember it is delivered-system-performance that is key. A couple of warnings about the results of others:

  • Check the prices for the configs as benchmarked (especially watch up out for entry level pricing as realistic configs on competitors can cost 2x to 10x more when configured with the fastest processors and full fast memory)
  • Watch out for performance per widget metrics. Some things you can see (servers) some things you can't see (cores). Especially as some cores are extremely expensive and this totally throws of any advantage of per-core performance.
  • Watch for benchmarks that aren't published (I'm still waiting for IBM p570 4-core & 8-core stream performance or LMbench.
  • Watch out for 1.xGhz published on one benchmark and 2.xGHz published for performance.

Like this post? del.icio.us | furl | slashdot | technorati | digg

MySQL Information Performance, DB, Consolidation, Virtualization, Watts, & Costs

Tuesday Feb 26, 2008

{update} There is a lot of information about MySQL and Sun at http://www.sun.com/mysql In addition, I've put together a list of several blogs on MySQL performance.

* a very interesting results that compares Solaris Open-source stack (OS, DB, Web, Virtualizaion) on a 1-chip UltraSPARC T2 server and beating a proprietary stack on a 4-chip QC Xeon. Also measured actual watts and costs. Seems real configurations of HP DL580's draw lots of watts:
http://blogs.sun.com/ritu/entry/mysql_benchmark_us_t2_beats

* an ERP result using MySQL with SugarCRM:
http://blogs.sun.com/vanga/entry/scaling_sugarcrm_with_mysql_on

* great information about tuning MySQL on linux and some performance results:
http://blogs.sun.com/allanp/entry/tuning_mysql_on_linux

* nice writeup on InnoDB on SysBench:
http://blogs.sun.com/realneel/entry/tuning_mysql_innodb_for_sysbench

For a For a variety of things on MySQL see:
http://blogs.sun.com/barton808/entry/mysql_done_deal_talking_with

Like this post? del.icio.us | furl | slashdot | technorati | digg

MySQL Performance Results, DB, Consolidation, Virtualization, Watts, & Costs

Tuesday Feb 26, 2008

Getting ready to head off for lunch and I took off my blinders and I see all of the MySQL announcements. There are even several blogs on MySQL performance. Already some very interesting things coming from bringing MySQL into Sun.

* a very interesting results that compares Solaris Open-source stack (OS, DB, Web, Virtualizaion) on a 1-chip UltraSPARC T2 server and beating a proprietary stack on a 4-chip QC Xeon. Also measured actual watts and costs. Seems real configurations of HP DL580's draw lots of watts:
http://blogs.sun.com/ritu/entry/mysql_benchmark_us_t2_beats

* an ERP result using MySQL with SugarCRM:
http://blogs.sun.com/vanga/entry/scaling_sugarcrm_with_mysql_on

* great information about tuning MySQL on linux and some performance results:
http://blogs.sun.com/allanp/entry/tuning_mysql_on_linux

For a For a variety of things on MySQL see:
http://blogs.sun.com/barton808/entry/mysql_done_deal_talking_with

Like this post? del.icio.us | furl | slashdot | technorati | digg

Siebel CRM 8.0 PSPP UltraSPARC T2 beats POWER6 and sets World Record

Thursday Jan 10, 2008

arrgghhh... I've been asked to show only Sun's results. You must now do your own math with the information posted on Oracle's website: http://www.oracle.com/apps_benchmark/doc/Sun_Siebel8_10000_PSPP_On_Solaris.pdf
http://www.oracle.com/apps_benchmark/doc/IBM_Siebel8_7000_PSPP_On_AIX_POWER6%20Final.pdf

IBM now longer holds the world record and really needs to post a correction on:
http://www-03.ibm.com/systems/p/hardware/benchmarks/erp.html


Four Sun SPARC Enterprise T5120 and T5220 servers (UltraSPARC T2 processors) set a new World Record using Siebel's standard Platform Sizing and Performance Program (PSPP) benchmark suite with Siebel CRM 8.0 Industry Applications and Oracle 10g R2 DB running on Solaris 10.

The Sun results using the UltraSPARC T2 supported 30% higher Siebel benchmark concurrent users compared to other results on the Siebel CRM Applications Release 8.0.

Sun again shows the UltraSPARC T2 servers are ideally suited for Oracle database applications. The database server ran Oracle 10g R2 on this Siebel benchmark.

{ Stuff deleted }

Sun's Solaris and Coolthreads based servers proves once again to be the best combination for scalability and resource utilization in the datacenter, giving users a consistent response time on critical applications as shown 10,000 users benchmark on Siebel CRM 8.0.

The 10,000 Siebel benchmark users performance results on 4 Sun SPARC Enterprise T5120/T5220 servers running Solaris 10 delivers a scalable and cost-effective platform for deploying Siebel CRM Application and Oracle 10g R2 deployment.

The result of 10,000 active concurrent Siebel user benchmark was run end to end on the new generation of Sun SPARC Enterprise servers using coolthreads technology with the highest level of space and energy efficiency.

See Also: http://www.oracle.com/apps_benchmark/html/white-papers-siebel.html

Siebel CRM 8.0 PSPP Performance Chart as of 01/04/2008 (bigger is better)

Vendor Users Web Server Application Servers Database Server
Sun 10,000 1 x Sun SPARC Enterprise T5120
4 cores, 1 chip @1.2 GHz US-T2
    8 GB RAM
Siebel CRM 8.0 SIA [20204] ENU
Sun Java System Web
    Server 6.1 SP8
Solaris 10 8/07
1 x Sun SPARC Enterprise T5220
8 cores, 1 chip @1.4 GHz US-T2
    32 GB RAM
1 x Sun SPARC Enterprise T5220
8 cores, 1 chip @1.2 GHz US-T2
    32 GB RAM
Siebel CRM 8.0 SIA [20204] ENU
Solaris 10 8/07
1 x Sun SPARC Enterprise T5120
8 cores, 1 chip @1.2 GHz US-T2
    32 GB RAM
Oracle 10gR2 Database
    Server v10.2.0.1.0
Solaris 10 8/07
. . . . .

As noted on the official benchmark report: "Siebel CRM Release 8.0 Industry Application Platform Sizing and Performance benchmarks are based on Siebel CRM Release 8.0 customized industry applications and reflect a heavier scenario mix and more-aggressive think times than earlier version. Results of this benchmark are not comparable with those of prior Siebel CRM Release 7 benchmarks."

Benchmark Description

Siebel CRM 8.0 Platform Sizing and Performance Program (PSPP) is a multi-tier benchmark designed to stress the Siebel CRM Release 8.0 architecture and to demonstrate that large customers can successfully deploy many thousands of concurrent users. Among the Siebel CRM Release 8.0 architecture features exercised are the following:

  • Smart Web Architecture: Takes advantage of the newest Web browser technology to deliver a highly interactive experience. The interaction model, which is similar to Windows-based applications, also improves productivity. Utilization rates on the web server are low, allowing customers to retain existing Web server infrastructure.
  • Smart Network Architecture: Allows Siebel CRM Release 8.0 customers to leverage their existing network infrastructure by compressing and caching user interface components, so that browser/Web server interaction occurs only when the application requests data. This allows customers to avoid expensive network upgrades that can be necessary with competing products.

  • Server Connection Broker: The Siebel Connection Broker (SCBroker) is a server component that provides intraserver loadbalancing. SCBroker distributes server requests across multiple instances of Application Object Managers (AOMs) running on a Siebel server.
  • Smart Database Connection Pooling and Multiplexing: Allows customers to scale their database without intrducing expensive and complex transaction-processing monitors.
  • Server Request Broker: Server Request Broker (SRBroker) processes synchronous server requests - reuqests that must be run immediately, and for which the calling process waits for completion.
  • Enterprise Application Integration: Allows customers to integrate their existing systems with Siebel CRM applications.
  • eScript: eScript is a scripting or programming language that application developers use to write simple scripts to extend Siebel applications. Javascript, a popular scripting language used primarily on Web sites, is its core language.

The test simulated real-world requirements of a large organization, consisting of 10,000 concurrent, active users from multiple departments accessing a call center. Test conditions simulated service representatives running Siebel Financial Services Call Center and partner organizations running Siebel Partner Relationship Management (Web sales and Web service). Siebel Workflow and the Siebel Scripting Engine were used to incorporate business-process-management customizations. The application also simulated integration with Web systems, using the Siebel Enterprise Application Integration component and Siebel Web Services.

Disclosure Statement:

Siebel CRM 8.0 Platform Sizing and Performance Program (PSPP) benchmark as of 01/04/08. Sun Microsystems: 10,000 users, 1 x Sun SPARC Enterprise T5120 web server (4 cores, 1 chip @1.2 GHz US-T2, 8 GB RAM), Siebel CRM 8.0 SIA [20204] ENU, Sun Java System Web Server 6.1 SP8, Solaris 10 8/07, 1 x Sun SPARC Enterprise T5220 application server (8 cores, 1 chip @1.4 GHz US-T2, 32 GB RAM), 1 x Sun SPARC Enterprise T5220 application server (8 cores, 1 chip @1.2 GHz US-T2, 32 GB RAM) Siebel CRM 8.0 SIA [20204] ENU, Solaris 10 8/07, 1 x Sun SPARC Enterprise T5120 database server (8 cores, 1 chip @1.2 GHz US-T2, 32 GB RAM), Oracle 10gR2 Database Server v10.2.0.1.0, Solaris 10 8/07 Oracle, Siebel, registered trademarks of Oracle Corporation and/or its affiliates. More info www.oracle.com/apps_benchmark/html/white-papers-siebel.html

Power Reference:

Sun measured: Database Server (1.2 GHz T5120, 8 core, 32G memory): 291W, Gateway/Application Server #1 (1.4 GHz T5220, 8 core, 32G memory): 323W, Application Server #2 (1.2 GHz T5220, 8 core, 32G memory): 376W, Web Server (1.2 GHz T5120, 4 core, 8G memory): 212W.

IBM power calculation based on the following: The p570 is supplied in building blocks with 2 chips, 4 cores per chassis called a CEC. Up to 4 CECs can be connected together to create a single 16 chip, 32 core SMP system. Each CEC is 4 RU, and each CE is estimatedC to consume 1,040 watts when configured with 2 processors, based on the following: IBM p6 570 power specifications from 80% of maximum report power consumption published here, 06/07/07, posted at ftp://ftp.software.ibm.com/common/ssi/rep_sp/n/PSB01628USEN/PSB01628USEN.PDF

System Configuration

Certified Results 10,000 Users
Reference Date: January 4, 2008
Systems: 1 x Sun SPARC Enterprise T5120, web server (one 1.2GHz UltraSPARC T2)
1 x Sun SPARC Enterprise T5220, gateway/application server (one 1.4GHz UltraSPARC T2)
1 x Sun SPARC Enterprise T5220, application server(one 1.2GHz UltraSPARC T2)
1 x Sun SPARC Enterprise T5120, database server (one 1.2GHz UltraSPARC T2)
Operating System: Solaris 10 8/07
Software: Sun Java System Web Server 6.1 SP8
Siebel CRM 8.0 SIA [20204] ENU
Oracle 10gR2 Database Server v10.2.0.1.0

[21] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

UltraSparc T2 and Tigerton Tests

Monday Jan 07, 2008

You may have missed this writeup about UltraSparc T2 and Tigerton Tests which looked at low-level memory access measurements: http://blogs.sun.com/psa/entry/ultrasparc_t2_sun

A quote from a Sun employee I like... "You can only compute as fast as you can move data"

[1] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

gcc, SPEC CPU2006, & Sun SPARC Enterprise T5220

Friday Nov 02, 2007

Sun has released benchmarks results on SPEC CPU with GCCfss. GCCfss is a GCC compatible frontend with Sun Studio backend. If you have codes developed with GCC you can now just use it to run really fast on UltraSPARC T2, with all kinds of great optimizations.

For more on GCCfss see: http://cooltools.sunsource.net/gcc/

The Sun SPARC Enterprise T5220 server, running at 1.4 GHz, delivered a result 78.0 SPECint_rate2006 which is slightly lower (1%) when compared with the full Sun Studio 12 compiler.

The Sun SPARC Enterprise T5220 using the GCC for SPARC Systems (gccfss) compiler topped all competitor's single chip results including the 4.7 GHZ POWER6 result from IBM by over 28% which used a proprietary compiler.

The gccfss compiler allows one to use the optimal Sun SPARC optimization tools along with the popular gcc coding conventions and deliver performance that has not been possible before without time consuming code changes.

SPEC CPU2006 Performance Charts: bigger is better, selected recent results

SPECint_rate2006

System Processors Performance Results
Type GHz Chips Cores Threads Peak Base
T5120/T5220 UltraSPARC T2 1.4 1 8 64 78.5 73.0
T5220 (gccfss) UltraSPARC T2 1.4 1 8 64 78.0 71.6
HP DL360 G5 Intel X5365 3.0 1 4 4 61.3 53.8
IBM p 570 Power6 4.7 1 2 4 60.9 53.2
Fujitsu RX300 Intel X5355 2.66 1 4 4 52.8 50.5

Results as of 30 Oct 2007 from www.spec.org.

Benchmark Description

SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

Disclosure Statement:

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 10/30/07. Sun SPARC Enterprise T5220 gccfss (UltraSPARC T2, 1 chip, 8 cores), 78.0 SPECint_rate2006; IBM p570 (POWER6, 1 chip, 2 cores), 60.9 SPECint_rate2006; HP DL360 G5 (Intel X5365 1chip 4-core), 61.3 SPECint_rate2006; Fujitsu RX300 (Intel X5355, 1-chip, 4-core) 52.8 SPECint_rate2006; Sun SPARC Enterprise T5220 (UltraSPARC T2, 1 chip, 8 cores), 78.5 SPECint_rate2006.

Results Summary

Results
Reference Date: Oct 30, 2007
System: Sun SPARC Enterprise T5220
Processor: Sun UltraSPARC T2, 1.4 GHz
  78.0 SPECint_rate2006
Software: Solaris 10, Sun Studio 12 Compiler gccfss

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg

careful reading shows a lot

Wednesday Oct 24, 2007

You have to read some things carefully

    "...And the good news is that about 40-70% of the stuff we do in performance tuning actually ends up helping end users," -- Bruce Lindsay(IBM Fellow), May 06, http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf

    "This is feasible in the TPC-C benchmark because there are only five tables and only ten to fifteen columns in each table. In a more realistic application, where there are many more queries to be considered, the tables are typically much, much wider, in the 80 to 100 column range; and there are dozens if not thousands of tables. Then this kind of analysis(ed note: tuning) is no longer practical." -- Bruce Lindsay(IBM Fellow since '96), May 06, http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf

    "The idea is to get the numbers by hook and by crook." -- Bruce Lindsay(IBM Fellow since '96), May 06, http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf

    The TPC-C benchmark is an industry standard for measuring the ability of a system to process complex online transactions and large volumes of business data. The TPC-C benchmark is unique in the way it exercises all components of a system, including processors, memory, networking, storage, operating system and database software, demonstrating total system performance in a way that many of the other benchmarks touted by some competitors do not. -- Bruce Lindsay(IBM Fellow since '96), July 25, 2006, http://www-03.ibm.com/solutions/sap/doc/content/news/pressrelease/1623288130.html

Issues:
  • This means that 30% to 60% of IBM's TPC-C tuning is useless for customers.
  • IBM clearly over-hyped TPC-C, just 2-3 months after they publicly showed all of its problems and "optimizations" they used.

    Next:

      "Significantly, the high utilization rate of the System z9 mainframes -- systems can and do operate at 80 to 100 percent utilization -- combined with its ability to "virtualize" workloads, can enable a single mainframe processor to perform far more work than a single x86 processor running Microsoft Windows. The latter may run as low as 5 percent utilization." - IBM Press Release http://www-03.ibm.com/press/us/en/pressrelease/19577.wss
    Issues:
  • used different work for mainframe and for its competitor.
  • "do" and "may" mean very different things
  • "mainframes do operate at 80-100%", "x86 processor running Microsoft Windows. The latter may run as low as 5%". So it is a valid but totally useless statement.
  • An equally invalid statement: x86 do operate at 80-100% and mainframes may run as low as 5%.

    Next:

      "First of all, the math is really simple. 4.7 is greater than 1.4. IBM's POWER6 4.7 GHz chip is faster than Sun's 1.4 GHz UltraSPARC T1 chip. And second of all, the IBM System p 570 remains the #1 SPECjbb2005 2-core result (1)." Marketing Program manager of IBM performance blog, Jun07
    Issues:
  • Did not compare system or chip performance but only quoted the GHz of a chip?
  • Made a true statement about core count but ignored that that IBM cores cost much more than Sun UltraSPARC T1 and/or UltraSPARC T2 on a per core basis, I know this is hard to verify since IBM isn't public about pricing, so you'll have to ask your IBM people to price specific configurations for you, be specific so you understand exactly what is priced.

    Next:

      "Even more impressive, the processor bandwidth of the POWER6 chip – 300 gigabytes per second -- could download the entire iTunes catalog in about 60 seconds" - IBM Press Release http://www-03.ibm.com/press/us/en/pressrelease/21580.wss
    Issues:
  • Added every bandwidth (L3 cache, address bandwidth?!?,...) in a chip, even though peak memory bandwidth is limited to at least a 10th of that, delivered is a lot less.
  • stated "processor bandwidth", even though "delivered" system bandwidth would actually be required to move the data (not address :) ).

    Next:

      "IBM calculates that 30 SunFire v890s can be consolidated into a single rack of the new IBM machine, saving more than $100,000 per year on energy costs (3)." - IBM Press Release http://www-03.ibm.com/press/us/en/pressrelease/21580.wss
    Issues:
  • used 2 year old sun result compared to power6 yet to be shipped as of may press release
  • said V890, so that people think it is a current comparison, had to read in the footnotes that it was 1.5 GHz slower CPU. Sun has introduced 1.8GHz, and 2.1GHz since.
  • made a "conservative" comparisons by giving IBM another 15% in performance
  • claimed Sun at 20% utilisation and IBM at 60% utilisation, that is one way to get 3x over your competition :)
  • never showed exactly what power was drawn by a 4.7GHz, 64GB memory system, at ??MHz DDR2 used in the comparison, etc.

    This was a bit of a repeat, but some things should not be forgotten.

    I've never been about popularity or names. You don't need my expertise to see funny things in IBM's statements. Don't attack me, attack the facts. Anonymously yours, Sun's BM Seer.

    Disclosure statement:

    TPC-C is a trademark of Transaction Processing Performance Council (TPC). More info www.tpc.org.

    Like this post? del.icio.us | furl | slashdot | technorati | digg
  • UltraSPARC T2 - not being selective at all

    Wednesday Oct 10, 2007

    Take a look at the last dozen posts, lots of world records for the UltraSPARC T2. All done with a single chip that beats many 2-socket and even 4-socket X64 systems. That is pretty amazing!

    There are people attacking it, and many are weak or have their facts wrong. Linus Torvalds, suggests that sun was selective on benchmarks, what??? Sun compared against every system that x64 vendors submitted on every tier of the datacenter. Linus points to one case where a 2-socket quad-core was faster than the US T2, but the BIOS had to be changed from defaults to get a better result. I guess you can change the BIOS from Linux... :) oh, wait, yeah that X5365 result was on Windows?).

    Bottom line: the UltraSPARC T2 is very innovative, low power, 64-threads and leading the industry. Boots Solaris, Boots Ubuntu, Open-source hardware!,...

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    SPEC CPU2006 UltraSPARC T2 exactly real just like we said

    Tuesday Oct 09, 2007

    Today, Sun submitted the SPECint_rate2006 and SPECfp_rate2006 Single-Chip World Records on the Sun SPARC Enterprise T5120/T5220. What are these servers? UltraSPARC T2 1.4GHz servers that you will hear loads more on today.

    The Sun SPARC Enterprise T5120 is the 1RU version, and the Sun SPARC Enterprise T5220 is the 2RU version, both of these servers are electronically equivalent with the 2RU having a bit more connectivity and storage if you need.

    The Sun SPARC Enterprise T5220 server, running at 1.4 GHz, beat all single-chip results running SPECint_rate2006 with a result of 78.5.

    The Sun SPARC Enterprise T5220 server beats the best single IBM 4.7 GHz dual-core POWER6 processor result by 29% and beat the best published single 3 GHz Xeon quad-core by 28% on SPECint_rate2006. There are no single quad-core Opteron results published for SPECint_rate2006.

    "but I've heard there is no floating point on Niagara processors :) Nay, the 1.4GHz UltraSPARC T2 in the Sun SPARC Enterprise T5220 server, beat all single-chip results running SPECfp_rate2006 with a result of 62.3.

    The Sun SPARC Enterprise T5220 server beat the best single IBM 4.7 GHz POWER6 processor based system result by 7% and beats the best published single 3 GHz quad-core Intel Xeon by 61% for SPECfp_rate2006.

    There are no single quad-core Opteron results published for SPECfp_rate2006.

    SPEC CPU2006 Performance Charts - bigger is better, selected recent results, please see www.spec.org for complete results.

    SPECint_rate2006

    System Procs Perf Results
    Type GHz Chips
    Cores
    Threads Peak Base
    T5120/T5220 UltraSPARC T2 1.4 1, 8 64 78.5 73.0
    HP DL380 G5 Intel X5365 3.0 1, 4 4 61.3 53.8
    IBM p 570 Power6 4.7 1, 2 4 60.9 53.2
    Fujitsu RX300 Intel X5355 2.66 1,4 4 52.8 50.5

    SPECfp_rate2006

    System Processors Performance Results
    Type GHz Chips, Cores Threads Peak Base
    T5120/T5220 UltraSPARC T2 1.4 1, 8 64 62.3 57.9
    IBM p 570 Power6 4.7 1, 2 4 58.0 51.5
    HP DL380 G5 Intel X5365 3.0 1, 4 4 38.8 36.4
    Fujitsu RX300 Intel X5355 2.66 1, 4 4 37.5 36.2

    Results as of 27 Sep 2007 from www.spec.org.

    Benchmark Description

    SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and CINT2006. CFP2006 targets floating-point performance, while CINT2006 targets integer performance.

    Each suite has two different measures. First is the CPU measure, which is the performance on the suite as a single stream. This can be either a single thread or automatic compiled parallel run. This measure is further defined by base and optimized runs. Base uses the same compiler flags for all kernels, where optimized is allowed to use different compiler flags for each kernel. Results are compared against a baseline system run that was standardized by SPEC.

    The second measure is Rate. It is a measure of how many CPU measures can be run at a time. Typically, it is run as n processes on n processors. It shows how well the same job mix can run on a system under some load. It also is run as a base and optimized set of results.

    Disclosure Statement:

    SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 9/27/07. Sun SPARC Enterprise T5220/T5120 (UltraSPARC T2, 1 chip, 8 cores), 78.5 SPECint_rate2006, IBM p570 (POWER6, 1 chip, 2 cores), 60.9 SPECint_rate2006, HP DL380 G5 (X5365, 1 chip, 4 cores), 61.3 SPECint_rate2006, Sun SPARC Enterprise T5220 (UltraSPARC T2, 1 chip, 8 cores), 62.3 SPECfp_rate2006.

    SPEC, SPECfp reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from www.spec.org as of 9/27/07. Sun SPARC Enterprise T5220/T5120 (UltraSPARC T2, 1 chip, 8 cores), 62.3 SPECfp_rate2006. IBM p570 (POWER6, 1 chip, 2 cores), 58.0 SPECfp_rate2006, Sun SPARC Enterprise T5220 (UltraSPARC T2, 1 chip, 8 cores), 62.3 SPECfp_rate2006. HP DL380 G5 (X5365, 1 chip, 4 cores), 38.8 SPECfp_rate2006.

    System Configuration

    Results
    Reference Date: Oct 09, 2007
    System: Sun SPARC Enterprise T5120/T5220
    Processor: Sun UltraSPARC T2, 1.4 GHz
      78.5 SPECint_rate2006
      62.3 SPECfp_rate2006
    Software: Solaris 10, Sun Studio 12 Compiler

    [6] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Ultra-FAST Cryptography on the Sun UltraSPARC T2

    Tuesday Oct 09, 2007

    The UltraSPARC T2 processor has very low-overhead cryptography that basically allows one to add security at 'zero-cost'. A single Sun UltraSPARC T2 processor achieves up to 37,000 RSA 1024-bit signs/s and up to 38.9 Gbit/s of AES-128 throughput.

    The comparisons below demonstrate the performance a single 1.4 GHz UltraSPARC T2 on RSA1024 (sign private key) and AES128-CBC operations

    • The UltraSPARC T2 delivers over 4.1 times greater RSA1024 performance and 4.6 times greater AES128 performance than the 2-way quad-core 3 GHz Xeon.
    • The UltraSPARC T2 delivers over 9.3 times greater RSA1024 performance and 10 times greater AES128 performance than the 2-way dual-core 2.6 GHz Opteron.
    • The UltraSPARC T2 also delivers over 3 times greater RSA1024 performance and 15.6 times greater AES128 performance than a system using the Cavium Nitrox PX crypto acclerator card.
    • The UltraSPARC T2 delivers over 30.8 times greater RSA1024 performance than the 2-way IBM p510 1.5 GHz Power5 .

    To achieve these great results, the UltraSPARC T2 processor, has an on-chip cryptographic accelerator (SPU) that consists of a Cipher/hash unit and an enhanced modular arithmetic (MAU). This is an evolution of the previous generation UltraSPARC T1 that only contained modular arithmetic units.

    Sun's UltraSPARC T2 processor introduces support for common bulk ciphers, secure hash operations and both prime and binary field Elliptic Cryptography. The UltraSPARC T2 processor supports RC4, DES, 3DES, AES-128, AES-192, AES-256, MD5, SHA-1, SHA-256.

    Competitive Landscape

    RSA/AES Cryptography Benchmark Performance as of 8/07/07 as measured by Sun on the following platforms.

    System Processor GHz Chips
    total-
    cores
    Operating
    System
    1024bit
    RSA (K signs/s)
    AES128
    (Gbit/s)
    notes
    Sun SPARC Enterprise T5220 UltraSPARC T2 1.4 GHz 1 chip 8 core Solaris 10 37.0 K 38.9 Gb/s actual
    Accelerator card Sun SCA6000     13.0 K 1.0 Gb/s actual
    Sun Fire T2000 UltraSPARC T1 1.2 GHz 1 chip 8 core Solaris 10 12.9 K   actual
    Accelerator card Cavium Nitrox PX     12.0 K 2.5 Gb/s data-
    sheet
    Sun FireT1000 UltraSPARC T1 1 GHz 1 chip 8 core Solaris 10 10.8 K   actual
      quad-core Xeon 3 GHz 2 chip 8 core   9.0 K 8.4 Gb/s actual
    Sun Fire V490* US IV+ 1.5 GHz 4 chip 8 core Solaris 10 8.0 K   actual
    IBM p690 Power4 1.3 GHz 16 chip 32 core AIX 5.1 6.1 K   actual
    Fujitsu PP850 SPARC64 V 1.9 GHz 16 chip 16 core Solaris 10 6.0 K   actual
      Opteron 2.6 GHz 2 chip 4 core   4.0 K 3.9 Gb/s actual
    Sun Fire V40z Opteron sc 2.6 GHz 4 chip 4 core Solaris 10 3.3 K   actual
    Dell PE 1850 Xeon 3.6 GHz 2 chip 2 core Linux RHEL4 U1 1.9 K   actual
    Dell PE 2850 Xeon 3.6 GHz 2 chip 2 core Linux SLES 9 1.9 K   actual
    IBM p510 Power5 1.5 GHz 1 chip 2 core AIX 5.3 1.2 K   actual

    * Used a Sun Crypto Accelerator (SCA) 4000 in the Sun Fire V490 testing.

    Benchmark Description

    The RSA/AES-128 Cryptography benchmark was developed by Sun to measure maximum throughput of RSA private key (sign) operations and AES-128 operations that a system can perform. On multi-chip and/or multi-core systems, multiple processes are used to achieve the maximum throughput. Two microbenchmark programs are used, pk11rsaperf/pk11aesperf on Solaris and OpenSSL speed test on non-Solaris systems. Though each microbenchmark uses different crypto APIs, they both measure the raw throughput of the same crypto operations.

    • pk11rsaperf & pk11aesperf is part of a set of cryptographic microbenchmark programs internally developed by the Crypto Product Group of NSN. pk11aesperf measures the performance of AES-128-CBC processing, as performed by Solaris Cryptographic Framework via PKCS#11 API. Different key sizes, data sizes and varying numbers of concurrent threads can be tested. The metric is aggregate operations per second, for pk11rsaperf and Gb/s for pk11aesperf (for large object sizes).

    • OpenSSL speed test, the standard microbenchmark included in the open-source OpenSSL package, measures raw cryptographic algorithm performance as implemented in the OpenSSL library - libcrypto.so via its own proprietary crypto APIs. For RSA the metric is operations per second, while for AES-128-CBC, the metric is Gb/s.

    Disclosure Statement:

    RSA/DSA Cryptography Benchmark Performance as of 08/07/07 as measured by Sun on the following platforms: Sun SPARC Enterprise T5220 37K RSA1024 signs/s, 38.9 AES128 Gb/s; Sun SCA6000 (actual) 13K RSA1024 signs/s, 1 AES128 Gb/s; Cavium Nitrox PX (datasheet) 12K RSA1024 signs/s, 2.5 AES128 Gb/s; 2-chip quad-core Xeon 3GHz 9K RSA1024 signs/s, 8.4 AES128 Gb/s; 2-chip dual-core Opteron 2.6GHz 4K RSA1024 signs/s, 3.9 AES128 Gb/s; Sun Fire T2000 1.2 GHz (8 cores, 1 chip) Solaris 10, 12,850 RSA1024 signs/s; Sun Fire T1000 1GHz (8 cores, 1 chip) Solaris 10, 10,764 RSA1024 signs/s; IBM p690 1.3 GHz (32 cores, 16 chips) AIX 5.1, 6,131 RSA1024 signs/s; Fujitsu PRIMEPOWER850 1.9 GHz (16 cores, 16 chips) Solaris 10, 6,038 RSA1024 signs/s; Dell PowerEdge 1850 3.6 GHz (2 cores, 2 chips) RHEL4 U1, 1,926 RSA1024 signs/s; Dell PowerEdge 2850 3.6 GHz (2 cores, 2 chips) SLES 9, 1,900 RSA1024 signs/s; IBM p5 510 1.5 GHz (2 cores, 1 chip, SMT) AIX 5.3, 1,200 RSA1024 signs/s.

    Results Summary

    Results


    37.0 K RSA1024 signs/s




    38.9 Gb/s AES128

    Reference Date:


    August 7, 2007

    Systems:


    Sun SPARC Enterprise T5120/T5220

    Total Number Processors:


    1 chip / 8 cores/chip (8 threads/core)

    Processor/GHz of Server:


    Sun UltraSPARC T2 1.4 GHz

    Operating System:


    Solaris 10

    Like this post? del.icio.us | furl | slashdot | technorati | digg

    UltraSPARC T2 and its NIU that'll be good for you

    Tuesday Oct 09, 2007

    This summer we announced the UltraSPARC T2 chip, but one of the things we didn't talk about much was the US T2's NIU. So let's look at some of the delivered results.

    By the by, you'll see a lot more on performance results on this blog today. Yep it's launch day. Now many of my colleagues are at CEC bellying up to the buffets and dropping their money at the tables, some of us are at home working to show you the latest :)

    The UltraSPARC T2 10GbE has an integrated NIU (10GbE Network Interface Unit, the 10GbE is silent :) ) which provides better performance and reduces CPU overhead of network traffic when compared to servers that must use NICs (network interface cards). The UltraSPARC T2's NIU has much lower latency which reduces CPU overhead.

    • 10GbE transmit, maximum throughput is 36% higher performance and CPU efficiency is 23% better
    • 10GbE receive, maximum throughput is almost twice the performance, exceeding x8 bus bandwidth by 16%
    UltraSPARC T2 with NIU has the following measured results TX: 14.6 Gb/s; RX 18.2 Gb/s. In contract the Atlas NIC has the following measured results TX: 10.7 Gb/s; RX 9.4 Gb/s.

    All performance tests were run by Sun and of course used Solaris 10.

    ... but what about standard benchmarks, ny advice is either get this blog in your RSS or check back every hour as, "happy days are here again"

    [7] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    David Patterson(UC Berkeley) on UltraSPARC T2

    Thursday Sep 20, 2007

    In a video, Prof. David Patterson opines on UltraSPARC T2 and how Sun's CMT has some very fresh ideas to move the industry forward on practical computing. He talks about the Old-fashioned and out-dated concepts of "peak" or "clock speed" and the need to look at delivered performance. here, here!!!

    He shows that the UltraSPARC T2 out of box is almost 1.5x to 2x faster than Clovertown(quad-core) & Opteron and three to four times the watt/performance advantage. In addition, he says the UltraSPARC T2 is the easiest to program and auto-tune.

    He did conceded that if you look at the archaic (he used the word "old-fashioned") 20th century metrics of peak and clock that the UltraSPARC T2 is 2x to 7x slower -- but he (like I) focus on delivered performance.

    David Patterson is a Professor in Computer Science at Univ of California Berkeley. David and John Hennessy (Stanford University) wrote the textbook "Computer Architecture: A Quantitative Approach Fourth Edition"

    AFTERNOTE #1

      To respond the the comment below (comments are now closed). I'm sure the professor will give us more details and comparison of floating-point performance on important applications between the UltraSPARC T2 and the various X64 architectures, he's very complete and thoughtful.

      In terms of other comparisons. There are cpu benchmarks (int & fp) comparisons that were done at UltraSPARC T2 launch, best chip in several comparisons. There will probably be more even results before long on commercial benchmarks.

    AFTERNOTE #2

    [1] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Sun UltraSPARC T2 & IBM Power6 comparison blogged about

    Wednesday Aug 29, 2007

    There is more preliminary UltraSPARC T2 performance is blogged about at: http://blogs.sun.com/jmeyer/entry/power6_goes_thud_part_v

    Where John states:

      And IBM knows that next quarter, Sun will be introducing systems based on the new UltraSPARC T2, the world's first true system-on-a-chip and the world's fastest microprocessor. Preliminary estimates on one popular benchmark show that a single rack of UltraSPARC T2-based systems will outperform four racks of 4.7GHz POWER6-based p5 570s (more on that as we get closer to system announcement). No kidding.
    I haven't seen this internal info yet, but I'll try to dig it up. Looking at other tests, I believe this one.

    ...John also talks more about the lagging IBM POWER6 rollout.

    [2] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    Oracle & UltraSPARC T1 - Commercial databases and CMT are no problem

    Thursday Aug 23, 2007

    In the last posting we showed Oracle Database with SAP-SD benchmarks all running on a Sun Fire T2000. As Sun has been saying since Day one of CMT. Major databases are perfectly matched for UltraSPARC T1. By the way Sun has also used Open source databases on benchmarks as well.

    We have lots of customers deploying RDBMS on UltraSPARC T1 and planning on UltraSPARC T2 servers. It really works well even though competitors and doubters want to try to say it is special purpose, sorry it isn't.

    Here is an opinion:

      "Now Sun's T2 is out and it's pretty much the world beater they promised - 30% faster on SPEC throughput than IBM's 4.7 Ghz Dual core Power6 and, more significantly, one third the cost and somewhere between two and three times the throughput of the Itanium. ... anyone still buying HP-UX and Itanium after Rock comes out will be doing it because they hate Sun and are quietly hoping for a miracle, just as DEC's partisans (and HP's own MPE customer base) did before them." -- zdnet's Paul Murphy

      Source: "A Dumb prediction: IBM will Buy HP's Unix Customers," By Paul Murphy, zdnet, 08/17/07, http://blogs.zdnet.com/Murphy/?p=941

    [2] Comments
    Like this post? del.icio.us | furl | slashdot | technorati | digg

    UltraSPARC T2, and Old UltraSPARC T1 world records & new Xeon's

    Thursday Aug 09, 2007

    Postscript:

    Be careful when comparing performance results, as an example look at a comment in yesterday's "Can I use 64 threads in a chip?" posting. At least this comment pointed out that you can use 4-8 threads in 2 chip Intel-based systems, but it was really trying to be a stab at UltraSPARC Performance. Here was the comment: One really needs to look at the complete data on those .pdf's to make a fair comparison (also in the disclosure statement below).

  • First: The T2000 SAP-SD used a 1.2GHz UltraSPARC T1, Sun now ships faster 1.4GHz UltraSPARC T1, and has announced 1.4GHz UltraSPARC T2. The 1.4GHz T2 has double the threads of that 1.4GHz (double the computational power).
  • Second: The T2000 SAP-SD result was submitted in Dec 2005, at that time it was near the performance of the expensive 4-way POWER5 IBM p550.
  • Third: The 2-chip Dual-core Xeon SAP-SD result above was submitted 18 months after the T2000 SAP-SD result.
  • Fourth: Different versions of the benchmark. The 2-chip Dual-core Xeon was run with ECC 6.0 (not SAP 5.0). The a newer version of the benchmark takes more computational work to produce the same results. Dual-core SAP-SD result was also run with Solaris 10 on Xeon, how cool is that!
  • Fifth: The 2-chip quad-core Xeon SAP-SD result above was submitted 19 months after the T2000 SAP-SD result.
  • Sixth: The Sun result used open-source MySQL MaxDB database, how cool is that! The Xeon results used Oracle or MicroSoft SQL Server.
      postscript:
      Sun latter used Oracle, others suggested US T1 has some sort of silly database limitation - NOT TRUE!

    You'll see more results soon.

    Triffids, as a reminder if you work for a partner company of SAP you must put the following disclosures when you post results. If you are not they you don't need to put this in, but as you can see the data in it would have allowed you to make a better comparison of systems. Don't worry I'm not asking you to identify yourself at all.

    Disclosure Statement:

    Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark Sun Fire T2000 (1-way, 1 proc, 8 cores, 32 threads) 1x 1.2 GHz UltraSPARC T1, 32 GB mem, 950 SD benchmark users, 1.91 sec avg response time, Cert#2005047., MaxDB 7.5 database, Solaris 10; Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark IBM System eServer p5 550 (4-way, 4 procs, 4 cores, 8 threads) 4x 1.9 GHz POWER5+, 32GB mem, 1,000 SD benchmark users, 1.97s avg resp time, Cert#2005040, IBM DB2 Universal Database 8.2.2, SuSE Linux Enterprise Server 9; Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model BFi20 S2 (2 procs, 4 cores, 4 threads) 2x Intel Xeon 5160, 3.0 GHz, 16GB mem, 1,020 SD benchmark users, 1.94s avg resp time, Cert#2007031, Oracle 10g, Solaris 10; Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model TX300 S3 (2 procs, 8 cores, 8 threads) 4x Quad-Core Intel Xeon Processor X5355 2.66 GHz, 32GB mem, 1865 SD benchmark users, 1.99s avg resp time, Cert#2007025, SQL Server 2005, Windows Server 2003 Enterprise Edition; SAP, R/3, mySAP reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark.

    I edited in:
    2 processors into Quad-Core Intel Xeon Processor X5355 2.66 GHz

    ...and..

    32 threads to the Sun Fire T2000, 1 processor / 8 cores ...in order to make the comparisons more consistent.

    Like this post? del.icio.us | furl | slashdot | technorati | digg