Friday Apr 11, 2008
To see under the covers and the design of the amazing UltraSPARC T2 Plus based systems check out this great blog: http://blogs.sun.com/deniss/date/20080410.
More postings to come on this great product. Remember it is delivered-system-performance that is key.
A couple of warnings about the results of others:
- Check the prices for the configs as benchmarked (especially watch up out for entry level pricing as realistic configs on competitors can cost 2x to 10x more when configured with the fastest processors and full fast memory)
- Watch out for performance per widget metrics. Some things you can see (servers) some things you can't see (cores). Especially as some cores are extremely expensive and this totally throws of any advantage of per-core performance.
- Watch for benchmarks that aren't published (I'm still waiting for IBM p570 4-core & 8-core stream performance or LMbench.
- Watch out for 1.xGhz published on one benchmark and 2.xGHz published for performance.
Tuesday Feb 26, 2008
{update} There is a lot of information about MySQL and Sun at http://www.sun.com/mysql
In addition, I've put together a list of several blogs on MySQL performance.
* a very interesting results that compares Solaris Open-source stack (OS, DB, Web, Virtualizaion) on a 1-chip UltraSPARC T2 server and beating a proprietary stack on a 4-chip QC Xeon. Also measured actual watts and costs. Seems real configurations of HP DL580's draw lots of watts:
http://blogs.sun.com/ritu/entry/mysql_benchmark_us_t2_beats
* an ERP result using MySQL with SugarCRM:
http://blogs.sun.com/vanga/entry/scaling_sugarcrm_with_mysql_on
* great information about tuning MySQL on linux and some performance results:
http://blogs.sun.com/allanp/entry/tuning_mysql_on_linux
* nice writeup on InnoDB on SysBench:
http://blogs.sun.com/realneel/entry/tuning_mysql_innodb_for_sysbench
For a For a variety of things on MySQL see:
http://blogs.sun.com/barton808/entry/mysql_done_deal_talking_with
Tuesday Feb 26, 2008
Getting ready to head off for lunch and I took off my blinders and I see
all of the MySQL announcements. There are even several blogs on MySQL performance. Already some very interesting things coming from bringing MySQL into Sun.
* a very interesting results that compares Solaris Open-source stack (OS, DB, Web, Virtualizaion) on a 1-chip UltraSPARC T2 server and beating a proprietary stack on a 4-chip QC Xeon. Also measured actual watts and costs. Seems real configurations of HP DL580's draw lots of watts:
http://blogs.sun.com/ritu/entry/mysql_benchmark_us_t2_beats
* an ERP result using MySQL with SugarCRM:
http://blogs.sun.com/vanga/entry/scaling_sugarcrm_with_mysql_on
* great information about tuning MySQL on linux and some performance results:
http://blogs.sun.com/allanp/entry/tuning_mysql_on_linux
For a For a variety of things on MySQL see:
http://blogs.sun.com/barton808/entry/mysql_done_deal_talking_with
Thursday Jan 10, 2008
arrgghhh... I've been asked to show only Sun's results. You must now do your own
math with the information posted on Oracle's website:
http://www.oracle.com/apps_benchmark/doc/Sun_Siebel8_10000_PSPP_On_Solaris.pdf
http://www.oracle.com/apps_benchmark/doc/IBM_Siebel8_7000_PSPP_On_AIX_POWER6%20Final.pdf
IBM now longer holds the world record and really needs to post a correction on:
http://www-03.ibm.com/systems/p/hardware/benchmarks/erp.html
Four Sun SPARC Enterprise T5120 and T5220 servers (UltraSPARC T2
processors) set a
new World Record using Siebel's standard Platform Sizing and Performance
Program (PSPP) benchmark suite with Siebel CRM 8.0 Industry Applications
and Oracle 10g R2 DB running on Solaris 10.
The Sun results using the UltraSPARC T2 supported 30% higher Siebel
benchmark concurrent users compared to other results on the Siebel CRM
Applications Release 8.0.
Sun again shows the UltraSPARC T2 servers are ideally suited for
Oracle database applications. The database server ran Oracle 10g R2 on
this Siebel benchmark.
{ Stuff deleted }
Sun's Solaris and Coolthreads based servers proves once again to be
the best combination for scalability and resource utilization in the
datacenter, giving users a consistent response time on critical applications
as shown 10,000 users benchmark on Siebel CRM 8.0.
The 10,000 Siebel benchmark users performance results on 4 Sun SPARC
Enterprise T5120/T5220 servers running Solaris 10 delivers a scalable and
cost-effective platform for deploying Siebel CRM Application and Oracle 10g R2
deployment.
The result of 10,000 active concurrent Siebel user benchmark was run end
to end on the new generation of Sun SPARC Enterprise servers using coolthreads
technology with the highest level of space and energy efficiency.
See Also: http://www.oracle.com/apps_benchmark/html/white-papers-siebel.html
Siebel CRM 8.0 PSPP Performance Chart as of 01/04/2008 (bigger is better)
| Vendor |
Users |
Web Server |
Application Servers |
Database Server |
| Sun |
10,000 |
1 x Sun SPARC Enterprise T5120
4 cores, 1 chip @1.2 GHz US-T2
8 GB RAM
Siebel CRM 8.0 SIA [20204] ENU
Sun Java System Web   Server 6.1 SP8
Solaris 10 8/07 |
1 x Sun SPARC Enterprise T5220
8 cores, 1 chip @1.4 GHz US-T2
32 GB RAM
1 x Sun SPARC Enterprise T5220
8 cores, 1 chip @1.2 GHz US-T2
32 GB RAM
Siebel CRM 8.0 SIA [20204] ENU
Solaris 10 8/07 |
1 x Sun SPARC Enterprise T5120
8 cores, 1 chip @1.2 GHz US-T2
32 GB RAM
Oracle 10gR2 Database   Server v10.2.0.1.0
Solaris 10 8/07 |
| . |
. |
. |
. |
. |
As noted on the official benchmark report: "Siebel CRM Release 8.0 Industry
Application Platform Sizing and Performance benchmarks are based on Siebel CRM
Release 8.0 customized industry applications and reflect a heavier scenario mix
and more-aggressive think times than earlier version. Results of this benchmark
are not comparable with those of prior Siebel CRM Release 7 benchmarks."
Benchmark Description
Siebel CRM 8.0 Platform Sizing and Performance Program (PSPP) is a multi-tier
benchmark designed to stress the Siebel CRM Release 8.0 architecture and to demonstrate
that large customers can successfully deploy many thousands of concurrent users.
Among the Siebel CRM Release 8.0 architecture features exercised are the following:
-
Smart Web Architecture: Takes advantage of the newest Web browser technology to deliver
a highly interactive experience. The interaction model, which is similar to Windows-based
applications, also improves productivity. Utilization rates on the web server are low, allowing
customers to retain existing Web server infrastructure.
-
Smart Network Architecture: Allows Siebel CRM Release 8.0 customers to leverage their
existing network infrastructure by compressing and caching user interface components,
so that browser/Web server interaction occurs only when the application requests data.
This allows customers to avoid expensive network upgrades that can be necessary with
competing products.
-
Server Connection Broker: The Siebel Connection Broker (SCBroker) is a server component that
provides intraserver loadbalancing. SCBroker distributes server requests across multiple
instances of Application Object Managers (AOMs) running on a Siebel server.
-
Smart Database Connection Pooling and Multiplexing: Allows customers to scale their
database without intrducing expensive and complex transaction-processing monitors.
-
Server Request Broker: Server Request Broker (SRBroker) processes synchronous server
requests - reuqests that must be run immediately, and for which the calling process
waits for completion.
-
Enterprise Application Integration: Allows customers to integrate their existing systems
with Siebel CRM applications.
-
eScript: eScript is a scripting or programming language that application developers use to write
simple scripts to extend Siebel applications. Javascript, a popular scripting language used
primarily on Web sites, is its core language.
The test simulated real-world requirements of a large organization, consisting of 10,000
concurrent, active users from multiple departments accessing a call center. Test conditions
simulated service representatives running Siebel Financial Services Call Center and partner
organizations running Siebel Partner Relationship Management (Web sales and Web service).
Siebel Workflow and the Siebel Scripting Engine were used to incorporate business-process-management
customizations. The application also simulated integration with Web systems, using the Siebel
Enterprise Application Integration component and Siebel Web Services.
Disclosure Statement:
Siebel CRM 8.0 Platform Sizing and Performance Program (PSPP) benchmark as of 01/04/08.
Sun Microsystems: 10,000 users,
1 x Sun SPARC Enterprise T5120 web server (4 cores, 1 chip
@1.2 GHz US-T2, 8 GB RAM), Siebel CRM 8.0 SIA [20204] ENU, Sun Java System Web Server 6.1 SP8,
Solaris 10 8/07,
1 x Sun SPARC Enterprise T5220 application server (8 cores, 1 chip @1.4 GHz US-T2,
32 GB RAM), 1 x Sun SPARC Enterprise T5220 application server (8 cores, 1 chip @1.2 GHz US-T2, 32 GB RAM) Siebel
CRM 8.0 SIA [20204] ENU, Solaris 10 8/07,
1 x Sun SPARC Enterprise T5120 database server (8 cores, 1 chip @1.2 GHz US-T2, 32 GB RAM),
Oracle 10gR2 Database Server v10.2.0.1.0, Solaris 10 8/07
Oracle, Siebel, registered trademarks of Oracle Corporation and/or its affiliates.
More info www.oracle.com/apps_benchmark/html/white-papers-siebel.html
Power Reference:
Sun measured: Database Server (1.2 GHz T5120, 8 core, 32G memory): 291W,
Gateway/Application Server #1 (1.4 GHz T5220, 8 core, 32G memory): 323W,
Application Server #2 (1.2 GHz T5220, 8 core, 32G memory): 376W,
Web Server (1.2 GHz T5120, 4 core, 8G memory): 212W.
IBM power calculation based on the following:
The p570 is supplied in building blocks with 2 chips, 4 cores per chassis
called a CEC. Up to 4 CECs can be connected together to create a
single 16 chip, 32 core SMP system.
Each CEC is 4 RU, and each CE is estimatedC to consume 1,040 watts when
configured with 2 processors, based on the following:
IBM p6 570 power specifications from 80% of maximum report power
consumption published here, 06/07/07, posted at
ftp://ftp.software.ibm.com/common/ssi/rep_sp/n/PSB01628USEN/PSB01628USEN.PDF
System Configuration
| Certified Results |
|
10,000 Users |
| Reference Date: |
|
January 4, 2008 |
| Systems: |
|
1 x Sun SPARC Enterprise T5120, web server (one 1.2GHz UltraSPARC T2) |
|
|
1 x Sun SPARC Enterprise T5220, gateway/application server (one 1.4GHz UltraSPARC T2) |
|
|
1 x Sun SPARC Enterprise T5220, application server(one 1.2GHz UltraSPARC T2) |
|
|
1 x Sun SPARC Enterprise T5120, database server (one 1.2GHz UltraSPARC T2) |
| Operating System: |
|
Solaris 10 8/07 |
| Software: |
|
Sun Java System Web Server 6.1 SP8
|
|
|
Siebel CRM 8.0 SIA [20204] ENU |
|
|
Oracle 10gR2 Database Server v10.2.0.1.0 |
Monday Jan 07, 2008
You may have missed this writeup about UltraSparc T2 and Tigerton Tests which looked at low-level memory access measurements: http://blogs.sun.com/psa/entry/ultrasparc_t2_sun
A quote from a Sun employee I like... "You can only compute as fast as you can move data"
Friday Nov 02, 2007
Sun has released benchmarks results on SPEC CPU with GCCfss. GCCfss is a GCC compatible frontend with Sun Studio backend. If you have codes developed with
GCC you can now just use it to run really fast on UltraSPARC T2, with all
kinds of great optimizations.
For more on GCCfss see:
http://cooltools.sunsource.net/gcc/
The Sun SPARC Enterprise
T5220 server, running at 1.4 GHz, delivered a result
78.0 SPECint_rate2006 which is slightly lower (1%) when
compared with the full Sun Studio 12 compiler.
The Sun SPARC Enterprise T5220 using the GCC for SPARC Systems
(gccfss) compiler topped all competitor's single chip results
including the 4.7 GHZ POWER6 result from IBM by over 28%
which used a proprietary compiler.
The gccfss compiler allows one to use the optimal Sun SPARC optimization tools
along with the popular gcc coding conventions and deliver performance
that has not been possible before without time consuming code
changes.
SPEC CPU2006 Performance Charts: bigger is better, selected recent results
SPECint_rate2006
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips |
Cores |
Threads |
Peak |
Base |
| T5120/T5220 |
UltraSPARC T2 |
1.4 |
1 |
8 |
64 |
78.5 |
73.0 |
| T5220 (gccfss) |
UltraSPARC T2 |
1.4 |
1 |
8 |
64 |
78.0 |
71.6 |
| HP DL360 G5 |
Intel X5365 |
3.0 |
1 |
4 |
4 |
61.3 |
53.8 |
| IBM p 570 |
Power6 |
4.7 |
1 |
2 |
4 |
60.9 |
53.2 |
| Fujitsu RX300 |
Intel X5355 |
2.66 |
1 |
4 |
4 |
52.8 |
50.5 |
Results as of 30 Oct 2007 from www.spec.org.
Benchmark Description
SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and
CINT2006. CFP2006 targets floating-point performance, while CINT2006
targets integer performance.
Each suite has two different measures. First is the CPU measure, which
is the performance on the suite as a single stream. This can be either
a single thread or automatic compiled parallel run. This measure is
further defined by base and optimized runs. Base uses the same compiler
flags for all kernels, where optimized is allowed to use different
compiler flags for each kernel. Results are compared against a baseline
system run that was standardized by SPEC.
The second measure is Rate. It is a measure of how many CPU measures
can be run at a time. Typically, it is run as n processes on n
processors. It shows how well the same job mix can run on a system
under some load. It also is run as a base and optimized set of
results.
Disclosure Statement:
SPEC, SPECint reg tm of Standard Performance Evaluation Corporation.
Sun result submitted to SPEC, other results from www.spec.org as of 10/30/07.
Sun SPARC Enterprise T5220 gccfss (UltraSPARC T2, 1 chip, 8 cores),
78.0 SPECint_rate2006; IBM p570 (POWER6, 1 chip, 2 cores), 60.9 SPECint_rate2006; HP DL360 G5 (Intel X5365 1chip 4-core),
61.3 SPECint_rate2006; Fujitsu RX300 (Intel X5355, 1-chip, 4-core) 52.8 SPECint_rate2006;
Sun SPARC Enterprise T5220 (UltraSPARC T2, 1 chip, 8 cores), 78.5 SPECint_rate2006.
Results Summary
| Results |
| Reference Date: |
|
Oct 30, 2007 |
| System: |
|
Sun SPARC Enterprise T5220 |
| Processor: |
|
Sun UltraSPARC T2, 1.4 GHz |
|
|
|
78.0 SPECint_rate2006 |
| Software: |
|
Solaris 10, Sun Studio 12 Compiler gccfss |
Wednesday Oct 24, 2007
You have to read some things carefully
"...And the good news is that about 40-70% of the
stuff we do in performance tuning actually ends up helping end users,"
-- Bruce Lindsay(IBM Fellow), May 06, http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf
"This is feasible in the TPC-C benchmark because there
are only five tables and only ten to fifteen columns in each table.
In a more realistic application, where there are many more queries
to be considered, the tables are typically much, much wider, in
the 80 to 100 column range; and there are dozens if not thousands
of tables. Then this kind of analysis(ed note: tuning) is no longer
practical." -- Bruce Lindsay(IBM Fellow since '96), May 06, http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf
"The idea is to get the numbers by hook and by crook." -- Bruce Lindsay(IBM Fellow since '96), May 06, http://www.sigmod.org/sigmod/record/issues/0506/p71-column-winslet.pdf
The TPC-C benchmark is an industry standard for measuring the ability of a system to process complex online transactions and large volumes of business data. The TPC-C benchmark is unique in the way it exercises all components of a system, including processors, memory, networking, storage, operating system and database software, demonstrating total system performance in a way that many of the other benchmarks touted by some competitors do not. -- Bruce Lindsay(IBM Fellow since '96), July 25, 2006, http://www-03.ibm.com/solutions/sap/doc/content/news/pressrelease/1623288130.html
Issues:
This means that 30% to 60% of IBM's TPC-C tuning is useless for customers.
IBM clearly over-hyped TPC-C, just 2-3 months after they publicly showed all of its problems and "optimizations" they used.
Next:
"Significantly, the high utilization rate of the System z9 mainframes -- systems can and do operate at 80 to 100 percent utilization -- combined with its ability to "virtualize" workloads, can enable a single mainframe processor to perform far more work than a single x86 processor running Microsoft Windows. The latter may run as low as 5 percent utilization." - IBM Press Release http://www-03.ibm.com/press/us/en/pressrelease/19577.wss
Issues:
used different work for mainframe and for its competitor.
"do" and "may" mean very different things
"mainframes do operate at 80-100%", "x86 processor running Microsoft Windows. The latter may run as low as 5%". So it is a valid but totally useless statement.
An equally invalid statement: x86 do operate at 80-100% and
mainframes may run as low as 5%.
Next:
"First of all, the math is really simple. 4.7 is greater than 1.4. IBM's POWER6 4.7 GHz chip is faster than Sun's 1.4 GHz UltraSPARC T1 chip. And second of all, the IBM System p 570 remains the #1 SPECjbb2005 2-core result (1)."
Marketing Program manager of IBM performance blog, Jun07
Issues:
Did not compare system or chip performance but only quoted the GHz of a chip?
Made a true statement about core count but ignored that that IBM cores cost much more than Sun UltraSPARC T1 and/or UltraSPARC T2 on a per core basis, I know this
is hard to verify since IBM isn't public about pricing, so you'll have to ask your IBM people to price specific configurations for you, be specific so you understand exactly what is priced.
Next:
"Even more impressive, the processor bandwidth of the POWER6 chip – 300 gigabytes per second -- could download the entire iTunes catalog in about 60 seconds" - IBM Press Release http://www-03.ibm.com/press/us/en/pressrelease/21580.wss
Issues:
Added every bandwidth (L3 cache, address bandwidth?!?,...) in a chip,
even though peak memory bandwidth is limited to at least a 10th of that, delivered is a lot less.
stated "processor bandwidth", even though "delivered" system bandwidth would actually be required to move the data (not address
).
Next:
"IBM calculates that 30 SunFire v890s can be consolidated into a single rack of the new IBM machine, saving more than $100,000 per year on energy costs (3)." - IBM Press Release http://www-03.ibm.com/press/us/en/pressrelease/21580.wss
Issues:
used 2 year old sun result compared to power6 yet to be shipped as of may press release
said V890, so that people think it is a current comparison, had to read in the footnotes that it was 1.5 GHz slower CPU. Sun has introduced 1.8GHz, and 2.1GHz since.
made a "conservative" comparisons by giving IBM another 15% in performance
claimed Sun at 20% utilisation and IBM at 60% utilisation, that is one way to get 3x
over your competition
never showed exactly what power was drawn by a 4.7GHz, 64GB memory system,
at ??MHz DDR2 used in the comparison, etc.
This was a bit of a repeat, but some things should not be forgotten.
I've never been about popularity or names. You don't need my expertise
to see funny things in IBM's statements. Don't attack me, attack the facts.
Anonymously yours, Sun's BM Seer.
Disclosure statement:
TPC-C is a trademark of Transaction Processing Performance Council (TPC). More info www.tpc.org.
Wednesday Oct 10, 2007
Take a look at the last dozen posts, lots of world records for the UltraSPARC T2.
All done with a single chip that beats many 2-socket and even 4-socket X64 systems.
That is pretty amazing!
There are people attacking it, and many are weak or have their facts wrong.
Linus Torvalds, suggests that sun was selective on benchmarks, what??? Sun compared
against every system that x64 vendors submitted on every tier of the datacenter.
Linus points to one case where a 2-socket quad-core was faster than the US T2, but
the BIOS had to be changed from defaults to get a better result.
I guess you can change the BIOS from Linux...
oh, wait, yeah that X5365 result was on Windows?).
Bottom line: the UltraSPARC T2 is very innovative, low power, 64-threads and leading the industry. Boots Solaris, Boots Ubuntu, Open-source hardware!,...
Tuesday Oct 09, 2007
Today, Sun submitted the SPECint_rate2006 and SPECfp_rate2006
Single-Chip World Records on the Sun SPARC Enterprise T5120/T5220.
What are these servers? UltraSPARC T2 1.4GHz servers that you will
hear loads more on today.
The Sun SPARC Enterprise T5120 is the 1RU version, and the
Sun SPARC Enterprise T5220 is the 2RU version, both of these
servers are electronically equivalent with the 2RU having a bit more
connectivity and storage if you need.
The Sun SPARC Enterprise T5220 server, running at 1.4 GHz, beat all single-chip results running SPECint_rate2006 with a result of 78.5.
The Sun SPARC Enterprise T5220 server beats the best single IBM 4.7 GHz dual-core POWER6 processor result by 29% and beat the best published single
3 GHz Xeon quad-core by 28% on SPECint_rate2006. There are no single quad-core Opteron results published for SPECint_rate2006.
"but I've heard there is no floating point on Niagara processors
Nay, the 1.4GHz UltraSPARC T2 in the Sun SPARC Enterprise T5220 server, beat
all single-chip results running SPECfp_rate2006 with a result of 62.3.
The Sun SPARC Enterprise T5220 server beat the best single IBM 4.7 GHz
POWER6 processor based system result by 7% and beats the best published
single 3 GHz quad-core Intel Xeon by 61% for SPECfp_rate2006.
There are no single quad-core Opteron results published for SPECfp_rate2006.
SPEC CPU2006 Performance Charts -
bigger is better, selected recent results,
please see
www.spec.org for complete results.
SPECint_rate2006
| System |
Procs |
Perf Results |
| Type |
GHz |
Chips Cores |
Threads |
Peak |
Base |
| T5120/T5220 |
UltraSPARC T2 |
1.4 |
1, 8 |
64 |
78.5 |
73.0 |
| HP DL380 G5 |
Intel X5365 |
3.0 |
1, 4 |
4 |
61.3 |
53.8 |
| IBM p 570 |
Power6 |
4.7 |
1, 2 |
4 |
60.9 |
53.2 |
| Fujitsu RX300 |
Intel X5355 |
2.66 |
1,4 |
4 |
52.8 |
50.5 |
SPECfp_rate2006
| System |
Processors |
Performance Results |
| Type |
GHz |
Chips, Cores |
Threads |
Peak |
Base |
| T5120/T5220 |
UltraSPARC T2 |
1.4 |
1, 8 |
64 |
62.3 |
57.9 |
| IBM p 570 |
Power6 |
4.7 |
1, 2 |
4 |
58.0 |
51.5 |
| HP DL380 G5 |
Intel X5365 |
3.0 |
1, 4 |
4 |
38.8 |
36.4 |
| Fujitsu RX300 |
Intel X5355 |
2.66 |
1, 4 |
4 |
37.5 |
36.2 |
Results as of 27 Sep 2007 from www.spec.org.
Benchmark Description
SPEC CPU2006 is made up of two suites of benchmarks, CFP2006 and
CINT2006. CFP2006 targets floating-point performance, while CINT2006
targets integer performance.
Each suite has two different measures. First is the CPU measure, which
is the performance on the suite as a single stream. This can be either
a single thread or automatic compiled parallel run. This measure is
further defined by base and optimized runs. Base uses the same compiler
flags for all kernels, where optimized is allowed to use different
compiler flags for each kernel. Results are compared against a baseline
system run that was standardized by SPEC.
The second measure is Rate. It is a measure of how many CPU measures
can be run at a time. Typically, it is run as n processes on n
processors. It shows how well the same job mix can run on a system
under some load. It also is run as a base and optimized set of
results.
Disclosure Statement:
SPEC, SPECint reg tm of Standard Performance Evaluation Corporation.
Sun result submitted to SPEC, other results from www.spec.org as of 9/27/07.
Sun SPARC Enterprise T5220/T5120 (UltraSPARC T2, 1 chip, 8 cores),
78.5 SPECint_rate2006, IBM p570 (POWER6, 1 chip, 2 cores), 60.9 SPECint_rate2006, HP DL380 G5 (X5365, 1 chip, 4 cores), 61.3 SPECint_rate2006,
Sun SPARC Enterprise T5220 (UltraSPARC T2, 1 chip, 8 cores),
62.3 SPECfp_rate2006.
SPEC, SPECfp reg tm of Standard Performance Evaluation Corporation.
Sun result submitted to SPEC, other results from www.spec.org as of 9/27/07.
Sun SPARC Enterprise T5220/T5120 (UltraSPARC T2, 1 chip, 8 cores),
62.3 SPECfp_rate2006.
IBM p570 (POWER6, 1 chip, 2 cores), 58.0 SPECfp_rate2006,
Sun SPARC Enterprise T5220 (UltraSPARC T2, 1 chip, 8 cores),
62.3 SPECfp_rate2006.
HP DL380 G5 (X5365, 1 chip, 4 cores), 38.8 SPECfp_rate2006.
System Configuration
| Results |
| Reference Date: |
|
Oct 09, 2007 |
| System: |
|
Sun SPARC Enterprise T5120/T5220 |
| Processor: |
|
Sun UltraSPARC T2, 1.4 GHz |
|
|
|
78.5 SPECint_rate2006 |
|
|
|
62.3 SPECfp_rate2006 |
| Software: |
|
Solaris 10, Sun Studio 12 Compiler |
Tuesday Oct 09, 2007
The UltraSPARC T2 processor has very low-overhead cryptography that
basically allows one to add security at 'zero-cost'. A single Sun UltraSPARC
T2 processor achieves up to 37,000 RSA 1024-bit signs/s and up to
38.9 Gbit/s of AES-128 throughput.
The comparisons below demonstrate the performance a single 1.4 GHz
UltraSPARC T2 on RSA1024 (sign private key) and AES128-CBC operations
- The UltraSPARC T2 delivers over 4.1
times greater RSA1024 performance and 4.6 times greater AES128
performance than the 2-way quad-core 3 GHz Xeon.
- The UltraSPARC T2 delivers over 9.3
times greater RSA1024 performance and 10 times greater AES128
performance than the 2-way dual-core 2.6 GHz Opteron.
- The UltraSPARC T2 also delivers over 3
times greater RSA1024 performance and 15.6 times greater AES128
performance than a system using the Cavium Nitrox PX crypto
acclerator card.
- The UltraSPARC T2 delivers over 30.8 times greater RSA1024
performance than the 2-way IBM p510 1.5 GHz Power5 .
To achieve these great results, the UltraSPARC T2 processor, has an on-chip cryptographic
accelerator (SPU) that consists of a Cipher/hash unit and an enhanced modular
arithmetic (MAU). This is an evolution of the previous generation UltraSPARC T1 that only contained modular arithmetic units.
Sun's UltraSPARC T2 processor introduces support for common bulk
ciphers, secure hash operations and both prime and binary field
Elliptic Cryptography. The UltraSPARC T2 processor supports RC4, DES,
3DES, AES-128, AES-192, AES-256, MD5, SHA-1, SHA-256.
Competitive Landscape
RSA/AES Cryptography Benchmark Performance as of 8/07/07 as
measured by Sun on the following platforms.
| System |
Processor GHz |
Chips total- cores |
Operating System |
1024bit RSA (K signs/s) |
AES128 (Gbit/s) |
notes |
| Sun SPARC Enterprise T5220 |
UltraSPARC T2 1.4 GHz |
1 chip 8 core |
Solaris 10 |
37.0 K |
38.9 Gb/s |
actual |
| Accelerator card |
Sun SCA6000 |
|
|
13.0 K |
1.0 Gb/s |
actual |
| Sun Fire T2000 |
UltraSPARC T1 1.2 GHz |
1 chip 8 core |
Solaris 10 |
12.9 K |
|
actual |
| Accelerator card |
Cavium Nitrox PX |
|
|
12.0 K |
2.5 Gb/s |
data- sheet |
| Sun FireT1000 |
UltraSPARC T1 1 GHz |
1 chip 8 core |
Solaris 10 |
10.8 K |
|
actual |
| |
quad-core Xeon 3 GHz |
2 chip 8 core |
|
9.0 K |
8.4 Gb/s |
actual |
| Sun Fire V490* |
US IV+ 1.5 GHz |
4 chip 8 core |
Solaris 10 |
8.0 K |
|
actual |
| IBM p690 |
Power4 1.3 GHz |
16 chip 32 core |
AIX 5.1 |
6.1 K |
|
actual |
| Fujitsu PP850 |
SPARC64 V 1.9 GHz |
16 chip 16 core |
Solaris 10 |
6.0 K |
|
actual |
| |
Opteron 2.6 GHz |
2 chip 4 core |
|
4.0 K |
3.9 Gb/s |
actual |
| Sun Fire V40z |
Opteron sc 2.6 GHz |
4 chip 4 core |
Solaris 10 |
3.3 K |
|
actual |
| Dell PE 1850 |
Xeon 3.6 GHz |
2 chip 2 core |
Linux RHEL4 U1 |
1.9 K |
|
actual |
| Dell PE 2850 |
Xeon 3.6 GHz |
2 chip 2 core |
Linux SLES 9 |
1.9 K |
|
actual |
| IBM p510 |
Power5 1.5 GHz |
1 chip 2 core |
AIX 5.3 |
1.2 K |
|
actual |
* Used a Sun Crypto Accelerator (SCA) 4000 in the Sun Fire V490
testing.
Benchmark Description
The RSA/AES-128 Cryptography benchmark was developed by Sun to
measure maximum throughput of RSA private key (sign) operations and
AES-128 operations that a system can perform. On multi-chip and/or
multi-core systems, multiple processes are used to achieve the
maximum throughput. Two microbenchmark programs are used,
pk11rsaperf/pk11aesperf on Solaris and OpenSSL speed test on
non-Solaris systems. Though each microbenchmark uses different crypto
APIs, they both measure the raw throughput of the same crypto
operations.
pk11rsaperf & pk11aesperf is part of a set of
cryptographic microbenchmark programs internally developed by the
Crypto Product Group of NSN. pk11aesperf measures the performance of
AES-128-CBC processing, as performed by Solaris Cryptographic
Framework via PKCS#11 API. Different key sizes, data sizes and
varying numbers of concurrent threads can be tested. The metric is
aggregate operations per second, for pk11rsaperf and Gb/s for
pk11aesperf (for large object sizes).
OpenSSL speed test, the standard microbenchmark
included in the open-source OpenSSL package, measures raw
cryptographic algorithm performance as implemented in the OpenSSL
library - libcrypto.so via its own proprietary crypto APIs. For RSA
the metric is operations per second, while for AES-128-CBC, the
metric is Gb/s.
Disclosure Statement:
RSA/DSA Cryptography Benchmark Performance as of 08/07/07 as measured by Sun on the following platforms:
Sun SPARC Enterprise T5220 37K RSA1024 signs/s, 38.9 AES128 Gb/s;
Sun SCA6000 (actual) 13K RSA1024 signs/s, 1 AES128 Gb/s;
Cavium Nitrox PX (datasheet) 12K RSA1024 signs/s, 2.5 AES128 Gb/s;
2-chip quad-core Xeon 3GHz 9K RSA1024 signs/s, 8.4 AES128 Gb/s;
2-chip dual-core Opteron 2.6GHz 4K RSA1024 signs/s, 3.9 AES128 Gb/s;
Sun Fire T2000 1.2 GHz (8 cores,
1 chip) Solaris 10, 12,850 RSA1024 signs/s; Sun Fire T1000 1GHz (8 cores, 1 chip) Solaris 10, 10,764
RSA1024 signs/s; IBM p690 1.3 GHz
(32 cores, 16 chips) AIX 5.1, 6,131 RSA1024 signs/s; Fujitsu PRIMEPOWER850 1.9 GHz (16 cores, 16
chips) Solaris 10, 6,038 RSA1024 signs/s; Dell PowerEdge 1850 3.6 GHz (2 cores, 2 chips) RHEL4 U1,
1,926 RSA1024 signs/s; Dell
PowerEdge 2850 3.6 GHz (2 cores, 2 chips) SLES 9, 1,900 RSA1024
signs/s; IBM p5 510 1.5 GHz (2 cores, 1
chip, SMT) AIX 5.3, 1,200 RSA1024 signs/s.
Results Summary
|
Results
|
|
37.0 K RSA1024 signs/s
|
|
|
|
|
38.9 Gb/s AES128
|
|
Reference Date:
|
|
August 7, 2007
|
|
Systems:
|
|
Sun SPARC Enterprise T5120/T5220
|
|
Total Number Processors:
|
|
1 chip / 8 cores/chip (8 threads/core)
|
|
Processor/GHz of Server:
|
|
Sun UltraSPARC T2 1.4 GHz
|
|
Operating System:
|
|
Solaris 10
|
Tuesday Oct 09, 2007
This summer we announced the UltraSPARC T2 chip, but one of the things
we didn't talk about much was the US T2's NIU. So let's look at some
of the delivered results.
By the by, you'll see a lot more on performance results on this blog
today. Yep it's launch day. Now many of my colleagues are at CEC bellying
up to the buffets and dropping their money at the tables, some of us are
at home working to show you the latest
The UltraSPARC T2 10GbE has an integrated NIU (10GbE Network Interface Unit, the 10GbE is silent
) which provides better
performance and reduces CPU overhead of network traffic when compared
to servers that must use NICs (network interface cards). The
UltraSPARC T2's NIU has much lower latency which reduces CPU overhead.
- 10GbE transmit, maximum throughput is 36% higher performance and CPU
efficiency is 23% better
- 10GbE receive, maximum throughput is almost twice the performance,
exceeding x8 bus bandwidth by 16%
UltraSPARC T2 with NIU has the following measured results TX: 14.6 Gb/s; RX 18.2 Gb/s. In contract the Atlas NIC has the following measured results TX: 10.7 Gb/s; RX 9.4 Gb/s.
All performance tests were run by Sun and of course used Solaris 10.
... but what about standard benchmarks, ny advice is either get this
blog in your RSS or check back every hour as, "happy days are here again"
Thursday Sep 20, 2007
In a video, Prof. David Patterson opines on UltraSPARC T2 and how Sun's CMT
has some very fresh ideas to move the industry forward on practical
computing. He talks about the Old-fashioned and out-dated concepts of "peak" or "clock speed" and the need to look at delivered performance.
here, here!!!
He shows that the UltraSPARC T2 out of box is almost 1.5x to 2x faster
than Clovertown(quad-core) & Opteron and three to four times the
watt/performance advantage. In addition, he says the UltraSPARC T2 is
the easiest to program and auto-tune.
He did conceded that if you look at the archaic (he used the word
"old-fashioned") 20th century metrics of peak and clock that the
UltraSPARC T2 is 2x to 7x slower -- but he (like I) focus on delivered
performance.
David Patterson is a Professor in Computer Science at Univ of
California Berkeley. David and John Hennessy (Stanford University)
wrote the textbook "Computer Architecture: A Quantitative Approach Fourth Edition"
AFTERNOTE #1
To respond the the comment below (comments are now closed). I'm sure the professor will give us more details and comparison of floating-point performance on important applications between the UltraSPARC T2 and the various X64 architectures, he's very complete and thoughtful.
In terms of other comparisons. There are cpu benchmarks (int & fp) comparisons that were done at UltraSPARC T2 launch, best chip in several comparisons. There will probably be more
even results before long on commercial benchmarks.
AFTERNOTE #2
Wednesday Aug 29, 2007
There is more preliminary UltraSPARC T2 performance is blogged about at:
http://blogs.sun.com/jmeyer/entry/power6_goes_thud_part_v
Where John states:
And IBM knows that next quarter, Sun will be introducing systems based on the new UltraSPARC T2, the world's first true system-on-a-chip and the world's fastest microprocessor. Preliminary estimates on one popular benchmark show that a single rack of UltraSPARC T2-based systems will outperform four racks of 4.7GHz POWER6-based p5 570s (more on that as we get closer to system announcement). No kidding.
I haven't seen this internal info yet, but I'll try to dig it up. Looking
at other tests, I believe this one.
...John also talks more about the lagging IBM POWER6 rollout.
Thursday Aug 23, 2007
In the last posting we showed Oracle Database with SAP-SD benchmarks all
running on a Sun Fire T2000. As Sun has been saying since Day one of CMT.
Major databases are perfectly matched for UltraSPARC T1. By the way Sun
has also used Open source databases on benchmarks as well.
We have lots of customers deploying RDBMS on UltraSPARC T1 and planning
on UltraSPARC T2 servers. It really works well even though competitors
and doubters want to try to say it is special purpose, sorry it isn't.
Here is an opinion:
"Now Sun's T2 is out and it's pretty much the world beater they promised -
30% faster on SPEC throughput than IBM's 4.7 Ghz Dual core Power6 and,
more significantly, one third the cost and somewhere between two and three
times the throughput of the Itanium. ... anyone still buying HP-UX and
Itanium after Rock comes out will be doing it because they hate Sun and are
quietly hoping for a miracle, just as DEC's partisans (and HP's own MPE
customer base) did before them." -- zdnet's Paul Murphy
Source: "A Dumb prediction: IBM will Buy HP's Unix Customers," By Paul Murphy, zdnet, 08/17/07,
http://blogs.zdnet.com/Murphy/?p=941
Thursday Aug 09, 2007
Postscript:
Be careful when comparing performance results, as an example look at
a comment in yesterday's
"Can I use 64 threads in a chip?" posting. At
least this comment pointed out that you can use 4-8 threads in 2 chip Intel-based systems, but it was really trying to
be a stab at UltraSPARC Performance. Here was the comment:
One really needs to look at the complete data on those .pdf's
to make a fair comparison (also in the disclosure statement
below).
First: The T2000 SAP-SD used a 1.2GHz UltraSPARC T1, Sun now ships faster 1.4GHz UltraSPARC T1, and has announced 1.4GHz UltraSPARC T2. The 1.4GHz T2 has double the threads of that 1.4GHz (double the computational power).
Second: The T2000 SAP-SD result was submitted in Dec 2005, at that time it
was near the performance of the expensive 4-way POWER5 IBM p550.
Third: The 2-chip Dual-core Xeon SAP-SD result above was
submitted 18 months after the T2000 SAP-SD result.
Fourth: Different versions of the benchmark. The 2-chip
Dual-core Xeon was run with ECC 6.0 (not SAP 5.0). The a newer version
of the benchmark takes more computational work to produce the same results.
Dual-core SAP-SD result was also run with Solaris 10 on Xeon, how cool is that!
Fifth: The 2-chip quad-core Xeon SAP-SD result above was
submitted 19 months after the T2000 SAP-SD result.
Sixth: The Sun result used open-source MySQL MaxDB database,
how cool is that! The Xeon results used Oracle or MicroSoft SQL Server.
postscript:
Sun latter used Oracle, others suggested US T1 has some sort of silly database limitation - NOT TRUE!
You'll see more results soon.
Triffids, as a reminder if you work for a partner company of SAP you must
put the following disclosures when you post results. If you are not
they you don't need to put this in, but as you can see the data in
it would have allowed you to make a better comparison of systems.
Don't worry I'm not asking you to identify yourself at all.
Disclosure Statement:
Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark Sun Fire T2000 (1-way, 1 proc, 8 cores, 32 threads) 1x 1.2 GHz UltraSPARC T1, 32 GB mem, 950 SD benchmark users, 1.91 sec avg response time, Cert#2005047., MaxDB 7.5 database, Solaris 10; Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark IBM System eServer p5 550 (4-way, 4 procs, 4 cores, 8 threads) 4x 1.9 GHz POWER5+, 32GB mem, 1,000 SD benchmark users, 1.97s avg resp time, Cert#2005040, IBM DB2 Universal Database 8.2.2, SuSE Linux Enterprise Server 9;
Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model BFi20 S2 (2 procs, 4 cores, 4 threads) 2x Intel Xeon 5160, 3.0 GHz, 16GB mem, 1,020 SD benchmark users, 1.94s avg resp time, Cert#2007031, Oracle 10g, Solaris 10;
Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model TX300 S3 (2 procs, 8 cores, 8 threads) 4x Quad-Core Intel Xeon Processor X5355 2.66 GHz, 32GB mem, 1865 SD benchmark users, 1.99s avg resp time, Cert#2007025, SQL Server 2005, Windows Server 2003 Enterprise Edition; SAP, R/3, mySAP reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark.
I edited in:
2 processors into Quad-Core Intel Xeon Processor X5355 2.66 GHz
...and..
32 threads to the Sun Fire T2000, 1 processor / 8 cores
...in order to make the comparisons more consistent.
Yeah, this result much better ! Real applicat...
Hm ... Power6 response time 0.091 vs 0.242
don't read too much into response time, the r...
I understand, IMHO response is scalability metric,...
As it turns out with a good benchmark design ...
The pSeries platform used for this benchmark is bi...