Tuesday Oct 09, 2007
The UltraSPARC T2 processor has very low-overhead cryptography that
basically allows one to add security at 'zero-cost'. A single Sun UltraSPARC
T2 processor achieves up to 37,000 RSA 1024-bit signs/s and up to
38.9 Gbit/s of AES-128 throughput.
The comparisons below demonstrate the performance a single 1.4 GHz
UltraSPARC T2 on RSA1024 (sign private key) and AES128-CBC operations
- The UltraSPARC T2 delivers over 4.1
times greater RSA1024 performance and 4.6 times greater AES128
performance than the 2-way quad-core 3 GHz Xeon.
- The UltraSPARC T2 delivers over 9.3
times greater RSA1024 performance and 10 times greater AES128
performance than the 2-way dual-core 2.6 GHz Opteron.
- The UltraSPARC T2 also delivers over 3
times greater RSA1024 performance and 15.6 times greater AES128
performance than a system using the Cavium Nitrox PX crypto
acclerator card.
- The UltraSPARC T2 delivers over 30.8 times greater RSA1024
performance than the 2-way IBM p510 1.5 GHz Power5 .
To achieve these great results, the UltraSPARC T2 processor, has an on-chip cryptographic
accelerator (SPU) that consists of a Cipher/hash unit and an enhanced modular
arithmetic (MAU). This is an evolution of the previous generation UltraSPARC T1 that only contained modular arithmetic units.
Sun's UltraSPARC T2 processor introduces support for common bulk
ciphers, secure hash operations and both prime and binary field
Elliptic Cryptography. The UltraSPARC T2 processor supports RC4, DES,
3DES, AES-128, AES-192, AES-256, MD5, SHA-1, SHA-256.
Competitive Landscape
RSA/AES Cryptography Benchmark Performance as of 8/07/07 as
measured by Sun on the following platforms.
| System |
Processor GHz |
Chips total- cores |
Operating System |
1024bit RSA (K signs/s) |
AES128 (Gbit/s) |
notes |
| Sun SPARC Enterprise T5220 |
UltraSPARC T2 1.4 GHz |
1 chip 8 core |
Solaris 10 |
37.0 K |
38.9 Gb/s |
actual |
| Accelerator card |
Sun SCA6000 |
|
|
13.0 K |
1.0 Gb/s |
actual |
| Sun Fire T2000 |
UltraSPARC T1 1.2 GHz |
1 chip 8 core |
Solaris 10 |
12.9 K |
|
actual |
| Accelerator card |
Cavium Nitrox PX |
|
|
12.0 K |
2.5 Gb/s |
data- sheet |
| Sun FireT1000 |
UltraSPARC T1 1 GHz |
1 chip 8 core |
Solaris 10 |
10.8 K |
|
actual |
| |
quad-core Xeon 3 GHz |
2 chip 8 core |
|
9.0 K |
8.4 Gb/s |
actual |
| Sun Fire V490* |
US IV+ 1.5 GHz |
4 chip 8 core |
Solaris 10 |
8.0 K |
|
actual |
| IBM p690 |
Power4 1.3 GHz |
16 chip 32 core |
AIX 5.1 |
6.1 K |
|
actual |
| Fujitsu PP850 |
SPARC64 V 1.9 GHz |
16 chip 16 core |
Solaris 10 |
6.0 K |
|
actual |
| |
Opteron 2.6 GHz |
2 chip 4 core |
|
4.0 K |
3.9 Gb/s |
actual |
| Sun Fire V40z |
Opteron sc 2.6 GHz |
4 chip 4 core |
Solaris 10 |
3.3 K |
|
actual |
| Dell PE 1850 |
Xeon 3.6 GHz |
2 chip 2 core |
Linux RHEL4 U1 |
1.9 K |
|
actual |
| Dell PE 2850 |
Xeon 3.6 GHz |
2 chip 2 core |
Linux SLES 9 |
1.9 K |
|
actual |
| IBM p510 |
Power5 1.5 GHz |
1 chip 2 core |
AIX 5.3 |
1.2 K |
|
actual |
* Used a Sun Crypto Accelerator (SCA) 4000 in the Sun Fire V490
testing.
Benchmark Description
The RSA/AES-128 Cryptography benchmark was developed by Sun to
measure maximum throughput of RSA private key (sign) operations and
AES-128 operations that a system can perform. On multi-chip and/or
multi-core systems, multiple processes are used to achieve the
maximum throughput. Two microbenchmark programs are used,
pk11rsaperf/pk11aesperf on Solaris and OpenSSL speed test on
non-Solaris systems. Though each microbenchmark uses different crypto
APIs, they both measure the raw throughput of the same crypto
operations.
pk11rsaperf & pk11aesperf is part of a set of
cryptographic microbenchmark programs internally developed by the
Crypto Product Group of NSN. pk11aesperf measures the performance of
AES-128-CBC processing, as performed by Solaris Cryptographic
Framework via PKCS#11 API. Different key sizes, data sizes and
varying numbers of concurrent threads can be tested. The metric is
aggregate operations per second, for pk11rsaperf and Gb/s for
pk11aesperf (for large object sizes).
OpenSSL speed test, the standard microbenchmark
included in the open-source OpenSSL package, measures raw
cryptographic algorithm performance as implemented in the OpenSSL
library - libcrypto.so via its own proprietary crypto APIs. For RSA
the metric is operations per second, while for AES-128-CBC, the
metric is Gb/s.
Disclosure Statement:
RSA/DSA Cryptography Benchmark Performance as of 08/07/07 as measured by Sun on the following platforms:
Sun SPARC Enterprise T5220 37K RSA1024 signs/s, 38.9 AES128 Gb/s;
Sun SCA6000 (actual) 13K RSA1024 signs/s, 1 AES128 Gb/s;
Cavium Nitrox PX (datasheet) 12K RSA1024 signs/s, 2.5 AES128 Gb/s;
2-chip quad-core Xeon 3GHz 9K RSA1024 signs/s, 8.4 AES128 Gb/s;
2-chip dual-core Opteron 2.6GHz 4K RSA1024 signs/s, 3.9 AES128 Gb/s;
Sun Fire T2000 1.2 GHz (8 cores,
1 chip) Solaris 10, 12,850 RSA1024 signs/s; Sun Fire T1000 1GHz (8 cores, 1 chip) Solaris 10, 10,764
RSA1024 signs/s; IBM p690 1.3 GHz
(32 cores, 16 chips) AIX 5.1, 6,131 RSA1024 signs/s; Fujitsu PRIMEPOWER850 1.9 GHz (16 cores, 16
chips) Solaris 10, 6,038 RSA1024 signs/s; Dell PowerEdge 1850 3.6 GHz (2 cores, 2 chips) RHEL4 U1,
1,926 RSA1024 signs/s; Dell
PowerEdge 2850 3.6 GHz (2 cores, 2 chips) SLES 9, 1,900 RSA1024
signs/s; IBM p5 510 1.5 GHz (2 cores, 1
chip, SMT) AIX 5.3, 1,200 RSA1024 signs/s.
Results Summary
|
Results
|
|
37.0 K RSA1024 signs/s
|
|
|
|
|
38.9 Gb/s AES128
|
|
Reference Date:
|
|
August 7, 2007
|
|
Systems:
|
|
Sun SPARC Enterprise T5120/T5220
|
|
Total Number Processors:
|
|
1 chip / 8 cores/chip (8 threads/core)
|
|
Processor/GHz of Server:
|
|
Sun UltraSPARC T2 1.4 GHz
|
|
Operating System:
|
|
Solaris 10
|
Thursday Sep 20, 2007
In a video, Prof. David Patterson opines on UltraSPARC T2 and how Sun's CMT
has some very fresh ideas to move the industry forward on practical
computing. He talks about the Old-fashioned and out-dated concepts of "peak" or "clock speed" and the need to look at delivered performance.
here, here!!!
He shows that the UltraSPARC T2 out of box is almost 1.5x to 2x faster
than Clovertown(quad-core) & Opteron and three to four times the
watt/performance advantage. In addition, he says the UltraSPARC T2 is
the easiest to program and auto-tune.
He did conceded that if you look at the archaic (he used the word
"old-fashioned") 20th century metrics of peak and clock that the
UltraSPARC T2 is 2x to 7x slower -- but he (like I) focus on delivered
performance.
David Patterson is a Professor in Computer Science at Univ of
California Berkeley. David and John Hennessy (Stanford University)
wrote the textbook "Computer Architecture: A Quantitative Approach Fourth Edition"
AFTERNOTE #1
To respond the the comment below (comments are now closed). I'm sure the professor will give us more details and comparison of floating-point performance on important applications between the UltraSPARC T2 and the various X64 architectures, he's very complete and thoughtful.
In terms of other comparisons. There are cpu benchmarks (int & fp) comparisons that were done at UltraSPARC T2 launch, best chip in several comparisons. There will probably be more
even results before long on commercial benchmarks.
AFTERNOTE #2
Wednesday Aug 29, 2007
There is more preliminary UltraSPARC T2 performance is blogged about at:
http://blogs.sun.com/jmeyer/entry/power6_goes_thud_part_v
Where John states:
And IBM knows that next quarter, Sun will be introducing systems based on the new UltraSPARC T2, the world's first true system-on-a-chip and the world's fastest microprocessor. Preliminary estimates on one popular benchmark show that a single rack of UltraSPARC T2-based systems will outperform four racks of 4.7GHz POWER6-based p5 570s (more on that as we get closer to system announcement). No kidding.
I haven't seen this internal info yet, but I'll try to dig it up. Looking
at other tests, I believe this one.
...John also talks more about the lagging IBM POWER6 rollout.
Thursday Aug 23, 2007
In the last posting we showed Oracle Database with SAP-SD benchmarks all
running on a Sun Fire T2000. As Sun has been saying since Day one of CMT.
Major databases are perfectly matched for UltraSPARC T1. By the way Sun
has also used Open source databases on benchmarks as well.
We have lots of customers deploying RDBMS on UltraSPARC T1 and planning
on UltraSPARC T2 servers. It really works well even though competitors
and doubters want to try to say it is special purpose, sorry it isn't.
Here is an opinion:
"Now Sun's T2 is out and it's pretty much the world beater they promised -
30% faster on SPEC throughput than IBM's 4.7 Ghz Dual core Power6 and,
more significantly, one third the cost and somewhere between two and three
times the throughput of the Itanium. ... anyone still buying HP-UX and
Itanium after Rock comes out will be doing it because they hate Sun and are
quietly hoping for a miracle, just as DEC's partisans (and HP's own MPE
customer base) did before them." -- zdnet's Paul Murphy
Source: "A Dumb prediction: IBM will Buy HP's Unix Customers," By Paul Murphy, zdnet, 08/17/07,
http://blogs.zdnet.com/Murphy/?p=941
Thursday Aug 23, 2007
The SPARC Enterprise Model T2000 | Sun Fire Model T2000 is the performance leader in Two-Tier SAP-SD Standard Application Benchmarks on single processor systems as of August 22nd, 2007. This result used the Oracle database on the UltraSPARC T1. Again as Sun has always maintained the UltraSPARC T1 is good at database-tier, application tier, and web tier!
- Sun Fire Model T2000 supported 1100 SD Benchmark Users, 5530 SAPS, using Oracle 10g is the fastest single-processor systems.
- Sun Fire Model T2000 beats a 2-chip dual-core Itanium2-based HP Integrity rx2660.
- Sun Fire Model T2000 beats a 2-chip dual-core Opteron-based HP ProLiant DL365.
- Sun Fire Model T2000 beats a 2-chip dual-core Xeon-based Fujitsu BFi20 S2 (Unicode).
- The Fujitsu BX620 S4 that uses two-chip 3GHz Xeon Quad-cores is only 1.8x faster than a single chip Sun Fire Model T2000 using UltraSPARC T1.
- The IBM p570 that uses two-chip 4.7GHz POWER6 is only 1.8x faster than a single chip Sun Fire Model T2000 using UltraSPARC T1.
- The just-announced UltraSPARC T2 has twice the thread count of the
UltraSPARC T1.
SAP-SD 2-Tier Performance, Benchmark Users (bigger is better)
| Sys |
Users |
# / GHz / Type |
Mem |
OS |
DB |
LI/Hr |
SAPS |
BM rev |
Date |
| IBM p570 |
2035 |
two 4.7 POWER6+ DC |
32 GB |
AIX 5L 5.3 |
Oracle 10g |
203,670 |
10,180 |
6.0 |
5/21/07 |
| Fujitsu BX620 S4 |
1940 |
two 3.0 Xeon QC |
32 GB |
Windows Srvr 2003 EE |
SQL Server 2005 |
194,000 |
9,700 |
6.0 |
8/13/07 |
| Sun Fire T2000 |
1100 |
one 1.4 US T1 |
64 GB |
Solaris 10 |
Oracle 10g |
110,670 |
5,530 |
6.0 |
8/22/07 |
| HP Integrity rx2660 |
1090 |
two 1.6 Itan2 DC |
32 GB |
HP-UX 11iV3 |
DB2 9 |
109,670 |
5,480 |
6.0 |
3/20/07 |
| HP ProLiant DL365 |
1083 |
two 2.8 Opt DC |
32 GB |
Windows Srvr 2003 EE |
SQL Srvr 2005 |
108,670 |
5,430 |
6.0 |
2/9/07 |
| Fujitsu BFi20 S2 Unicode |
1020 |
two 3 Xeon 5160 DC |
16 GB |
Solaris 10 |
Oracle 10g |
102,330 |
5,120 |
6.0 |
5/4/07 |
| IBM p550 |
1000 |
four 1.9 POWER5+ DC |
32 GB |
SuSE Linux ES9 |
DB2 UDB 8.2.2 |
100,330 |
5,020 |
5.0 |
10/04/05 |
| Sun Fire T2000 |
950 |
one 1.2 US T1 |
32 GB |
Solaris 10 |
MaxDB 7.5 |
95,670 |
4,780 |
5.0 |
11/17/05 |
| IBM x3250 |
850 |
one 2.13 Xeon |
8 GB |
Windows SrVr 2003 EE |
DB2 9 |
88,000 |
4,400 |
6.0 |
5/11/07 |
Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.
Benchmark Description
The SAP Standard Application SD (Sales and Distribution) Benchmark is a
two-tier ERP business test that is indicative of full business workloads
of complete order processing and invoice processing, and demonstrates the
ability to run both the application and database software on a single
system. The SAP Standard Application SD Benchmark represents the critical
tasks performed in real-world ERP business environments.
SAP is one of the premier world-wide ERP application providers, and maintains
a suite of benchmark tests to demonstrate the performance of competetive
systems on the various SAP products.
SAP has specified that the Benchmark Users metric is the only metric to be used
for public comparisons.
However, Benchmark Users can be traded off with response time in performance
tuning, and so comparing Line Items per Hour or SAPS
may be a different way to compare the actual power of systems.
Funny that Sun compares against current IBM results, IBM bloggers
decide to do funny comparisons on a different SAP benchmark, but
compared their latest system to a 16-month old result on a US-IV system
that is 2 processor GHz upgrades behind. I guess that is one way to win...
Disclosure Statement:
Two-tier SAP Standard Sales and Distribution (SD) standard SAP ERP 2004/2005 application benchmark:
SPARC Enterprise Model T2000 | Sun Fire T2000 (1-way, 1 proc, 8 cores, 32 threads) 1 x 1.4 GHz
UltraSPARC T1, 64GB memory, 1100 SD Benchmark users, 1.91 sec avg response time,
Cert#2007051, Oracle 10g, Solaris 10;
Sun Fire T2000 (1-way, 1 proc, 8 cores, 32 threads) 1 x 1.2 GHz
UltraSPARC T1, 32GB memory, 950 SD Benchmark users, 1.91 sec avg response time,
Cert#2005047., MaxDB 7.5 database, Solaris 10;
Fujitsu Siemens Computers PRIMERGY MOdel BX620 S4
(2-way, 2 procs, 8 cores, 8 threads), 2 x 3.0 GHz Quad-Core Intel Xeon,
32 GB memory, 1940 SD Benchmark users, 1.99 sec avg response time,
Cert#2007049, SQL Server 2005, Windows Server 2003 Enterprise Edition;
HP ProLiant DL365 (2-way, 2 procs, 4 cores, 4 threads)
2 x 2.8 GHz Opteron, 32GB memory, 1083 SAP SD Benchmark users,
1.98 sec avg response time, Cert#2007006, SQL Server 2005,
Windows Server 2003 Enterprise Edition;
HP Integrity rx2660 (2-way, 2 procs, 4 cores, 8 threads)
2 x 1.6 GHz Itanium, 32GB memory, 1090 SAP SD Benchmark users,
1.93 sec avg response time, Cert#2007016, DB2 9, HP-UX 11iV3;
IBM System p 570 (2-way, 2 procs, 4 cores, 8 threads) 2 x 4.7 GHz
POWER6+, 32GB memory, 2035 SD Benchmark users, 1.99s avg resp time, Cert#2007037, Oracle 10g, AIX 5L Version 5.3;
Fujitsu Siemens Computers PRIMERGY Model BFi20 S2 (2-way, 2 procs, 4 cores, 4 threads)
2 x 3GHz Intel Xeon 5160 dual-core, 16GB memory,(Unicode) 1020 SAP SD Benchmark users,
1.94 sec avg response time, Cert#2007031, Oracle 10g, Solaris 10;
IBM System x3250 (1-way, 1 proc, 4 cores, 4 threads) 1 x 2.13 GHz
Xeon, 8GB memory, 850 SD Benchmark users, 1.59s avg resp time, Cert#2007036,
DB2 9, Windows Server 2003 Enterprise Edition;
IBM System eServer p5 550 (4-way, 4 procs, 4 cores, 8 threads) 4 x 1.9 GHz
POWER5+, 32GB memory, 1000 SD Benchmark users, 1.97s avg resp time, Cert#2005040,
IBM DB2 Universal Database 8.2.2, SuSE Linux Enterprise Server 9;
SAP, R/3, mySAP reg TM of SAP AG in Germany and other countries.
More info http://www.sap.com/benchmark.
| Certified Results |
|
Performance: |
|
1100 benchmark users |
|
Server: |
|
Sun Fire |
|
Processors: |
|
1 1.4 GHz UltraSPARC T1 |
|
Memory: |
|
64 GB |
|
Operating system: |
|
Solaris 10 |
|
Database S/W: |
|
Oracle 10g |
|
SAP S/W: |
|
SAP ECC 6.0 |
|
SAP Certification: |
|
2007051 |
|
Storage: |
|
Sun StorEdge 6020 |
Thursday Aug 09, 2007
Postscript:
Be careful when comparing performance results, as an example look at
a comment in yesterday's
"Can I use 64 threads in a chip?" posting. At
least this comment pointed out that you can use 4-8 threads in 2 chip Intel-based systems, but it was really trying to
be a stab at UltraSPARC Performance. Here was the comment:
One really needs to look at the complete data on those .pdf's
to make a fair comparison (also in the disclosure statement
below).
First: The T2000 SAP-SD used a 1.2GHz UltraSPARC T1, Sun now ships faster 1.4GHz UltraSPARC T1, and has announced 1.4GHz UltraSPARC T2. The 1.4GHz T2 has double the threads of that 1.4GHz (double the computational power).
Second: The T2000 SAP-SD result was submitted in Dec 2005, at that time it
was near the performance of the expensive 4-way POWER5 IBM p550.
Third: The 2-chip Dual-core Xeon SAP-SD result above was
submitted 18 months after the T2000 SAP-SD result.
Fourth: Different versions of the benchmark. The 2-chip
Dual-core Xeon was run with ECC 6.0 (not SAP 5.0). The a newer version
of the benchmark takes more computational work to produce the same results.
Dual-core SAP-SD result was also run with Solaris 10 on Xeon, how cool is that!
Fifth: The 2-chip quad-core Xeon SAP-SD result above was
submitted 19 months after the T2000 SAP-SD result.
Sixth: The Sun result used open-source MySQL MaxDB database,
how cool is that! The Xeon results used Oracle or MicroSoft SQL Server.
postscript:
Sun latter used Oracle, others suggested US T1 has some sort of silly database limitation - NOT TRUE!
You'll see more results soon.
Triffids, as a reminder if you work for a partner company of SAP you must
put the following disclosures when you post results. If you are not
they you don't need to put this in, but as you can see the data in
it would have allowed you to make a better comparison of systems.
Don't worry I'm not asking you to identify yourself at all.
Disclosure Statement:
Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark Sun Fire T2000 (1-way, 1 proc, 8 cores, 32 threads) 1x 1.2 GHz UltraSPARC T1, 32 GB mem, 950 SD benchmark users, 1.91 sec avg response time, Cert#2005047., MaxDB 7.5 database, Solaris 10; Two-tier SAP ECC 5.0 Standard Sales and Distribution (SD) benchmark IBM System eServer p5 550 (4-way, 4 procs, 4 cores, 8 threads) 4x 1.9 GHz POWER5+, 32GB mem, 1,000 SD benchmark users, 1.97s avg resp time, Cert#2005040, IBM DB2 Universal Database 8.2.2, SuSE Linux Enterprise Server 9;
Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model BFi20 S2 (2 procs, 4 cores, 4 threads) 2x Intel Xeon 5160, 3.0 GHz, 16GB mem, 1,020 SD benchmark users, 1.94s avg resp time, Cert#2007031, Oracle 10g, Solaris 10;
Two-tier SAP ECC 6.0 Standard Sales and Distribution (SD) benchmark Fujitsu Siemens Computers PRIMERGY Model TX300 S3 (2 procs, 8 cores, 8 threads) 4x Quad-Core Intel Xeon Processor X5355 2.66 GHz, 32GB mem, 1865 SD benchmark users, 1.99s avg resp time, Cert#2007025, SQL Server 2005, Windows Server 2003 Enterprise Edition; SAP, R/3, mySAP reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark.
I edited in:
2 processors into Quad-Core Intel Xeon Processor X5355 2.66 GHz
...and..
32 threads to the Sun Fire T2000, 1 processor / 8 cores
...in order to make the comparisons more consistent.
Wednesday Aug 08, 2007
Can someone really use 64-threads in a chip? The answer is simple,
when you look out into your datacenter do you see racks of servers
or just a single naked core sitting alone in the back corner?
If you see racks of server you are running lots and lots of threads.
Think of it his way, if you have a bunch of dual-core single-socket
1RU servers filling a rack you have around 80 threads in a rack, or
2-socket you have 160, or quad-core 2-socket you have 320 threads.
Now how would you judge performance of a single rack (with 80-320 threads)?
Would you run one copy of "gzip" or "tar" and compare that to your laptop and say that rack is slow, of course not., You'd run a whole bunch of them.
So when you are performance testing an UltraSPARC T1 or UltraSPARC T2
server throw lots of work at it and it will have no problem. There
is massive parallelism in every datacenter with racks of servers. Perfect for UltraSPARC T1/T2. Every datacenter with web-tiers, application-tiers,
and database behind those tiers runs tons of threads. And remember the
UltraSPARC T1 and introduction and even last week continues to set leading performance records at every tier.
Intelligence test
Would you judge performance of an UltraSPARC T2 by running a single "gzip" or "tar"?
Wednesday Aug 08, 2007
Why does Sun designate yesterday's performance results as "estimates",
why that word? Did some Sun marketeer just throw a dart and just pick a big number. No. All
UltraSPARC T2 SPEC CPU and SPEC OMP metrics quoted are from full “reportable” runs,
but are nevertheless designated as “estimates” because they use
pre-production systems. Sun customer systems, to be announced later, are expected to perform similarly. SPEC rules do allow comparing
these preliminary scores and published result.
Is Sun the only vendor to use this clause? No. Intel and AMD have made
a long history of using preliminary numbers at chip announcements to get
the word out about their performance. Sun is just following their lead,
and trumping their performance
Ok, back to why the word "estimates?" The SPEC CPU committee voted
to use that specific word for preliminary scores. Members include
IBM, Intel, AMD, HP, .... And every employee of a member company must follow the rules.
By license agreement, SPEC members and customers agree to run and report results as specified in each benchmark suite's documentation.
from SPEC FAQ
Postings on Sun's UltraSPARC T2 performance:
http://blogs.sun.com/bmseer/entry/performance_of_the_new_sun
http://blogs.sun.com/bmseer/entry/ultrasparc_t2_more_floating_point
http://blogs.sun.com/sprack/entry/ultrasparc_t2_world_class_crypto
OpenSPARC T2:
http://blogs.sun.com/d/entry/ultrasparc_t2_documentation_available
Ubunu (aready booted on UltraSPARC T2):
Ubuntu & Canonical & UltraSPARC T1 (May06).
As a Sun employee I try my best to follow every rule when talking about results in public, but I'm an engineer so sometimes it is hard to follow all the legalese so I try to correct things as soon as I see an error. And I do my best to remind other Sun bloggers to put in the proper disclosure statement for SPEC & TPC benchmark results. Though quite
honestly I wish SPEC & TPC would streamline the rules, make them more consistent, and minimize the lengthy disclosure statements.
Of course because Sun is in the lead and because I made some suggestions, I'm sure this entry will be fully scrutinized by every
competitor. If I made errors let me know in the comments and I will correct them.
Disclosure Statement
SPEC, SPECint, SPECfp, and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of August 6, 2007. Actually this one is short because I didn't put any
specific results in this posting, the ones at the links have the more extensive disclosures because they show scores & results.
Tuesday Aug 07, 2007
Beyond UltraSPARC T2 what other technologies matter? There are two more keys to Sun providing such effective performance in the
new single-chip Sun UltraSPARC T2 64-thread processor, that is Solaris (and
now of course OpenSolaris) and Sun Studio compilers. Here is a nice slide of the history of hardware history of SPARC, I borrowed this on from
an entry in "On the Record"
An important thing to remember
that besides Sun's long history with SPARC, we've also lead the way in parallelism. Over 15 years ago, Solaris supported 64-way SPARC systems and
provided near-linear scaling. For those of you old enough to remember, at
that time IBM, SGI, HP, and everyone else thought there was no way Sun
could produce effective 64-way systems. They were wrong and now our competitors have finally
all have introduced systems with lots of processors and/or threads.
Solaris and Sun Studio compilers have a LONG history and lots of experience with industrial-strength applications with lots of threads.
Solaris and Sun Studio compilers were great at scaling to 64-way systems 15 years ago, with a lot more experience and hard work we are even better at scaling and will scale to lots more threads right now. Many thanks to all of those compiler & OS engineers!
Postings on Sun's UltraSPARC T2 performance:
http://blogs.sun.com/bmseer/entry/performance_of_the_new_sun
http://blogs.sun.com/bmseer/entry/ultrasparc_t2_more_floating_point
http://blogs.sun.com/sprack/entry/ultrasparc_t2_world_class_crypto
OpenSPARC T2:
http://blogs.sun.com/d/entry/ultrasparc_t2_documentation_available
...I've focused on Solaris, but there are options, for
example Ubuntu. Ubuntu has already booted on the UltraSPARC T2.
As as a reminder Ubuntu and Canonical proved it on an UltraSPARC T1 almost 14 months ago, see this article on that work.
Tuesday Aug 07, 2007
More about floating-point on the Sun UltraSPARC T2 in this posting, In
the previous posting SPECfp_2006 scores and the UltraSPARC T2 design being open-sourced were discussed.
In the UltraSPARC T2 there are eight floating-point units that are well suited for scientific applications. Based upon preliminary runs the
Sun UltraSPARC T2 processor at 1.4 GHz beats all single chip scores
showing 14230(est)/15081(est) SPECompMbase2001/SPECompMpeak2001.
How do these preliminary runs (we must use the term "estimated" by SPEC rules) compare to SPECompMbase2001/SPECompMpeak2001 scores?
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
IBM p520 POWER5+ 1.9GHz processor published result by 85%.
- ...Sun is waiting for POWER6 4.7GHz results, maybe UltraSPARC T2 results will scare IBM from ever publishing a single-chip result?
Benchmark description:
The SpecOMP benchmark is a test of the performance of 9 High
Performance computing applications. It is used to compare the
performance of shared memory servers. All C/C++ and FORTRAN
applications in this suite use the OpenMP programming model that
provides a portable, scalable model for developing parallel
applications for platforms ranging from the desktop to the
supercomputer.
The OpenMP Application Program Interface (API) supports
multi-platform shared-memory parallel programming in C/C++ and Fortran
on all architectures, from the largest Unix servers to the small
Windows NT platforms.
Disclosure statement:
All UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable” runs,
but are nevertheless designated as “estimates” because they use preproduction
systems. SPEC, and SPEComp registered trademarks of Standard Performance
Evaluation Corporation.
Sun UltraSPARC T2 1.4GHz (1 chip, 8 cores, 64 threads) 14230 (est)/ 15081 (est) SPECompMbase2001/SPECompMpeak2001.
Competitive results from www.spec.org as of
August 6, 2007. IBM p520 1.9GHz (1 chip, 2 cores, 4 threads) published 8141/8174 SPECompMbase2001/SPECompMpeak2001.
Tuesday Aug 07, 2007
Sun UltraSPARC T2 is an amazing chip and very fast! The UltraSPARC T2 features several industry firsts:
- Eight cores and 64 threads
- Integrated 10 GbE networking and I/O
- Dedicated, cryptographic and floating point units per core
- 10 cryptographic functions supported with hardware
- open-source design: www.opensparc.net
Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz,
beat all single chip scores showing 78.3 est. SPECint_rate2006.
How do these preliminary runs (we must use the term "estimated" by
SPEC rules) compare to SPECint_rate2006 results.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
IBM POWER6 4.7GHz processor published result by 29%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
estimated scores of the AMD Barcelona by 23%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
published scores of the 2.66GHz Intel X5355 (Clovertown) by 48%.
Based upon preliminary runs, the Sun UltraSPARC T2 processor at 1.4 GHz,
beat all single chip scores showing 62.3 est. SPECfp_rate2006.
How do these preliminary runs (we must use the term "estimated" by
SPEC rules) compare to SPECfp_rate2006 results.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best
published single-chip IBM POWER6 4.7GHz processor result by 7%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip estimated scores of the AMD Barcelona by 11%.
- These Sun UltraSPARC T2 1.4GHz processor scores beat the best single-chip
published scores of the 2.66GHz Intel X5355 (Clovertown) by 66%.
Performance per core doesn't matter GHz doesn't matter, what matters
is numbers of cores, efficiency, and design of the chip! Competitors
are saying that UltraSPARC T2 is proprietary... this makes no sense.
both UltraSPARC T1 and UltraSPARC T2 are open source designs (www.opensparc.net). You do not find the
latest design of Intel, AMD, or IBM as open source designs.
Disclosure Statement:
All Sun UltraSPARC T2 SPEC CPU metrics quoted are from full “reportable”
runs, but are nevertheless designated as “estimates” because they use
preproduction systems. SPEC, SPECint, SPECfp registered trademarks of
Standard Performance Evaluation Corporation. Sun UltraSPARC T2
1.4GHz (1 chip, 8 cores, 64 threads) 78.3 est. SPECint_rate2006,
62.3 est. SPECfp_rate2006.
Competitive results from www.spec.org as of August 6, 2007.
IBM POWER6 4.7GHz (1 chip, 2 cores, 4 threads) 60.9. SPECint_rate2006,
58.0 SPECfp_rate2006.
AMD Barcelona 2.6 GHz (1 chip, 4 cores, 4 threads) 63.9 est SPECint_rate2006,
56.3 est. SPECfp_rate2006. Barcelona estimates based upon "The Register"
article stating 2.6GHz quad is 21% and 50% faster than Intel 2.66 system.
Fujitsu RX300 Intel X5355 2.66 GHz (1 chip, 4 cores, 4 threads) 52.8 SPECint_rate2006, 47.5 SPECfp_rate2006.
Reminder: The Niagara 2 score was obtained from a full "reportable" SPEC
run, but is designated as an "estimate" because a pre-production system
was used.
...more information on the UltraSPARC T2 later today.
Monday Aug 06, 2007
Many news sources now covering UltraSPARC T2, the new high-performance chip from Sun.
This new UltraSPARC T2 chip leads in many ways. I'll cover the performance numbers tomorrow.
For now:
http://www.computerworld.com.au/index.php/id;898889798
http://www.reuters.com/article/technologyNews/idUSN0625780420070806
http://www.channelweb.co.uk/vnunet/news/2195718/sun-lifts-lid-niagara-processor
etc..
For some of my previous comments:
http://blogs.sun.com/bmseer/entry/news_trickles_out_on_niagara2
Please remember that the previous generation chip, the UltraSPARC T1,
just set an application-tier world record (all details at link). How many times has the "old" chip with half as many threads set a world record weeks before the new one is announced?
A final note. I venture that this chip is going to lead for database, application tier, and of course web tier, oh and don't forget HPC, yes it is that versatile.
I think we should take care with the general compa...