BM Seer Facts & Questions from an Anonymous Sun Source

Ultra-FAST Cryptography on the Sun UltraSPARC T2

Tuesday Oct 09, 2007

The UltraSPARC T2 processor has very low-overhead cryptography that basically allows one to add security at 'zero-cost'. A single Sun UltraSPARC T2 processor achieves up to 37,000 RSA 1024-bit signs/s and up to 38.9 Gbit/s of AES-128 throughput.

The comparisons below demonstrate the performance a single 1.4 GHz UltraSPARC T2 on RSA1024 (sign private key) and AES128-CBC operations

  • The UltraSPARC T2 delivers over 4.1 times greater RSA1024 performance and 4.6 times greater AES128 performance than the 2-way quad-core 3 GHz Xeon.
  • The UltraSPARC T2 delivers over 9.3 times greater RSA1024 performance and 10 times greater AES128 performance than the 2-way dual-core 2.6 GHz Opteron.
  • The UltraSPARC T2 also delivers over 3 times greater RSA1024 performance and 15.6 times greater AES128 performance than a system using the Cavium Nitrox PX crypto acclerator card.
  • The UltraSPARC T2 delivers over 30.8 times greater RSA1024 performance than the 2-way IBM p510 1.5 GHz Power5 .

To achieve these great results, the UltraSPARC T2 processor, has an on-chip cryptographic accelerator (SPU) that consists of a Cipher/hash unit and an enhanced modular arithmetic (MAU). This is an evolution of the previous generation UltraSPARC T1 that only contained modular arithmetic units.

Sun's UltraSPARC T2 processor introduces support for common bulk ciphers, secure hash operations and both prime and binary field Elliptic Cryptography. The UltraSPARC T2 processor supports RC4, DES, 3DES, AES-128, AES-192, AES-256, MD5, SHA-1, SHA-256.

Competitive Landscape

RSA/AES Cryptography Benchmark Performance as of 8/07/07 as measured by Sun on the following platforms.

System Processor GHz Chips
total-
cores
Operating
System
1024bit
RSA (K signs/s)
AES128
(Gbit/s)
notes
Sun SPARC Enterprise T5220 UltraSPARC T2 1.4 GHz 1 chip 8 core Solaris 10 37.0 K 38.9 Gb/s actual
Accelerator card Sun SCA6000     13.0 K 1.0 Gb/s actual
Sun Fire T2000 UltraSPARC T1 1.2 GHz 1 chip 8 core Solaris 10 12.9 K   actual
Accelerator card Cavium Nitrox PX     12.0 K 2.5 Gb/s data-
sheet
Sun FireT1000 UltraSPARC T1 1 GHz 1 chip 8 core Solaris 10 10.8 K   actual
  quad-core Xeon 3 GHz 2 chip 8 core   9.0 K 8.4 Gb/s actual
Sun Fire V490* US IV+ 1.5 GHz 4 chip 8 core Solaris 10 8.0 K   actual
IBM p690 Power4 1.3 GHz 16 chip 32 core AIX 5.1 6.1 K   actual
Fujitsu PP850 SPARC64 V 1.9 GHz 16 chip 16 core Solaris 10 6.0 K   actual
  Opteron 2.6 GHz 2 chip 4 core   4.0 K 3.9 Gb/s actual
Sun Fire V40z Opteron sc 2.6 GHz 4 chip 4 core Solaris 10 3.3 K   actual
Dell PE 1850 Xeon 3.6 GHz 2 chip 2 core Linux RHEL4 U1 1.9 K   actual
Dell PE 2850 Xeon 3.6 GHz 2 chip 2 core Linux SLES 9 1.9 K   actual
IBM p510 Power5 1.5 GHz 1 chip 2 core AIX 5.3 1.2 K   actual

* Used a Sun Crypto Accelerator (SCA) 4000 in the Sun Fire V490 testing.

Benchmark Description

The RSA/AES-128 Cryptography benchmark was developed by Sun to measure maximum throughput of RSA private key (sign) operations and AES-128 operations that a system can perform. On multi-chip and/or multi-core systems, multiple processes are used to achieve the maximum throughput. Two microbenchmark programs are used, pk11rsaperf/pk11aesperf on Solaris and OpenSSL speed test on non-Solaris systems. Though each microbenchmark uses different crypto APIs, they both measure the raw throughput of the same crypto operations.

  • pk11rsaperf & pk11aesperf is part of a set of cryptographic microbenchmark programs internally developed by the Crypto Product Group of NSN. pk11aesperf measures the performance of AES-128-CBC processing, as performed by Solaris Cryptographic Framework via PKCS#11 API. Different key sizes, data sizes and varying numbers of concurrent threads can be tested. The metric is aggregate operations per second, for pk11rsaperf and Gb/s for pk11aesperf (for large object sizes).

  • OpenSSL speed test, the standard microbenchmark included in the open-source OpenSSL package, measures raw cryptographic algorithm performance as implemented in the OpenSSL library - libcrypto.so via its own proprietary crypto APIs. For RSA the metric is operations per second, while for AES-128-CBC, the metric is Gb/s.

Disclosure Statement:

RSA/DSA Cryptography Benchmark Performance as of 08/07/07 as measured by Sun on the following platforms: Sun SPARC Enterprise T5220 37K RSA1024 signs/s, 38.9 AES128 Gb/s; Sun SCA6000 (actual) 13K RSA1024 signs/s, 1 AES128 Gb/s; Cavium Nitrox PX (datasheet) 12K RSA1024 signs/s, 2.5 AES128 Gb/s; 2-chip quad-core Xeon 3GHz 9K RSA1024 signs/s, 8.4 AES128 Gb/s; 2-chip dual-core Opteron 2.6GHz 4K RSA1024 signs/s, 3.9 AES128 Gb/s; Sun Fire T2000 1.2 GHz (8 cores, 1 chip) Solaris 10, 12,850 RSA1024 signs/s; Sun Fire T1000 1GHz (8 cores, 1 chip) Solaris 10, 10,764 RSA1024 signs/s; IBM p690 1.3 GHz (32 cores, 16 chips) AIX 5.1, 6,131 RSA1024 signs/s; Fujitsu PRIMEPOWER850 1.9 GHz (16 cores, 16 chips) Solaris 10, 6,038 RSA1024 signs/s; Dell PowerEdge 1850 3.6 GHz (2 cores, 2 chips) RHEL4 U1, 1,926 RSA1024 signs/s; Dell PowerEdge 2850 3.6 GHz (2 cores, 2 chips) SLES 9, 1,900 RSA1024 signs/s; IBM p5 510 1.5 GHz (2 cores, 1 chip, SMT) AIX 5.3, 1,200 RSA1024 signs/s.

Results Summary

Results


37.0 K RSA1024 signs/s




38.9 Gb/s AES128

Reference Date:


August 7, 2007

Systems:


Sun SPARC Enterprise T5120/T5220

Total Number Processors:


1 chip / 8 cores/chip (8 threads/core)

Processor/GHz of Server:


Sun UltraSPARC T2 1.4 GHz

Operating System:


Solaris 10

Like this post? del.icio.us | furl | slashdot | technorati | digg

UltraSPARC T2 and its NIU that'll be good for you

Tuesday Oct 09, 2007

This summer we announced the UltraSPARC T2 chip, but one of the things we didn't talk about much was the US T2's NIU. So let's look at some of the delivered results.

By the by, you'll see a lot more on performance results on this blog today. Yep it's launch day. Now many of my colleagues are at CEC bellying up to the buffets and dropping their money at the tables, some of us are at home working to show you the latest :)

The UltraSPARC T2 10GbE has an integrated NIU (10GbE Network Interface Unit, the 10GbE is silent :) ) which provides better performance and reduces CPU overhead of network traffic when compared to servers that must use NICs (network interface cards). The UltraSPARC T2's NIU has much lower latency which reduces CPU overhead.

  • 10GbE transmit, maximum throughput is 36% higher performance and CPU efficiency is 23% better
  • 10GbE receive, maximum throughput is almost twice the performance, exceeding x8 bus bandwidth by 16%
UltraSPARC T2 with NIU has the following measured results TX: 14.6 Gb/s; RX 18.2 Gb/s. In contract the Atlas NIC has the following measured results TX: 10.7 Gb/s; RX 9.4 Gb/s.

All performance tests were run by Sun and of course used Solaris 10.

... but what about standard benchmarks, ny advice is either get this blog in your RSS or check back every hour as, "happy days are here again"

[7] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg