Cryptography Acceleration on UltraSPARC T2 Systems
Delivering on the same promise of Chip Multi-threading, this time,
Sun is offering an even more powerful chip with
more complete cryptographic features coupled with on-chip IO and
integrated 2x10-GBPS network. A true System On Chip. The Ultra SPARC T2,
being a successor of the UltraSPARC T1, combines the power of eight
8-way multithreaded cores offering 64 simultaneous processing threads
while capping the power consumption when compared with any other
comparable 8-core systems. More detailed information about the
UltraSPARC T2 specification can be viewed
here.
Zero Cost Cryptography
The UltraSPARC T1 processor offered 8 on-chip Modular Arithmetic Units (MAU), one per core. Today, these 8 MAUs are used to offload and accelerate RSA/DSA operations without compromising performance of the regular core functionality. RSA operation is an important component of the SSL full handshake.
The other part of cryptographic operation is encryption, decryption and hashing of sensitive data. Each core of the The UltraSPARC T2 processor contains a Streams Processing Unit (SPU) offering Encryption/Decryption and Hash-Operations offload engine, which can be used to offload DES, 3DES, AES-128, AES-192, AES-256, RC4, MD5, SHA1, SHA256. It also offers ECCp-160 and ECCb-163 used in Public Key exchange.
In addition to Encrypion/Decryption/Hash functionality, the UltraSPARC T2 processor has a on-chip Random Number Generator that is normally used by cryptographic applications for entropy data.
How to Configure
Just as on UltraSPARC T1, a PKCS#11 compliant cryptographic application on UltraSPARC T2 can take advantage of new cryptographic features via the Solaris Cryptographic Framework. N2CP, NCP and N2RNG are pre-configured and enabled by default on US T2 based systems. Secure applications such as Apache, SJS Web Server and Java Applications will need to be configured to go the PKCS#11 route if the default route is non-PKCS#11. Instructions given in my earlier blog for UltraSPARC T1 can still be used on UltraSPARC T2 except that new mechanisms can be added to offload encryption, decryption and hashing operations. For example, On US T1, to configure SJS Web Server NSS, we have:
Other complex mechanisms CKM_DSA_SHA1, CKM_MD5_RSA_PKCS , CKM_SHA1_RSA_PKCS, CKM_SHA256_RSA_PKCS, CKM_SHA384_RSA_PKCS and CKM_SHA512_RSA_PKCS listed above still can't be offloaded yet although the UltraSPARC T2 processor supports each SHA, MD5, DSA and RSA algorithm individually. To offload these complex mechanisms, we need to have JSSE or Metaslot break these complex mechansims into simple ones for pkcs11_kernel.so provider to handle individually because we don't have a single provider that aggregates the NCP and N2CP capabilities. This issue is being tracked and worked on in CR #6337157.
A blueprint article with step by step instructions on how to take full advantage of US T2 hardware cryptographic features is currently in the works. I will update this blog with a link as soon as it's ready. If you have specific questions please post your questions here and I will try to answer them.
Kernel SSL Proxy
Traditionally, applications running on systems equipped with Cryptographic Accelerators need to pay the cost of moving data from User land to Kernel and back to User land for the purpose of encryption/decryption and hashing. If the application is a secure network application like secure webserver, then it should be possible to complete the cryptographic operation in the kernel itself and let the application handle clear text data both for in-bound and out-bound data. This technique would save cpu cycles spent in moving data from User land to kernel and back. This behaviour is analogous to SSL proxy devices, where the cryptographic opertations terminate outside the system, except in this case the cryptographic operation completes in the kernel of the same system where the application is running therefore preserving end-to-end security. Taking the approach of letting the kernel complete the cyrptographic operations, makes the kernel act as a SSL proxy. This technique of making the kernel act as a SSL proxy device is at the heart of Kernel Cryptographic Solution (KSSL) offered with the Solaris 10 OS.
An introduction to Kernel Cryptographic Solution (KSSL) is provided at my previous blog.
On UltraSPARC T1 and UltraSPARC T2 based systems, KSSL can offload the RSA/DSA operation to the MAU and thus avoid data movement overhead from User Land to Kernel and back.
On UltraSPARC T2 based systems, in addition of offloading RSA operation, KSSL also offloads some commonly used encryption algorithms like RC4,DES and 3DES along with HMAC (SHA1_HMAC, MD5_HMAC) computation of outbound packets.
A perfect match for datacenters
US T2 based systems provide 2x10-GBPS NICs onboard in addidtion to the 4 onboard gigabit ports. Just like the benefit from having on-chip Cryptographic Acceleration, these on-chip 10-GBPS NICs do not suffer from the limitations of bus bandwidth such as found with PCI-(x/ex) devices. US T1 and T2 based systems both allow additional PCI-ex based devices to be added for additional IO requirements such as disks or additional 10-GBPS cards. They also both offer virtualization techniques such as Zones and LDOMs. The on-chip cryptographic engine and the on-chip 10-GBPS interfaces can be virtualized or partitioned and are accessible even from a virtualized environment.
Packaging all these features in just 1U or 2U systems, makes both US T2 ideal for datacenters looking for expansion and or consolidation, with considerable savings in Power and Cooling. These systems can be used in any deployment scenario in the datacenter from edge tier to the backend database tier.
Comments?
Related Links
Sun Fire T2000 and Secure Applications
Ultra-Fast Cryptography on the Sun UltraSPARC T2
Why on-chip accelerators?
UltraSPARC T2 - World's True System on a Chip
UltraSPARC T2 Supplement to the UltraSPARC architecture 2007
Zero Cost Cryptography
The UltraSPARC T1 processor offered 8 on-chip Modular Arithmetic Units (MAU), one per core. Today, these 8 MAUs are used to offload and accelerate RSA/DSA operations without compromising performance of the regular core functionality. RSA operation is an important component of the SSL full handshake.
The other part of cryptographic operation is encryption, decryption and hashing of sensitive data. Each core of the The UltraSPARC T2 processor contains a Streams Processing Unit (SPU) offering Encryption/Decryption and Hash-Operations offload engine, which can be used to offload DES, 3DES, AES-128, AES-192, AES-256, RC4, MD5, SHA1, SHA256. It also offers ECCp-160 and ECCb-163 used in Public Key exchange.
In addition to Encrypion/Decryption/Hash functionality, the UltraSPARC T2 processor has a on-chip Random Number Generator that is normally used by cryptographic applications for entropy data.
- The eight MAUs, one for each core, are driven by the Niagara Crypto Provider (NCP) device driver in the Solaris 10 OS for both UltraSPARC T1 and UltraSPARC T2 processor. NCP supports hardware assisted acceleration of RSA and DSA cryptographic operations. It's NCP's responsibility to load balance between the MAUs to reduce programming complexity. NCP is pre-configured and enabled in the Solaris Cryptographic Framework on both T1 and T2 systems.
- The eight SPUs, 1 per core on the UltraSPARC T2 processor are driven by the Niagara2 Crypto Provider (N2CP) device driver. N2CP supports hardware assisted acceleration DES, 3DES, AES, RC4, SHA1, SHA256, MD5, ECC and CRC32. It's N2CP's responsibility to load balance between the SPUs to reduce programming complexity. N2CP is pre-configured and enabled in the Solaris Cryptographic Framework on the T2 systems.
- Random Number Generator unit is driven by the Niagara2 Random Number Generator (N2RNG) in the Solaris 10 OS.
How to Configure
Just as on UltraSPARC T1, a PKCS#11 compliant cryptographic application on UltraSPARC T2 can take advantage of new cryptographic features via the Solaris Cryptographic Framework. N2CP, NCP and N2RNG are pre-configured and enabled by default on US T2 based systems. Secure applications such as Apache, SJS Web Server and Java Applications will need to be configured to go the PKCS#11 route if the default route is non-PKCS#11. Instructions given in my earlier blog for UltraSPARC T1 can still be used on UltraSPARC T2 except that new mechanisms can be added to offload encryption, decryption and hashing operations. For example, On US T1, to configure SJS Web Server NSS, we have:
$ modutil -dbdir . -nocertdb -add "Solaris Cryptographic Framework"
-libfile \ /usr/lib/libpkcs11.so -mechanisms RSA
$ modutil -dbdir . -nocertdb -add "Solaris Cryptographic Framework"
-libfile \ /usr/lib/libpkcs11.so -mechanisms RSA:AES:MD5
SSLCryptoDevice pkcs11
Other complex mechanisms CKM_DSA_SHA1, CKM_MD5_RSA_PKCS , CKM_SHA1_RSA_PKCS, CKM_SHA256_RSA_PKCS, CKM_SHA384_RSA_PKCS and CKM_SHA512_RSA_PKCS listed above still can't be offloaded yet although the UltraSPARC T2 processor supports each SHA, MD5, DSA and RSA algorithm individually. To offload these complex mechanisms, we need to have JSSE or Metaslot break these complex mechansims into simple ones for pkcs11_kernel.so provider to handle individually because we don't have a single provider that aggregates the NCP and N2CP capabilities. This issue is being tracked and worked on in CR #6337157.
A blueprint article with step by step instructions on how to take full advantage of US T2 hardware cryptographic features is currently in the works. I will update this blog with a link as soon as it's ready. If you have specific questions please post your questions here and I will try to answer them.
Kernel SSL Proxy
Traditionally, applications running on systems equipped with Cryptographic Accelerators need to pay the cost of moving data from User land to Kernel and back to User land for the purpose of encryption/decryption and hashing. If the application is a secure network application like secure webserver, then it should be possible to complete the cryptographic operation in the kernel itself and let the application handle clear text data both for in-bound and out-bound data. This technique would save cpu cycles spent in moving data from User land to kernel and back. This behaviour is analogous to SSL proxy devices, where the cryptographic opertations terminate outside the system, except in this case the cryptographic operation completes in the kernel of the same system where the application is running therefore preserving end-to-end security. Taking the approach of letting the kernel complete the cyrptographic operations, makes the kernel act as a SSL proxy. This technique of making the kernel act as a SSL proxy device is at the heart of Kernel Cryptographic Solution (KSSL) offered with the Solaris 10 OS.
An introduction to Kernel Cryptographic Solution (KSSL) is provided at my previous blog.
On UltraSPARC T1 and UltraSPARC T2 based systems, KSSL can offload the RSA/DSA operation to the MAU and thus avoid data movement overhead from User Land to Kernel and back.
On UltraSPARC T2 based systems, in addition of offloading RSA operation, KSSL also offloads some commonly used encryption algorithms like RC4,DES and 3DES along with HMAC (SHA1_HMAC, MD5_HMAC) computation of outbound packets.
A perfect match for datacenters
US T2 based systems provide 2x10-GBPS NICs onboard in addidtion to the 4 onboard gigabit ports. Just like the benefit from having on-chip Cryptographic Acceleration, these on-chip 10-GBPS NICs do not suffer from the limitations of bus bandwidth such as found with PCI-(x/ex) devices. US T1 and T2 based systems both allow additional PCI-ex based devices to be added for additional IO requirements such as disks or additional 10-GBPS cards. They also both offer virtualization techniques such as Zones and LDOMs. The on-chip cryptographic engine and the on-chip 10-GBPS interfaces can be virtualized or partitioned and are accessible even from a virtualized environment.
Packaging all these features in just 1U or 2U systems, makes both US T2 ideal for datacenters looking for expansion and or consolidation, with considerable savings in Power and Cooling. These systems can be used in any deployment scenario in the datacenter from edge tier to the backend database tier.
Comments?
Related Links
Sun Fire T2000 and Secure Applications
Ultra-Fast Cryptography on the Sun UltraSPARC T2
Why on-chip accelerators?
UltraSPARC T2 - World's True System on a Chip
UltraSPARC T2 Supplement to the UltraSPARC architecture 2007
Using the n2cp0 provider with hardware-based hashing algorithms causes SSL handshake errors. When I disabled the hashing algorithms the SSL handshake worked and I could see incremental counts for both RSAPRIVATE (NCP) and AES (N2CP) drivers. Any help on how to make the hardware hashing algorithms work with Apache using the PKCS#11 library and openssl version 0.9.7d bundled with S10U4 ?
Posted by Jeroen on December 17, 2007 at 12:43 AM PST #
Hi Jeroen,
Which version of Solaris do you have? There is a bug for this exact problem: http://bugs.opensolaris.org/view_bug.do?bug_id=6606361
It's been fixed in OpenSolaris. But there is no patch yet for Solaris 10 Updates. So if you want to get the fix now, you will need to switch to OpenSolaris community version - here is the DVD image at http://opensolaris.org/os/downloads/sol_ex_dvd/. If you are already on some older version of OpenSolaris, to get the fix, you need to do a "bfu" to the following archive. http://dlc.sun.com/osol/on/downloads/current/on-bfu-nightly-osol-nd.sparc.tar.bz2 Instructions on how to do bfu can be found at: http://opensolaris.org/os/community/on/devref_toc/devref_5/ Please read section 5.3. Remember you will need to have the OpenSolaris installed first before you can do this bfu.
Thanks,
-ning
Posted by Ning Sun on December 21, 2007 at 11:43 AM PST #
Hi Ning;
We're using httpclient in java to connect to an SSL web service. We're also using NCP and have CKM_MD5_RSA_PKCS (PKCS #1 MD5 With RSA Encryption) disabled in sunpkcs11-solaris.cfg.
When we attempt to open the connection we're getting the following error:
java.lang.NoClassDefFoundError
at javax.crypto.Cipher.getInstance(DashoA12275)
at com.sun.net.ssl.internal.ssl.JsseJce.getCipher(JsseJce.java:90)
at com.sun.net.ssl.internal.ssl.RSACipher.<init>(RSACipher.java:35)
at com.sun.net.ssl.internal.ssl.RSACipher.getInstance(RSACipher.java:69)
I think it's interesting that 1) the server certificate is CKM_MD5_RSA_PKCS and 2) the code works under windows.
In your opinion is this failure due to using the pkcs11 provider in Java and having CKM_MD5_RSA_PKCS in the disabled list? I have found a few others having this same issue but no open bug reports to explain it.
Posted by Nathan Wray on February 13, 2008 at 08:47 AM PST #
Hi Ning,
In your document: "USING THE CRYPTOGRAPHIC ACCELERATORS IN THE ULTRASPARC T1 AND T2 PROCESSORS" why is Prefork mode recommended for Apache 2.2.x web server versus Worker mode (threaded)?
thank you,
Emiro
Posted by Emiro Uribe on August 21, 2009 at 08:19 AM PDT #
@Jeroen and @Ning: to get a patch for a bug in Solaris 10 update release, one has to raise a support call and make the Sun Services file an escalation. The patches for this particular bug (CR 6606361) were released in S10u5 time frame.
Posted by Vladimir Kotal on August 25, 2009 at 04:05 AM PDT #
@Emiro: The 819-5782 blueprint does not seem to mention the MPM-worker mode at all. I think this is because the Apache shipped with Solaris 10 by default only supports the prefork mode. In general the MPM-worker mode could be beneficial for SSL processing with crypto offloaded to the MUAs on Niagara CPUs (depending on workload). AFAIK officially supported Apache with worker mode module is available via Webstack: http://www.sun.com/systems/solutions/amp/index.jsp
Posted by Vladimir Kotal on August 25, 2009 at 04:33 AM PDT #