Today's Page Hits: 285
This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.
Optimizing OpenSolaris With Open Source
Our Golden Retriever puppy, Fannie Mae |
OpenSolaris uses common source to implement both user-land libraries (pkcs11_softtoken.so for cryptography and libmd.so for hash algorithms) and kernel-land modules (/kernel/crypto/amd64/* modules accessible through pkcs11_kernel.so). All these optimizations apply to both userland and kernel. Replacing C with assembly wasn't a straight drop-in process, mainly due to differences in implementation between OpenSolaris and the Open Source used. The main differences include function definitions and data structures for keys and context (state).
Availability All these optimizations are available in OpenSolaris 2008.11 and Solaris Nevada build 93 or latter. Except for AES and SHA2, I backported all optimizations to Solaris 10 10/08 (aka U6). I backported the AES optimization into the next Solaris 10 update, which should be released in 2009.
ARCFOUR Optimization shown by openssl speed |
Performance Gain
The chart shows ARCFOUR performance gains with AMD64 2.2GHz and Intel EM64T 2.1GHz processors running OpenSolaris.
ARCFOUR shows a large gain of 2x-4x with amd64 assembly over C.
All performance numbers shown here use the same two systems with these two processors.
I show the gain from running the benchmark
/usr/sfw/bin/amd64/openssl speed -evp rc4 -elapsed -engine pkcs11
Running SPECweb2005-banking shows these improvements with ARCFOUR and MD5 optimization:
Marc Bevand |
MD5 Optimization shown by openssl speed |
By the way, Bevand also wrote software that runs on Sony PlayStation3 to crack UNIX crypt passwords by brute force. (The crypt algorithm is used by default on Solaris, but not OpenSolaris 2008.11. If you have CRYPT_DEFAULT=__unix__ set in /etc/security/policy.conf, you have this vulnerability) It takes advantage of the 128-bit wide PS3 processor, by parallelizing boolean operations on each bit of the 128 bits on each of 7 cores. On average a UNIX crypt password can be cracked in 70 days with one PS3 at a cost of $1100 of electricity (at $0.10/KWh). Multiple PS3s will crack a password faster. But lets get back to our topic, MD5 . . .
Performance Gain
I show the gain in MD5 performance by running the benchmark
/usr/sfw/bin/amd64/openssl speed -evp md5 -elapsed -engine pkcs11
The gain increases with data size.
SHA1 and the SHA2 family of hash algorithms are NIST standards. Use of the older SHA1 standard, should be avoided over SHA2 because of weaknesses recently found in the algorithm. A replacement for SHA1 and SHA2 (called AHS) is in the works by NIST. Each generation of SHA hashes have a more-complex algorithm and longer hash result as shown here:
$ digest -a md5 osol-0811-rc2-ai.iso b9e8a553b310150d56a4269c651d6bc4 $ digest -a sha1 osol-0811-rc2-ai.iso 27f031fdd594ade74eeefda0afc428a75d1fa13e $ digest -a sha256 osol-0811-rc2-ai.iso fa19f36503aeaf9392a76946b00f5f657cc88cf6f456378f74d06b99ada35e6e $ digest -a sha384 osol-0811-rc2-ai.iso 1572843b6802dc98a2a6e46fbe99e9547a4895ce4ca11fbde8e1129cb67d887e39ac67ae4981eefe ccd8395e477ea718 $ digest -a sha512 osol-0811-rc2-ai.iso 126687444951c68129b253ddc752f9a9c476fd1690bcf79405ee94d89ae2ecec3d90cd5fb8797b4c d6b2c286e1b000469cb03b10ca5c7d876ab3ceb76e019526
For SHA1 and SHA2 I used OpenSSL's hand-coded amd64 assembly written by Andy Polyakov (who also written ARCFOUR assembly, above).
SHA2 Optimization shown by microbenchmarks |
SHA1 Optimization shown by openssl speed |
AES encryption flash animation |
At first I replaced the OpenSolaris C implementation of AES with the OpenSSL assembly implementation, but to my surprise the assembly version was about the same as C, using the Sun Studio C compiler with "cc -O" optimization. In fairness, the OpenSSL implementation makes a performance gain by tightly-integrating the code for AES and CBC feedback mode. However, with AES alone or AES with other feedback modes, the assembly and C implementations perform about the same.
Next, I tried Dr. Brian Gladman's AES implementation and found it was faster than both C and OpenSSL assembly, so used Gladman's assembly source. The assembly source is encoded with YASM-style assembly and macro syntax, so I translated it to Solaris assembly language and cpp-style #define/#ifdef macros.
IPSec Optimization shown by the netperf |
AES Optimization shown by the encrypt(1) command (lower is better) |
AES-128 Optimization shown by openssl speed |
AES is used by IPsec. and has improved IPsec. throughput. Dan McDonald ran the netperf benchmark with one pair of connected e1000g Ethernet ports on two Galaxy systems. The throughput on the 56x4 TCP_STREAM tests with just one SA (no parallelism) improved from 362Mbit/sec. to 463Mbit/sec. From these numbers, FTP throughput improves from about ~300Mbit/sec. to 444Mbit/sec.
Posted at 06:11PM Dec 12, 2008 by DanX in Solaris | Comments[1]
First of all I want to point out that OpenSSL rc4-x86_64 module is *not* "based on an earlier version by Marc Bevand." Well, not to diminish Marc's effort, rc4-x86_64 second optimization round was triggered by his submission, but it was *second* round.
Secondly, original OpenSSL rc4-x86_64 effectively has three code-pathes: AMD, Intel pre-Core and Intel Core specific. Second one was omitted from OpenSolaris. I'm not judging the decision (though from commentary section it seems that it was done based on wrong analysis), I simply feel that this needs to be said.
As for OpenSSL AES performance. Again, I'm not judging the decision, it just needs to be said. OpenSSL module has a number of countermeasures against timing attacks, which naturally have impact on performance. The number varies from release to release, e.g. in recent 0.9.8 you'll find that the loops are folded to preclude correlation between D- and I-cache timings. Then the last round is properly protected (as far as I can see Brian's code provides this as an option, but it's not utilized in OpenSolaris). Development branch provides even further degree of protection... In other words OpenSSL AES assembler modules are not only about performance, they're as much [and sometimes even more] about security. Cheers. A.
Posted by Andy Polyakov on January 08, 2009 at 12:57 AM PST #