Speeding up your small crypto operations
When using the Cryptographic Framework either directly or plugged into something like NSS, small crypto operations can bog down your machine because the memory allocation locking which will block other operations. Using libumem, it relieves the load on the allocation locks. When using a T1000, reporting 24 cpus, with each cpu doing a small operation continuously with SHA1 or AES. There is a very noticeable problem.
| Memory allocator: | libc.so | libumem.so |
| Avg Time per Operation: | 29040 nsec | 20803 nsec |
| Operations per second: | 34424 | 48070 |
| CPU idle time: | 41% | 0% |
| Memory allocator: | libc.so | libumem.so |
| Avg Time per Operation: | 127245 nsec | 52903 nsec |
| Operations per second: | 7859 | 18903 |
| CPU idle time: | 0% | 0% |
| Memory allocator: | libc.so | libumem.so |
| Avg Time per Operation: | 67035 nsec | 35353 nsec |
| Operations per second: | 14917 | 28285 |
| CPU idle time: | 55% | 0% |
| Memory allocator: | libc.so | libumem.so |
| Avg Time per Operation: | 186246 nsec | 165087 nsec |
| Operations per second: | 5369 | 6057 |
| CPU idle time: | 0% | 0% |
Igorning the performance gains, note the cpu utilization for 16 byte operations. Using the default memory allocator, 41% and 55% of the cpu time was idle, that is nearly 10 and 13 cpus worth of power sitting on the sidelines doing nothing.
Looking at the 256 and 512 byte data file, the malloc locking does not cause any idle time, but it reduces the performance. By the time you get over 1024 bytes the locking is less of a problem as the crypto algorithms are taking most of the operation time and the locks are colliding less.
In the end, this can give you an overall performance boost with SSL because many transactions are small one. You should see better performance with your web servers and directory servers. Many web server transactions, particularly from the clients have small in size; as well as, most the directory server/ldap transactions.
Make sure you are using libumem. The easiest way is the preload the library:
# LD_PRELOAD=/usr/lib/libumem.so.1; export LD_PRELOAD
# LD_PRELOAD_64=/usr/lib/libumem.so.1; export LD_PRELOAD_64