The following numbers from a kernel micro benchmark run on a T5440 show that the crypto stack scales nicely in the current build, snv_117. This micro benchmark calls crypto_encrypt() in a loop for CKM_AES_CBC mechanism with a 128-bit key.
#modload saes_scale_atomic (8192 byte input data size, crypto_encrypt() atomic call, in-place)
|
# Threads |
Throughput in MBytes/sec |
|---|---|
|
1 |
400 |
|
8 |
3189 |
|
16 |
6331 |
|
32 |
12663 |
|
64 |
14344 |
|
128 |
8920 |
|
256 |
7483 |
So, why the decrease after 64 threads? It turned it is because of too many thread context switches caused by the threads cv_wait'ing on a CWQ. Incidentally, there are 32 CWQ units on a T5440. I added the following line in n2cp.conf and redid the above tests -
n2cp-sync-threads=8;
#modload saes_scale_atomic (8192 byte input data size, crypto_encrypt() atomic call, in-place)
|
# Threads |
Throughput in MBytes/sec |
|---|---|
|
1 |
400 |
|
8 |
3189 |
|
16 |
6331 |
|
32 |
12663 |
|
64 |
15109 |
|
128 |
18363 |
|
256 |
22932 |
There is a penalty for setting the spinners to 8 though, which is increased CPU consumption. In practice, a workload is unlikely to have more than 64 threads all doing crypto_encrypt() at the same instant. So, the default value of 1 will work fine.