Zero-cost security?
In today's environment, security is
becoming ever more essential, whether we be talking about web
servers, databases, file systems or networking. However, the high
cost associated with security is problematic; if I have a system that
is capable of performing X operations per second when running in an
non-secure mode, when I flip that metaphorical switch and go secure,
the throughput of operations that the system can sustain will fall
drastically. 2X slowdowns are commonplace and 5X, or even 10X,
slowdowns are not that uncommon.
As a result of this high
cost, there is often significant reluctance to develop and deploy the
comprehensive security strategies that are required in today's world;
leading to the serious consequences that we read about all too
frequently.
So what is typically done to remedy this
situation?
If you look at the security overheads, the vast
majority of the overhead is frequently attributable to the
cryptographic operations that underpin the security protocols.
However, general purpose processors are ill suited to performing
cryptographic operations. As a result, we often try to offload the
cryptographic processing to custom hardware that can perform the
operations orders of magnitude faster than can be achieved on the
processor.
Accordingly, accelerators should allow us to
convert the significant security overheads into virtually negligible
overheads. Essentially, accelerators should allow us to achieve zero
cost security! (by which I mean that there should be a negligible
performance impact associated with going secure).
Unfortunately,
accelerators have largely failed to deliver on this.
This is
basically a result of the way we have architected and deployed
accelerators; we have a system, and then, almost as an afterthought,
we add in the PCI-based accelerator card. With this architecture, the
cost of offloading an operation to the accelerator can be very high,
significantly limiting the type of cryptographic operation that can
be cost effectively offloaded; its frequently more cost effective to
just perform the processing on the processor!
With the
UltraSPARC T2 processor, we have moved the crypto accelerators
on-chip and tightly coupled them with the processor cores. As a
result, it has been possible to radically reduce the overheads
associated with offloading an operation to the accelerators. In turn,
this allows the T2 accelerators to cost effectively handle a much
broader range of cryptographic operations than traditional offchip
accelerators and enables the UltraSPARC T2 processor to deliver zero
cost security in a wide variety of application spaces.

The benefit is really dependant on the kind of crypto you're doing though, isn't it?
I mean, persistent (e.g. SSH and LDAP) connections do the expensive work once at session startup, then they fall back to symmetric (cheap) encryption for the rest of the traffic.
HTTP seems to be the area where it really shines - short sessions - but even then something like a keepalive reduces the benefit. I guess disk encryption would be another good candidate?
Posted by Dick Davies on August 08, 2007 at 03:18 PM PDT #
While bulk ciphers are fairly cheap in SW compared to public-key operations, the cost rapidly mounts when you have have a multitude of concurrent streams or even just a few high BW streams. For T2 processor, with its integrated support for 10Gb Ethernet, the SW cost of performing bulk ciphers and hashes to match these kind of bit rates is significant -- the HW support ensures we can facilitate line speed encryption and decryption.
In SW, for AES-128, to process each 16-byte block, over 500 instructions are required. As a result, even high-performance single-thread processors such as Clovertown, will struggle to provide more than around 1.5Gb/s/core; so even using all cores in the quad-core processor, it won't be possible to obtain line rate with 10GbE if you are interested in secure networking.
Posted by Lawrence Spracklen on August 10, 2007 at 05:53 PM PDT #
Good point - 64 threads maps to quite a few zones and associated traffic, I suppose.
The last time I did any kind of benchmark I was on 802.11g with a 1.4Ghz VIA CPU - things have moved on a bit since then :)
Posted by Dick Davies on August 11, 2007 at 02:35 AM PDT #
It's not just SSHv2 or LDAP. Think of NFS, pNFS, CIFS, WebDAV. Also, there are no side channel timing attacks against T2's crypto HW because it does not affect the memory caches while it computes the results.
Posted by Nico on August 13, 2007 at 03:45 PM PDT #