In today's environment, security is becoming ever more essential, whether we be talking about web servers, databases, file systems or networking. However, the high cost associated with security is problematic; if I have a system that is capable of performing X operations per second when running in an non-secure mode, when I flip that metaphorical switch and go secure, the throughput of operations that the system can sustain will fall drastically. 2X slowdowns are commonplace and 5X, or even 10X, slowdowns are not that uncommon.

As a result of this high cost, there is often significant reluctance to develop and deploy the comprehensive security strategies that are required in today's world; leading to the serious consequences that we read about all too frequently.

So what is typically done to remedy this situation?

If you look at the security overheads, the vast majority of the overhead is frequently attributable to the cryptographic operations that underpin the security protocols. However, general purpose processors are ill suited to performing cryptographic operations. As a result, we often try to offload the cryptographic processing to custom hardware that can perform the operations orders of magnitude faster than can be achieved on the processor.

Accordingly, accelerators should allow us to convert the significant security overheads into virtually negligible overheads. Essentially, accelerators should allow us to achieve zero cost security! (by which I mean that there should be a negligible performance impact associated with going secure).

Unfortunately, accelerators have largely failed to deliver on this.

This is basically a result of the way we have architected and deployed accelerators; we have a system, and then, almost as an afterthought, we add in the PCI-based accelerator card. With this architecture, the cost of offloading an operation to the accelerator can be very high, significantly limiting the type of cryptographic operation that can be cost effectively offloaded; its frequently more cost effective to just perform the processing on the processor!

With the UltraSPARC T2 processor, we have moved the crypto accelerators on-chip and tightly coupled them with the processor cores. As a result, it has been possible to radically reduce the overheads associated with offloading an operation to the accelerators. In turn, this allows the T2 accelerators to cost effectively handle a much broader range of cryptographic operations than traditional offchip accelerators and enables the UltraSPARC T2 processor to deliver zero cost security in a wide variety of application spaces.

Comments:

The benefit is really dependant on the kind of crypto you're doing though, isn't it?

I mean, persistent (e.g. SSH and LDAP) connections do the expensive work once at session startup, then they fall back to symmetric (cheap) encryption for the rest of the traffic.

HTTP seems to be the area where it really shines - short sessions - but even then something like a keepalive reduces the benefit. I guess disk encryption would be another good candidate?

Posted by Dick Davies on August 08, 2007 at 03:18 PM PDT #

While bulk ciphers are fairly cheap in SW compared to public-key operations, the cost rapidly mounts when you have have a multitude of concurrent streams or even just a few high BW streams. For T2 processor, with its integrated support for 10Gb Ethernet, the SW cost of performing bulk ciphers and hashes to match these kind of bit rates is significant -- the HW support ensures we can facilitate line speed encryption and decryption.

In SW, for AES-128, to process each 16-byte block, over 500 instructions are required. As a result, even high-performance single-thread processors such as Clovertown, will struggle to provide more than around 1.5Gb/s/core; so even using all cores in the quad-core processor, it won't be possible to obtain line rate with 10GbE if you are interested in secure networking.

Posted by Lawrence Spracklen on August 10, 2007 at 05:53 PM PDT #

Good point - 64 threads maps to quite a few zones and associated traffic, I suppose.

The last time I did any kind of benchmark I was on 802.11g with a 1.4Ghz VIA CPU - things have moved on a bit since then :)

Posted by Dick Davies on August 11, 2007 at 02:35 AM PDT #

It's not just SSHv2 or LDAP. Think of NFS, pNFS, CIFS, WebDAV. Also, there are no side channel timing attacks against T2's crypto HW because it does not affect the memory caches while it computes the results.

Posted by Nico on August 13, 2007 at 03:45 PM PDT #

Post a Comment:
Comments are closed for this entry.

This blog copyright 2009 by sprack