Slog - Sohrab's weblog
Moore forwarding (all pun intended)
My previous blog talked about 2 possibilities.
1. That today’s general purpose computing chips with multi core architectures could process network packets fast enough to create a shift away from expensive specialized hardware back to processing network traffic on general purpose servers.
2. General purpose computing chips with multi core architectures may provide "enough" performance to consider using a general purpose Operating system in certain deployments.
Armed with my theory I set about to investigate performance related questions (point 2) with the help of some great talent in the Solars networking group (a lot of the credit goes to Garrett D, Sangeeta M, and a few other folks who volunteered their time to this effort). The question I set out to answer is, whether CMT helps forwarding performance (in particular on general purpose OS's)? If so, then by how much ?
The tests that we ran, were experimental in nature (and should be treated as such), done more to prove a theory rather than provide data in a sterile, sanitized "benchmark". At the same time the people running the tests are highly skilled engineers and have some degree of exposure to network performance.
We 1st tried to get some base line numbers on a Spirent system, generating small (64 byte UDP packets), against a x64 Sun server running Solaris 10 with 2 Sockets (2 freestanding cpu's) and each socket having 2 cores. We then compared these numbers using a Sun prototype machine with a single UltraSPARC T2 (8 core) processor. After a bit of tinkering ( a lot more than the x64 case since we had to understand the system), the initial results are quite outstanding. In a Bidirectional forwarding test the single chip UltraSPARC T2 pumped close to 300% more packets through the system than the 2 chip (2core) x64 system. It is interesting to note that the UltraSPARC T2 when using 1.5K byte packets, saturated the 10G line (single stream at 18% CPU utilization). We also ran a sanity test against Linux and got numbers where it appears that Solaris was at least 30% better on an apples to apples comparison.
The interrupt driven context of the network traffic demonstrates predictable behavior by the threads of the UltraSPARC T2 by exhibiting some uneven CPU utilization across them. This leads us to belief that we can significantly improve the throughput by creating a much more even distribution across all the threads (today there are some threads that have a high utilization and end up creating starvation to a majority of the threads). All this has generated a great level of excitement and the Crossbow project on open Solaris has an
architecture to parallelize the network stack without any overheads
so all the Niagara threads can do work in parallel. This is currently planed to be made available as part of project crossbow (http://opensolaris.org/os/project/crossbow/ ), (http://blogs.sun.com/sunay), so stay tuned for future posts around this effort.
Bear in mind though that there is also an on going effort to examine the network performance of the UltraSPARC T2 chip much closer to the metal (point 1). There, the initial data on the UltraSPARC T2 shows a multi factor improvement in throughput over the above mentioned test. More information on this can be obtained from the following paper:
http://www.sun.com/products-n-solutions/docs/SUNP_wp.pdf
Posted at 07:15AM Jan 11, 2008 by sohrab in Sun | Comments[1]
Posted by c0t0d0s0.org on January 12, 2008 at 03:22 AM PST #