Saturday March 29, 2008
If I could just get the stupid thing to build
Link Aggravation
Have been trying to do something useful with Link Aggregation on a T5120 connected to a Linksys SRW2048 switch. The whole rig includes a couple of those cute little 1u x4200 servers we sell, the T5120 (64 harware thread CMT system), the Faban Test harness, a benchmark designed to test a whole load of Web 2.0 stuff and a MySQL instance.
I couldn't for the life of me make Link aggregation work on incoming packets. it was obvious from various forum threads that the Linksys load balanced packets based on the MAC address of the system sending the traffic. But the implications of what that meant took a long time to sink in. I'd configure the aggregation on the System Under Test (hereby referred to as the SUT) using dladm:
dladm create-aggr -d e1000g1 -d e1000g2 1
Which creates an aggregation called aggr1 from the interfaces e1000g1 and e1000g2. Then I configured the switch, which involved using Windows and Internet Explorer and a web based interface that logged me out after 5 minutes of inactivity. It's fairly straightforward to configure a Link Aggregation Group (LAG) on the Linksys providing you do everything in exactly the right order (and provided you don't get distracted for 5 minutes). A LAG is the switch side aggregation of 2 or more ports, hopefully in my case the ones that are connected to the interfaces on the SUT that are part of the aggregation.
My test harness has two agents running on separate x4200s and generating load to the SUT. The SUT has two interfaces aggregated (or teamed) which results in a virtual network interface called aggr1 . You can use dladm to look at the traffic on the interfaces that make up the integration:
dladm show-aggr -s -i 5 1
The people who wrote dladm didn't get the formatting of the output right and you basically have to memorize the column positions of the output data. You end up using the %ipkts and %opkts metric for each interface as that never goes above 100 and so it's position doesn't change. The output looks like this:
key: 1 ipackets rbytes opackets obytes %ipkts %opktsTotal 193732 167576255 398676 515340307
e1000g1 194852 168869030 214144 274881494 100.6
53.7
e1000g2 0 0 185943 242021790 0.0
46.6
Which shows all of the incoming traffic (%ipkts) being sent to e1000g1. This traffic is coming from 2 systems, each with a different MAC address (duh) so why no load balancing?
Turns out that the load balancing on the switch is more routing than load balancing. For a 2 port LAG (as in this case) Packets from src MAC address 1 are sent to the first port of the LAG, packets from src MAC address 2 are sent to the second port of the LAG, packets from src MAC address 3 are sent to the first port of the LAG and so on. Packets from the same src MAC address are always sent to the same port so as to avoid any re-ordering issues. So depending on the traffic on my network, it's really a matter of luck as to whether the traffic from the two agents goes to the same port on the LAG or to different ports on the LAG. This wouldn't be a problem if I had 2000 clients sending traffic from 2000 different systems, but there's only two (which might well be the case with a server system fronted by a reverse proxy).
There is a workaround that we tried in our test environment, we changed the MAC address of one of the client systems using ifconfig (not something I'd generally recommend), until the incoming traffic (according to dladm) was balanced across the two interfaces. This seemed to work every time and had me chanting "Wax on , Wax off" as I toggled the MAC address of one of the loaders and watched the traffic move to a different interface on the SUT. Here's the final result as given by dladm:
key: 1 ipackets rbytes opackets obytes %ipkts %opkts
Total 176194 156273288 342248 441581280
e1000g1 74939 67362534 206674 269054155 42.5
60.4
e1000g2 102336 90181188 136617 173791907 58.1
39.9
Apparently it would be better if we were able to load balance on the switch at Layer 4 (the transport layer) which would allow load balancing of traffic based on network endpoints (i.e. IP address and port number) but it seems likely that our Linksys SRW2048 switch doesn't support this.
There are useful entries on Nicolas Droux' blog showing the architecture of the Link Aggregation subsystem in Solaris 10 and on setting up a Link Aggregation.
Posted at 08:31PM Mar 29, 2008 by MandyWaite in Lighttpd | Comments[0]