Monday Mar 10, 2008

Partitioning Network Resources for a Virtualized World (New MAC Driver Interfaces in Crossbow)

I don't want to say too much about the virtualization story, since there have been a lot of different explanations about the motivations of virtualizing the unbalanced computer world. Regardless whether you buy those various opinions, in network area, when you see how unreasonable those hardware resources are allocated and used today, you may see the necessity to virtualize our network.

GLDv3 is the new framework of network drivers in Solaris. This framework has been evolving more than three years. Besides redesigning those important data structures and reimplementing major modules, Crossbow introduced VNIC, FLOW, and SRS(soft-ring-set) to GLDv3. Dynamic switching between interrupt mode and polling mode enforces precise bandwidth control for NIC interfaces, flows, even single connections. Indeed, Crossbow project has re-architected GLDv3 in last several months. Maybe this is not the best excuse for delaying publishing GLDv3 interface, but we've been trying hard to finalize the interfaces to provide a stable framework so that driver developers can easily write a new driver for those virtualization-ready hardwares.

I would like to call the part of  Crossbow I'm talking about here Hardware Resrouces Management, that includes how the drivers register hardware resources(rx/tx channels and interrupts, etc.), how the MAC layer organize various resources for virtualization purpose, and how we enforce performance while virtualizing those resources. Ethernet is the current focus.

Since the first 10Gb Ethernet NIC was developed, both hardware vendors and driver writers have been seeking methods to release the power of 10GbE. PCI-E eliminated the ~8.5Gbps bus throughput limitation of PCI-X 133, sophisticated hardware offloading technologies moved a lot of software work to hardware to save CPU times, and multiple hardware traffic channels that can perform simultaneously with multiple CPU threads.  Multiple parallel channels needs packet classification to work, and the classification can be done by software on transmit side while hardware packet classification is needed for receive side.

The hardware virtulaization of NIC starts from packet classification. Simply, packet classification can be done based on many different rules, like MAC addresses, VLAN tags, L3/L4 headers, or packet length. The hierarchy of different types of classifications impacts the architecture of driver very much. Because MAC address have been used as an "identifier" of a NIC, we considered MAC address classification the basic feature for hardware virtualization. Given that Multiple MAC addresses has became a common feature implemented in most modern NICs,  as soon as the hardware can steer inbound packets that have different destination MAC addresses to different receive channels, each receive channel perform just like a independent NIC with a unique MAC address. By utilize these features, we may virtually build some NIC interfaces in half(receive function only). To have a functional virtual NIC, transmit capability is also needed. Since NIC hardware don't care the included MAC address in outbound packets, the framework may simply allocated a number of transmit channels to the virtual NIC according its configured transmit speed. Now, with Multiple Traffic Channels and Hardware MAC Address Classification, virtual NICs have been built up on top of the same NIC hardware.  In fact, actual hardwares are more complex than this simple model and the framework need to be much more flexible. We'll discuss it further blow.

From our point of view, all layer 2 classifications, including MAC address, VLAN tag, etc., should be used for virtualization purpose while L3/L4 classifications are useful for load balance purpose that enhances performance. When packets that have the same destination MAC address are classified, we also want them to be processed simultaneously on multiple CPU threads when one CPU thread is not powerful enough to process the whole traffic. After L2 classification, packets could be hashed to multiple channels based on their L3/L4 information by the hardware load balancer. So we have a two level hardware classification model, in which we define the multiple receive rings(channels) targeted by the load balancer a Ring Group. The Group is associated with one or more MAC addresses depending upon whether this group is shared by multiple VNICs.

(to be continued)

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed