Let it rip
Sunay Tripathi's blog on Solaris Networking, Network Virtualization, Crossbow, Cloud Computing etc.
Sun Distinguished Engineer's view on code, architecture and industry trends

20091104 Wednesday November 04, 2009

Crossbow paper wins best paper award at Usenix LISA09 and BOF schedule

Crossbow paper wins best paper award at Usenix LISA09 and BOF schedule

We had submiited another paper at Usenix LISA 2009 conference at Baltimore, MD which is being held from Nov 3-5, 2009. The paper is title Crossbow Virtual Wire: Network in a Box. Yesterday we were informed that our paper won the Best Paper award for the conference. Woohoo!!

I met many people here at LISA that are already using Crossbow in very interesting ways. I got many requests to hold a BOF while we were here. So I hit our marketing VP for some beer budget (can't have a BOF without drinks) and we are now having a Crossbow and Solaris Networking BOF on Nov 4th, 2009 from 10.30 to 11.30pm in Dover AB conference room. The venue details can be found on usenix LISA site here. So people who are already at the conference on in the general area of Maryland, Virginia, DC, etc, please do come buy. It would be good to attach faces to name and we will have chilled beer. We will also be showing the Virtual Wire Builder kit to build your own virtual network (all available in open source form).

Once again, BOF details are
  • Crossbow & Solaris Networking BOF at Usenix Lisa 2009
  • Place: Marriott Waterfront Hotel, Baltimore. MD.
  • Date: Nov 4th, 2009
  • Time: 10.30-11.30pm
  • Agenda: Virtual Wire Builder kit, Open discussion, Beer

Hope to see you there.
(2009-11-04 12:10:39.0) Permalink Comments [0]

20090712 Sunday July 12, 2009

Crossbow Launch, Talk and BOF at Community One and Java One

Crossbow Launch, Talk and BOF at Community One and Java One

On June 1, 2009, during Community One and Java One in San Francisco, California, Crossbow was formally launched as part of OpenSolaris 2009.06. The morning started with a keynote where John Fowler, EVP of Sun Systems group formally announced OpenSolaris 2009.06 as the beta for next enterprise release of Solaris.Next (Next release after Solaris 10). He and Greg Lavender then went on to show the Crossbow feature and the Virtual Wire demo. Later in the day I did a talk on Crossbow where Nicolas and Kais accompanied me and showed the Crossbow Virtual wire demo in detail. Bill Franklin and some of his cohorts were dressed as Crossbow knights and they charged in the room right after the talk. I think people just got a shock of their life. It was very entertaining.

The launch got lot of visibility and very good press coverage which can be see on the Crossbow News page. The most notable ones were: On June 2, 2009 we held the Crossbow BOF in the evening. Great showing and great support from the Community. So great stuff and a good closure for Phase 1 of the Crossbow project. The team members were pretty happy and relived. Now trying to get the next intermediate phase going so we can complete the story for next enterprise release of Solaris which might or might not be called Solaris11. Key things are more analytics (dlstat/flowstat), some security/anti spoofing features and more usablity etc. More details are being discussed on the Crossbow Discussion page.
(2009-07-12 14:59:12.0) Permalink Comments [1]

20090710 Friday July 10, 2009

2xtr7ip5ya 2xtr7ip5ya (2009-07-10 10:54:30.0) Permalink Comments [1]

20090526 Tuesday May 26, 2009

Crossbow Sigcomm09 papers are now online

Crossbow Sigcomm09 papers are now online

Here are the details of the two Crossbow ACM Sigcomm09 papers
  • Crossbow: From Hardware Virtualized NICs to Virtualized Networks Abstract: This paper describes a new architecture for achieving network virtualization using virtual NICs (VNICs) as the building blocks. The VNICs can be associated with dedicated and independent hardware lanes that consist of dedicated NIC and kernel resources. Hardware lanes support dynamic polling, which enables the fair sharing of bandwidth with no performance penalty. VNICs ensure full separation of traffic for virtual machines within the host. A collection of VNICs on one or more physical machines can be connected to create a Virtual Wire by assigning them a common attribute such as a VLAN tag. The full paper is available here
  • Crossbow: A vertically integrated QoS stack Abstract: This paper describes a new architecture which addresses Quality of Service (QoS) by creating unique flows for applications, services, or subnets. A flow is a dedicated and independent path from the NIC hardware to the socket layer in which the QoS layer is integrated into the protocol stack instead of being implemented as a separate layer. Each flow has dedicated hardware and software resources allowing applications to meet their specified quality of service within the host.
    The architecture efficiently copes with Distributed Denial of Service (DDoS) attacks by creating zero or limited bandwidth flows for the attacking traffic. The unwanted packets can be dropped by the NIC hardware itself at no cost.
    A collection of flows on more than one host can be assigned the same Differentiated Services Code Point (DSCP) label which forms a path dedicated to a service across the enterprise network and enables end-to-end QoS within the data center. The full paper is available here
Enjoy reading and join us for the talk BOF and party at Community One (see the previous entry) on June 1-2, 2009!!

blogarama - the blog directory Promote Your Blog Visit blogadda.com to discover Indian blogs (2009-05-26 19:14:47.0) Permalink

20090518 Monday May 18, 2009

Crossbow Research papers in SIGCOMM, Party, Community One/Java One etc

Crossbow Research papers in SIGCOMM, Community One/Java One etc

Last week was a very exciting week. Two of our research papers got accepted in SIGCOMM VISA09 and SIGCOMM WREN09. This year, SIGCOMM will to be held in Barcelona, Spain from August 17-21 and has four focus areas. Two of them are on Virtualization and Enterprise Networking which is where we had submitted a paper on the virtualization and flows respectively. We will make these papers available online very soon once we submit the camera ready copy to the ACM editors.

So comes the next question - where is the party? Well the party is during the Java One and Community One on June 1 and 2. Did I tell you that Community One is FREE and there is a big party in the evening. I think Crossbow gets formally announced as part of Community One itself and we will have a talk on Crossbow titled Open Networking with Crossbow on June 1st at 2.40pm and a BOF on Crossbow on June 2nd at 5.30pm. We will also be hosting a Demo Pod during Java One.

Crossbow is a more visible initiative but the last few months were pretty fruitful since not only we delivered Crossbow, but also several parts of Clearview and Volo amongst others.

So please come by, help if you can or just enjoy the sessions and enrich yourself and just celebrate. Let me know if you are able to help out in demo, manning the booths and answering questions.
(2009-05-18 23:35:55.0) Permalink

20090317 Tuesday March 17, 2009

Crossbow: Virtualized switching and performance

Crossbow: Virtualized switching and performance

Saw Cisco's unified fabric announcement. Seems like they are going after Cloud computing which pretty much promises to solve the world hunger problem. Even if Cloud computing can just solve the high data center cost problem and make compute, networking, and storage available on demand in a cheap manner, I am pretty much sold on it. The interesting part is that world needs to move towards enabling people to bring their network on the cloud and have compute, bandwidth and storage available on demand. Talking about networking and network virtualization, this means that we need to go to open standards, open technology and off the shelf hardware. The users of cloud will not accept a vendor or provider lock down. The cloud needs to be built in such a manner that a user can take his physical network and migrate it to an operator's cloud and at the same time have the ability to build their own clouds and migrate stuff between the two. Open Networking is the key ingredient here.

This essentially means that there is no room for custom ASICs and protocols and the world of networking needs to change. This is what Jonathan was talking about to certain extent around Open Networking and Crossbow. OpenSolaris with Crossbow make things very interesting in this space. But it seems like people don't fully understand what Crossbow and OpenSolaris bring to the table. I saw a post from Scott Lowe and several other mentioning that Crossbow is pretty similar to VMware's network virtualization solutions and Cisco Nexus 1000v virtual switches.

Let me take some time to explain few very important things about Crossbow:
  • Its Open Source and part of OpenSolaris. You can download it right here.
  • Its leverages NIC hardware switching and features to deliver isolation and performance for virtual machines. Crossbow not only includes H/W & S/W based VNICs and switches, it also offers Virtualized Routers, Load balancer, and Firewalls. The Virtual Network Machines can be created using Crossbow and Solaris Zones and have pretty amazing performance. All these are connected together using the Crossbow Virtual Wire. You don't need to buy fancy and expensive virtualized switches to create and use Virtual Wire.
  • Using hardware virtualized lanes Crossbow technology scales multiples of 10gig traffic using off the shelf hardware.

Hardware based VNICs and Hardware based Switching

Picture is always worth a thousand words. The figure shows how crossbow VNIC are built on top of real NIC hardware and how we do switching in hardware where possible. And Crossbow does have a full featured S/W layer where it can do S/W VNICs and switching as well. The hardware is leveraged when available. Its important to note that most of the NIC vendors do ship with the necessary NIC classifiers and Rx/Tx rings and its pretty much mandatory for 10 gig NICs which do form the backbone for a cloud.
Crossbow H/W based VNICs

Virtual Wire: The essence of virtualized networking

The Crossbow Virtual Wire technology allows a person to convert a full features physical network (multiple subnets, switches and routers) and configure it within one or more hosts. This is the key to move virtualized networks in and out of the cloud. The figure shows a two subnet physical network with multiple switches, different link speeds and connected via a router and how it can be virtualized in a single box. A full workshop to do virtualized networking is available here.
Virtual Wire

Scaling and Performance

Crossbow leverages the NICs features pretty aggressively to create virtualization lanes that help traffic scale across large number of cores and threads. For people wanting to build real or virtual appliances using OpenSolaris, the performance and scaling across 10 Gig NICs is pretty essential. The figure below shows an overview of hardware lanes.
Crossbow Virtualization Architecture

More Information

There is a white paper and more detailed documents (including how to get started) at the Crossbow OpenSolaris page.



(2009-03-17 17:30:06.0) Permalink Comments [2]

20090302 Monday March 02, 2009

Crossbow enables an Open Networking Platform

Crossbow enables an Open Networking Platform

I came across this blog from Paul Murphy. You should read the second half of Pauls blog. What he says pretty true. Crossbow delivered a brand new networking stack to Solaris which has scalability, virtualization, QoS, and better observability designed in (instead of patched in). The complete list of features delivered and under works are here. Coupled with a full fledged open source Quagga Routing Suite (RIP, OSPF, BGP, etc), IP Filter Firewall, and a kernel Load Balancer, OpenSolaris becomes a pretty useful platform for building Open Networking appliances.

Apart from single box functionality, imagine if you want to deliver Virtual Router or a load balancer, it would be pretty easy to do so. OpenSolaris offers Zones where you can deliver a pre configured zone as a Router, Load balancer, or a firewall. The difference would be that this Zone would be fully portable to another machine running OpenSolaris and will have no performance penalty. After all, we aka Crossbow team guarantee that our VNICs with Zones do not have any performance penalties. You can also build a fully portable and pre configured virtual networking equipment using Xen guest which can be made to migrate between any OpenSolaris or Linux host.

I noticed that couple of folks on Paul blog were asking about why Crossbow NIC virtualization is different? Well, its not just the NIC being virtualized but actually the entire data path along with it called a Virtualization Lane. You can see the virtualization lane all the way from NIC to socket Layer and back here. Not only is there one or more Virtualization Lanes per virtual machine, the bandwidth partitioning, Diffserv tagging, priority, CPU assignment etc. are designed in as part of the architecture. The same concepts are used to scale the stack across multiples of 10gigE NIC over large number of cores and threads (out of the world forwarding performance anyone!).

And as mentioned before, Crossbow enables Virtual Wire. A ability to create a full featured network without any physical wires. Think of running network simulations and testing in a whole new light!!
(2009-03-02 23:10:16.0) Permalink Comments [1]

20090204 Wednesday February 04, 2009

Ben wrote a pretty nice blog on Crossbow Ben's blog on Crossbow - great overview

Ben's blog on Crossbow - great overview


Ben wrote a great blog on Crossbow. Thanks Ben. It gives a good overview of features and if you want to get more details on the internals, you can read more details on architecture or you can build a Virtual Wire - a Network in a Box which is explained with a example here.

(2009-02-04 23:46:55.0) Permalink Comments [1]

20081214 Sunday December 14, 2008

Crossbow - Network Virtualization Architecture Comes to Life Crossbow - Network Virtualization Architecture Comes to Life

Crossbow - Network Virtualization Architecture Comes to Life

December 5th, 2008 was a joyous occasion and a humbling one at the same time. A vision that was created 4 years back was coming to life. I still remember the summer of 2004 when Sinyaw threw a challenge at me - can you Change the world? And it was Fall of same year when I unveiled the first set of Crossbow slides to him and Fred Zlotnik over a bottle of wine. Lot of planning and finally ready to start but there were still hurdles in the way. We were still trying to finish Nemo aka GLDv3 - A high performance device driver framework which was absolutely required for Crossbow (We needed absolute control over the Hardware). Nemo finished mid 2005 but then Nicolas, Yuzo etc. left Sun and went to a startup. Thiru was still trying to finish Yosemite (the FireEngine follow on). So in short, 2005 was basically more planning and prototyping (specially controlling the Rx rings and dynamic polling) on my part. I think it was early 2006 when work begin on Crossbow in earnest. Kais moved over from security group, Nicolas was back at Sun, Thiru, Eric Cheng, Mike Lim (and of course me) came together to form the core team (which later expanded to 20+ people in early 2008). So it was a long standing dream and almost three years of hard work that finally came to life when Crossbow Phase 1 integrated in Nevada Build 105 (and will be available in OpenSolaris 6.09 release).

Crossbow - H/W Virtualized Lanes that Scale (10gigE over multiple cores)

One of key tenets of Crossbow design was the concept of H/W Virtualization Lanes. Essentially tying a NIC Receive and Transmit ring, DMA channel, kernel threads, kernel queues, processing CPUs together. There are no shared locks, counters or anything. Each lane gets to individually schedule the packet processing by switching its Rx ring independently between interrupt mode and poll mode (Dynamic Polling). Now you can see why Nemo was so important because without it, stack couldn't control the H/W and without Nemo, the NIC vendors wouldn't have played along with us in adding the features we wanted (stateless classification, Rx/Tx rings, etc). Once a lane is created, we can program the classifier to spread packets based on IP addresses and port between each lane for scaling reasons. With the multiple cores and multiple thread that seems to be the way of life going forward and 10+ gigE of Bandwidth (soon we will have IPoIB working as well), scaling really matters (and we are not talking about achieving line rates on 10 gigE with jumbo grams - we are talking about real world, mix of small and large packets, 10k of connections and 1000s of threads).

To demonstrate the point, I captured bunch of statistics while finishing the final touches to the data path and getting ready to beat some world records. The table below shows mpstat output along with packets per second serviced for the Intel Oplin (10gigE) NIC on a Niagara2 based system. The NIC has enabled all 8 Rx/Tx rings and has 8 interrupts enabled (one for each rx ring).
 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
 38    0   0    6    21    3   31    1    5   12    0    86    0   0   0  99
 39    0   0 2563  5506 3907 3282   28   34 1170    0   178    0  21   0  78
 40    0   0 2553  5117 3948 2410   38  150 1192    0   504    1  21   0  77
 41    0   0 2651  5221 4232 2011   25   53 1195    0   210    0  20   0  80
 42    0   0 3078  5700 4743 2069   21   28 1285    0   125    0  22   0  78
 43    0   0 3280  5837 4777 2118   19   24 1328    0   101    0  22   0  78
 44    0   0 3143 19566 18801 1773  50   44 1285    0    68    0  65   0  35
 45    0   0 4570  7748 6838 1984   23   27 1697    0   118    0  29  0  71

# netstat -ia 1
    input   e1000g    output       input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls 
4       0     1       0     0      61284   0     128820  0     0     
3       0     2       0     0      61015   0     129316  0     0     
4       0     2       0     0      60878   0     128922  0     0  

This link shows the interrupt binding, mpstat and intrstat output. You can see that the NIC is trying very hard to spread the load but because the stack sees this as one NIC, there is one CPU (number 44) where all the 8 threads collide. Its like a 8 lane highway becoming single lane during rush hours.

Now lets look what happens when Crossbow enables a lane all the way up the stack for each Rx ring and also enables dynamic polling for each individually. If you look at the corresponding mpstat and intrstat output and packets per second rate, you will see that the lanes really do work independently from each other resulting in almost linear spreading and much higher packets per second serviced. The benchmark represents a webserver workload and needless to say, Crossbow with dynamic polling on per Rx ring basis almost tripled the performance. The raw stats can be seen here.
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys wt idl
 37    0   0 2507 11906 10272 4267  265  326  489    0   776    4  28   0  68
 38    0   0 2111 11793 9840 6503  336  314  472    0   615    3  32   0  65
 39    0   0  500 10409 10164  565    7  125  174    0  1413    6  23   0  70
 40    0   1  660 10423 9982  950   23  288  272    0  3834    8  34   0  58
 41    0   1  658 10490 10108  847   16  238  237    0  2549    8  29   0  64
 42    0   0  584 10605 10299  708   12  181  207    0  1828    7  26   0  67
 43    0   0  732 10829 10559  598    9  141  193    0  1485    7  25   0  68
 44    0   1  306   487   25 1091   17  282  330    0  4083    9  17   0  74

# netstat -ia 1
     input   e1000g    output       input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls 
2       0     1       0     0      267619  0     522226  0     0     
2       0     2       0     0      275395  0     539920  0     0     
2       0     2       0     0      251023  0     482335  0     0     
And finally below we print some statistics from the MAC per Rx ring data structure (mac_soft_ring_set_t). For each Rx ring, we track the number of packets received via interrupt path, number received via poll path, chains less than 10, chains between 10 and 50 and chains over 50 (each time we polled the Rx ring). And you can see that polling path brings a larger chunk of packets and in bigger chains.
Crossbow Virtualization Architecture
Keep in mind that for most OSes and most NIC, the interrupt path brings one packet at a time. This makes Crossbow architecture more efficient for scaling as well as performance at higher loads on high B/W NICs.

Crossbow and Network Virtualization

Once we have the ability to create these independent H/W lanes, programming the NIC classifier is easy. Instead of spreading the incoming traffic for scaling, we program the classifier to send packets for a mac address to a individual lane. The MAC addresses are tied to individual Virtual NICs (VNICs) which are in turn attached to guest Virtual Machines or Solaris Containers (Zones). The separation for each virtual machine is driven by the H/W and processed on the CPUs attached to the virtual machine (the poll thread and interrupts for the Rx ring for a VNIC are bound to the assigned CPUs). The picture kind of looks like this
Crossbow Virtualization Architecture
Since for NICs and VNICs, we always do dynamic polling, enforcing bandwidth limit is pretty easy. One can create a VNIC by simply specifying the B/W limit, priority, cpu lists in one shot and the poll thread will enforce the limit by picking up only packets that meet the limit. Something as simple as
freya(67)% dladm create-vnic -l e1000g0 -p maxbw=100,cpus=2 my_guest_vm
The above command will create a VNIC called my_guest_vm with a random MAC address and assign it a B/W of 100Mbps. All the processing for this VNIC is tied to CPU 2. Its features like this that makes Crossbow a integral part of Sun Cloud Computing initiative due to roll out soon.

Anyway, this should give you a flavour. There is a white paper and more detailed documents (including how to get started) at the Crossbow OpenSolaris page.



(2008-12-14 16:07:10.0) Permalink Comments [2]

20080304 Tuesday March 04, 2008

Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad) Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)

Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)


I did a session for developers during the Sun Tech Day in Hyderabad and Raju Alluri had printed out 100 copies of the workshop and we were carrying 100 DVDs with Crossbow iso images (they are available on web here. The people just loved it. We had sooo underestimated the demand that printouts and DVDs disappeared in less than a minute. I had a presentation that included 30 odd slides but I couldn't even go past slide 7 since the workshop was so interesting to people. And between the tech day presentation and user group meeting in the evening, people pointed out a lot of interesting uses and why this can be such a powerful thing.

The idea that you can create any arbitrarily complex physical network as a virtual wire and run your favorite workload, do performance analysis and debug it is very appealing to people. Remember that we are not simulating the network. This is the real thing i.e. real applications running and real packets flowing. If you application runs on any OS, it will run on this virtual network and will send and receive real packets!!

The concept is pretty useful even to people like us because now we don't need to pester our lab staff to create us a network for us to test or experiment on. And best part is, we can use xVM and run Linux and Windows as hosts as well.

We are thinking of writing a book which reinvents how you learn networking in schools and universities. And oh by the way, do people really care about CCNA now that they can do all this on their laptop :) If someone is interested in contributing real examples for this workshop module and the book, you are more than welcome. Just drop us a line.
(2008-03-04 18:05:48.0) Permalink Comments [5]

20080229 Friday February 29, 2008

Network in a Box (Creating a real Networks on your Laptop) Virtual Wire: Network in a Box (Creating a real Networks on your Laptop)

Virtual Wire: Network in a Box (Creating a real Network on your Laptop)

Crossbow: Network Virtualization & Resource Control




Objective

Create a real network comprising of Hosts, Switches and Routers as a Virtual Network on a laptop. The Virtual Network (called Virtual Wire) is created using OpenSolaris project Crossbow Technology and the hosts etc are created using Solaris Zones (a light weight virtualization technology). All the steps necessary to create the virtual topology are explained.

The users can use this hands on demo/workshop and exercises in the end to become an expert in
  • Configuring IPv4 and IPv6 networks
  • Hands on experience with OpenSolaris
  • Configure and manage a real Router
  • IP Routing technologies including RIP, OSPF and BGP
  • Debugging configuration and connectivity issues
  • Network performance and bottleneck Analysis
The users of this module need not have access to a real network, router and switches. All they need is a laptop or desktop running OpenSolaris Project Crossbow snapshot 2/28/2008 or later which can be found at http://www.opensolaris.org/os/project/crossbow/snapshots.

Introduction

Crossbow (Network Virtualization and Resource Control) allows users to create a Virtual Wire with fixed link speeds in a box. Multiple subnet connected via a Virtual Router is pretty easy to configure. This allows the network administrators to do a full network configuration, verify IP address, subnet masks and router ports and addresses. They can test connectivity and link speeds and when fully satisfied, they can instantiate the configuration on the real network.

Another great application is to debug problems by simulating a real network in a box. If network administrators are having issues with connectivity or performance, they can create a virtual network and debug their issues using snoop, kernel stats and dtrace. They don't need to use the expensive H/W based network analyzers.

The network developers and researchers working with protocols (like high speed TCP) can use OpenSolaris to write their implementation and then try it out with other production implementations. They can debug and fine tune their protocol quite a bit before sending even a single packet on the real network.

Note1: Users can use Solaris Zones, Xen or ldom guests to create the virtual hosts while Crossbow provides the virtual network building blocks. There is no simulation but real protocol code at work. Users run real applications on the host and clients which generate real packets.

Note2: The Solaris protocol code executed for a virtual network or Solaris acting a real router or host is common all the way to bottom of MAC layer. In case of virtual networks, the device driver code for a physical NIC is the only code that is not needed.

Try it Yourself

Lets do a simple exercise. As part of this exercise, you will learn
  • How to configure a virtual network having two subnets and connected via a Virtual Router using Crossbow and Zones
  • How to set the various link speeds to simulate multiple speed network
  • Do some performance runs to verify connectivity
What you need:

A laptop or machine running Crossbow snapshot from Feb 28, 2008 or later http://www.opensolaris.org/os/project/crossbow/snapshots/

Virtual Network Example

Lets take a physical network. The example in Fig 1a is representing the real network showing how my desktop connects to the Lab servers. The desktop is on 20.0.0.0/24 network while the server machines (host1 and host2) are on 10.0.0.0/24 network. In addition, host1 has got a 10/100 Mbps NIC limiting its connectivity to 100Mbps.

Fig. 4

Fig. 1a

We will represent the network shown in Fig 1a on my Crossbow enabled laptop as a Virtual Network. We use Zones to act as host1, host2 and the Router while the global zone (gz) acts as the client (as a user exercise, create another client zone and assign VNIC6 to it to act as a client).
Fig. 4

Fig. 1a



Note 3: The Crossbow MAC layer itself does the switching between the VNICs. The Etherstub is craeated as a dummy device to connect the various virtual NICs. User can imagine etherstub as a Virtual Switch to help visualize the virtual network as a replacement for a physical network where each physical switch is replaced by a virtual switch (implemented by a Crossbow etherstub).

Create the Virtual Network

Lets start by creating the 2 etherstubs using the dladm command
gz# dladm create-etherstub etherstub1
gz# dladm create-etherstub etherstub3
gz# dladm show-etherstub
LINK
etherstub1
etherstub3

Create the necessary Virtual NICs. VNIC1 has a limited speed of 100Mbs while others have no limit
gz# dladm create-vnic -l etherstub1 vnic1
gz# dladm create-vnic -l etherstub1 vnic2
gz# dladm create-vnic -l etherstub1 vnic3

gz# dladm create-vnic -l etherstub3 vnic6
gz# dladm create-vnic -l etherstub3 vnic9
gz# dladm show-vnic
LINK        OVER             SPEED  MACADDRESS         MACADDRTYPE       
vnic1       etherstub1      - Mbps  2:8:20:8d:de:b1    random            
vnic2       etherstub1      - Mbps  2:8:20:4a:b0:f1    random            
vnic3       etherstub1      - Mbps  2:8:20:46:14:52    random            
vnic6       etherstub3      - Mbps  2:8:20:bf:13:2f    random            
vnic9       etherstub3      - Mbps  2:8:20:ed:1:45     random            

Create the hosts and assign them the VNICs. Also create the Virtual Router and assign it VNIC3 and VNIC9 over etherstub1 and etherstub3 respectively. Both the Virtual Router and Hosts are created using Zones in this example but you can easily use Xen or logical domains.

Create a base Zone which we can clone. The first part is necessary if you are on a zfs filesystem.
gz# zfs create -o mountpoint=/vnm rpool/vnm
gz# chmod 700 /vnm

gz# zonecfg -z vnmbase
vnmbase: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vnmbase
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

This part takes 15-20 minutes
gz# zoneadm -z vnmbase install

Now lets create the 2 hosts and the Virtual Router as follow
gz# zonecfg -z host1
host1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host1
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic1
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

gz# zoneadm -z host1 clone vnmbase
gz# zoneadm -z host1 boot

gz# zlogin -C host1

Connect to the console and go through the sysid config. For this example, we assign 10.0.0.1/24 as IP address for vnic1. You can specify this during sysidcfg. For default route, specify 10.0.0.3 as the default route. You can say 'none' for naming service, IPv6, kerberos etc for the purpose of this example.

Similarly create host2 and configure it with vnic2 i.e.
gz# zonecfg -z host2
host2: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host2
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic2
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

gz# zoneadm -z host2 clone vnmbase
gz# zoneadm -z host2 boot

gz# zlogin -C host2

Connect to the console and go through the sysid config. For this example, we assign 10.0.0.2/24 as IP address for vnic2. You can specify this during sysidcfg. For default route, specify 10.0.0.3 as the default route. You can say 'none' for naming service, IPv6, kerberos etc for the purpose of this example.

Lets now create the Virtual Router as
gz# zonecfg -z vRouter
vRouter: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vRouter
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic3
zonecfg:vnmbase:net> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic9
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

gz# zoneadm -z vRouter clone vnmbase
gz# zoneadm -z vRouter boot

gz# zlogin -C vRouter

Connect to the console and go through the sysid config. For this example, we assign 10.0.0.3/24 as IP address for vnic3 and 20.0.0.1/24 as the IP address for vnic9. You can specify this during sysidcfg. For default route, specify 'none' as the default route. You can say 'none' for naming service, IPv6, kerberos etc for the purpose of this example. Lets enable forwarding on the Virtual Router to connect the 10.x.x.x and 20.x.x.x networks.
vRouter# svcadm enable network/ipv4-forwarding:default

Note 5: The above is done inside virtual router. Make sure you are in the window where you did the zlogin -C vRouter above

Now lets bringup VNIC6 and configure it including setting up routes in the global zone. You can easily create another host called host3 as the client on 20.x.x.x network by creating a host3 zone and assigning it 20.0.0.1/24 IP address

Lets configure the VNIC6. Open a xterm in the global zone
gz# ifconfig vnic6 plumb 20.0.0.3/24 up
gz# route add 10.0.0.0 20.0.0.1
gz# ping 10.0.0.1
10.0.0.1 is alive
gz# ping 10.0.0.2
10.0.0.2 is alive

Similarly, login into host1 and/or host2 and verify connectivity
host1# ping 20.0.0.3
20.0.0.3 is alive
host1# ping 10.0.0.2
10.0.0.2 is alive

Set up Link Speed

What we configured above are unlimited B/W links. We can configure a link speed on all the links. For this example, lets configure the link speed of 100Mbps on VNIC1
gz# dladm set-linkprop -p maxbw=100 vnic1

We could have configured the link speed (or B/W limit) while we were creating the vnic itself by adding the
-p maxbw=100
option to create-vnic command.

Test the performance

Start 'netserver' (or tool of your choice) in host1 and host2. You wil have to install the tools in the relevant places
host1# /opt/tools/netserver &
host2# /opt/tools/netserver &

gz# /opt/tools/netperf -H 10.0.0.2
TCP STREAM TEST to 10.0.0.2 : histogram

Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 49152  49152  49152    10.00    2089.87  

gz# /opt/tools/netperf -H 10.0.0.1
TCP STREAM TEST to 10.0.0.1 : histogram
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 49152  49152  49152    10.00     98.78   

Note6: Since 10.0.0.2 is assigned to VNIC2 which has no limit, we get the max speed possible. 10.0.0.1 is configured over VNIC1 which is assigned to host1 and we just set the link speed to 100Mbps and thats why we get only 98.78Mbps.

Cleanup

gz# zoneadm -z host1 halt
gz# zoneadm -z host1 uninstall

delete the zone
gz# zonecfg -z host1
zonecfg:host1> delete
Are you sure you want to delete zone host1 (y/[n])? y
zonecfg:host1> exit

In this way, delete host2 and vRouter zones. Make sure you don't delete vnmbase since re creating it takes time.
gz# ifconfig vnic6 unplumb

After you have deleted the zone, you can delete vnics and etherstubs as follows
# dladm delete-vnic vnic1			/* Delete VNIC */
# dladm delete-vnic vnic2
# dladm delete-vnic vnic3
# dladm delete-vnic vnic6
# dladm delete-vnic vnic9

# dladm delete-etherstub etherstub3		/* Delete etherstub */
# dladm delete-etherstub etherstub1

Make sure that VNICs are unplumbed (ifconfig vnic6 unplumb) and not assigned to a zone (delete the zone first) before you can delete them. You need to delete all the vnics on the etherstub before you can delete the etherstub.

User Exercises

Now that you are familiar with the concepts and technology, you are ready to do some experiments of your own. Cleanup the machine as mentioned above. The exercises below will help you master IP routing, configuring networks, and debugging for performance bottlenecks.
  • Recreate the Virtual Networkwork as show in Fig 1b but this time create an additional zone called client and assigned vnic6 to that client zone.
    	client Zone		vRouter		host1		host2
    		|		  |  |		  |		  |
    		---- etherstub3 ---  -------- etherstub 1----------
    
    Run all your connectivity tests from zloging into the client. Now change all IPv4 addresses to be IPv6 addresses and verify that client and hosts still have connectivity
  • Leave the Virtual Network as in 1, but configure OSPF in vRouter instead of RIP by default. Verify that you can still get the connectivity. Note the steps needed to configure OSPF
  • Configure 20.0.0.0 and 10.0.0.0 networks as two separate autonomous networks, assign them unique ASN numbers and configure unique BGP domains. Verify that connectivity still works. Note the steps needed to configure BGP domains.
  • Cleanup everything and recreate the virtual network in 1 above but instead of statically assigning the IP addresses to hosts and clients, configure NAT on the vRouter to give out address on subnet 10.0.0.0/24 on vnic3 and address on 20.0.0.0/24 for vnic9. While creating the hosts and clients, configure them to get their IP address through DHCP.
  • Cleanup everything and recreate the virtual network in 1 above. Add additional router vRouter2 which has a vnic each on the 2 etherstubs.
    			 vRouter1
    			/ 	 \
    	    20.0.0.0/24 	  10.0.0.0/24
    			\	 /
    			 vRouter2
    
    
    This provides a redundant path from client to the hosts. Experiment with running different routing protocols and assign different weight to each path and see what path you take from client to host (use traceroute to detect). Now configure the routing protocol on two vRouters to be OSPF and play with link speeds and see how the path changes. Note the configuration and observations.
  • Cleanup. Lets now introduce another Virtual Router between two subnets i.e.
    client Zone		vRouter1	vRouter2	host1	     host2
    	|		  |  | 		 |    |		  |	       |
    	---- etherstub3 ---  -etherstub 2-    -----etherstub 3----------
    	    20.0.0.0/24	      30.0.0.0/24	   10.0.0.0/24
    
    Now set the link (VNIC) between vRouter1 and etherstub2 to be 75 Mbps. Use snmp from client to retrive the stats from the vRouter1 and check where the packets are getting dropped when you run netperf from client to host2.

    Remove the limit set earlier and instead set the link speed of 75 Mbps on link between etherstub2 and vRouter2. Again use snmp to get the stats out on vRouter1. Do you see similar results as vRouter1? If not, can you explain why?

Conclusion and More resources

Use the real example and configure the virtual network to get familiar with the techniques used. At this point, have a look at your network and try to create a virtual network.

Get more details on the OpenSolaris Crossbow page http://www.opensolaris.org/os/project/crossbow

You can find high level presentations, architectural documents, man pages etc at http://www.opensolaris.org/os/project/crossbow/Docs

Join the crossbow-discuss@opensolaris.org mailing list at http://www.opensolaris.org/os/project/crossbow/discussions

Send in your questions or your configuration samples and we will put it in the use cases examples.

A similar Virtual Network example using global zone as a NAT can be found on Nicolas's blog at http://blogs.sun.com/droux

Kais has a a example of dynamic bandwidth paritioning at http://blogs.sun.com/kais

Venu talks about some of the cool crossbow features at http://blogs.sun.com/iyer which allows virtualizing services with Crossbow technology using flowadm.

(2008-02-29 02:59:01.0) Permalink Comments [2]

20060824 Thursday August 24, 2006

CrossBow: Solaris Network Virtualization & Resource Control CrossBow: Solaris Network Virtualization & Resource Control

CrossBow: Solaris Network Virtualization & Resource Control

1. CrossBow (the name):

It makes some sense to explain the relatonship between the technology (Network Virtualization and Resource Control) and the project name (CrossBow). It is believed that Crossbow was invented in 341B.C. in China but the use became prevalent in middle ages specially when steel was used to make the weapon. More powerful Crossbows could penetrate the armour at 200 yards and gave the typical horse mounted knights real nightmares. But the biggest differentiator was the simplicity in their use. Crossbow could be used effectively after a week of training, while a comparable single-shot skill with a longbow could take years of practice.

Similary, if you take a look at the existing QOS mechanisms on a end host, they are very difficult to use and normally take a very skilled administrator to use effectively. Even then, the existing QOS mechanism come with heavy performance penalties which is also pretty common with any kind of virtualization as well. In Solaris land, we have invented a new way of imposing bandwidth resource control as attribute to a real or a virtual NIC such that it is built in as part of the Solaris network stack and comes without any performance penalties. Since the virtualization aspects and/or resource control aspects are just the attributes of the NIC/VNIC (specified when a NIC or Virtual NIC is created), a normal user and configure them without needing a docterate in QOS or virtualization. "CrossBow" was the most suitable name for this project since we are trying to achieve similar results in the field of virtualization and resource control as the weapon did in medivial times in the battlefield.

2. CrossBow (the background):

Crossbow provides the building blocks for network virtualization and resource control by creating virtual stacks around any service (HTTP, HTTPS, FTP, NFS, etc.), protocol (TCP, UDP, SCTP, etc.), or Virtual machines like Containers, Xen and ldoms.

The project allows the system administrator to carve out any physical NIC into multiple virtual NICs which are pretty similar to real NICs and are administered just like real NICs. Each Virtual NIC can be assigned its own priority and band-width on a shared NIC without causing any performance degradation. The virtual NICs can have their own NIC hardware resources (Rx/Tx rings, DMA channels), MAC addresses, kernel threads and queues which are private to the VNIC and are not shared accross all traffic. In case of Solaris Containers, the Container can be assigned a virtual Stack Instance as well along with one or more virtual NICs. As such traffic for one VNIC can be totally isolated from other traffic and assigned any kind of limits or guarantees on amount of bandwidth it can use.

3. Overview:

Project Crossbow extends Solaris reach in several markets.

3a. OS/Network/Server Consolidation:


The application, network and server consolidation environments where both OS and network virtualization play a big role. This market is typically driven by the cost of owning and managing physical machines and physical networks. The sweet spot for these horizontally scaled environment are the 2-4 socket machines which appear as 4-8 CPU machines in case of x86/x64 systems and 32-64 CPU machines in case of SUN's new Niagara based servers. From total cost of ownership perspective, these blades have only one physical NIC (1Gb or 10Gb) but are trying to run multiple virtual machines (Xen, Containers, ldoms) which have to share the NIC resources and the available bandwidth.

The problem gets worse because for 3 decades we have been designing application to go as fast as possible and any congestion control is the job of the transport layer (if at all). So if one virtual machine is using UDP based traffic, then other virtual machines on the same system using TCP traffic will suffer badly. Even within same transport (TCP for instance), bulk througput applications like ftp/http etc will have a very negetive impact on interactive traffic and latency sensitive applications.

The goal of the project Crossbow is to different virtual machines share the common NIC in a fair manner and allow system administrators to set preferential policies where necessary (e.g. the ISP selling limited B/W on a common pipe) without any performance impact.

3b. Traditional QOS and application consolidation:

Exisiting host based QOS mechanism are very complex to setup and typically come with a sizable performance penalty and increase in latency. The big part of the problem is the interrupt based delivery mechanism for inbound packets and the QOS being implemented by a separate layer (typically between NIC driver and IP). The network and transport layer of the host stack is unware about the QOS layer. The packets are already delivered to the host memory by means of interrupts and the QOS layer needs to classify the packets to various queues before it can apply the policies. In case the packet can not be processed because the bandwidth usage for that class is exceeded, it sits in a queue while still consuming system memory.

Project Crossbow integrates stack virtualization and QOS as part of the stack architecture itself to offer a large subset of QOS type functionality at zero performance penalty and simple administrative interfaces. It also integrates diffserv with the stack where a virtual NIC can set and read the diffserv based labels. Since Crossbow architecture is limited in differentiating the traffic based on layer 2, 3, and 4 headers only i.e. the VLAN tag, local mac address, local IP address, protocol, and ports; the functionality offered is a subset of exisiting QOS mechanism although it covers 90% of the use cases without any performance penalty. This is the prime reason why project Crossbow refers to the bandwidth related policies as 'Bandwidth resource control' instead of QOS.

3c. Horizontally scaled markets:

This is the market segment made up of low priced volume servers (typically 2-4 socket machines) which offer services which require little or no sharing of data between them. The small servers can be standalone machines in a rack or blades in a chasis. Grids are another way to use volume servers to achieve the output of the traditional large SMP machines or main frames.

In case of blades which share a common 10Gb NIC to the chasis, Crossbow again provides the sharing of bandwidth in a fair manner. In addition, the Crossbow provided APIs for network management, virtualization and bandwidth resource control can be use by 3rd party management softwares to propogate the common policy throughout the server farm or all the blades in the chasis. In a Solaris based homogenous environments, its very easy to mark an application or a virtual machine (based on port or IP address) as critical and propogate the same policy through all the machines. The diffserv labels can be added appropriately such that the policy is honoured by all machines and network element in the center.

4. Technical problems in exisiting architectures:

As mentioned earlier, the host based QOS systems work as a layer between the network stack and as such are pretty inefficient in providing the QOS services required of them. But that is not all.

The exisiting interrupt driven packet delivery model pecludes any kind of policy enforcement and fair sharing. When a NIC interrupt is raise, it is at a highest priority and the CPU has to context switch whatever processing to deal with the interrupt. Most of the time, the processing of a critical packet is interrupted to deal with the arrival of a non critical packet.

The anonymous packet processing in the kernel is another major problem in virtualizing the stack and enforcing any kind of bandwidth resource control (including fairness). 80% of the work is already done for an incoming packet when the stack determines that no one is actually interested in the packet and it needs to drop it. In other words, the cost of dropping unwanted packets is too high.

Everything in the host flows through common queues and is processed by common threads which make enforcing policies based on traffic type very difficult. Recv or xmit of each packet impacts processing on any other packet on that particular CPU.

In most of the virtualized environments, the pseudo NIC in the virtual machines has no way of knowing about the hardware capabilities of the real hardware (even simple things like hardware checksum) because of the presense of the bridge in between and ends up making negetive performance impact. In addition, there is no mechanism to share the NIC in a fair manner. The transition of typical packet from the dom0 to domU also causes severe performance problems.

5. CrossBow Architecture:

The Crossbow architecture starts out by integrating network virtualization and resource control as part of the stack architecture. The Solaris 10 network stack has already been designed for the next decade where the connection to CPU affinity is maintained and the upper stack has tight control over the NIC resources.

Crossbow builds on top of that by pushing the classification of packets based on services, protocols or virtual machines as far below as possible. If the NIC hardware itself has ability to divide onboard memory into segements/queues (know as Rx and Tx rings) which can preferably haev their own DMA channels and MSI-X interrupts, the stack programs the NIC classifier to classify packets based on configured policies to different Rx rings. Each Rx/Tx ring is owned by a CPU and a separate kernel queue know as serialization queue which controls the rate of packet arrival into the system based on configured bandwidth.

The Rx/Tx ring, the associated DMA channel, MSI-X interrupt, the serialization queue, the CPU, and processing threads are all unique for the service, protocol or virtual machine in question and can be assigned a unique MAC address and a Virtual NIC which becomes the administration entity that can be administered like a normal NIC. The NIC classifier drives the incoming packets to the correct RX ring from where the Squeue owning the Rx ring (and VNIC) will pull the packets via polling mode based on fair sharing of resources or configured bandwidth. The interrupt mode is used only when the Squeue has no packets to process and the Rx ring is empty. Each individual Rx ring is dynamically switched between interrupt and polling mode. Incoming packets that exceed the configured bandwidth limit remain in the NIC itself in their corresponding Rx ring and are pulled in the system only when they are ready to be processed.

The creation of an administrative entity (VNIC) is optional and typically associated with a virtual machine like Solaris containers, Xen or ldoms. For application or protocol based resource control, a separate data path is created to provide the isolation and resource control but a VNIC is not configured.

As mentioned above the VNIC is just an administrative entity. If the classification has already been done by the NIC to a particular Rx ring, the packets as delivered directly to IP layer by means of function calls when Rx ring is interrupt mode or the squeue residing in IP layer pulls the packet chain directly from the Rx ring when in the polling mode. In essence, the entire data link layer is bypassed resulting in improved performance and lower latencies. If the VNIC is placed in promiscous mode, the data link bypass is abandoned and the Rx ring delivers packets via the VNIC layer which creates a copy of the packet for promiscous stream. Similarly, in polling mode, the squeues poll entry point are changed to point at VNIC which is turns pulls the packets from Rx rings, makes a copy and then gives the chain to the Squeue poll thread.

The entire layered architecture is built on function pointers know as 'upcall_func' and 'downcall_func' with corresponding 'upcall_arg' and 'downcall_arg' for context. Every layer provides a pointer of its recv function as 'upcall_func' and a context as 'upcall_arg' to the layer below. Similarly, every layer provides pointer to its transmit function as 'downcall_func' and a context cookie as 'downcall_arg' to layer above. This is how the packet path is constructed. Any layer can short circuit itself out by providing the 'upcall_func' and 'upcall_arg' of the layer above to layer below (and same for transmit side if needed). All context cookies for a layer work on reference based system when each layer pointed to it gets a reference and ensure that data structures don't get freed till all references are dropped.

In case, the NIC hardware does not have classification capability (unlikely since most of intel, broadcom and SUN 1Gb NICs and pretty much all 10Gb NICs shipping for past several years have this capability) or have run out of the classification capability, the architecture provides a classification capability in the mac layer and employs soft rings which are similar to functionality as NIC hardware classifier and RX rings. The NIC hardware layer coupled with lower MAC layer and soft rings are termed as 'Pseudo Hardware layer'. A request by administartor to create a new VNIC or flow will always return successful from the pseudo hardware layer. The pseudo hardware layer manages the hardware and software classification capability and Rx rings and soft rings transparently from upper layers.

6. Crossbow layers, data structures and packet flow:

Its easier to illustrate this with 2 flows. The first one is for IP_addr = a.b.c.d && TCP and it goes through normal path via Upper dls etc. This is under the assumption that either snoop (or someone else in DLS) is interested in this flow and we can't bypass data link processing. The squeue poll function in this case is dls_poll_ring and argument is dls_impl_t.

The 2nd flow is for IP_addr = m.n.o.p && port = 80 && TCP which is unique and no one is interested in snooping it. In this case, the dls layer allows itself to be pypassed by setting the upcall_func and upcall_arg for soft_ring/Rx_rings to directly call into IP. The squeue is directly polling the H/W Rx ring in this case.


Data Flow


7. The administrative model:

Crossbow introduces a new command called 'netrcm' and further augments 'dladm' which was introduced as part of the new high performance device driver framework (GLDv3) in Solaris 10.

'dladm (1M)' - This is primarily used to create, modify and destroy VNIC based on mac or IP addresses. The created VNIC is visible and managed by ifconfig just like any otehr NIC and can get its IP address assigned via DHCP if necessary.

The examples below can illustrate this better:
     Example 1: Configuring VNICs

     To create two VNICs interfaces with vinc-ids 1 and 2
     over a single physical device bge0, enter the following com-
     mands:

     # dladm create-vnic -d bge0 1
     # dladm create-vnic -d bge0 2
     The new links will be called vnic1 and vnic2.

     Example 2: Configuring VNICs and allocating bandwidth & priority


     To create two VNIC interfaces with vinc-ids 1 and 2
     over a single physical device bge0 and make vnic1 a higher
     priority VNIC using factory assigned MAC address with guarantee 
     to use upto 90% of the bandwidth and vnic2 having a lower priority 
     with a random MAC address and a hard limit of 100Mbps:

     # dladm create-vnic -d bge0 -m factory -b 90% -G -p high 1
     # dladm create-vnic -d bge0 -m random -b 100M -L -p low 2 

     Example 3: Configure a VNIC by choosing a factory MAC address

     To create a VNIC interface with vinc-id 1 by first
     listing the factory available MAC address and then using one
     of them:

     # dladm show-dev -d bge0 -m
     bge0     
            link: up        speed: 1000   Mbps       duplex: full
     MAC addresses:
slot-ident      Address                 In Use
1               0:e0:81:27:d4:47        Yes
2               8:0:20:fe:4e:a5         No

     # dladm create-vnic -d bge0 -m factory -n 2 1

     # dladm show-dev -d bge0
     bge0     
            link: up        speed: 1000   Mbps       duplex: full
     MAC addresses:
slot-ident      Address                 In Use
1               0:e0:81:27:d4:47        Yes
2               8:0:20:fe:4e:a5         Yes

     Example 4: Configuring VNICs sharing a MAC address

     To create two VNICs with vnic-id 1 and 2 by first listing the
     available factory assigned MAC addresses and then picking one
     that will be shared by the newly created VNICs

     # dladm show-dev -d bge0 -m
     bge0     
            link: up        speed: 1000   Mbps       duplex: full
     MAC addresses:
slot-ident      Address                 In Use
1               0:e0:81:27:d4:47        Yes
2               8:0:20:fe:4e:a5         No

     # dladm create-vnic -d bge0 -m shared -n 2 1
     # dladm create-vnic -d bge0 -m shared -n 2 2

     Example 5: Creating a VNIC with user specified MAC address

     To create a VNIC with vnic-id 1 by providing a user specified
     mac address

     # dladm create-vnic -d bge0 -m 8:0:20:fe:4e:b8


'netrcm (1M)' - This command is primarily used to provide isolation and private resources to an application traffic or protocol. In addition, we can also configure bandwidth limits and guarantees for the flows. Again some example can illustrate the usage better:
     Example 1: Create a policy around mission critical port 443 traffic
     which is https service.

     To create a policy around inbound https traffic on a https server
     so that https gets it dedicated NIC hardware and kernel TCP/IP
     resources. The policy-id specified is https-1 which is used to
     later modify of delete the policy.

     # netrcm add-policy -d bge0 -H transport = TCP local port = 443 https-1

     Example 2: Modify an existing policy to add bandwidth resource control

     To modify https-1 policy to add bandwidth control and give it a 
     high priority
     
     # netrcm modify-policy -d bge0 -b 90% -G -p high https-1

     Example 3: Limit the bandwidth usage of UDP protocol

     To create a policy for UDP protocol so that it can not consume more
     than 10% of available bandwidth. The policy-id is called limit-udp-1.

     # netrcm add-policy -d bge0 -b 90% -L -p low limit-udp-1


8. Crossbow Observability - Stats, history and APIs:

Apart from the functionality related to network virtualization and bandwidth resource control, Crossbow offers a whole range of news tools and mechanism to understand the bandwidth usage. Administrators can see real time bandwidth usage for various VNICs or configured flows (via 'netrcm') without causing any performance penalties.

The Rx rings and squeues dealing with a particular flow keep track of normal stats which are pulled by a userland daemon from time to time. The daemon also logs the information in special log files which allows users to see history at any given time. A user can request usage for a time period in past to understand the system behaviour.

Crossbow will provide more tools to help capacity planning by allowing the system to be put under capacity planning mode where bandwdith usage for top traffic is monitored and displayed.

All the observability and administrative interfaces can be accessed by APIs which allow other applications to use and manage the system.

9. Resources:

Crossbow project page on OpenSolaris is a good source of information http://www.opensolaris.org/os/project/crossbow

The Crossbow mailing list is where all the day to day business for the project is conducted. Anyone can join the mailing list crossbow-discuss@opensolaris.org.

Crossbow slide presentation can be found here Crossbow Team members are:
    * Kais Belgaied        
    * Stephanie Brucker    
    * Eric Cheng           
    * Nicolas Droux        
    * Markus Flierl        
    * Carol Gayo           
    * Mohan Iyer           
    * Darrin Johnson       
    * Michael Lim          
    * Rajagopal Kunhappan  
    * Erik Nordmark        
    * Ethan Solomita       
    * Thirumalai Srinivasan
    * Sunay Tripathi       
    * Nicky Veitch         
    * Bill Watson          
    * Roamer Lu             

Email: first.last@sun.com
(2006-08-24 02:26:02.0) Permalink Comments [1]

20060402 Sunday April 02, 2006

Project Crossbow: Network Virtualization and Resource Control going live Project Crossbow - going live on OpenSolaris

Project Crossbow going live on OpenSolaris

Hello and Welcome to project Crossbow!! We are going to add Network Virtualization and Resource Control to Solaris without degrading performance.
At this time, we are seeking members from open solaris community to become part of Crossbow i-team. Its the charter of i-team to gather requirements and deliver the project including design, docs and testing. We would love to have members of the community get involved from day one. The participation opportunities include (but are not limited to):
  • helping define the project
  • gathering requirements
  • designing the project
  • writing code
  • creating demos
  • doing talks and evangalizing the project
Please send an email to me if you are interested. we can promise you that this will be a thrilling adventure and you will be living on the bleeding edge of technology! Project Crossbow is brought to you by same people who created project FireEngine (new stack architecture), project Nemo (GLDv3 - new high performance device driver framework), project Yosemite (UDP performance), etc to name a few.
Apart from active participation, you can also participate via the mailing lists and discussion groups where we will be posting various documents for review and comments apart from day to day discussion.

The project Crossbow page is visible here
You can sign up for the discussion group here (2006-04-02 20:41:15.0) Permalink Comments [1]

20051207 Wednesday December 07, 2005

Nemo based e1000g on T2000 Derek Morr points out that T2000 uses e1000g controllers, which are still dlpi based, so they wouldn't (yet) get the advantages of Nemo (GLDv3). Very good observation. The T1000 already uses a broadcom chip which comes up as bge which is fully Nemo based. The T2000 indeed uses a DLPI based driver in Solaris 10 update currently. Without going into the why (its not very pretty), the Nevada and OpenSolaris version of e1000g is already Nemo based (BTW, the DLPI driver comes up as ipge on T2000 which tell you that its not Nemo based). The Nemo based patches for e1000g (for S10) should be available soon if not avialable already. Pretty soon the machine will ship with the patches already installed and future updates will obviously have the Nemo version.


(2005-12-07 21:09:58.0) Permalink Comments [2]

20051206 Tuesday December 06, 2005

Niagara - Designed for Network Throughput

Niagara - Designed for Network throughput


We finally announce Niagara based servers to the public! Billed as the low cost, energy efficient, huge network throughput processors - marketing mumbo jumbo you think?? Well, try it and you will see. I was priviledged enough that one of the earliest prototype landed on my desk (or in my lab to be precise) so Solaris networking could be tailored to take advantage of the chip. And boy, together with Solaris, this thing rocks!!

So you know that Niagara is multi core, multi threaded chip and Solaris takes advantage in multiple way. Let me highlight some of them.

Network performance

The load from the NIC is fanned out to multiple soft rings in the GLDv3 layer based on the src IP address and port information. Each soft ring in turn is tied to a Niagara thread and a Vertical Perimeter  such that packets from a connection have locality to specific H/W thread on a core and the NIC has locality to specific core. Think of this model as 4 H/W threads per core processing the NIC such that if one thread stalls for resource, the CPU cycles are not wasted. The result is amazing network performance for this beast. Performs 5-6 times the performance of your typical x86 based CPU.

Virtualization

Imagine you are a ISP or someone wanting to consolidate multiple machines on one physical machine. Well, Niagara based platforms lends themselves beautifully to this concept because there are so many H/W threads around which appear as individual CPUs to Solaris. We have a project underway called  Crossbow (details available on Network Community page on OpenSolaris) which will allow you to carve the machine (create virtual network stacks) into multiple virtual machines and tied specific CPUs to them and control the B/W utilization for each virtual machine on a shared NIC.

Real Time Networking/Offload

With GLDv3 based drivers and FireEngine architecture in Solaris 10, the stack controls the rate of interrupts and can dynamically switch the NIC between interrupt and polling mode. Couple with Niagara platform, Solaris can run the entire networking stack on one core and provide real time capabilities to the application. Meanwhile, the application them selves run on different core without worrying about networking interrupts pinning them down. You can get pretty bounded latencies provided application can do some admission control. We are also planning to hide the core running networking from the application effectively getting TOE for free without suffering from the drawbacks of offloading networking to a spearate piece of hardware.


[ T: ]

(2005-12-06 17:31:01.0) Permalink Comments [1]



archives
About Sunay
Highly Educational Entries
Project and Communities Links
Crossbow BlogRoll
links
referers