
Wednesday November 04, 2009
Crossbow paper wins best paper award at Usenix LISA09 and BOF schedule Crossbow paper wins best paper award at Usenix LISA09 and BOF schedule
We had submiited another paper at Usenix LISA 2009 conference at
Baltimore, MD which is being held from Nov 3-5, 2009. The paper
is title Crossbow Virtual Wire: Network in a Box. Yesterday we
were informed that our paper won the Best Paper award for the
conference. Woohoo!!
I met many people here at LISA that are already using Crossbow in
very interesting ways. I got many requests to hold a BOF while we
were here. So I hit our marketing VP for some beer budget (can't
have a BOF without drinks) and we are now having a Crossbow and
Solaris Networking BOF on Nov 4th, 2009 from 10.30 to 11.30pm in
Dover AB conference room. The venue details can be found on
usenix LISA site
here. So people who are already at the conference on in the
general area of Maryland, Virginia, DC, etc, please do come buy.
It would be good to attach faces to name and we will have chilled
beer. We will also be showing the Virtual Wire Builder kit to
build your own virtual network (all available in open source form).
Once again, BOF details are
- Crossbow & Solaris Networking BOF at Usenix Lisa 2009
- Place: Marriott Waterfront Hotel, Baltimore. MD.
- Date: Nov 4th, 2009
- Time: 10.30-11.30pm
- Agenda: Virtual Wire Builder kit, Open discussion, Beer
Hope to see you there.
(2009-11-04 12:10:39.0)
Permalink

Sunday July 12, 2009
Crossbow Launch, Talk and BOF at Community One and Java One Crossbow Launch, Talk and BOF at Community One and Java One
On June 1, 2009, during Community One and Java One in San Francisco, California, Crossbow was formally launched as part of OpenSolaris 2009.06. The morning started with a keynote where John Fowler, EVP of Sun Systems group formally announced OpenSolaris 2009.06 as the beta for next enterprise release of Solaris.Next (Next release after Solaris 10). He and Greg Lavender then went on to show the Crossbow feature and the Virtual Wire demo. Later in the day I did a talk on Crossbow where Nicolas and Kais accompanied me and showed the Crossbow Virtual wire demo in detail. Bill Franklin and some of his cohorts were dressed as Crossbow knights and they charged in the room right after the talk. I think people just got a shock of their life. It was very entertaining.
The launch got lot of visibility and very good press coverage which can be see on the Crossbow News page. The most notable ones were:
On June 2, 2009 we held the Crossbow BOF in the evening. Great showing and great support from the Community.
So great stuff and a good closure for Phase 1 of the Crossbow project. The team members were pretty happy and relived. Now trying to get the next intermediate phase going so we can complete the story for next enterprise release of Solaris which might or might not be called Solaris11. Key things are more analytics (dlstat/flowstat), some security/anti spoofing features and more usablity etc. More details are being discussed on the Crossbow Discussion page.
(2009-07-12 14:59:12.0)
Permalink

Friday July 10, 2009
2xtr7ip5ya 2xtr7ip5ya
(2009-07-10 10:54:30.0)
Permalink

Tuesday May 26, 2009
Crossbow Sigcomm09 papers are now online Crossbow Sigcomm09 papers are now online
Here are the details of the two Crossbow ACM Sigcomm09 papers
- Crossbow: From Hardware Virtualized NICs to Virtualized Networks
Abstract: This paper describes a new architecture for achieving network virtualization using virtual NICs (VNICs) as the building blocks.
The VNICs can be associated with dedicated and independent hardware lanes that consist of dedicated NIC and kernel resources.
Hardware lanes support dynamic polling, which enables the fair sharing of bandwidth with no performance penalty. VNICs ensure full separation of traffic for virtual machines within the host. A collection of VNICs on one or more physical machines can be connected
to create a Virtual Wire by assigning them a common attribute such as a VLAN tag. The full paper is available here
- Crossbow: A vertically integrated QoS stack
Abstract: This paper describes a new architecture which addresses Quality of Service (QoS) by creating unique flows for applications, services, or subnets.
A flow is a dedicated and independent path from the NIC hardware to the socket layer in which the QoS layer is integrated into the protocol
stack instead of being implemented as a separate layer. Each flow has dedicated hardware and software resources allowing applications to meet their specified quality of service within the host.
The architecture efficiently copes with Distributed Denial of Service (DDoS) attacks by creating zero or limited bandwidth flows for
the attacking traffic. The unwanted packets can be dropped by the NIC hardware itself at no cost.
A collection of flows on more than one host can be assigned the same Differentiated Services Code Point (DSCP) label which forms a path dedicated to a service across the enterprise network and enables end-to-end QoS within the data center. The full paper is available here
Enjoy reading and join us for the talk BOF and party at Community One (see the previous entry) on June 1-2, 2009!!
(2009-05-26 19:14:47.0)
Permalink

Monday May 18, 2009
Crossbow Research papers in SIGCOMM, Party, Community One/Java One etc Crossbow Research papers in SIGCOMM, Community One/Java One etc
Last week was a very exciting week. Two of our research papers got
accepted in
SIGCOMM VISA09 and
SIGCOMM WREN09. This year, SIGCOMM will to be held in Barcelona, Spain from August 17-21
and has four focus areas. Two of them are on Virtualization and Enterprise Networking which
is where we had submitted a paper on the virtualization and flows respectively. We
will make these papers available online very soon once we submit the camera ready
copy to the ACM editors.
So comes the next question - where is the party? Well the party is during the
Java One
and Community One on June 1 and 2. Did I tell you that Community One is FREE and there
is a big party in the evening. I think Crossbow gets formally announced as part of
Community One itself and we will have a talk on Crossbow titled
Open Networking with Crossbow on June 1st at 2.40pm and a
BOF on Crossbow on June 2nd at 5.30pm. We will also be hosting a Demo Pod during Java One.
Crossbow is a more visible initiative but the last few months were
pretty fruitful since not only we delivered Crossbow,
but also several parts of Clearview and
Volo amongst others.
So please come by, help if you can or just enjoy the sessions and enrich yourself and just
celebrate. Let me know if you are able to help out in demo, manning the booths and answering
questions.
(2009-05-18 23:35:55.0)
Permalink

Tuesday March 17, 2009
Crossbow: Virtualized switching and performance Crossbow: Virtualized switching and performance
Saw
Cisco's unified fabric announcement. Seems like they are going
after Cloud computing which pretty much promises to solve the world
hunger problem. Even if Cloud computing can just solve the high data
center cost problem and make compute, networking, and storage
available on demand in a cheap manner, I am pretty much sold on it.
The interesting part is that world needs to move towards enabling
people to bring their network on the cloud and have compute, bandwidth
and storage available on demand. Talking about networking and network
virtualization, this means that we need to go to open standards,
open technology and off the shelf hardware. The users of cloud
will not accept a vendor or provider lock down. The cloud needs to be
built in such a manner that a user can take his physical network and
migrate it to an operator's cloud and at the same time have the
ability to build their own clouds and migrate stuff between the
two. Open Networking is the key ingredient here.
This essentially means that there is no room for custom ASICs and
protocols and the world of networking needs to change. This is what
Jonathan was talking about to certain extent around Open Networking
and Crossbow. OpenSolaris with Crossbow make things very
interesting in this space. But it seems like people don't fully
understand what Crossbow and OpenSolaris bring to the table. I saw a post from
Scott Lowe and several other mentioning that Crossbow is pretty
similar to VMware's
network virtualization solutions and
Cisco Nexus 1000v virtual switches.
Let me take some time to
explain few very important things about Crossbow:
- Its Open Source and part of OpenSolaris. You can download it
right here.
- Its leverages NIC hardware switching and features to deliver
isolation and performance for virtual machines. Crossbow not only
includes H/W & S/W based VNICs and switches, it also offers
Virtualized Routers, Load balancer, and Firewalls. The Virtual Network
Machines can be created using Crossbow and Solaris Zones and have
pretty amazing performance. All these are connected together using the
Crossbow Virtual
Wire. You don't need to buy fancy and expensive virtualized switches to create
and use Virtual Wire.
- Using hardware virtualized lanes Crossbow technology scales multiples of 10gig
traffic using off the shelf hardware.
Hardware based VNICs and Hardware based Switching
Picture is always worth a thousand words. The figure shows how
crossbow VNIC are built on top of real NIC hardware and how we do
switching in hardware where possible. And Crossbow does have a full
featured S/W layer where it can do S/W VNICs and switching as
well. The hardware is leveraged when available. Its important to note
that most of the NIC vendors do ship with the necessary NIC
classifiers and Rx/Tx rings and its pretty much mandatory for 10 gig
NICs which do form the backbone for a cloud.

Virtual Wire: The essence of virtualized networking
The Crossbow Virtual
Wire technology allows a person to convert a full
features physical network (multiple subnets, switches and routers) and
configure it within one or more hosts. This is the key to move
virtualized networks in and out of the cloud. The figure shows a
two subnet physical network with multiple switches, different link
speeds and connected via a router and how it can be virtualized in a
single box. A full workshop to do virtualized networking is available
here.

Scaling and Performance
Crossbow leverages the NICs features pretty aggressively to create
virtualization lanes that help traffic scale across large number of
cores and threads. For people wanting to build real or virtual
appliances using OpenSolaris, the performance and scaling across 10
Gig NICs is pretty essential. The figure below shows an overview of
hardware lanes.

More Information
There is a white paper and more detailed
documents (including how to get started) at the
Crossbow
OpenSolaris page.
network
virtualization
crossbow
cloud computing
(2009-03-17 17:30:06.0)
Permalink

Monday March 02, 2009
Crossbow enables an Open Networking Platform Crossbow enables an Open Networking Platform
I came across this blog from Paul Murphy. You
should read the second half of Pauls blog. What he says pretty true. Crossbow delivered a brand new
networking stack to Solaris which has scalability, virtualization, QoS, and better observability
designed in (instead of patched in). The complete list of features delivered and under works are
here. Coupled with a full
fledged open source Quagga Routing Suite (RIP, OSPF, BGP, etc),
IP Filter Firewall, and a kernel Load Balancer, OpenSolaris becomes a
pretty useful platform for building Open Networking appliances.
Apart from single box functionality, imagine if you want to deliver Virtual Router or a load balancer,
it would be pretty easy to do so. OpenSolaris offers Zones
where you can deliver a pre configured zone as a Router, Load balancer, or a firewall. The difference would be
that this Zone would be fully portable to another machine running OpenSolaris and will have no performance
penalty. After all, we aka Crossbow team guarantee that
our VNICs with Zones do not have any performance penalties.
You can also build a fully portable and pre configured virtual networking equipment using Xen guest which can be made to migrate between any OpenSolaris
or Linux host.
I noticed that couple of folks on Paul blog were asking about why Crossbow NIC virtualization is
different? Well, its not just the NIC being virtualized but actually
the entire data path along with it called a Virtualization Lane. You can see the virtualization lane all the way from NIC to socket Layer and back
here.
Not only is there one or more Virtualization Lanes per virtual machine,
the bandwidth partitioning, Diffserv tagging, priority, CPU assignment etc. are designed in as part of the
architecture. The same concepts are used to scale the stack across multiples of 10gigE NIC over large
number of cores and threads (out of the world forwarding performance anyone!).
And as mentioned before, Crossbow enables
Virtual Wire. A ability to create a full featured network without any physical wires. Think of
running network simulations and testing in a whole new light!!
(2009-03-02 23:10:16.0)
Permalink

Wednesday February 04, 2009
Ben wrote a pretty nice blog on Crossbow
Ben's blog on Crossbow - great overview
Ben's blog on Crossbow - great overview
Ben wrote a great
blog on Crossbow. Thanks Ben. It gives a good overview of features and
if you want to get more details on the internals, you can read
more details on architecture or you can build a Virtual Wire - a Network in a Box which is
explained with a example
here.
(2009-02-04 23:46:55.0)
Permalink

Sunday December 14, 2008
Crossbow - Network Virtualization Architecture Comes to Life
Crossbow - Network Virtualization Architecture Comes to Life
Crossbow - Network Virtualization Architecture Comes to Life
December 5th, 2008 was a joyous occasion and a humbling one at the
same time. A vision that was created 4 years back was coming to life.
I still remember the summer of 2004 when Sinyaw
threw a challenge at me - can you Change the world? And it was
Fall of same year when I unveiled the first set of Crossbow slides to
him and Fred Zlotnik over a bottle of wine. Lot of planning and
finally ready to start but there were still hurdles in the way. We were still
trying to finish
Nemo aka GLDv3 - A high performance device driver framework which
was absolutely required for Crossbow (We needed absolute control over
the Hardware). Nemo finished mid 2005 but then Nicolas, Yuzo etc. left
Sun and went to a startup. Thiru was still trying to finish Yosemite
(the FireEngine follow on). So in short, 2005 was basically more
planning and prototyping (specially controlling the Rx rings and
dynamic polling) on my part. I think it was early 2006 when work
begin on Crossbow in earnest. Kais moved over from security group,
Nicolas was back at Sun, Thiru, Eric Cheng, Mike Lim (and of course me)
came together to form the core team (which later expanded to 20+ people
in early 2008). So it was a long standing dream
and almost three years of hard work that finally came to life when Crossbow Phase
1 integrated in Nevada Build 105 (and will be available in
OpenSolaris 6.09 release).
Crossbow - H/W Virtualized Lanes that Scale (10gigE over multiple cores)
One of key tenets of Crossbow design was the concept of H/W Virtualization
Lanes. Essentially tying a NIC Receive and Transmit ring, DMA channel,
kernel threads, kernel queues, processing CPUs together. There are
no shared locks, counters or anything. Each lane gets to individually
schedule the packet processing by switching its Rx ring independently
between interrupt mode and poll mode (Dynamic Polling). Now
you can see why Nemo was so
important because without it, stack couldn't control the H/W and
without Nemo, the NIC vendors wouldn't have played along with us in
adding the features we wanted (stateless classification, Rx/Tx rings,
etc). Once a lane is created, we can program the classifier to spread
packets based on IP addresses and port between each lane for scaling
reasons. With the multiple cores and multiple thread that seems to be
the way of life going forward and 10+ gigE of Bandwidth (soon we will
have IPoIB working as well), scaling really matters (and we are not
talking about achieving line rates on 10 gigE with jumbo grams - we
are talking about real world, mix of small and large packets, 10k of
connections and 1000s of threads).
To demonstrate the point, I captured bunch of statistics while
finishing the final touches to the data path and getting ready to beat
some world records. The table below shows mpstat output along with
packets per second serviced for the Intel Oplin (10gigE) NIC on a
Niagara2 based system. The NIC has enabled all 8 Rx/Tx rings and has 8
interrupts enabled (one for each rx ring).
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
38 0 0 6 21 3 31 1 5 12 0 86 0 0 0 99
39 0 0 2563 5506 3907 3282 28 34 1170 0 178 0 21 0 78
40 0 0 2553 5117 3948 2410 38 150 1192 0 504 1 21 0 77
41 0 0 2651 5221 4232 2011 25 53 1195 0 210 0 20 0 80
42 0 0 3078 5700 4743 2069 21 28 1285 0 125 0 22 0 78
43 0 0 3280 5837 4777 2118 19 24 1328 0 101 0 22 0 78
44 0 0 3143 19566 18801 1773 50 44 1285 0 68 0 65 0 35
45 0 0 4570 7748 6838 1984 23 27 1697 0 118 0 29 0 71
# netstat -ia 1
input e1000g output input (Total) output
packets errs packets errs colls packets errs packets errs colls
4 0 1 0 0 61284 0 128820 0 0
3 0 2 0 0 61015 0 129316 0 0
4 0 2 0 0 60878 0 128922 0 0
This
link shows the interrupt binding, mpstat and intrstat output. You
can see that the NIC is trying very hard to spread the load but
because the stack sees this as one NIC, there is one CPU (number 44)
where all the 8 threads collide. Its like a 8 lane highway becoming
single lane during rush hours.
Now lets look what happens when Crossbow enables a lane all the way up
the stack for each Rx ring and also enables dynamic polling for each
individually. If you look at the corresponding mpstat and intrstat
output and packets per second rate, you will see that the lanes
really do work independently from each other resulting in almost
linear spreading and much higher packets per second serviced. The
benchmark represents a webserver workload and needless to say,
Crossbow with dynamic polling on per Rx ring basis almost tripled the
performance. The raw stats can be seen here.
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
37 0 0 2507 11906 10272 4267 265 326 489 0 776 4 28 0 68
38 0 0 2111 11793 9840 6503 336 314 472 0 615 3 32 0 65
39 0 0 500 10409 10164 565 7 125 174 0 1413 6 23 0 70
40 0 1 660 10423 9982 950 23 288 272 0 3834 8 34 0 58
41 0 1 658 10490 10108 847 16 238 237 0 2549 8 29 0 64
42 0 0 584 10605 10299 708 12 181 207 0 1828 7 26 0 67
43 0 0 732 10829 10559 598 9 141 193 0 1485 7 25 0 68
44 0 1 306 487 25 1091 17 282 330 0 4083 9 17 0 74
# netstat -ia 1
input e1000g output input (Total) output
packets errs packets errs colls packets errs packets errs colls
2 0 1 0 0 267619 0 522226 0 0
2 0 2 0 0 275395 0 539920 0 0
2 0 2 0 0 251023 0 482335 0 0
And finally below we print some statistics from the MAC per Rx ring data
structure (mac_soft_ring_set_t). For each Rx ring, we track the number
of packets received via interrupt path, number received via poll path,
chains less than 10, chains between 10 and 50 and chains over 50 (each
time we polled the Rx ring). And you can see that polling path brings
a larger chunk of packets and in bigger chains.

Keep in mind that for most OSes and most NIC, the interrupt path
brings one packet at a time. This makes Crossbow architecture more
efficient for scaling as well as performance at higher loads on high
B/W NICs.
Crossbow and Network Virtualization
Once we have the ability to create these independent H/W lanes,
programming the NIC classifier is easy. Instead of spreading the
incoming traffic for scaling, we program the classifier to send
packets for a mac address to a individual lane. The MAC addresses are
tied to individual Virtual NICs (VNICs) which are in turn attached to
guest Virtual Machines or Solaris Containers (Zones). The separation
for each virtual machine is driven by the H/W and processed on the
CPUs attached to the virtual machine (the poll thread and interrupts
for the Rx ring for a VNIC are bound to the assigned CPUs). The
picture kind of looks like this

Since for NICs and VNICs, we always do dynamic polling, enforcing bandwidth
limit is pretty easy. One can create a VNIC by simply specifying the
B/W limit, priority, cpu lists in one shot and the poll thread will
enforce the limit by picking up only packets that meet the limit. Something
as simple as
freya(67)% dladm create-vnic -l e1000g0 -p maxbw=100,cpus=2 my_guest_vm
The above command will create a VNIC called my_guest_vm with a random MAC
address and assign it a B/W of 100Mbps. All the processing for this VNIC
is tied to CPU 2. Its features like this that makes Crossbow a integral part
of Sun Cloud Computing initiative due to roll out soon.
Anyway, this should give you a flavour. There is a white paper and more detailed
documents (including how to get started) at the
Crossbow
OpenSolaris page.
network
virtualization
crossbow
cloud computing
(2008-12-14 16:07:10.0)
Permalink

Tuesday March 04, 2008
Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)
Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)
Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)
I did a session for developers during the Sun Tech Day in Hyderabad and Raju Alluri
had printed out 100 copies of the workshop and we were carrying 100 DVDs with Crossbow iso images (they are available
on web here. The people just loved it. We had sooo
underestimated the demand that printouts and DVDs disappeared in less than a minute. I had a presentation that included
30 odd slides but I couldn't even go past slide 7 since the workshop was so interesting to people. And between the
tech day presentation and user group meeting in the evening, people pointed out a lot of interesting uses and why
this can be such a powerful thing.
The idea that you can create any arbitrarily complex physical network as a virtual wire and run your favorite workload,
do performance analysis and debug it is very appealing to people. Remember that we are not simulating the network. This
is the real thing i.e. real applications running and real packets flowing. If you application runs on any OS, it will
run on this virtual network and will send and receive real packets!!
The concept is pretty useful even to people like us because now we don't need to pester our lab staff to create us
a network for us to test or experiment on. And best part is, we can use xVM and run Linux and Windows as hosts as well.
We are thinking of writing a book which reinvents how you learn networking in schools and universities. And oh by the way,
do people really care about CCNA now that they can do all this on their laptop :) If someone is interested in contributing
real examples for this workshop module and the book, you are more than welcome. Just drop us a line.
networking
virtualization
crossbow
(2008-03-04 18:05:48.0)
Permalink

Friday February 29, 2008
Network in a Box (Creating a real Networks on your Laptop)
Virtual Wire: Network in a Box (Creating a real Networks on your Laptop)
Virtual Wire: Network in a Box (Creating a real Network on your Laptop)
Crossbow: Network Virtualization & Resource
Control
Objective
Create a real network comprising of Hosts, Switches and Routers as a Virtual
Network on a laptop. The Virtual Network (called Virtual Wire) is created using OpenSolaris project
Crossbow Technology and the hosts etc are created using Solaris Zones (a light
weight virtualization technology). All the steps necessary to create the
virtual topology are explained.
The users can use this hands on demo/workshop and exercises in the end to
become an expert in
- Configuring IPv4 and IPv6 networks
- Hands on experience with OpenSolaris
- Configure and manage a real Router
- IP Routing technologies including RIP, OSPF and BGP
- Debugging configuration and connectivity issues
- Network performance and bottleneck Analysis
The users of this module need not have access to a real network, router and
switches. All they need is a laptop or desktop running OpenSolaris Project
Crossbow snapshot 2/28/2008 or later which can be found at
http://www.opensolaris.org/os/project/crossbow/snapshots.
Introduction
Crossbow (Network Virtualization and Resource Control) allows users to create
a Virtual Wire with fixed link speeds in a box. Multiple subnet connected
via a Virtual Router is pretty easy to configure. This allows the network
administrators to do a full network configuration, verify IP address, subnet
masks and router ports and addresses. They can test connectivity and link
speeds and when fully satisfied, they can instantiate the configuration on
the real network.
Another great application is to debug problems by simulating a real network in
a box. If network administrators are having issues with connectivity or
performance, they can create a virtual network and debug their issues using
snoop, kernel stats and dtrace. They don't need to use the expensive H/W
based network analyzers.
The network developers and researchers working with protocols (like high
speed TCP) can use OpenSolaris to write their implementation and then try it
out with other production implementations. They can debug and fine tune their
protocol quite a bit before sending even a single packet on the real
network.
Note1: Users can use Solaris Zones, Xen or ldom guests to create the virtual
hosts while Crossbow provides the virtual network building blocks. There is
no simulation but real protocol code at work. Users run real applications
on the host and clients which generate real packets.
Note2: The Solaris protocol code executed for a virtual network or Solaris
acting a real router or host is common all the way to bottom of MAC layer. In
case of virtual networks, the device driver code for a physical NIC is the
only code that is not needed.
Try it Yourself
Lets do a simple exercise. As part of this exercise, you will learn
- How to configure a virtual network having two subnets and connected via a
Virtual Router using Crossbow and Zones
- How to set the various link speeds to simulate multiple speed network
- Do some performance runs to verify connectivity
What you need:
A laptop or machine running Crossbow snapshot from Feb 28, 2008 or later
http://www.opensolaris.org/os/project/crossbow/snapshots/
Virtual Network Example
Lets take a physical network. The example in Fig 1a is representing the
real network showing how my desktop connects to the Lab servers. The desktop
is on 20.0.0.0/24 network while the server machines (host1 and host2) are
on 10.0.0.0/24 network. In addition, host1 has got a 10/100 Mbps NIC
limiting its connectivity to 100Mbps.

Fig. 1a
We will represent the network shown in Fig 1a on my Crossbow enabled laptop as
a Virtual Network. We use Zones to act as host1, host2 and the Router while
the global zone (gz) acts as the client (as a user exercise, create another
client zone and assign VNIC6 to it to act as a client).

Fig. 1a
Note 3: The Crossbow MAC layer itself does the switching between the
VNICs. The Etherstub is craeated as a dummy device to connect the various
virtual NICs. User can imagine etherstub as a Virtual Switch to help
visualize the virtual network as a replacement for a physical network where
each physical switch is replaced by a virtual switch (implemented by a
Crossbow etherstub).
Create the Virtual Network
Lets start by creating the 2 etherstubs using the dladm command
gz# dladm create-etherstub etherstub1
gz# dladm create-etherstub etherstub3
gz# dladm show-etherstub
LINK
etherstub1
etherstub3
Create the necessary Virtual NICs. VNIC1 has a limited speed of 100Mbs
while others have no limit
gz# dladm create-vnic -l etherstub1 vnic1
gz# dladm create-vnic -l etherstub1 vnic2
gz# dladm create-vnic -l etherstub1 vnic3
gz# dladm create-vnic -l etherstub3 vnic6
gz# dladm create-vnic -l etherstub3 vnic9
gz# dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE
vnic1 etherstub1 - Mbps 2:8:20:8d:de:b1 random
vnic2 etherstub1 - Mbps 2:8:20:4a:b0:f1 random
vnic3 etherstub1 - Mbps 2:8:20:46:14:52 random
vnic6 etherstub3 - Mbps 2:8:20:bf:13:2f random
vnic9 etherstub3 - Mbps 2:8:20:ed:1:45 random
Create the hosts and assign them the VNICs. Also create the Virtual
Router and assign it VNIC3 and VNIC9 over etherstub1 and etherstub3
respectively. Both the Virtual Router and Hosts are created using
Zones in this example but you can easily use Xen or logical domains.
Create a base Zone which we can clone. The first part is necessary if you are on a zfs filesystem.
gz# zfs create -o mountpoint=/vnm rpool/vnm
gz# chmod 700 /vnm
gz# zonecfg -z vnmbase
vnmbase: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vnmbase
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
This part takes 15-20 minutes
gz# zoneadm -z vnmbase install
Now lets create the 2 hosts and the Virtual Router as follow
gz# zonecfg -z host1
host1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host1
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic1
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
gz# zoneadm -z host1 clone vnmbase
gz# zoneadm -z host1 boot
gz# zlogin -C host1
Connect to the console and go through the sysid config. For this example,
we assign 10.0.0.1/24 as IP address for vnic1. You can specify this
during sysidcfg. For default route, specify 10.0.0.3 as the default
route. You can say 'none' for naming service, IPv6, kerberos etc for the
purpose of this example.
Similarly create host2 and configure it with vnic2 i.e.
gz# zonecfg -z host2
host2: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host2
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic2
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
gz# zoneadm -z host2 clone vnmbase
gz# zoneadm -z host2 boot
gz# zlogin -C host2
Connect to the console and go through the sysid config. For this example,
we assign 10.0.0.2/24 as IP address for vnic2. You can specify this
during sysidcfg. For default route, specify 10.0.0.3 as the default
route. You can say 'none' for naming service, IPv6, kerberos etc for the
purpose of this example.
Lets now create the Virtual Router as
gz# zonecfg -z vRouter
vRouter: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vRouter
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic3
zonecfg:vnmbase:net> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic9
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
gz# zoneadm -z vRouter clone vnmbase
gz# zoneadm -z vRouter boot
gz# zlogin -C vRouter
Connect to the console and go through the sysid config. For this example, we
assign 10.0.0.3/24 as IP address for vnic3 and 20.0.0.1/24 as the IP address
for vnic9. You can specify this during sysidcfg. For default route, specify
'none' as the default route. You can say 'none' for naming service, IPv6,
kerberos etc for the purpose of this example. Lets enable forwarding on
the Virtual Router to connect the 10.x.x.x and 20.x.x.x networks.
vRouter# svcadm enable network/ipv4-forwarding:default
Note 5: The above is done inside virtual router. Make sure you are in the
window where you did the zlogin -C vRouter above
Now lets bringup VNIC6 and configure it including setting up routes in the
global zone. You can easily create another host called host3 as the client
on 20.x.x.x network by creating a host3 zone and assigning it 20.0.0.1/24
IP address
Lets configure the VNIC6. Open a xterm in the global zone
gz# ifconfig vnic6 plumb 20.0.0.3/24 up
gz# route add 10.0.0.0 20.0.0.1
gz# ping 10.0.0.1
10.0.0.1 is alive
gz# ping 10.0.0.2
10.0.0.2 is alive
Similarly, login into host1 and/or host2 and verify connectivity
host1# ping 20.0.0.3
20.0.0.3 is alive
host1# ping 10.0.0.2
10.0.0.2 is alive
Set up Link Speed
What we configured above are unlimited B/W links. We can configure a link
speed on all the links. For this example, lets configure the link speed of
100Mbps on VNIC1
gz# dladm set-linkprop -p maxbw=100 vnic1
We could have configured the link speed (or B/W limit) while we were creating
the vnic itself by adding the -p maxbw=100 option to create-vnic command.
Test the performance
Start 'netserver' (or tool of your choice) in host1 and host2. You wil have
to install the tools in the relevant places
host1# /opt/tools/netserver &
host2# /opt/tools/netserver &
gz# /opt/tools/netperf -H 10.0.0.2
TCP STREAM TEST to 10.0.0.2 : histogram
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
49152 49152 49152 10.00 2089.87
gz# /opt/tools/netperf -H 10.0.0.1
TCP STREAM TEST to 10.0.0.1 : histogram
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
49152 49152 49152 10.00 98.78
Note6: Since 10.0.0.2 is assigned to VNIC2 which has no limit, we get the max
speed possible. 10.0.0.1 is configured over VNIC1 which is assigned to host1
and we just set the link speed to 100Mbps and thats why we get only
98.78Mbps.
Cleanup
gz# zoneadm -z host1 halt
gz# zoneadm -z host1 uninstall
delete the zone
gz# zonecfg -z host1
zonecfg:host1> delete
Are you sure you want to delete zone host1 (y/[n])? y
zonecfg:host1> exit
In this way, delete host2 and vRouter zones. Make sure you don't delete
vnmbase since re creating it takes time.
gz# ifconfig vnic6 unplumb
After you have deleted the zone, you can delete vnics and etherstubs as
follows
# dladm delete-vnic vnic1 /* Delete VNIC */
# dladm delete-vnic vnic2
# dladm delete-vnic vnic3
# dladm delete-vnic vnic6
# dladm delete-vnic vnic9
# dladm delete-etherstub etherstub3 /* Delete etherstub */
# dladm delete-etherstub etherstub1
Make sure that VNICs are unplumbed (ifconfig vnic6 unplumb) and not assigned
to a zone (delete the zone first) before you can delete them. You need to
delete all the vnics on the etherstub before you can delete the etherstub.
User Exercises
Now that you are familiar with the concepts and technology, you are ready to
do some experiments of your own. Cleanup the machine as mentioned above. The
exercises below will help you master IP routing, configuring networks, and
debugging for performance bottlenecks.
- Recreate the Virtual Networkwork as show in Fig 1b but this time create
an additional zone called client and assigned vnic6 to that client zone.
client Zone vRouter host1 host2
| | | | |
---- etherstub3 --- -------- etherstub 1----------
Run all your connectivity tests from zloging into the client. Now
change all IPv4 addresses to be IPv6 addresses and verify that client
and hosts still have connectivity
- Leave the Virtual Network as in 1, but configure OSPF in vRouter instead
of RIP by default. Verify that you can still get the connectivity. Note
the steps needed to configure OSPF
- Configure 20.0.0.0 and 10.0.0.0 networks as two separate autonomous
networks, assign them unique ASN numbers and configure unique BGP domains.
Verify that connectivity still works. Note the steps needed to
configure BGP domains.
- Cleanup everything and recreate the virtual network in 1 above but
instead of statically assigning the IP addresses to hosts and clients,
configure NAT on the vRouter to give out address on subnet 10.0.0.0/24
on vnic3 and address on 20.0.0.0/24 for vnic9. While creating the
hosts and clients, configure them to get their IP address through DHCP.
- Cleanup everything and recreate the virtual network in 1 above. Add
additional router vRouter2 which has a vnic each on the 2 etherstubs.
vRouter1
/ \
20.0.0.0/24 10.0.0.0/24
\ /
vRouter2
This provides a redundant path from client to the hosts. Experiment
with running different routing protocols and assign different weight
to each path and see what path you take from client to host (use
traceroute to detect). Now configure the routing protocol on two
vRouters to be OSPF and play with link speeds and see how the path
changes. Note the configuration and observations.
- Cleanup. Lets now introduce another Virtual Router between two
subnets i.e.
client Zone vRouter1 vRouter2 host1 host2
| | | | | | |
---- etherstub3 --- -etherstub 2- -----etherstub 3----------
20.0.0.0/24 30.0.0.0/24 10.0.0.0/24
Now set the link (VNIC) between vRouter1 and etherstub2 to be 75 Mbps.
Use snmp from client to retrive the stats from the vRouter1 and check
where the packets are getting dropped when you run netperf from
client to host2.
Remove the limit set earlier and instead set the link speed of 75 Mbps
on link between etherstub2 and vRouter2. Again use snmp to get the
stats out on vRouter1. Do you see similar results as vRouter1? If
not, can you explain why?
Conclusion and More resources
Use the real example and configure the virtual network to get familiar with
the techniques used. At this point, have a look at your network and try to
create a virtual network.
Get more details on the OpenSolaris Crossbow page
http://www.opensolaris.org/os/project/crossbow
You can find high level presentations, architectural documents, man pages etc
at
http://www.opensolaris.org/os/project/crossbow/Docs
Join the crossbow-discuss@opensolaris.org mailing list at
http://www.opensolaris.org/os/project/crossbow/discussions
Send in your questions or your configuration samples and we will put it in
the use cases examples.
A similar Virtual Network example using global zone as a NAT can be found on
Nicolas's blog at
http://blogs.sun.com/droux
Kais has a a example of dynamic bandwidth paritioning at
http://blogs.sun.com/kais
Venu talks about some of the cool crossbow features at
http://blogs.sun.com/iyer which allows
virtualizing services with Crossbow technology using flowadm.
networking
virtualization
crossbow
(2008-02-29 02:59:01.0)
Permalink

Thursday August 24, 2006
CrossBow: Solaris Network Virtualization & Resource
Control
CrossBow: Solaris Network Virtualization & Resource
Control
CrossBow: Solaris Network
Virtualization & Resource Control
1. CrossBow (the name):
It makes some sense to explain the relatonship between the technology
(Network Virtualization and Resource Control) and the project name
(CrossBow). It is believed that Crossbow was invented in 341B.C. in
China but the use became prevalent in middle ages specially when steel
was used to make the weapon. More powerful Crossbows could penetrate
the armour at 200 yards and gave the typical horse mounted knights
real nightmares. But the biggest differentiator was the simplicity in
their use. Crossbow could be used effectively after a week of
training, while a comparable single-shot skill with a longbow could
take years of practice.
Similary, if you take a look at the existing QOS mechanisms on a end
host, they are very difficult to use and normally take a very skilled
administrator to use effectively. Even then, the existing QOS
mechanism come with heavy performance penalties which is also pretty
common with any kind of virtualization as well. In Solaris land, we
have invented a new way of imposing bandwidth resource control as
attribute to a real or a virtual NIC such that it is built in as part
of the Solaris network stack and comes without any performance
penalties. Since the virtualization aspects and/or resource control
aspects are just the attributes of the NIC/VNIC (specified when a NIC or
Virtual NIC is created), a normal user and configure them without
needing a docterate in QOS or virtualization. "CrossBow" was the most
suitable name for this project since we are trying to achieve similar
results in the field of virtualization and resource control as the
weapon did in medivial times in the battlefield.
2. CrossBow (the background):
Crossbow provides the building blocks for network virtualization and
resource control by creating virtual stacks around any service (HTTP,
HTTPS, FTP, NFS, etc.), protocol (TCP, UDP, SCTP, etc.), or Virtual
machines like Containers, Xen and ldoms.
The project allows the system administrator to carve out any physical
NIC into multiple virtual NICs which are pretty similar to real NICs
and are administered just like real NICs. Each Virtual NIC can be
assigned its own priority and band-width on a shared NIC without
causing any performance degradation. The virtual NICs can have their
own NIC hardware resources (Rx/Tx rings, DMA channels), MAC addresses,
kernel threads and queues which are private to the VNIC and are not
shared accross all traffic. In case of Solaris Containers, the
Container can be assigned a virtual Stack Instance as well along with
one or more virtual NICs. As such traffic for one VNIC can be
totally isolated from other traffic and assigned any kind of limits or
guarantees on amount of bandwidth it can use.
3. Overview:
Project Crossbow extends Solaris reach in several markets.
3a. OS/Network/Server Consolidation:
The application, network and server consolidation environments where
both OS and network virtualization play a big role. This market is
typically driven by the cost of owning and managing physical machines
and physical networks. The sweet spot for these horizontally scaled
environment are the 2-4 socket machines which appear as 4-8 CPU
machines in case of x86/x64 systems and 32-64 CPU machines in case of
SUN's new Niagara based servers. From total cost of ownership
perspective, these blades have only one physical NIC (1Gb or 10Gb) but
are trying to run multiple virtual machines (Xen, Containers, ldoms)
which have to share the NIC resources and the available bandwidth.
The problem gets worse because for 3 decades we have been designing
application to go as fast as possible and any congestion control is
the job of the transport layer (if at all). So if one virtual machine
is using UDP based traffic, then other virtual machines on the same
system using TCP traffic will suffer badly. Even within same transport
(TCP for instance), bulk througput applications like ftp/http etc will
have a very negetive impact on interactive traffic and latency
sensitive applications.
The goal of the project Crossbow is to different virtual machines
share the common NIC in a fair manner and allow system administrators
to set preferential policies where necessary (e.g. the ISP selling
limited B/W on a common pipe) without any performance impact.
3b. Traditional QOS and application consolidation:
Exisiting host based QOS mechanism are very complex to setup and
typically come with a sizable performance penalty and increase in
latency. The big part of the problem is the interrupt based delivery
mechanism for inbound packets and the QOS being implemented by a
separate layer (typically between NIC driver and IP). The network and
transport layer of the host stack is unware about the QOS layer. The
packets are already delivered to the host memory by means of
interrupts and the QOS layer needs to classify the packets to various
queues before it can apply the policies. In case the packet can not be
processed because the bandwidth usage for that class is exceeded, it
sits in a queue while still consuming system memory.
Project Crossbow integrates stack virtualization and QOS as part of
the stack architecture itself to offer a large subset of QOS type
functionality at zero performance penalty and simple administrative
interfaces. It also integrates diffserv with the stack where a virtual
NIC can set and read the diffserv based labels. Since Crossbow
architecture is limited in differentiating the traffic based on layer
2, 3, and 4 headers only i.e. the VLAN tag, local mac address, local
IP address, protocol, and ports; the functionality offered is a subset
of exisiting QOS mechanism although it covers 90% of the use cases
without any performance penalty. This is the prime reason why project
Crossbow refers to the bandwidth related policies as 'Bandwidth
resource control' instead of QOS.
3c. Horizontally scaled markets:
This is the market segment made up of low priced volume servers
(typically 2-4 socket machines) which offer services which require
little or no sharing of data between them. The small servers can be
standalone machines in a rack or blades in a chasis. Grids are another
way to use volume servers to achieve the output of the traditional
large SMP machines or main frames.
In case of blades which share a common 10Gb NIC to the chasis,
Crossbow again provides the sharing of bandwidth in a fair manner. In
addition, the Crossbow provided APIs for network management,
virtualization and bandwidth resource control can be use by 3rd party
management softwares to propogate the common policy throughout the
server farm or all the blades in the chasis. In a Solaris based
homogenous environments, its very easy to mark an application or a
virtual machine (based on port or IP address) as critical and
propogate the same policy through all the machines. The diffserv
labels can be added appropriately such that the policy is honoured by
all machines and network element in the center.
4. Technical problems in exisiting architectures:
As mentioned earlier, the host based QOS systems work as a layer
between the network stack and as such are pretty inefficient in
providing the QOS services required of them. But that is not all.
The exisiting interrupt driven packet delivery model pecludes any kind
of policy enforcement and fair sharing. When a NIC interrupt is raise,
it is at a highest priority and the CPU has to context switch whatever
processing to deal with the interrupt. Most of the time, the
processing of a critical packet is interrupted to deal with the
arrival of a non critical packet.
The anonymous packet processing in the kernel is another major problem
in virtualizing the stack and enforcing any kind of bandwidth resource
control (including fairness). 80% of the work is already done for an
incoming packet when the stack determines that no one is actually
interested in the packet and it needs to drop it. In other words, the
cost of dropping unwanted packets is too high.
Everything in the host flows through common queues and is processed by
common threads which make enforcing policies based on traffic type
very difficult. Recv or xmit of each packet impacts processing on any
other packet on that particular CPU.
In most of the virtualized environments, the pseudo NIC in the virtual
machines has no way of knowing about the hardware capabilities of the
real hardware (even simple things like hardware checksum) because of
the presense of the bridge in between and ends up making negetive
performance impact. In addition, there is no mechanism to share the
NIC in a fair manner. The transition of typical packet from the dom0
to domU also causes severe performance problems.
5. CrossBow Architecture:
The Crossbow architecture starts out by integrating network
virtualization and resource control as part of the stack
architecture. The Solaris 10 network stack has already been designed
for the next decade where the connection to CPU affinity is maintained
and the upper stack has tight control over the NIC resources.
Crossbow builds on top of that by pushing the classification of
packets based on services, protocols or virtual machines as far below
as possible. If the NIC hardware itself has ability to divide onboard
memory into segements/queues (know as Rx and Tx rings) which can
preferably haev their own DMA channels and MSI-X interrupts, the stack
programs the NIC classifier to classify packets based on configured
policies to different Rx rings. Each Rx/Tx ring is owned by a CPU and
a separate kernel queue know as serialization queue which controls the
rate of packet arrival into the system based on configured bandwidth.
The Rx/Tx ring, the associated DMA channel, MSI-X interrupt, the
serialization queue, the CPU, and processing threads are all unique
for the service, protocol or virtual machine in question and can be
assigned a unique MAC address and a Virtual NIC which becomes the
administration entity that can be administered like a normal NIC. The
NIC classifier drives the incoming packets to the correct RX ring from
where the Squeue owning the Rx ring (and VNIC) will pull the packets
via polling mode based on fair sharing of resources or configured
bandwidth. The interrupt mode is used only when the Squeue has no
packets to process and the Rx ring is empty. Each individual Rx ring
is dynamically switched between interrupt and polling mode. Incoming
packets that exceed the configured bandwidth limit remain in the NIC
itself in their corresponding Rx ring and are pulled in the system
only when they are ready to be processed.
The creation of an administrative entity (VNIC) is optional and
typically associated with a virtual machine like Solaris containers,
Xen or ldoms. For application or protocol based resource control, a
separate data path is created to provide the isolation and resource
control but a VNIC is not configured.
As mentioned above the VNIC is just an administrative entity. If the
classification has already been done by the NIC to a particular Rx
ring, the packets as delivered directly to IP layer by means of
function calls when Rx ring is interrupt mode or the squeue residing
in IP layer pulls the packet chain directly from the Rx ring when in
the polling mode. In essence, the entire data link layer is bypassed
resulting in improved performance and lower latencies. If the VNIC is
placed in promiscous mode, the data link bypass is abandoned and the
Rx ring delivers packets via the VNIC layer which creates a copy of
the packet for promiscous stream. Similarly, in polling mode, the
squeues poll entry point are changed to point at VNIC which is turns
pulls the packets from Rx rings, makes a copy and then gives the chain
to the Squeue poll thread.
The entire layered architecture is built on function pointers know as
'upcall_func' and 'downcall_func' with corresponding 'upcall_arg' and
'downcall_arg' for context. Every layer provides a pointer of its recv
function as 'upcall_func' and a context as 'upcall_arg' to the layer
below. Similarly, every layer provides pointer to its transmit
function as 'downcall_func' and a context cookie as 'downcall_arg' to
layer above. This is how the packet path is constructed. Any layer can
short circuit itself out by providing the 'upcall_func' and
'upcall_arg' of the layer above to layer below (and same for transmit
side if needed). All context cookies for a layer work on reference
based system when each layer pointed to it gets a reference and ensure
that data structures don't get freed till all references are dropped.
In case, the NIC hardware does not have classification capability
(unlikely since most of intel, broadcom and SUN 1Gb NICs and pretty
much all 10Gb NICs shipping for past several years have this
capability) or have run out of the classification capability, the
architecture provides a classification capability in the mac layer and
employs soft rings which are similar to functionality as NIC hardware
classifier and RX rings. The NIC hardware layer coupled with lower MAC
layer and soft rings are termed as 'Pseudo Hardware layer'. A request
by administartor to create a new VNIC or flow will always return
successful from the pseudo hardware layer. The pseudo hardware layer
manages the hardware and software classification capability and Rx
rings and soft rings transparently from upper layers.
6. Crossbow layers, data structures and packet flow:
Its easier to illustrate this with 2 flows. The first one is for
IP_addr = a.b.c.d && TCP and it goes through normal path via Upper dls
etc. This is under the assumption that either snoop (or someone else
in DLS) is interested in this flow and we can't bypass data link
processing. The squeue poll function in this case is dls_poll_ring and
argument is dls_impl_t.
The 2nd flow is for IP_addr = m.n.o.p && port = 80 && TCP
which is unique and no one is interested in snooping it. In this case,
the dls layer allows itself to be pypassed by setting the upcall_func
and upcall_arg for soft_ring/Rx_rings to directly call into IP.
The squeue is directly polling the H/W Rx ring in this case.

7. The administrative model:
Crossbow introduces a new command called 'netrcm' and further augments
'dladm' which was introduced as part of the new high performance
device driver framework (GLDv3) in Solaris 10.
'dladm (1M)' - This is primarily used to create, modify and destroy
VNIC based on mac or IP addresses. The created VNIC is visible and
managed by ifconfig just like any otehr NIC and can get its IP address
assigned via DHCP if necessary.
The examples below can illustrate this better:
Example 1: Configuring VNICs
To create two VNICs interfaces with vinc-ids 1 and 2
over a single physical device bge0, enter the following com-
mands:
# dladm create-vnic -d bge0 1
# dladm create-vnic -d bge0 2
The new links will be called vnic1 and vnic2.
Example 2: Configuring VNICs and allocating bandwidth & priority
To create two VNIC interfaces with vinc-ids 1 and 2
over a single physical device bge0 and make vnic1 a higher
priority VNIC using factory assigned MAC address with guarantee
to use upto 90% of the bandwidth and vnic2 having a lower priority
with a random MAC address and a hard limit of 100Mbps:
# dladm create-vnic -d bge0 -m factory -b 90% -G -p high 1
# dladm create-vnic -d bge0 -m random -b 100M -L -p low 2
Example 3: Configure a VNIC by choosing a factory MAC address
To create a VNIC interface with vinc-id 1 by first
listing the factory available MAC address and then using one
of them:
# dladm show-dev -d bge0 -m
bge0
link: up speed: 1000 Mbps duplex: full
MAC addresses:
slot-ident Address In Use
1 0:e0:81:27:d4:47 Yes
2 8:0:20:fe:4e:a5 No
# dladm create-vnic -d bge0 -m factory -n 2 1
# dladm show-dev -d bge0
bge0
link: up speed: 1000 Mbps duplex: full
MAC addresses:
slot-ident Address In Use
1 0:e0:81:27:d4:47 Yes
2 8:0:20:fe:4e:a5 Yes
Example 4: Configuring VNICs sharing a MAC address
To create two VNICs with vnic-id 1 and 2 by first listing the
available factory assigned MAC addresses and then picking one
that will be shared by the newly created VNICs
# dladm show-dev -d bge0 -m
bge0
link: up speed: 1000 Mbps duplex: full
MAC addresses:
slot-ident Address In Use
1 0:e0:81:27:d4:47 Yes
2 8:0:20:fe:4e:a5 No
# dladm create-vnic -d bge0 -m shared -n 2 1
# dladm create-vnic -d bge0 -m shared -n 2 2
Example 5: Creating a VNIC with user specified MAC address
To create a VNIC with vnic-id 1 by providing a user specified
mac address
# dladm create-vnic -d bge0 -m 8:0:20:fe:4e:b8
'netrcm (1M)' - This command is primarily used to provide isolation
and private resources to an application traffic or protocol. In
addition, we can also configure bandwidth limits and guarantees for
the flows. Again some example can illustrate the usage better:
Example 1: Create a policy around mission critical port 443 traffic
which is https service.
To create a policy around inbound https traffic on a https server
so that https gets it dedicated NIC hardware and kernel TCP/IP
resources. The policy-id specified is https-1 which is used to
later modify of delete the policy.
# netrcm add-policy -d bge0 -H transport = TCP local port = 443 https-1
Example 2: Modify an existing policy to add bandwidth resource control
To modify https-1 policy to add bandwidth control and give it a
high priority
# netrcm modify-policy -d bge0 -b 90% -G -p high https-1
Example 3: Limit the bandwidth usage of UDP protocol
To create a policy for UDP protocol so that it can not consume more
than 10% of available bandwidth. The policy-id is called limit-udp-1.
# netrcm add-policy -d bge0 -b 90% -L -p low limit-udp-1
8. Crossbow Observability - Stats, history and APIs:
Apart from the functionality related to network virtualization and
bandwidth resource control, Crossbow offers a whole range of news
tools and mechanism to understand the bandwidth usage. Administrators
can see real time bandwidth usage for various VNICs or configured
flows (via 'netrcm') without causing any performance penalties.
The Rx rings and squeues dealing with a particular flow keep track of
normal stats which are pulled by a userland daemon from time to
time. The daemon also logs the information in special log files which
allows users to see history at any given time. A user can request
usage for a time period in past to understand the system behaviour.
Crossbow will provide more tools to help capacity planning by allowing
the system to be put under capacity planning mode where bandwdith
usage for top traffic is monitored and displayed.
All the observability and administrative interfaces can be accessed by
APIs which allow other applications to use and manage the system.
9. Resources:
Crossbow project page on OpenSolaris is a good source of information
http://www.opensolaris.org/os/project/crossbow
The Crossbow mailing list is where all the day to day business for the
project is conducted. Anyone can join the mailing list
crossbow-discuss@opensolaris.org.
Crossbow slide presentation can be found here
Crossbow Team members are:
* Kais Belgaied
* Stephanie Brucker
* Eric Cheng
* Nicolas Droux
* Markus Flierl
* Carol Gayo
* Mohan Iyer
* Darrin Johnson
* Michael Lim
* Rajagopal Kunhappan
* Erik Nordmark
* Ethan Solomita
* Thirumalai Srinivasan
* Sunay Tripathi
* Nicky Veitch
* Bill Watson
* Roamer Lu
Email: first.last@sun.com
networking
virtualization
crossbow
(2006-08-24 02:26:02.0)
Permalink

Sunday April 02, 2006
Project Crossbow: Network Virtualization and Resource Control going live
Project Crossbow - going live on OpenSolaris
Hello and Welcome to project Crossbow!! We are going to
add Network Virtualization and Resource Control to Solaris without degrading
performance.
At this time, we are seeking members from open solaris community to become
part of Crossbow i-team. Its the charter of i-team to gather requirements
and deliver the project including design, docs and testing. We would love
to have members of the community get involved from day one. The participation
opportunities include (but are not limited to):
- helping define the project
- gathering requirements
- designing the project
- writing code
- creating demos
- doing talks and evangalizing the project
Please send an email to me if you are interested. we can promise you
that this will be a thrilling adventure and you will be living on the
bleeding edge of technology! Project Crossbow is brought to you by
same people who created project FireEngine (new stack architecture),
project Nemo (GLDv3 - new high performance device driver framework),
project Yosemite (UDP performance), etc to name a few.
Apart from active participation, you can also participate via the
mailing lists and discussion groups where we will be posting various
documents for review and comments apart from day to day discussion.
The project Crossbow page is visible here
You can sign up for the discussion group here
(2006-04-02 20:41:15.0)
Permalink

Wednesday December 07, 2005
Nemo based e1000g on T2000
Derek Morr points out that T2000
uses e1000g controllers, which are still dlpi based, so they wouldn't
(yet) get the advantages of Nemo (GLDv3). Very good observation. The
T1000 already uses a broadcom chip which comes up as bge which is fully
Nemo based. The T2000 indeed uses a DLPI
based driver in Solaris 10 update currently. Without going
into the why (its not very pretty), the Nevada and OpenSolaris version
of e1000g is already Nemo based (BTW, the DLPI driver comes up as ipge
on T2000 which tell you that its not Nemo based). The Nemo based
patches for e1000g (for S10) should be available soon if not avialable
already. Pretty soon
the machine will ship with the patches already installed and future
updates will obviously have the Nemo version.
(2005-12-07 21:09:58.0)
Permalink

Tuesday December 06, 2005
Niagara - Designed for Network Throughput
Niagara - Designed for Network throughput
We finally announce Niagara based servers to the public! Billed as the
low cost, energy efficient, huge network throughput processors -
marketing mumbo jumbo you think?? Well, try it and you will see. I was
priviledged enough that one of the earliest prototype landed on my desk
(or in my lab to be precise) so Solaris networking could be tailored to
take advantage of the chip. And boy, together with Solaris, this thing
rocks!!
So you know that Niagara is multi core, multi threaded chip and Solaris
takes advantage in multiple way. Let me highlight some of them.
Network performance
The load from the NIC is fanned out to multiple soft rings in the GLDv3
layer based on the src IP address and port information. Each soft ring
in turn is tied to a Niagara thread and a Vertical
Perimeter such that packets from a connection have locality
to specific H/W thread on a core and the NIC has locality to specific
core. Think of this model as 4 H/W threads per core processing the NIC
such that if one thread stalls for resource, the CPU cycles are not
wasted. The result is amazing network performance for this beast.
Performs 5-6 times the performance of your typical x86 based CPU.
Virtualization
Imagine you are a ISP or someone wanting to consolidate multiple
machines on one physical machine. Well, Niagara based platforms lends
themselves beautifully to this concept because there are so many H/W
threads around which appear as individual CPUs to Solaris. We have a
project underway called Crossbow
(details available on Network
Community page on OpenSolaris) which will allow you to carve the
machine (create virtual network stacks) into multiple virtual machines
and tied specific CPUs to them and control the B/W utilization for each
virtual machine on a shared NIC.
Real Time Networking/Offload
With GLDv3
based drivers and FireEngine
architecture in Solaris 10, the stack controls the rate of interrupts
and can dynamically switch the NIC between interrupt and polling mode.
Couple with Niagara platform, Solaris can run the entire networking
stack on one core and provide real time capabilities to the
application. Meanwhile, the application them selves run on different
core without worrying about networking interrupts pinning them down.
You can get pretty bounded latencies provided application can do some
admission control. We are also planning to hide the core running
networking from the application effectively getting TOE for free
without suffering from the drawbacks of offloading networking to a
spearate piece of hardware.
[ T:
NiagaraCMT
]
(2005-12-06 17:31:01.0)
Permalink
|
|
|
|
|