All
|
General
|
Solaris
|
Solaris Networking

Sunday July 12, 2009
Crossbow Launch, Talk and BOF at Community One and Java One Crossbow Launch, Talk and BOF at Community One and Java One
On June 1, 2009, during Community One and Java One in San Francisco, California, Crossbow was formally launched as part of OpenSolaris 2009.06. The morning started with a keynote where John Fowler, EVP of Sun Systems group formally announced OpenSolaris 2009.06 as the beta for next enterprise release of Solaris.Next (Next release after Solaris 10). He and Greg Lavender then went on to show the Crossbow feature and the Virtual Wire demo. Later in the day I did a talk on Crossbow where Nicolas and Kais accompanied me and showed the Crossbow Virtual wire demo in detail. Bill Franklin and some of his cohorts were dressed as Crossbow knights and they charged in the room right after the talk. I think people just got a shock of their life. It was very entertaining.
The launch got lot of visibility and very good press coverage which can be see on the Crossbow News page. The most notable ones were:
On June 2, 2009 we held the Crossbow BOF in the evening. Great showing and great support from the Community.
So great stuff and a good closure for Phase 1 of the Crossbow project. The team members were pretty happy and relived. Now trying to get the next intermediate phase going so we can complete the story for next enterprise release of Solaris which might or might not be called Solaris11. Key things are more analytics (dlstat/flowstat), some security/anti spoofing features and more usablity etc. More details are being discussed on the Crossbow Discussion page.
(2009-07-12 14:59:12.0)
Permalink

Tuesday March 17, 2009
Crossbow: Virtualized switching and performance Crossbow: Virtualized switching and performance
Saw
Cisco's unified fabric announcement. Seems like they are going
after Cloud computing which pretty much promises to solve the world
hunger problem. Even if Cloud computing can just solve the high data
center cost problem and make compute, networking, and storage
available on demand in a cheap manner, I am pretty much sold on it.
The interesting part is that world needs to move towards enabling
people to bring their network on the cloud and have compute, bandwidth
and storage available on demand. Talking about networking and network
virtualization, this means that we need to go to open standards,
open technology and off the shelf hardware. The users of cloud
will not accept a vendor or provider lock down. The cloud needs to be
built in such a manner that a user can take his physical network and
migrate it to an operator's cloud and at the same time have the
ability to build their own clouds and migrate stuff between the
two. Open Networking is the key ingredient here.
This essentially means that there is no room for custom ASICs and
protocols and the world of networking needs to change. This is what
Jonathan was talking about to certain extent around Open Networking
and Crossbow. OpenSolaris with Crossbow make things very
interesting in this space. But it seems like people don't fully
understand what Crossbow and OpenSolaris bring to the table. I saw a post from
Scott Lowe and several other mentioning that Crossbow is pretty
similar to VMware's
network virtualization solutions and
Cisco Nexus 1000v virtual switches.
Let me take some time to
explain few very important things about Crossbow:
- Its Open Source and part of OpenSolaris. You can download it
right here.
- Its leverages NIC hardware switching and features to deliver
isolation and performance for virtual machines. Crossbow not only
includes H/W & S/W based VNICs and switches, it also offers
Virtualized Routers, Load balancer, and Firewalls. The Virtual Network
Machines can be created using Crossbow and Solaris Zones and have
pretty amazing performance. All these are connected together using the
Crossbow Virtual
Wire. You don't need to buy fancy and expensive virtualized switches to create
and use Virtual Wire.
- Using hardware virtualized lanes Crossbow technology scales multiples of 10gig
traffic using off the shelf hardware.
Hardware based VNICs and Hardware based Switching
Picture is always worth a thousand words. The figure shows how
crossbow VNIC are built on top of real NIC hardware and how we do
switching in hardware where possible. And Crossbow does have a full
featured S/W layer where it can do S/W VNICs and switching as
well. The hardware is leveraged when available. Its important to note
that most of the NIC vendors do ship with the necessary NIC
classifiers and Rx/Tx rings and its pretty much mandatory for 10 gig
NICs which do form the backbone for a cloud.

Virtual Wire: The essence of virtualized networking
The Crossbow Virtual
Wire technology allows a person to convert a full
features physical network (multiple subnets, switches and routers) and
configure it within one or more hosts. This is the key to move
virtualized networks in and out of the cloud. The figure shows a
two subnet physical network with multiple switches, different link
speeds and connected via a router and how it can be virtualized in a
single box. A full workshop to do virtualized networking is available
here.

Scaling and Performance
Crossbow leverages the NICs features pretty aggressively to create
virtualization lanes that help traffic scale across large number of
cores and threads. For people wanting to build real or virtual
appliances using OpenSolaris, the performance and scaling across 10
Gig NICs is pretty essential. The figure below shows an overview of
hardware lanes.

More Information
There is a white paper and more detailed
documents (including how to get started) at the
Crossbow
OpenSolaris page.
network
virtualization
crossbow
cloud computing
(2009-03-17 17:30:06.0)
Permalink

Monday March 02, 2009
Crossbow enables an Open Networking Platform Crossbow enables an Open Networking Platform
I came across this blog from Paul Murphy. You
should read the second half of Pauls blog. What he says pretty true. Crossbow delivered a brand new
networking stack to Solaris which has scalability, virtualization, QoS, and better observability
designed in (instead of patched in). The complete list of features delivered and under works are
here. Coupled with a full
fledged open source Quagga Routing Suite (RIP, OSPF, BGP, etc),
IP Filter Firewall, and a kernel Load Balancer, OpenSolaris becomes a
pretty useful platform for building Open Networking appliances.
Apart from single box functionality, imagine if you want to deliver Virtual Router or a load balancer,
it would be pretty easy to do so. OpenSolaris offers Zones
where you can deliver a pre configured zone as a Router, Load balancer, or a firewall. The difference would be
that this Zone would be fully portable to another machine running OpenSolaris and will have no performance
penalty. After all, we aka Crossbow team guarantee that
our VNICs with Zones do not have any performance penalties.
You can also build a fully portable and pre configured virtual networking equipment using Xen guest which can be made to migrate between any OpenSolaris
or Linux host.
I noticed that couple of folks on Paul blog were asking about why Crossbow NIC virtualization is
different? Well, its not just the NIC being virtualized but actually
the entire data path along with it called a Virtualization Lane. You can see the virtualization lane all the way from NIC to socket Layer and back
here.
Not only is there one or more Virtualization Lanes per virtual machine,
the bandwidth partitioning, Diffserv tagging, priority, CPU assignment etc. are designed in as part of the
architecture. The same concepts are used to scale the stack across multiples of 10gigE NIC over large
number of cores and threads (out of the world forwarding performance anyone!).
And as mentioned before, Crossbow enables
Virtual Wire. A ability to create a full featured network without any physical wires. Think of
running network simulations and testing in a whole new light!!
(2009-03-02 23:10:16.0)
Permalink

Tuesday March 04, 2008
Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)
Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)
Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)
I did a session for developers during the Sun Tech Day in Hyderabad and Raju Alluri
had printed out 100 copies of the workshop and we were carrying 100 DVDs with Crossbow iso images (they are available
on web here. The people just loved it. We had sooo
underestimated the demand that printouts and DVDs disappeared in less than a minute. I had a presentation that included
30 odd slides but I couldn't even go past slide 7 since the workshop was so interesting to people. And between the
tech day presentation and user group meeting in the evening, people pointed out a lot of interesting uses and why
this can be such a powerful thing.
The idea that you can create any arbitrarily complex physical network as a virtual wire and run your favorite workload,
do performance analysis and debug it is very appealing to people. Remember that we are not simulating the network. This
is the real thing i.e. real applications running and real packets flowing. If you application runs on any OS, it will
run on this virtual network and will send and receive real packets!!
The concept is pretty useful even to people like us because now we don't need to pester our lab staff to create us
a network for us to test or experiment on. And best part is, we can use xVM and run Linux and Windows as hosts as well.
We are thinking of writing a book which reinvents how you learn networking in schools and universities. And oh by the way,
do people really care about CCNA now that they can do all this on their laptop :) If someone is interested in contributing
real examples for this workshop module and the book, you are more than welcome. Just drop us a line.
networking
virtualization
crossbow
(2008-03-04 18:05:48.0)
Permalink

Friday February 29, 2008
Network in a Box (Creating a real Networks on your Laptop)
Virtual Wire: Network in a Box (Creating a real Networks on your Laptop)
Virtual Wire: Network in a Box (Creating a real Network on your Laptop)
Crossbow: Network Virtualization & Resource
Control
Objective
Create a real network comprising of Hosts, Switches and Routers as a Virtual
Network on a laptop. The Virtual Network (called Virtual Wire) is created using OpenSolaris project
Crossbow Technology and the hosts etc are created using Solaris Zones (a light
weight virtualization technology). All the steps necessary to create the
virtual topology are explained.
The users can use this hands on demo/workshop and exercises in the end to
become an expert in
- Configuring IPv4 and IPv6 networks
- Hands on experience with OpenSolaris
- Configure and manage a real Router
- IP Routing technologies including RIP, OSPF and BGP
- Debugging configuration and connectivity issues
- Network performance and bottleneck Analysis
The users of this module need not have access to a real network, router and
switches. All they need is a laptop or desktop running OpenSolaris Project
Crossbow snapshot 2/28/2008 or later which can be found at
http://www.opensolaris.org/os/project/crossbow/snapshots.
Introduction
Crossbow (Network Virtualization and Resource Control) allows users to create
a Virtual Wire with fixed link speeds in a box. Multiple subnet connected
via a Virtual Router is pretty easy to configure. This allows the network
administrators to do a full network configuration, verify IP address, subnet
masks and router ports and addresses. They can test connectivity and link
speeds and when fully satisfied, they can instantiate the configuration on
the real network.
Another great application is to debug problems by simulating a real network in
a box. If network administrators are having issues with connectivity or
performance, they can create a virtual network and debug their issues using
snoop, kernel stats and dtrace. They don't need to use the expensive H/W
based network analyzers.
The network developers and researchers working with protocols (like high
speed TCP) can use OpenSolaris to write their implementation and then try it
out with other production implementations. They can debug and fine tune their
protocol quite a bit before sending even a single packet on the real
network.
Note1: Users can use Solaris Zones, Xen or ldom guests to create the virtual
hosts while Crossbow provides the virtual network building blocks. There is
no simulation but real protocol code at work. Users run real applications
on the host and clients which generate real packets.
Note2: The Solaris protocol code executed for a virtual network or Solaris
acting a real router or host is common all the way to bottom of MAC layer. In
case of virtual networks, the device driver code for a physical NIC is the
only code that is not needed.
Try it Yourself
Lets do a simple exercise. As part of this exercise, you will learn
- How to configure a virtual network having two subnets and connected via a
Virtual Router using Crossbow and Zones
- How to set the various link speeds to simulate multiple speed network
- Do some performance runs to verify connectivity
What you need:
A laptop or machine running Crossbow snapshot from Feb 28, 2008 or later
http://www.opensolaris.org/os/project/crossbow/snapshots/
Virtual Network Example
Lets take a physical network. The example in Fig 1a is representing the
real network showing how my desktop connects to the Lab servers. The desktop
is on 20.0.0.0/24 network while the server machines (host1 and host2) are
on 10.0.0.0/24 network. In addition, host1 has got a 10/100 Mbps NIC
limiting its connectivity to 100Mbps.

Fig. 1a
We will represent the network shown in Fig 1a on my Crossbow enabled laptop as
a Virtual Network. We use Zones to act as host1, host2 and the Router while
the global zone (gz) acts as the client (as a user exercise, create another
client zone and assign VNIC6 to it to act as a client).

Fig. 1a
Note 3: The Crossbow MAC layer itself does the switching between the
VNICs. The Etherstub is craeated as a dummy device to connect the various
virtual NICs. User can imagine etherstub as a Virtual Switch to help
visualize the virtual network as a replacement for a physical network where
each physical switch is replaced by a virtual switch (implemented by a
Crossbow etherstub).
Create the Virtual Network
Lets start by creating the 2 etherstubs using the dladm command
gz# dladm create-etherstub etherstub1
gz# dladm create-etherstub etherstub3
gz# dladm show-etherstub
LINK
etherstub1
etherstub3
Create the necessary Virtual NICs. VNIC1 has a limited speed of 100Mbs
while others have no limit
gz# dladm create-vnic -l etherstub1 vnic1
gz# dladm create-vnic -l etherstub1 vnic2
gz# dladm create-vnic -l etherstub1 vnic3
gz# dladm create-vnic -l etherstub3 vnic6
gz# dladm create-vnic -l etherstub3 vnic9
gz# dladm show-vnic
LINK OVER SPEED MACADDRESS MACADDRTYPE
vnic1 etherstub1 - Mbps 2:8:20:8d:de:b1 random
vnic2 etherstub1 - Mbps 2:8:20:4a:b0:f1 random
vnic3 etherstub1 - Mbps 2:8:20:46:14:52 random
vnic6 etherstub3 - Mbps 2:8:20:bf:13:2f random
vnic9 etherstub3 - Mbps 2:8:20:ed:1:45 random
Create the hosts and assign them the VNICs. Also create the Virtual
Router and assign it VNIC3 and VNIC9 over etherstub1 and etherstub3
respectively. Both the Virtual Router and Hosts are created using
Zones in this example but you can easily use Xen or logical domains.
Create a base Zone which we can clone. The first part is necessary if you are on a zfs filesystem.
gz# zfs create -o mountpoint=/vnm rpool/vnm
gz# chmod 700 /vnm
gz# zonecfg -z vnmbase
vnmbase: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vnmbase
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
This part takes 15-20 minutes
gz# zoneadm -z vnmbase install
Now lets create the 2 hosts and the Virtual Router as follow
gz# zonecfg -z host1
host1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host1
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic1
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
gz# zoneadm -z host1 clone vnmbase
gz# zoneadm -z host1 boot
gz# zlogin -C host1
Connect to the console and go through the sysid config. For this example,
we assign 10.0.0.1/24 as IP address for vnic1. You can specify this
during sysidcfg. For default route, specify 10.0.0.3 as the default
route. You can say 'none' for naming service, IPv6, kerberos etc for the
purpose of this example.
Similarly create host2 and configure it with vnic2 i.e.
gz# zonecfg -z host2
host2: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host2
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic2
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
gz# zoneadm -z host2 clone vnmbase
gz# zoneadm -z host2 boot
gz# zlogin -C host2
Connect to the console and go through the sysid config. For this example,
we assign 10.0.0.2/24 as IP address for vnic2. You can specify this
during sysidcfg. For default route, specify 10.0.0.3 as the default
route. You can say 'none' for naming service, IPv6, kerberos etc for the
purpose of this example.
Lets now create the Virtual Router as
gz# zonecfg -z vRouter
vRouter: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vRouter
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic3
zonecfg:vnmbase:net> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic9
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit
gz# zoneadm -z vRouter clone vnmbase
gz# zoneadm -z vRouter boot
gz# zlogin -C vRouter
Connect to the console and go through the sysid config. For this example, we
assign 10.0.0.3/24 as IP address for vnic3 and 20.0.0.1/24 as the IP address
for vnic9. You can specify this during sysidcfg. For default route, specify
'none' as the default route. You can say 'none' for naming service, IPv6,
kerberos etc for the purpose of this example. Lets enable forwarding on
the Virtual Router to connect the 10.x.x.x and 20.x.x.x networks.
vRouter# svcadm enable network/ipv4-forwarding:default
Note 5: The above is done inside virtual router. Make sure you are in the
window where you did the zlogin -C vRouter above
Now lets bringup VNIC6 and configure it including setting up routes in the
global zone. You can easily create another host called host3 as the client
on 20.x.x.x network by creating a host3 zone and assigning it 20.0.0.1/24
IP address
Lets configure the VNIC6. Open a xterm in the global zone
gz# ifconfig vnic6 plumb 20.0.0.3/24 up
gz# route add 10.0.0.0 20.0.0.1
gz# ping 10.0.0.1
10.0.0.1 is alive
gz# ping 10.0.0.2
10.0.0.2 is alive
Similarly, login into host1 and/or host2 and verify connectivity
host1# ping 20.0.0.3
20.0.0.3 is alive
host1# ping 10.0.0.2
10.0.0.2 is alive
Set up Link Speed
What we configured above are unlimited B/W links. We can configure a link
speed on all the links. For this example, lets configure the link speed of
100Mbps on VNIC1
gz# dladm set-linkprop -p maxbw=100 vnic1
We could have configured the link speed (or B/W limit) while we were creating
the vnic itself by adding the -p maxbw=100 option to create-vnic command.
Test the performance
Start 'netserver' (or tool of your choice) in host1 and host2. You wil have
to install the tools in the relevant places
host1# /opt/tools/netserver &
host2# /opt/tools/netserver &
gz# /opt/tools/netperf -H 10.0.0.2
TCP STREAM TEST to 10.0.0.2 : histogram
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
49152 49152 49152 10.00 2089.87
gz# /opt/tools/netperf -H 10.0.0.1
TCP STREAM TEST to 10.0.0.1 : histogram
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
49152 49152 49152 10.00 98.78
Note6: Since 10.0.0.2 is assigned to VNIC2 which has no limit, we get the max
speed possible. 10.0.0.1 is configured over VNIC1 which is assigned to host1
and we just set the link speed to 100Mbps and thats why we get only
98.78Mbps.
Cleanup
gz# zoneadm -z host1 halt
gz# zoneadm -z host1 uninstall
delete the zone
gz# zonecfg -z host1
zonecfg:host1> delete
Are you sure you want to delete zone host1 (y/[n])? y
zonecfg:host1> exit
In this way, delete host2 and vRouter zones. Make sure you don't delete
vnmbase since re creating it takes time.
gz# ifconfig vnic6 unplumb
After you have deleted the zone, you can delete vnics and etherstubs as
follows
# dladm delete-vnic vnic1 /* Delete VNIC */
# dladm delete-vnic vnic2
# dladm delete-vnic vnic3
# dladm delete-vnic vnic6
# dladm delete-vnic vnic9
# dladm delete-etherstub etherstub3 /* Delete etherstub */
# dladm delete-etherstub etherstub1
Make sure that VNICs are unplumbed (ifconfig vnic6 unplumb) and not assigned
to a zone (delete the zone first) before you can delete them. You need to
delete all the vnics on the etherstub before you can delete the etherstub.
User Exercises
Now that you are familiar with the concepts and technology, you are ready to
do some experiments of your own. Cleanup the machine as mentioned above. The
exercises below will help you master IP routing, configuring networks, and
debugging for performance bottlenecks.
- Recreate the Virtual Networkwork as show in Fig 1b but this time create
an additional zone called client and assigned vnic6 to that client zone.
client Zone vRouter host1 host2
| | | | |
---- etherstub3 --- -------- etherstub 1----------
Run all your connectivity tests from zloging into the client. Now
change all IPv4 addresses to be IPv6 addresses and verify that client
and hosts still have connectivity
- Leave the Virtual Network as in 1, but configure OSPF in vRouter instead
of RIP by default. Verify that you can still get the connectivity. Note
the steps needed to configure OSPF
- Configure 20.0.0.0 and 10.0.0.0 networks as two separate autonomous
networks, assign them unique ASN numbers and configure unique BGP domains.
Verify that connectivity still works. Note the steps needed to
configure BGP domains.
- Cleanup everything and recreate the virtual network in 1 above but
instead of statically assigning the IP addresses to hosts and clients,
configure NAT on the vRouter to give out address on subnet 10.0.0.0/24
on vnic3 and address on 20.0.0.0/24 for vnic9. While creating the
hosts and clients, configure them to get their IP address through DHCP.
- Cleanup everything and recreate the virtual network in 1 above. Add
additional router vRouter2 which has a vnic each on the 2 etherstubs.
vRouter1
/ \
20.0.0.0/24 10.0.0.0/24
\ /
vRouter2
This provides a redundant path from client to the hosts. Experiment
with running different routing protocols and assign different weight
to each path and see what path you take from client to host (use
traceroute to detect). Now configure the routing protocol on two
vRouters to be OSPF and play with link speeds and see how the path
changes. Note the configuration and observations.
- Cleanup. Lets now introduce another Virtual Router between two
subnets i.e.
client Zone vRouter1 vRouter2 host1 host2
| | | | | | |
---- etherstub3 --- -etherstub 2- -----etherstub 3----------
20.0.0.0/24 30.0.0.0/24 10.0.0.0/24
Now set the link (VNIC) between vRouter1 and etherstub2 to be 75 Mbps.
Use snmp from client to retrive the stats from the vRouter1 and check
where the packets are getting dropped when you run netperf from
client to host2.
Remove the limit set earlier and instead set the link speed of 75 Mbps
on link between etherstub2 and vRouter2. Again use snmp to get the
stats out on vRouter1. Do you see similar results as vRouter1? If
not, can you explain why?
Conclusion and More resources
Use the real example and configure the virtual network to get familiar with
the techniques used. At this point, have a look at your network and try to
create a virtual network.
Get more details on the OpenSolaris Crossbow page
http://www.opensolaris.org/os/project/crossbow
You can find high level presentations, architectural documents, man pages etc
at
http://www.opensolaris.org/os/project/crossbow/Docs
Join the crossbow-discuss@opensolaris.org mailing list at
http://www.opensolaris.org/os/project/crossbow/discussions
Send in your questions or your configuration samples and we will put it in
the use cases examples.
A similar Virtual Network example using global zone as a NAT can be found on
Nicolas's blog at
http://blogs.sun.com/droux
Kais has a a example of dynamic bandwidth paritioning at
http://blogs.sun.com/kais
Venu talks about some of the cool crossbow features at
http://blogs.sun.com/iyer which allows
virtualizing services with Crossbow technology using flowadm.
networking
virtualization
crossbow
(2008-02-29 02:59:01.0)
Permalink

Tuesday December 06, 2005
Niagara - Designed for Network Throughput
Niagara - Designed for Network throughput
We finally announce Niagara based servers to the public! Billed as the
low cost, energy efficient, huge network throughput processors -
marketing mumbo jumbo you think?? Well, try it and you will see. I was
priviledged enough that one of the earliest prototype landed on my desk
(or in my lab to be precise) so Solaris networking could be tailored to
take advantage of the chip. And boy, together with Solaris, this thing
rocks!!
So you know that Niagara is multi core, multi threaded chip and Solaris
takes advantage in multiple way. Let me highlight some of them.
Network performance
The load from the NIC is fanned out to multiple soft rings in the GLDv3
layer based on the src IP address and port information. Each soft ring
in turn is tied to a Niagara thread and a Vertical
Perimeter such that packets from a connection have locality
to specific H/W thread on a core and the NIC has locality to specific
core. Think of this model as 4 H/W threads per core processing the NIC
such that if one thread stalls for resource, the CPU cycles are not
wasted. The result is amazing network performance for this beast.
Performs 5-6 times the performance of your typical x86 based CPU.
Virtualization
Imagine you are a ISP or someone wanting to consolidate multiple
machines on one physical machine. Well, Niagara based platforms lends
themselves beautifully to this concept because there are so many H/W
threads around which appear as individual CPUs to Solaris. We have a
project underway called Crossbow
(details available on Network
Community page on OpenSolaris) which will allow you to carve the
machine (create virtual network stacks) into multiple virtual machines
and tied specific CPUs to them and control the B/W utilization for each
virtual machine on a shared NIC.
Real Time Networking/Offload
With GLDv3
based drivers and FireEngine
architecture in Solaris 10, the stack controls the rate of interrupts
and can dynamically switch the NIC between interrupt and polling mode.
Couple with Niagara platform, Solaris can run the entire networking
stack on one core and provide real time capabilities to the
application. Meanwhile, the application them selves run on different
core without worrying about networking interrupts pinning them down.
You can get pretty bounded latencies provided application can do some
admission control. We are also planning to hide the core running
networking from the application effectively getting TOE for free
without suffering from the drawbacks of offloading networking to a
spearate piece of hardware.
[ T:
NiagaraCMT
]
(2005-12-06 17:31:01.0)
Permalink

Tuesday June 14, 2005
The world of Solaris Networking The world of Solaris Networking
The DDay has finally arrived. Open Solaris is here. For me
personally, its a very nice feeling since I can now talk about the
architecture and implementation openly with people and point them to
the code. Before coming to Sun, I had always been in research labs
where collaboration is the way of life. God - how much I missed that
part in Sun and thankfully I am hoping to get it.
One of the big changes in Solaris 10 was
project FireEngine which allowed Solaris to perform and
scale. The important thing that I couldn't tell people before was
where the wins came from. Bulk of them came from a lockless design
called Vertical perimeter implemented by means of a serialization
queue. This allows packets once picked up for processing to be taken
all the way up to socket layer or all the way down to device
driver. With the aid of the IPclassifer, we bind connections to
squeues (which in turn are bound to CPUs) and this allows us to get a
better locality and scaling. The squeues also allow us to track the
entire backlog per CPU. The GLDv3 based drivers allow IP to control
the interrupts and based on the squeue backlog, the interrupts are
controlled dynamically to achieve even higher performance and avoid
the havoc caused by interrupts. Some day I will tell you stories on
how we dealt with 1Gb NICs when they arrived and CPUs were still
pretty slow.
Coming back to collaboration, you will notice that Solaris networking
architecture looks very different compared to SVR4 STREAMS based
architecture or BSD based architecture. It opens new doors for us and
it allows us to do stack virtualization and resource control (project
Crossbow) and tons of new things. We have setup a networking
community page which has brief discussion on some of the new
projects we are doing and would love to hear what you think about
it. The discussion form on the same page
would be an easy way to talk. We are open to suggestions on how
you would like to see this go forward.
Enjoy, just like I enjoyed Solaris for so many years!
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
(2005-06-14 09:07:13.0)
Permalink

Thursday May 26, 2005
High Performance device driver framework (aka project Nemo) A lot has happened since my last blog. I will talk about it one of these days. But the coolest thing we
finished is called project Nemo. Its a high performance device driver framework which allows writing device
driver for Solaris a breeze. Its technically GLDv3 framework but we like to call it Nemo instead :)
So what can Nemo do for you? Well, it switches dynamically between interrupt and polling mode (all controlled
by IP) to boost performance. Any device driver which support turning the interrupt on/off can take
advantage and boost performance by 20-25% by cutting the number of interrupts in more useful manner
and improving the latency at the same time. Way superior to interrupt coalescing etc
Ben also finds it pretty useful here. Hey Ben,
as you mentioned, lot of people are finding using ethernet pretty useful in storage as well. I'll have
some followon news on our iSCSI front soon. The initiator is already done and will be part of S10 update
while we are seening some pretty impressive numbers on a Solaris 10 iSCSI target with 10Gb.
Coming back to Nemo, it also does trunking for both 10Gb and 1Gb NICs in a pretty simple way. We demo'd
a trunk of 2 10Gb NICs on a 2 CPU machine during the Sunlab's openhouse in april and we ran over 12GBps
over the trunk! There are some other cool things Nemo will allow us to do and one of these days I will tell you
the details.
(2005-05-26 20:11:56.0)
Permalink

Wednesday December 08, 2004
Solaris networking external page and discussion forum ready! I would like to welcome you all to our External Solaris Networking
page on bigadmin (yes I know it took this long but doing the code is
lot easier :). Check it out.
It currently has FireEngine (the enhanced high performing TCP/IP stack in Solaris 10) related information (including the public white paper). We plan to
move this forward to include every networking related project.
Also has a discussion forum. I would like to encourage people
to ask any networking related questions there and the experts to answer those there.
This way, we build a kind of external FAQ for Solaris networking
that would be very useful.
(2004-12-08 16:33:47.0)
Permalink

Sunday October 17, 2004
Solaris vs Red Hat Sorry guys, the heading is not mine. Its coming from the discussion at www.osnews.com
where the Solaris 10 networking is being discussed. It is pretty
interesting discussion if you filter out few of the usual posting where
people don't really know the facts.
I was surprised to see a large number of people who know Solaris voicing
their positive opinions. Normally, people from Solaris world are not very
vocal on discussion groups and public forums. So that is a surprising
(and good) change. Guys keep it up!
Someone mentioned that why are we not targetting Windows. Come on guys,
you got to be serious. I am an engineer and do you think I design
networking architecture targeted to beat windows :) As pointed out in
the comments, they are not even on my radar. Maybe in next twenty years,
their technology will match our current stuff but then we would have
hopefully moved on :^) And yes, as I am told, we do beat Windows 2003 by
20-30% on a 2 CPU x86 box (Opteron 2x2.2GHz with 2 Gb RAM) on webbench
(static, dynamic and ecommerce). There are probably more benchmarks but
frankly we hadn't had time to compare or publish. Our sole aim right now
is to improve the real customer workloads and we are depending on
customers to tell us these numbers.
As for AIX and HP-UX (and I am going to get in trouble now with my bosses
for saying this), they just don't exist in any significant manner. I have
talked to a large numbers of customers in past two years since part of
our approach is to understand what the customer is having trouble with
and what he will need going forward, and let me be really honest, I don't
see HP-UX at all and very little AIX. Yes I do see IBM and HP machines,
but they are all running Linux (please no flames, this is just my
experience).
Again, when we are designing/writing new code, we do like to set some
targets. When it comes to scaling across large number of CPUs, we have
always done very well because thats where we focused. We never really
looked at 1-2 CPU performance before since it was always easy to add more
processors on SPARC platforms. Linux on the other hand has really simple
code that allowed it to perform very well on 1 CPU. So our challenge
was to come up with an architecture that could beat Linux on low end and
still allowed us to scale linearly on high end and sure enough, we
created FireEngine . Its the same code that runs on SPARC platforms
scaling linearly and runs pretty fast on 2 CPU x86 platforms. And as you
add more CPUs on x86 (going to 4 and 8 and then dual core), we just start
becoming very compelling architecture.
As for some people commenting about the validity on the numbers comparing
Solaris 10 and Apache with RHEL AS3 and Apache on www.sun.com, they are on
the same H/W. Its a 2x2.2 Ghz Opteron box (V20z) with 6Gb RAM and 2
Broadcom Gig NICs. The numbers were done on webbench and the other major
web performance benchmark that we can't talk about since the numbers are
not published yet. These numbers are for out of box Solaris 10 32bits with no
tuning at all (entire FireEngine focus was on out of box performance for
real customer workloads). And frankly, we are not really interested
in benchmarks because all the Linux web performance numbers (for instance
SPECweb99) are published using TUX or Red Hat content accelarator. I
haven't come across a single customer who is running TUX so far. So why
doesn't someone publish a Linux Apache number without any benchmark
special and we will be sure to put resources to meet/beat those
numbers. That I think would be a more fair comparison. And thats why I am
far more impressed by customer quotes like the one from "Bill
Morgan, CIO at Philadelphia Stock Exchange Inc.", where he said that
Solaris 10 improved his trading capacity by 36%. Now we are not
talking about a micro benchmark here but a system level capacity. This
was on a 12 way E4800 (SPARC platform). Basically, they loaded Solaris 10
on the same H/W and were able to do 36% more stock transactions per
second.
And once again, I am not really anti Linux or anything. I just need
something to compete against in a good natured way (HP-UX, AIX, IRIX are
not around anymore, and I still can't bring myself down to compete with
Windows). Before FireEngine, it was Linux guys who used to pull my leg
saying when will I make Solaris perform as well as Linux on 1 CPU. Well,
Solaris does perform now and some of the guys who used ot pull my leg
took me out for beer when they loaded Solaris express on their
system. And knowing them, I might be buying the next round somewhere down
the line.
Oh, before I end, I wanted to just touch on why we are not comparing
against RHEL AS4beta. Well, its not us who is doing the comparing but our
customers. And that is because although Solaris 10 is due to ship now,
things like FireEngine have been available and stable for almost a
year. If I am to do the comparison, I will pick the latest in Red hat but
I will compare it against Solaris 10 Update (due out 3-6 months after
Solaris 10). And you know what, we haven't exactly been sitting around
for the past year. Solaris 10 update will improve performance over S10
FCS by another 20-25% on networking workloads.
(2004-10-17 00:32:26.0)
Permalink

Saturday October 16, 2004
More Solaris on x86 Performance data www.sun.com is featuring performance on Solaris x86 platforms. BTW, couple of you asked how the new TCP/IP stack can scale almost linearly. I am trying to understand how much I am allowed to say on blogs like this but hopefully next week sometime I will give a more technical rundown for the geeks out there.
(2004-10-16 13:34:12.0)
Permalink

Tuesday October 12, 2004
Solaris 10 on x86 really performs Someone pointed me to this article from George Colony, CEO, Forrester Research and the real story from Tom Adelstein. Both are pretty interesting articles but one of the feedbacks "Untrue... Learn the Facts first" to Tom kind of got me motivated to write this blog. "Solaris 10 on x86" can really match Linux in performance and better yet, linearly scale over large number of CPUs (remember that 8 CPUs x86 blades are here already and then we will start seeing 8 CPUs, dual core blades). The new network architecture (FireEngine) in S10 allows the same code to give a huge performance win on 1 and 2 CPU configurations and give linear scaling when more CPUs are added.
Take for instance web performance. We have improvemed 2 CPU performance by close to 50% (compared to Solaris 9) using a real web server like Apache, Sun One Web Server, Zeus, etc without any gimmicks like kernel caching etc. Its just plain webserver with TCP/IP and a dumb NIC. Some of our Solaris express customers are telling us that we are outperforming RHEL AS3 by almost 15-35% on the same hardware.
Interested in more numbers - On static and dynamic webbench, Solaris 10 is at par with RHEL AS3 on 2 CPU v20z while its ahead by 15% on webbench Ecommerce benchmark. On the same box, we can saturate a 1Gb NIC using only 8-9% on a 2.2Ghz Opteron processor but the real killer deal is that our 10Gb drivers are coming up and Alex Aizman fromS2io just informed me that we are pushing close to 7.3Gbps traffic on a v20z (with 2 x 1.6 Ghz Opterons) with more than 20% CPU to spare. We haven't even ported the driver to the high performance Nemo framework or enabled any hardware features as yet. So I am expecting a huge upside in next 2-3 months as the driver gets ported to Nemo (Paul and Yuzo should tell you more about Nemo sometime soon).
The improvements are not restricted to TCP only. We are doing a FireEngine followup for UDP which improves Tibco benchmark by 130% and Volano Mark benchmark by 30%. The customer tells us that we are outperforming RHEL AS3 by almost 15% on the same hardware. Adi et. al. can add some more details about UDP performance.
And the big killer features on Opterons, you can run 64bit Oracle or webserver on 64bit Solaris to take advantage of the bigger VM space but leave bulk of your apps to be 32bits which run unchanged.
I am not claiming the best performing OS title (atleast not yet!) for Solaris 10, but guys, we are still ramping up! Every new project going in Solaris is now delivering double digits performance improvements (FireEngine architecture has opened the door) and soon I will claim that title :) I must add that all these gains come on the same hardware without application needing to change at all. Just get the latest Solaris Express and see it for yourself.
And BTW, most of us at SUN are really pretty friendly towards Linux. Sure we compete in a good natured way. And Tom did hit the nail on the head regarding why people at SUN don't like Red Hat - Its really has to do with them having transformed free Linux into a not so free Linux.
(2004-10-12 01:19:43.0)
Permalink

Thursday October 07, 2004
Thanks for the interest. More performance has been ordered! Wow! Judging from the number of email I received and the interest in network performance in general, looks like real people also read these blogs (other than robots and crawlers). I really appreciate the interest. Keep those emails coming as well and I will be more diligent in updating these logs frequently with interesting stuff and updates. As requested by most of you, I have placed a quaterly recurring order for more performance from getmemoreperf.com. The order tracking number is "sunay at sun dot com" :)
(2004-10-07 11:27:36.0)
Permalink

Monday October 04, 2004
When will you have enough performance? For someone who never had a web-page, this blog business is really
frightening so bear with me if I seem like a novice. I wonder if
someone actually reads these pages or its just the robots, crawlers
and zombies generating the hits ;^) Anyway since Carol (our PM) thinks
this is useful medium to tell people outside Sun what I am thinking
instead of them finding out when the product actually ships, here
goes.
My name is Sunay Tripathi and I am a Sr. staff Eng. in Solaris
Networking and security technologies. Yes, we are the people who make
the 'Net' work in 'Network is the computer'. I also go by as the
architect of FireEngine, the new TCP/IP stack in Solaris 10 for people
who have tried Solaris 10 already and are pretty happy with the
performance (which is most).
So what am I working on these days - well I hear 10Gb is
happening. And I also hear that 10Gb is not enough. People are wanting
20-30Gbps bandwidth coming into 4 CPU opteron blades and still have
meaningful processing power left!! Well, you do that and watch the
interrupts go up like crazy and the system behave in more twisted ways
than you can imagine and trust me, its not nice. But FireEngine comes
to the rescue. We can tame the interrupts and do exactly what people
want. I'll tell you the details some other day unless John Fowler can
beat me to it by blogging soon.
Fairness and security is something that keeps me awake these days. A
large section of customers tell me that they see 'http' literally
disappearing in next 3-5 years and everything will be 'https' (SSL)
and they don't want to sacrifice CPU just doing crypto and they don't
want crypto to overwhelm rest of the traffic. Well, OK, they said QOS
but what they actually meant was fairness without any guarantees. I am
hard pressed to see why CNN will go 'https' but they do have a point -
Yahoo mail should really be SSL protected by default!! So I am building
fairness as part of the architecture instead of another add-on
layer.
So let me tell you what else do I do other than designing and writing
code. I like to hang out with my old stanford and IIT buddies who keep
telling me that how we can combine forces to build the next big thing
for internet (some day). I also love watching my 11 months old learn
to walk. He is already hooked on to my workstation and has his own
desktop now. Not surprising given that he sees his Mom and Dad spend
80% of their waking hours on these things. But what he really wants is
my Acer Ferrari laptop running 64bit Solaris and I tell him dream on
buddy :) My other passion is fast cars (after fast code) and
Taekwando. I am a black belt and used to practice with Stanford
Taekwando. Had a string of injuries last year which has kept me away
but I have started training again and will be back soon.
Well, thats who I am. But let me tell you the real reason why I am
doing this (apart from the fact even Sin-yaw and the rest of the perf
team has a blog) - I actually want to hear back from you guys. Tell
me what latest and greatest thing you are working on or dreaming off
and how Solaris can make it happen for you. Not sure how the feedback
thing works on this blog but you can always drop me a direct
email. The address is pretty simple - sunay at sun dot com. I would
also love to hear your opinions if you already tried Solaris 10.
And as for when will you have enough performance? The answer is never!
(2004-10-04 00:30:20.0)
Permalink
|