Friday Mar 21, 2008
Friday Mar 21, 2008
Here's another example of Containers that can manage their own affairs.
Sometimes you want to closely manage the devices that a Solaris Container uses. This is easy to do from the global zone: by default a Container does not have direct access to devices. It does have indirect access to some devices, e.g. via a file system that is available to the Container.
By default, zones use NICs that they share with the global zone, and perhaps with other zones. In the past these were just called "zones." Starting with Solaris 10 8/07, these are now referred to as "shared-IP zones." The global zone administrator manages all networking aspects of shared-IP zones.
Sometimes it would be easier to give direct control of a Container's devices to its owner. An excellent example of this is the option of allowing a Container to manage its own network interfaces. This enables it to configure IP Multipathing for itself, as well as IP Filter and other network features. Using IPMP increases the availability of the Container by creating redundant network paths to the Container. When configured correctly, this can prevent the failure of a network switch, network cable or NIC from blocking network access to the Container.
As described at docs.sun.com, to use IP Multipathing you must choose two network devices of the same type, e.g. two ethernet NICs. Those NICs are placed into an IPMP group through the use of the command ifconfig(1M). Usually this is done by placing the appropriate ifconfig parameters into files named /etc/hostname.<NIC-instance>, e.g. /etc/hostname.bge0.
An IPMP group is associated with an IP address. Packets leaving any NIC in the group have a source address of the IPMP group. Packets with a destination address of the IPMP group can enter through either NIC, depending on the state of the NICs in the group.
Delegating network configuration to a Container requires use of the new IP Instances feature. It's easy to create a zone that uses this feature, making this an "exclusive-IP zone." One new line in zonecfg(1M) will do it:
zonecfg:twilight> set ip-type=exclusiveOf course, you'll need at least two network devices in the IPMP group. Using IP Instances will dedicate these two NICs to this Container exclusively. Also, the Container will need direct access to the two network devices. Configuring all of that looks like this:
global# zonecfg -z twilight zonecfg:twilight> create zonecfg:twilight> set zonepath=/zones/roots/twilight zonecfg:twilight> set ip-type=exclusive zonecfg:twilight> add net zonecfg:twilight:net> set physical=bge1 zonecfg:twilight:net> end zonecfg:twilight> add net zonecfg:twilight:net> set physical=bge2 zonecfg:twilight:net> end zonecfg:twilight>add device zonecfg:twilight:device> set match=/dev/net/bge1 zonecfg:twilight:net> end zonecfg:twilight>add device zonecfg:twilight:device> set match=/dev/net/bge2 zonecfg:twilight:net> end zonecfg:twilight> exitAs usual, the Container must be installed and booted with zoneadm(1M):
global# zoneadm -z twilight install global# zoneadm -z twilight bootNow you can login to the Container's console and answer the usual configuration questions:
global# zlogin -C twilight <answer questions> <the zone automatically reboots>After the Container reboots, you can configure IPMP. There are two methods. One uses link-based failure detection and one uses probe-based failure detection.
Link-based detection requires the use of a NIC which supports this feature. Some NICs that support this are hme, eri, ce, ge, bge, qfe and vnet (part of Sun's Logical Domains). They are able to detect failure of the link immediately and report that failure to Solaris. Solaris can then take appropriate steps to ensure that network traffic continues to flow on the remaining NIC(s).
Other NICs do not support this link-based failure detection, and must use probe-based detection. This method uses ICMP packets ("pings") from the NICs in the IPMP group to detect failure of a NIC. This requires one IP address per NIC, in addition to the IP address of the group.
Regardless of the method used, configuration can be accomplished manually or via files /etc/hostname.<NIC-instance>. First I'll describe the manual method.
# ifconfig bge1 plumb # ifconfig bge1 twilight group ipmp0 up # ifconfig bge2 plumb # ifconfig bge2 group ipmp0 upNote that those commands only achieve the desired network configuration until the next time that Solaris boots. To configure Solaris to do the same thing when it next boots, you must put the same configuration information into configuration files. Inserting those parameters into configuration files is also easy:
/etc/hostname.bge1: twilight group ipmp0 upThose two files will be used to configure networking the next time that Solaris boots. Of course, an IP address entry for twilight is required in /etc/inet/hosts.
/etc/hostname.bge2: group ipmp0 up
If you have entered the ifconfig commands directly, you are finished. You can test your IPMP group with the if_mpadm command, which can be run in the global zone, to test an IPMP group in the global zone, or can be run in an exclusive-IP zone, to test one of its groups:
# ifconfig -a ... bge1: flags=201000843If you are using link-based detection, that's all there is to it!mtu 1500 index 4 inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255 groupname ipmp0 ether 0:14:4f:f8:9:1d bge2: flags=201000843 mtu 1500 index 5 inet 0.0.0.0 netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b ... # if_mpadm -d bge1 # ifconfig -a ... bge1: flags=289000842 mtu 0 index 4 inet 0.0.0.0 netmask 0 groupname ipmp0 ether 0:14:4f:f8:9:1d bge2: flags=201000843 mtu 1500 index 5 inet 0.0.0.0 netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b bge2:1: flags=201000843 mtu 1500 index 5 inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255 ... # if_mpadm -r bge1 # ifconfig -a ... bge1: flags=201000843 mtu 1500 index 4 inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255 groupname ipmp0 ether 0:14:4f:f8:9:1d bge2: flags=201000843 mtu 1500 index 5 inet 0.0.0.0 netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b ...
As mentioned above, using probe-based detection requires more IP addresses:
/etc/hostname.bge1: twilight netmask + broadcast + group ipmp0 up addif twilight-test-bge1 \ deprecated -failover netmask + broadcast + up
/etc/hostname.bge2: twilight-test-bge2 deprecated -failover netmask + broadcast + group ipmp0 upThree entries for hostname and IP address pairs will, of course, be needed in /etc/inet/hosts.
All that's left is a reboot of the Container. If a reboot is not practical at this time, you can accomplish the same effect by using ifconfig(1M) commands:
twilight# ifconfig bge1 plumb twilight# ifconfig bge1 twilight netmask + broadcast + group ipmp0 up addif \ twilight-test-bge1 deprecated -failover netmask + broadcast + up twilight# ifconfig bge2 plumb twilight# ifconfig bge2 twilight-test-bge2 deprecated -failover netmask + \ broadcast + group ipmp0 up
Whether link-based failure detection or probe-based failure detection is used, we have a Container with these network properties:
Surely you could get almost the same level of reliability by moving the network configuration outside the zones? Only problem I've run into was zones not figuring out that an interface had switched over when the zones were shut down when it happened. (I'm not saying that using ip-instances isn't cool, just saying that there's still hope for those of us with many zones and not a pile of spare nics)
Posted by Mads on March 22, 2008 at 08:23 AM EDT #
How does one get a listing of all interface types that support LInk-based failover detection?
Barring said listing, is there a command that indicates a physical interfaces ability to support link based failover.
Best,
Posted by John Kotches on March 24, 2008 at 12:36 PM EDT #
A list of NICs that support link-based detection in Solaris 10 can be found at: http://docs.sun.com/app/docs/doc/816-4554/6maoq0283?l=en&a=view . Unfortunately, there isn't a Solaris command which will determine if an existing NIC supports LD.
Posted by JeffV on March 24, 2008 at 02:25 PM EDT #
Mads is absolutely correct: IP Multipathing can be configured for a shared-IP zone from the global zone, or for an exclusive-IP zones by the zone itself. Shared-IP zones can share one or more NICs and the global zone performs all of the network configuration steps. The level of reliability is the same in either case.
I was focusing on exclusive-IP zones in my examples.
Also, look forward to Virtual NICs (VNICs). This feature will enable the global zone to create VNICs which can then be used in the global zone, for shared-IP zones, or exclusive-IP zones. An exclusive-IP zone can use its VNIC(s) in the same way that it uses physical NICs.
For more information on VNICs, see Project Crossbow at opensolaris.org: http://www.opensolaris.org/os/project/crossbow/ . That project is in beta test right now.
Posted by JeffV on March 24, 2008 at 05:38 PM EDT #
Awesome (said quite sarcastically)...
The common current NICs we're using (e1000g / nxge) aren't on the list. So either list isn't current, or they aren't supported.
It's nice that you can do exclusive-ip in a zone with link-based failure, but there is the real possibility to run out of physical interfaces if you are following supported NIC specs for servers. This is especially true if you are using NAS for your big storage.
VNICs look promising, but when will the feature make it into the mainline Solaris code? No inside information here, but I'm guessing u6 or later.
Posted by John Kotches on March 25, 2008 at 11:14 AM EDT #