Solaris has had a feature to increase network availability called IP Multipathing (IPMP). Initially it required a test address on every data link in an IPMP group, where the test addresses were used as the source IP address to probe network elements for path availability. One of the benefits of probe-based failure detection is that it can extend beyond the directly connected link(s), and verify paths through the attached switch(es) to what typically is a router or other redundant element to provide available services.

Having one IP address (whether a public or a private, non routable) per data link and also the separate address(es) for the application(s) turns out to be a lot of addresses to allocate and administer. And since the default of five probes spaced two seconds apart meant a failure would take at least ten (10) seconds to be detected, something more was needed.

So in the Solaris 9 timeframe the ability to also do link based failure detection was delivered. It requires specific NICs whose driver has the ability to notify the system that a link has failed. The Introduction to IPMP in the Solaris 10 Systems Administrators Guide on IP Services lists the NICs that support link state notification. Solaris 10 supports configuring IPMP with only link based failure detection.

global# more /etc/hostname.bge[12]
::::::::::::::
/etc/hostname.bge1
::::::::::::::
10.1.14.140/26 group ipmp1 up
::::::::::::::
/etc/hostname.bge2
::::::::::::::
group ipmp1 standby up
On system boot, there will be an indication on the console that since no test addresses are defined, probe-based failure detection is disabled.

Apr 10 10:57:20 in.mpathd[168]: No test address configured on interface bge2; disabling probe-based failure detection on it
Apr 10 10:57:20 in.mpathd[168]: No test address configured on interface bge1; disabling probe-based failure detection on it
Looking at the interfaces configured,
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 10.1.14.140 netmask ffffffc0 broadcast 10.1.14.191
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
bge2: flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 0 index 4
        inet 0.0.0.0 netmask 0
        groupname ipmp1
        ether 0:3:ba:e3:42:8d
you will notice that two of the three interfaces have no address (0.0.0.0). Also, the data address is on a physical interface on bge1. At the same time bge2 has the 0.0.0.0 address. On the failure of bge1,
Apr 10 14:34:53 global bge: NOTICE: bge1: link down
Apr 10 14:34:53 global in.mpathd[168]: The link has gone down on bge1
Apr 10 14:34:53 global in.mpathd[168]: NIC failure detected on bge1 of group ipmp1
Apr 10 14:34:53 global in.mpathd[168]: Successfully failed over from NIC bge1 to NIC bge2


global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=19000802<BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED> mtu 0 index 3
        inet 0.0.0.0 netmask 0
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8d
bge2:1: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 10.1.14.140 netmask ffffffc0 broadcast 10.1.14.191
the data address is migrated onto bge2:1. I find this a little confusing. However, I don't know any way around it on Solaris 10. The IPMP Re-architecture makes this a lot easier!

Using Probe-based IPMP with non-global zones

Configuring a shared IP Instance non-global zone and utilizing IPMP managed in the global zone is very easy.

The IPMP configuration is very simple. Interface bge1 is active, and bge2 is in stand-by mode.

global# more /etc/hostname.bge[12]
::::::::::::::
/etc/hostname.bge1
::::::::::::::
group ipmp1 up
::::::::::::::
/etc/hostname.bge2
::::::::::::::
group ipmp1 standby up
My zone configuration is:
global# zonecfg -z zone1 info
zonename: zone1
zonepath: /zones/zone1
brand: native
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
net:
        address: 10.1.14.141/26
        physical: bge1
Prior to booting, the network configuration is:
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone zone1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8d
After booting, the network looks like this:
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone zone1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        zone zone1
        inet 10.1.14.141 netmask ffffffc0 broadcast 10.1.14.191
bge2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8d

So a simple case for the use of IPMP, without the need for test addresses! Other IPMP configurations, such as more than two data links, or active-active, are also supported with link based failure detection. The more links involved, the more test addresses are saved with link based failure detection. Since writing this entry I was involved in a customer configuration where this is saving several hundred IP address and their management (such as avoiding duplicate address). That customer is willing to forgo the benefit of probes testing past the local switch port.

Steffen

Comments:

Does link based failure detection IPMP support Active/Active?

Posted by Herman on April 26, 2009 at 06:46 PM EDT #

Yes, active-active works whether using probe or link based failure detection.

Posted by Steffen Weiberle on April 27, 2009 at 07:37 AM EDT #

Does this link based IPMP work inside the openstorage (amber road) aktty -> shell? It's a underlying Solaris NV / OSOL mixture, this should be possible..

Posted by Tom Stocker on May 27, 2009 at 08:24 AM EDT #

Hi Tom, this is a core feature of IPMP in Solaris 10 and in Nevada (SX-CE) and OpenSolaris. As I understand it, the GUI for the Sun Storage 7000 Unifed Storage System only supports configuring with probe test addresses. If you do anything via the shell, I don't know the effect it has on the GUI. Note that with probe-based failure detection, the interface will also fail over if the link fails. So you get quick failover and failback, while also allowing probes to test beyond the local switch port.

Steffen

Posted by Steffen Weiberle on May 27, 2009 at 03:57 PM EDT #

I am also trying to get IPMP working on an amber road (7110). It is directly attached to a v440 and a v240. No switches involved so probe based will not work for me. I have done this with solaris boxes with link based with no problem, however like Steffen says it is not a feature of the GUI nor the CLI as far as i can tell.

Is there any known SUN supported way to accomplish this?

Thanks

-Pete

Posted by Pete Sorensen on July 01, 2009 at 11:59 AM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by stw