Keep what I found in Solaris Xiang's Blog

Wednesday May 13, 2009

Since Clearview/IPMP was putback in Solaris Nevada Build 107 in this January, we've got the new experience with IPMP configuration and management in Solaris. This article will give you a brief introduction to IPMP concepts and the new configuration way of IPMP in Solaris.

This article refers to the high level design document written by Meem .  If you have questions or need more informations on Clearivew/IPMP project, the following links would help you:

Clearview Project page at opensolaris.org
Answers to PSARC's "20 questions": Inception | Commitment
IPMP Internals Overview Presentation

Administrator documentation (for docs.sun.com): IPMP Administrative Overview | IPMP Configuration Tasks | Draft Manpages

IPMP Overview

IPMP is a popular Solaris-specific mulitipathing technology which operates at the IP layer. Specifically, IPMP attempts to insulate IP-based networking applications from changes to the underlying networking hardware on the system and the system's connectivity to the network as a whole. For instance, if two networking interfaces are put into an IPMP group, then the failure or removal of either interface from the system will not affect applications using the IP addresses hosted on those networking interfaces. Further, the inbound and outbound networking traffic using those IP addresses will be load spread across the networking interfaces in the group, providing greater network utilization.

Why we use IPMP?

I summarized the answer in the following three reasons:

- Prevent limitations using an ordinary IP interface when it sufferring offline for hardware maintenance, unexpected failure.
- Make a more available connection to network. With IPMP, network connectivity is always available, provided that a minimum of one interface is usable for the group.
- Improve overall network performance by automatically spreading out outbound network traffic accross the set of interfaces in the IPMP group. IPMP also indirectly controls inbound load spreading by performing source address selection for packets whose IP source address was not specified by the application.

Active-active IPMP Configuration

Now, let's see how to configure IPMP interfaces. Here we use host_a and host_b for presentation. Where host_a is the machine we uses for IPMP interfaces creation and host_b is the machine to place a probe target. Now, what is probe target? Let's keep this question and I will explain it later in this article.

Usually, we use active-active IPMP configuration for network connection. Which means we have *all* interfaces in IPMP group activated to enhance network load and connectivity. Let's see how to create it. First create an IPMP interface:

host_a # ifconfig ipmp0 ipmp group a

This is the explicit way to create an IPMP group by ifconfig(1M). The interface name is "ipmp0" and the group name is "a". Next, I will add two underlying interfaces to the new created IPMP interface.

See which interfaces are available on host_a for IPMP configuration:

host_a # dladm show-link
LINK        CLASS    MTU    STATE    OVER
bge2        phys     1500   unknown  --
bge0        phys     1500   up       --
bge1        phys     1500   unknown  --
bge3        phys     1500   unknown  --

Note that we can use many types of link for IPMP group, for example, normal physical links, vlan links, renamed link etc. However, the important thing we need to care about is all links assigned to an IPMP group must be connected to the same switch or in the same LAN. Now, we put physical link bge1 and bge2 into group a.

host_a # ifconfig bge1 plumb group a
host_a # ifconfig bge2 plumb group a

Let's use "ipmpstat -g" to verify the state of ipmp0:

host_a # ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       a           failed    --        [bge2 bge1]

The ipmpstat(1M) tool is introduced as the principal tool to obtain informations about IPMP groups as a part of Clearview project. This command provides information about all aspects of your IPMP group configuration.  ipmpstat with "-g" option shows the status of each ipmp group exist in the system. Now, ipmp0 is marked as "failed" for there is no active data addresses.

Now, let's add two data addresses to ipmp0 using ifconfig(1M):

host_a # ifconfig ipmp0 192.168.30.100/24 up
host_a # ifconfig ipmp0 addif 192.168.30.101/24 up
Created new logical interface ipmp0:1

Then bring up two two underlying interfaces to take over the data addresses, and verify them using "ipmpstat -a"(or use "ipmpstat -an" to prevent hostname resolve) :

host_a # ifconfig bge1 up
host_a # ifconfig bge2 up
host_a # ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
192.168.30.101            up     ipmp0       bge2        bge2 bge1
192.168.30.100            up     ipmp0       bge1        bge2 bge1


"ipmpstat -a" shows the data addresses status in the system. Please note that the outbound network loads are shared on both bge1 and bge2. Let me explain more about the concepts of data address

Each data address refers to an IP address that can be used as the source or destination address for data. Data addresses are part of an IPMP group and can be used continuously, provided that one interface in the group is functioning. For example, if we use one of the data address in the IPMP group for telnet connection, the connection will not be broken as long as there are one or more group's underlying interfaces is active, no matter what have happened on other underlying interfaces(such as interfaces subject to a hardware fault or somebody have cut down the cable connected to that interfaces) in the same group.

In previous IPMP implementation, data addresses were hosted on the underlying interfaces of an IPMP group. In the current implementation, data addresses are hosted on the IPMP interfaces.

Now we can use the data addresses for traffic transferring. But in order to provide the full functionality of IPMP, we should set test addresses as well. Let me address more concepts about test address here.

Each test address refers to an IP address that must be used as the source or destination address for probes, and must not be used as a source or destination address for data traffic. Here a probe refers to an ICMP packet sent by in.mpathd daemon. This probe is used to test the send and receive path of a given interface. A probe packet uses an IPMP test address as its source address and a probe target(usually a router) as its destination. Where a probe target is an endpoint to send ICMP reply for probes, usually it is used to verify the IPMP interface's connection to router so that the traffic from/to IPMP interfaces can go through another machine of the same network and other networks.

Remember what host_b is used for? OK, let's plumb a probe target on host_b, be care that the interface used as probe target and the underlying interfaces of the IPMP group we just created are typically connected to the same switch.

host_b # ifconfig e1000g1 plumb 192.168.30.199/24 up
host_b # ifconfig e1000g1
e1000g1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 6
        inet 192.168.30.199 netmask ffffff00 broadcast 192.168.30.255
        ether 0:14:4f:82:59:61

Let's turn back to the topic of test address. Test addresses are associated with an underlying interface(that is different from data addresses!). These addresses are designated as NOFAILOVER so that they remain on the underlying interface even if the interface fails to facilitate repair detection. Because test addresses must be designated as DEPRECATED to keep the system from using them as source addresses for data packets. To assign test addresses on each underlying interfaces, we can execute the following commands:

host_a # ifconfig bge1 -failover
host_a # ifconfig bge2 -failover
host_a # ifconfig bge1 192.168.30.200/24
host_a # ifconfig bge2 192.168.30.201/24

The purpose we first use "-failover" option is to stop the test addresses moving to IPMP interfaces. Remember that test addresses can only be set on underlying interfaces. Let's verify test addresses by "ipmpstat -t", we also can verify the probe target by this command:

host_a # ipmpstat -t
INTERFACE   MODE      TESTADDR            TARGETS
bge2        multicast 192.168.30.201      192.168.30.199
bge1        multicast 192.168.30.200      192.168.30.199


Failover and Failback Operations

Now, the active-active IPMP configuration is completed. This is a typical and reliable setting for networking traffic purpose. The connection will be maintained if you have at least one interface alive, which means any failures on one of the interfaces will not affect the connection. When we introduce failures or mark down on underlying interfaces of an IPMP group. IPMP will automatically perform "failover" operation, once the failed interface is repaired, IPMP will automatically perform "failback" operations. To be simple, I will use "ifconfig <interface> down/up" to demonstrate "failover" and "failback". Next, let's use the previously configured system and mark bge1 in ipmp0 down and see what happens:

host_a # ifconfig bge1 down
host_a # ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       a           degraded  10.00s    bge2 [bge1]

Now, we can see ipmp0 have been marked as "degraded" for one of its underlying interface bge1 is down(bge1 embraced in []). At this time, since bge2 is up, let's check with "ipmpstat -an" to see if the data addresses of ipmp0 are failed over to bge2.

host_a # ipmpstat -an
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
192.168.30.101            up     ipmp0       bge2        bge2
192.168.30.100            up     ipmp0       bge2        bge2

Now, let me show you the "failback", it happens when an underlying interface is repaired(come back to live):

host_a # ifconfig bge1 up
host_a # ipmpstat -an
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
192.168.30.101            up     ipmp0       bge2        bge2 bge1
192.168.30.100            up     ipmp0       bge1        bge2 bge1

Active-standby IPMP Configuration

Now, let's switch to active-standby IPMP configuration.

Active-standby IPMP configuration provides you at least one backup underlying interface for connection reliablity. When some active interfaces subject to damage such as hardware fault, the backup interfaces will be auto activated and data addresses will be bounded to the backup interfaces. Once the failed interfaces are repaired, the data addresses will be failed back to the repaired interface and the standby interface will be inactive again. If you use data address for network traffic or other applications, the failover and failback operations are transparent. And the active and inactive operations of standby interfaces have no effect or performance decrease to the network applications. Now, let's keep on the demonstration.

Now that we have bge1 and bge2 configured in ipmp0, let's make bge2 as a standby(backup) interface:

host_a # ifconfig bge2 standby
host_a # ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge2        no      ipmp0       is-----   up        ok        ok
bge1        yes     ipmp0       --mb---   up        ok        ok
host_a # ipmpstat -an
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
192.168.30.101            up     ipmp0       bge1        bge1
192.168.30.100            up     ipmp0       bge1        bge1

Here "-i" option of ipmpstat means it shows the information of IPMP underlying interfaces. Now bge2 have been marked as i(inactive) and s(standby). And all data addresses are moved to bge1. Let's mark bge1offline and see what happens:

host_a # if_mpadm -d bge1
host_a # ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge2        yes     ipmp0       -smb---   up        ok        ok
bge1        no      ipmp0       -----d-   up        disabled  offline
host_a # ipmpstat -an
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
192.168.30.101            up     ipmp0       bge2        bge2
192.168.30.100            up     ipmp0       bge2        bge2

Here if_mpadm(1M) is the tool for administering interfaces in an IP multipathing group. We can see the (i)inactive flag on bge2 have been removed and all data addresses have been moved to bge2.

Now the IPMP configuration demonstration is finished. Let's destroy the IPMP interface we created in this demonstration. First, unplumb the underlying interfaces, then unplumb the IPMP interface:

host_a # ifconfig bge1 unplumb
host_a # ifconfig bge2 unplumb
host_a # ifconfig ipmp0 unplumb
host_a # ipmpstat -g

Remember that we can not unplumb an IPMP interface before we removed all the underlying interfaces in the IPMP group.

(END)

Comments:

Great article. I do have a question. Where in the active-active configuration did you specify that the 192.168.30.199 was your target probe ?? I can see you configured host_b with this IP address but I do not see the command used on host_a to configure host_b as the target probe. Your ipmpstat -t command shows that the target is configured, but the article does not show how the configuration is done. Please clarify.

Posted by Jairo Cardozo on August 28, 2009 at 02:32 AM CST #

To specify a target probe, you just plumb up an interface(such as 192.168.30.199) in the same subnet on another machine, the machine(such as host_b) usually acts as router so that the traffic through the IPMP interface can be forwarded out the subnet by the machine.
For IPv6, the machine on which the probe target configured should configure neighbor discovery server, it can assign global address on IPMP interface so that the machine(act as router) can forward the traffic using IPMP data address outside the subnet.

Posted by xiang zhou on September 10, 2009 at 11:49 PM CST #

Hi Xiang, thanks for the article. You write: "the important thing we need to care about is all links assigned to an IPMP group must be connected to the same switch or in the same LAN"
I have a blade enclosure, with switches on the same fabric, but not stacked. These switches are connected to 1 bigger switch outside of the enclosure. I want to connect a sun storage 7110 to the enclosure, with 2 nic connected to 2 seperate enclosure switches using IPMP. (so that the enclosure switches are not a SPOF). Is that ok?
TIA, regard, Johan

Posted by Johan Hoeke on October 01, 2009 at 03:03 PM CST #

Post a Comment:
  • HTML Syntax: NOT allowed