Configuring a system for Virtual IO failover

  1. Configure the primary service domain
  2. A system configured for virtual IO failover, consists of two (primary & backup) service domains with direct IO access, and client domain(s) configured to be serviced by both service domains. In order to create a two service domains with direct access to physical IO, the PCI-E bus in the system has to be split and assigned to both service domains. In this example, we will start with the "factory-default" configuration, and create and configure the 'primary' domain such that it only uses PCI-E bus - bus_a. The 'primary' service domain like a typical service domain will also run a virtual disk server and virtual network switch. The VIO services will use the physical disk and network adapter available on PCI-E bus_a.

    Note: Before removing bus_b from the primary domain, ensure that both bus_a and bus_b contain disk and network devices for use by the respective service domains.
    # Specify CPU amd Memory resources
    ldm set-vcpu 4 primary
    ldm set-mem 4G primary
    # Remove bus_b from Primary domain
    ldm rm-io bus_b primary
    # Add vds, vsw, vcc services
    ldm add-vds primary-vds0 primary
    ldm add-vsw net-dev=e1000g0 primary-vsw0 primary
    ldm add-vcc port-range=5000-5100 primary-vcc0 primary
    
    Save the new configuration to the SP, and power-cycle the system so that new config will take effect.
    # Save configuration to SP
    ldm add-spconfig initial
    
    After power cycle, ensure that the system is running the virtual NTS service so that it has provide access to the other domain consoles.
    # Start the virtual NTS service
    svcadm enable vntsd
    
  3. Configure the alternate service domain
  4. Create and configure the alternate service domain just like any other domain. Instead of creating virtual disk and network devices, the domain is also configured like the primary domain with disk server and virtual switch services that use the physical devices available on its PCI-E bus_b.
    # Create domain
    ldm create alternate
    # Specify CPU amd Memory resources
    ldm set-vcpu 4 alternate
    ldm set-mem 4G alternate
    # Add bus_b from alternate domain
    ldm add-io bus_b alternate
    # Add vds, vsw services
    ldm add-vds alternate-vds0 alternate
    ldm add-vsw net-dev=e1000g3 alternate-vsw0 alternate
    # Bind and start 'alternate' service domain
    ldm bind alternate
    ldm start alternate
    # Connect to the 'alternate' service domain console
    telnet localhost 5000
    
  5. Create the client ldg1 domain
  6. # Create domain
    ldm create ldg1
    # Specify CPU amd Memory resources
    ldm set-vcpu 4 ldg1
    ldm set-mem 4G ldg1
    
    # Export disk image files as virtual disks from both virtual disk servers # Note: Ensure that the file image is available on both service domains ldm add-vdsdev /ldomspool/ldg1/bootdisk.img vol1@primary-vds0 ldm add-vdsdev /ldomspool/ldg1/bootdisk.img vol1@alternate-vds0
    # Add two virtual disks - each one is connected to a diff vDisk service # Note: Configure the vdisk with a timeout of 1 sec so that it will return # an error if it cannot establish a connection with the vdisk server # after the specified period. ldm add-vdisk timeout=1 vdisk1 vol1@primary-vds0 ldg1 ldm add-vdisk timeout=1 vdisk2 vol1@alternate-vds0 ldg1
    # Add two vnet devices - each one is connected to a diff service ldm add-vnet vnet1 primary-vsw0@primary ldg1 ldm add-vnet vnet2 alternate-vsw0@alternate ldg1
    # Configure both disks as boot device, and enable auto-boot? ldm set-var boot-device="/virtual-devices@100/channel-devices@200/disk@0 \ /virtual-devices@100/channel-devices@200/disk@1" ldg1 ldm set-var auto-boot\?=true ldg1
    # Bind and start 'ldg1' client domain ldm bind ldg1 ldm start ldg1 # Connect to the 'ldg1' clien domain console telnet localhost 5001

  7. Add support for boot disk mirroring using SVM
    • The client domain consists of two virtual disks backed by files exported by the virtual disk server. Ensure that each virtual disk consists of atleast three partitions: root (s0), swap (s1) and space for SVM metadevice state databases (s3).
    • # prtvtoc /dev/dsk/c0d0s2
      * /dev/dsk/c0d0s2 partition map
      *  .....
      *  .....
      *  .....
      *                          First     Sector    Last
      * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
             0      2    00          0  14336400  14336399
             1      3    01   14336400   1051200  15387599
             2      5    00          0  16776000  16775999
             3      0    00   15387600   1386600  16774199
      
    • Make sure that both drives look the same. We can do this by slicing the second drive in the same way as our first drive, the master. There is no need to newfs the second drive slices here, that will automaticaly done by the mirror syncing later.
    •      
      prtvtoc /dev/rdsk/c0d0s2 | fmthard -s - /dev/rdsk/c0d1s2 
      
      Note: If your drives of a different geometry you need to create the slices on the second disk by hand and not use the command above. In such a case, make sure the second disk has identical slice sizes or the bigger slices.

    • Create the metadevice state database (-f creates the initial state database, -a attaches the new database device, -c3 tells it to keep 3 copies of the database and weĠre creating the databases on slice 3 of both disks). You can see the results with metadb -i. metadb -i is a very handy tool to determine the state of your metadb states.
    • metadb -a -f -c3 /dev/rdsk/c0d0s3 /dev/rdsk/c0d1s3
      
    • Now we go to setup the initial metadevices. Like metadb, metainit must be forced with -f, but this time not because its the initial creation, but because we work on mounted filesystems. So, here we create a 1 way concatenation of our actual slices and form the needed submirrors.
    • metainit -f d10 1 1 c0d0s0
      metainit -f d11 1 1 c0d0s1
      metainit -f d20 1 1 c0d1s0   
      metainit -f d21 1 1 c0d1s1    
      
    • Initialize the mirroring using the metainit command. The -m tells SVM that we want to build a mirror with the name in the first column (d0 & d1) and consisting of the submirror in the third column. We now have a one way mirror of our system drive, but its not active yet.
    • metainit d0 -m d10
      metainit d1 -m d11
      
    • Setup the root device. Save the current /etc/vfstab and /etc/system files so that can be used to restore the system to a pre-SVM configuration if necessary.
    • cp /etc/vfstab /etc/vfstab.saved 
      cp /etc/system /etc/system.saved
      metaroot d0
      
    • Modify /etc/vfstab and change the default root and swap partitions to use the meta device.
    • #/dev/dsk/c0d0s1        -       -       swap    -       no      -
      /dev/md/dsk/d1          -       -       swap    -       no      -
      #/dev/dsk/c0d0s0  /dev/rdsk/c0d0s0 /    ufs     1       no      -
      /dev/md/dsk/d0  /dev/md/rdsk/d0 /       ufs     1       no      -
      
    • Optional: Making sure we can boot in case of disk failure.
      USE ONLY WHERE DATA INTEGRETY IS LESS IMPORTANT THEN SERVICE AVILABILITY. THIS CAN LEAD TO FILE CORUPTION.
      To make sure we can boot in case a disk fails, we need to tell the kernel to ignore the quota on metadbs, otherwise we can't boot in a two disk setup, because we can't never fullfill the requirments. For that we add the following to /etc/system ....
      echo "set md:mirrored_root_flag=1" >> /etc/system
      
    • Reboot the system. Following reboot, the system will come up for the first time on the mirror. If all went well we are up and running in a few seconds.
    • Create the mirrors by hooking up the second drive so that we have mirrored slices. This will take considerable amount of time as the process will sync both disks. Use metastat to check on the progress of the syncing
    • metattach d0 d20
      metattach d1 d21
      
    • Since swap is now located on a metadevice, we want to tell the system to use the meta device. Also since resyncing swap at boot is just wasted time, we disable it.
      dumpadm -d /dev/md/dsk/d0
      metaparam -p 0 /dev/md/dsk/d0
      
    • Run metastat to ensure that that all the required mirroring is setup properly.
      # metastat
      
      d1: Mirror
          Submirror 0: d11
            State: Okay         
          Submirror 1: d21
            State: Okay         
          Pass: 1
          Read option: roundrobin (default)
          Write option: parallel (default)
          Size: 1051200 blocks (513 MB)
      
      d11: Submirror of d1
          State: Okay         
          Size: 1051200 blocks (513 MB)
          Stripe 0:
              Device   Start Block  Dbase        State Reloc Hot Spare
              c0d0s1          0     No            Okay   No  
      
      
      d21: Submirror of d1
          State: Okay         
          Size: 1051200 blocks (513 MB)
          Stripe 0:
              Device   Start Block  Dbase        State Reloc Hot Spare
              c0d1s1          0     No            Okay   No  
      
      
      d0: Mirror
          Submirror 0: d10
            State: Okay         
          Submirror 1: d20
            State: Okay         
          Pass: 1
          Read option: roundrobin (default)
          Write option: parallel (default)
          Size: 14336400 blocks (6.8 GB)
      
      d10: Submirror of d0
          State: Okay         
          Size: 14336400 blocks (6.8 GB)
          Stripe 0:
              Device   Start Block  Dbase        State Reloc Hot Spare
              c0d0s0          0     No            Okay   No  
      
      
      d20: Submirror of d0
          State: Okay         
          Size: 14336400 blocks (6.8 GB)
          Stripe 0:
              Device   Start Block  Dbase        State Reloc Hot Spare
              c0d1s0          0     No            Okay   No  
      
      
      Device Relocation Information:
      Device   Reloc  Device ID
      c0d1   No       -
      c0d0   No       -
      
      
  8. Add support for IP failover using IPMP
    • The client domain consists of two virtual network devices. We will configure IP multipathing across these two devices.

    • Obtain 4 IP addresses in the same subnet. In Multipathing there are 2 fixed (or private) address and 2 floating (or public) addresses. The 2 fixed addresses are referred to as internal. One is assigned directly to each network interface. The 2 floating addresses are the external ones. If one of the NICs detects link failure, the address tied to that NIC fails over to the working NIC. When the NIC comes back up, the address fails back to its original home. Determine right now which will be your internal IPs and which will be your external. Edit /etc/hosts with your 4 IPs. example:
                10.6.90.10           ldg1-vnet0
                10.6.90.11           ldg1-vnet1
                10.6.90.100          ldg1-ext0 ldg1-dummy
                10.6.90.101          ldg1-ext1 ldg1.eng.sun.com
      
    • Setup hostname.* files. You can pretty much copy these two files as is and just modify them slightly to fit your naming conventions in the same way that you setup the /etc/hosts file above.
      /etc/hostname.vnet0
      ldg1-vnet0 netmask + broadcast + group production deprecated -failover up \
         addif ldg1-ext0 netmask + broadcast + failover up
      
      /etc/hostname.vnet1
      ldg1-vnet1 netmask + broadcast + group production deprecated -failover up \
         addif ldg1-ext1 netmask + broadcast + failover up
      
    • Adjust failover detection timeouts. /etc/default/mpathd has a default failover timeout of 10000. This means that it should take 10 at most seconds to detect and successfully fail over an interface. I like to configure this to 1000. In my working with IP multipathing, numbers below that seem to result in excessive messages about that number being too low and lots of messages in syslog. If you change this file, you will have to restart mpathd. Now is as good a time as any to either restart mpathd or start it for the first time if it is not already running.

    • Configure any additional routing or default gateways as appropriate. For the above IP addresses, the following router was added as the default gateway.
    • route add default 10.6.90.1
      
    • Make it active. This is the easy part. Copy and paste your /etc/hostname.vnet* files to ifconfig commands as below:
      /sbin/ifconfig vnet0 ldg1-vnet0 netmask + broadcast + \
         group production deprecated -failover up \
         addif ldg1-ext0 netmask + broadcast + failover up
      /sbin/ifconfig vnet1 ldg1-vnet1 netmask + broadcast + \
         group production deprecated -failover up \
         addif ldg1-ext1 netmask + broadcast + failover up
      
  9. All Done - Enjoy!!!.
    The client domain is now resilient to the reboot of either the primary or the alternate service domain. Disk and network access in ldg1 should failover on a service domain reboot. Since the metadisk in the client domain is mirrored, an explicit resync of the sub-mirrors is required following each service domain reboot.
  10. # Repair the root and swap meta devices by adding the sub-mirrors
    metareplace -e d0 c0d0s0 
    metareplace -e d1 c0d0s1