How 'suite' it is... - Jackie Gleason The "Availability Suite"

Tuesday Jun 12, 2007

A question was recently posted in zfs-discuss@opensolaris.org on the subject of AVS replication vs ZFS send receive for odd sized volume pairs, and does the use of AVS make it all seamless? Yes, the use of Availability Suite makes it all seamless, but only after AVS is initially configured.


Unlike ZFS, which was designed and developed to be very easy to configure, Availability Suite requires explicit and somewhat overly detailed configuration information to be setup, and setup correctly for it to work seamlessly.


Recently I worked with one of Sun's customers involving the configuration of two Sun Fire x4500 servers, a remarkably performing system, being a four-way x64 server, with the highest storage density available, being 24
TB in 4U of rack space. The customer's desired configuration was simple, two servers, in an active - active, high availability configuration, deployed 2000 km apart, with each system acting as the disaster recovery system for the other. Replication needed to be CDP, Continuous Data Protection, offering 24/7 by 365, in both directions, and once setup correctly, CDP would work seamlessly, and be a lights out operation.


Each x4500, or Thumper, comes with 48 disks, two of which will be used as the SVM mirrored system disk, (can't have a single point of failure), leaving 46 data disks. Since each system's configuration will be the disaster recovery system for the other site, this leaves 23 disks available on each system as data disks. The decision as to what type of ZFS provided redundancy, the number of volumes in each pool, if compression or encryption is enabled, is not a concern to Availability Suite, since whatever vdevs are configured, the ZFS volume and file metadata will get replicated too.


For testing out this replicated ZFS on AVS scenario in on my Thumper, here are the steps followed:


1). Take one of the 46 disks that will eventually be placed in the ZFS storage pool. Use the ZFS zpool utility to correctly format this disk, and action which will create a EFI labeled disk, with all available blocks in slice 0. Then delete the pool.


# zpool create -f temp c4t2d0; zpool destroy temp

2). Next run the AVS 'dsbitmap' utility to determine the size of an SNDR bitmap to replicate this disk's slice 0, saving the results for later use.

# dsbitmap -r /dev/rdsk/c4t2d0s0 | tee /tmp/vol_size
Remote Mirror bitmap sizing

Data volume (/dev/rdsk/c4t2d0s0) size: 285196221 blocks
Required bitmap volume size:
  Sync replication: 1089 blocks
  Async replication with memory queue: 1089 blocks
  Async replication with disk queue: 9793 blocks
  Async replication with disk queue and 32 bit refcount: 35905 blocks
Remote Mirror bitmap sizing

Selection will be for either synchronous replication with memory queues. Other replication types also work with ZFS, but synchronous replication is best, is network latency is low.

3). To assure redundancy of the SNDR bitmap, each will be mirrored via SVM, hence we will need to double the number of blocks needed, rounded up to a multiple of 8KB or 16 blocks

# VOL_SIZE="`cat /tmp/vol_size| grep 'size: [0-9]' | awk '{print $5}'`"
# BMP_SIZE="`cat /tmp/vol_size| grep 'Sync ' | awk '{print $3}'`"
# SVM_SIZE=$((((BMP_SIZE+((16-1)/16))*16)*2))
# ZFS_SIZE=$((VOL_SIZE-SVM_SIZE))
# SVM_OFFS=$(((34+ZFS_SIZE)))
# echo "Original volume size: $VOL_SIZE, Bitmap size: $BMP_SIZE"
# echo "SVM soft partition size: $SVM_SIZE, ZFS vdev size: $ZFS_SIZE"

5). Use the 'find' utility below, adjusting its first parameter to produce the list of volumes that will be placed into the ZFS storage pool. Carefully examine this list, and adjust the first search parameter and/or use 'egrep -v "disk|disk"', for one or disks to exclude from this list any volumes that are not to be part of this ZFS storage pool configuration.

This resulting list produced by "find ...", is key in reformatting all of the LUNs that will be part of a replicated ZFS storage pool.

# find /dev/rdsk/c[45]*s0
    or
# find /dev/rdsk/c[45]*s0 | egrep -v "c4t2d0s0|c4t3d0s0"

6). Re-use the corrected find command from above as the driver to change the format of all of those volumes.

# find /dev/rdsk/c[45]*s0 | xargs -n1 fmthard -d 0:4:0:34:$ZFS_SIZE
# find /dev/rdsk/c[45]*s0 | xargs -n1 fmthard -d 1:4:0:$SVM_OFFS:$SVM_SIZE
# find /dev/rdsk/c[45]*s0 | xargs -n1 prtvtoc |egrep "^       [01]|partition map"

7). Re-use the corrected find command from above, with the additional selection of only even numbered disks, placing slice 1 of all selected disks into the SVM metadevice d101

# find /dev/rdsk/c[45]*[24680]s1 | xargs -I {} echo 1 $1\{} | xargs metainit d101 `find /dev/rdsk/c[45]*[24680]s1 | wc -l`

8). Re-use the corrected find command from above, with the additional selection of only odd numbered disks, placing slice 1 of all selected disks into the SVM metadevice d102

# find /dev/rdsk/c[45]*[13579]s1 | xargs -I {} echo 1 $1\{} | xargs metainit d102 `find /dev/rdsk/c[45]*[13579]s1 | wc -l`

9). Now mirror metadevice d101 and d102, into mirror d100, ignoring the WARNING that both sides of the mirror will not be the same. When the bitmap volumes are createD, they will be initialized, at which time both sides of the mirror will be equal.

# metainit d100 -m d101 d102

10). Now from the mirror SVM storage pool, allocate bitmap volumes out of SVM soft paritions for each SNDR replica

# OFFSET=1
# for n in `find /dev/rdsk/c[45]*s1 | grep -n s1 | cut -d ':' -f1 | xargs`
do
    metainit d$n -p /dev/md/rdsk/d100 -o $OFFSET -b $BMP_SIZE
    OFFSET=$(((OFFSET + BMP_SIZE + 1)))
done

11). Repeat steps 1 - 10 on the SNDR remote system (NODE-B)

12). Generate the SNDR enable on NODE-A

# DISK=1
# for ZFS_DISK in `find /dev/rdsk/c[45]*s0`
do
    sndradm -nE $NODE-A $ZFS_DISK /dev/md/rdsk/d$DISK NODE-B $ZFS_DISK /dev/md/rdsk/d$DISK ip sync g zfs-pool
    DISK=$(((DISK + 1)))
done

13).  Repeat step 12 on NODE-B

14). Perform then ZPOOL enables

# find /dev/rdsk/c[45]*s0 | xargs zpool create zfs-pool        

15). Enable SNDR replication, and take a look at what you have done!


# sndradm -g zfs-pool -nu
# sndradm -g zfs-pool -P
# metastat -P
# zpool status zfs-pool


The face of Sun StorageTek Availability Suite has changed quite a bit since June '06, when AVS 4.0 was released, supporting Solaris 10 on SPARC and x64/x86 platforms. In February '07, Availabilty Suite became an OpenSolaris Project, and now in June '07, Availability Suite reaches yet another new milestone, being a product offering in the Try Sun Products Free for 60 Days program.

The Sun StorageTek Availability Suite try for 60 days program provides a pair of Solaris host-based data services, supporting all Solaris file systems, most Solaris databases, plus Sun and 3rd-party applications. Availability Suite works with and across all Solaris volumes managers, SVM, ZFS's zvols, any block storage devices, being direct-attached, Fibre Channel, iSCSI, all independent of the underlying level of redundancy, or the physical device type  or storage array.

Sun StorageTek Availability Suite Point-in-Time Copy software creates and instantly accessible, for both reading and writing, an independent, dependent or compact dependent copy (no clones needed) of ANY volume on ANY storage. Creation can be 1-to-1, or 1-to-many, and the many can be any of the supported copy types. The resulting shadow volumes can be used on the local host, or if an independent copy was created on dual-host or SAN accessible storage, in can be used on any other similarly connected host, in both a read and write manner, without loosing the ability for a fast-resynchronization, avoiding a full-copy later on.

Sun StorageTek Availability Suite Remote Mirror Copy software enables real-time synchronous or asynchronous data replication to either local campus, metro, or remote data centers. Remote Mirror copies can be 1-to-1, 1-to-many or multi-hop as in A-to-B, then B-to-C. As part of its disaster recovery features, it supports both role-reversal (primary and secondary node swap roles), or on-demand reverse synchronization, where instantly after invoking a secondary to primary copy, the primary volume can be accesses for read and write, where un-replicated blocks are fetched on-demand.

The Sun StorageTek Availability Suite Point-in-Time Copy  & Remote Mirror Copy software are fully integrated, providing features such as point-in-time vs. real-time replication, data migration between Solaris 8, 9, 10 and OpenSolaris platforms. Availability Suite will also become part of Solaris 11 (Nevada) before the end of Q4FY07 (this month).

Sun StorageTek Availability Suite is fully integrated with Solaris Cluster, Solaris Cluster GeoGrapic Edition and Netra High Availability Suite, as a key component in bringing high availability to these product offerings.