Friday Sep 26, 2008

I came across Mike La Spina's Ubiquitous Talk blog a while back and was stoked that we have people taking advantage of the recent changes in OpenSolaris to support the Software iSCSI Initiator in Vmware ESX. For the last while, we had compatibility issues but those have been addressed and code changes put back so there are some cool things we can do now. I've seen Mike post in the VMware VMTN forums about this as well so that is very cool.

I wanted to take things a bit further in this blog post and show how you can use the snapshot/cloning features in ZFS to easily provision virtual machines in ESX. I'm using svn_98 for this, running on a X4500. I'm not going to create a large iSCSI lun and use vmfs3 on it to store multiple VMs, instead I'm going to create a small lun to use as a RDM lun. This will allow me to take snapshots and clones at the virtual machine level, rather than at some coarser granularity.


So let's start by creating a simple 10G ZVOL (using -s for sparse) and turn it into an iSCSI target.

# zfs create -s -V 10g pool0/vmware/iscsi/lun0

Let's just take a look at how much space that uses. 34k! The power of the sparse!

# zfs list pool0/vmware/iscsi/lun0
NAME                      USED  AVAIL  REFER  MOUNTPOINT
pool0/vmware/iscsi/lun0  34.1K  13.0T  34.1K  -


As Mike also showed in his blog, I'm going to deviate from the quicker "shareiscsi=" parameter in zfs and set up my iscsi target manually. When I get to the clones, I'll be setting the lun numbers using the same method.

# iscsitadm create target -b /dev/zvol/rdsk/pool0/vmware/iscsi/lun0 zvolt

I also want to create an ACL mapping to my software initiator.

# iscsitadm list initiator
Initiator: isv6220c
    iSCSI Name: iqn.1998-01.com.vmware:isv-6220c-5e4d4229
    CHAP Name: Not set

# iscsitadm modify target -l isv6220c zvolt

# iscsitadm list target -v
Target: zvolt
    iSCSI Name: iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt
    Connections: 0
    ACL list:
        Initiator: isv6220c
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 0
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size:   10G
            Backing store: /dev/zvol/rdsk/pool0/vmware/iscsi/lun0
            Status: online

After rescanning my ESX server, here's what I see. I have two paths because I have 2 Network ports on my X4500. ESX sets up Session Multipathing.

Disk vmhba32:3:0 /dev/sdb (10239MB) has 2 paths and policy of Fixed
 iScsi sw iqn.1998-01.com.vmware:isv-6220c-5e4d4229<->iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt vmhba32:3:0 On active preferred
 iScsi sw iqn.1998-01.com.vmware:isv-6220c-5e4d4229<->iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt vmhba32:4:0 On

Now it's time to create the VM in the VI3 client.

Right-Click on your ESX server -> New Virtual Machine. Choose "Custom"

Walk through the rest of the steps as you normally would and, because you chose "Custom", you will be able to choose your Virtual Disk method. Choose "Raw Device Mapping" and select the lun that is being presented from your OpenSolaris iSCSI target.

Follow your normal OS installation methods from here on out. I used Windows 2003 Enterprise x64 for a quick installation.

Taking a look at the filesystem usage

# zfs list pool0/vmware/iscsi/lun0
NAME                      USED  AVAIL  REFER  MOUNTPOINT
pool0/vmware/iscsi/lun0  2.09G  13.0T  2.09G  -


So, my 10G zvol is still only using 2G of space.  I'll let this be my "golden" image. This is where I'd install VMTools, any patches needed, any custom apps..whatever. Once I get the image "Just Right", I can move on to snapping and cloning.

Shut down the Virtual Machine in VI3.

Make a snapshot

# zfs snapshot pool0/vmware/iscsi/lun0@w2k3-golden

Initially, the snapshot takes no additional space.

pool0/vmware/iscsi/lun0              2.09G  13.0T  2.09G  -
pool0/vmware/iscsi/lun0@w2k3-golden      0      -  2.09G  -


Next we clone it to make a new read/write copy.

# zfs clone pool0/vmware/iscsi/lun0@w2k3-golden pool0/vmware/iscsi/clone


Now add it as a new target.

Note the -u flag in my command below, here I'm creating multiple luns under the same target (zvolt). The "shareiscsi" parameter only lets you create new targets, all with lun0 only. This isn't a problem for a few targets, but VMware ESX has a 64 target limit and it also considers each path to a multi-pathed target as a brand new target. You could potentially run out of targets. That's why I like creating multiple luns under a single target.

# iscsitadm create target -u 1 -b /dev/zvol/rdsk/pool0/vmware/iscsi/clone zvolt

# iscsitadm list target -v
Target: zvolt
    iSCSI Name: iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt
    Connections: 2
        Initiator:
            iSCSI Name: iqn.1998-01.com.vmware:isv-6220c-5e4d4229
            Alias: isv-6220c.central.sun.com
        Initiator:
            iSCSI Name: iqn.1998-01.com.vmware:isv-6220c-5e4d4229
            Alias: isv-6220c.central.sun.com
    ACL list:
        Initiator: isv6220c
    TPGT list:
    LUN information:
        LUN: 0
            GUID: 600144f048dd3e090000144f2103c800
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size:   10G
            Backing store: /dev/zvol/rdsk/pool0/vmware/iscsi/lun0
            Status: online
        LUN: 1
            GUID: 0
            VID: SUN
            PID: SOLARIS
            Type: disk
            Size:   10G
            Backing store: /dev/zvol/rdsk/pool0/vmware/iscsi/clone
            Status: online



And here we can see we have 1 orginal lun, 1 snapshot and 1 clone..all taking just around 2.1G of space.

pool0/vmware/iscsi/clone             55.5K  13.0T  2.09G  -
pool0/vmware/iscsi/lun0              2.09G  13.0T  2.09G  -
pool0/vmware/iscsi/lun0@w2k3-golden      0      -  2.09G  -


Rescan for the new lun in ESX. It shows up as vmhba32:3:1/vmhba32:4:1, indicating it is seen as a lun under the same target as before.


Disk vmhba32:3:0 /dev/sdb (10239MB) has 2 paths and policy of Fixed
 iScsi sw iqn.1998-01.com.vmware:isv-6220c-5e4d4229<->iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt vmhba32:3:0 On active preferred
 iScsi sw iqn.1998-01.com.vmware:isv-6220c-5e4d4229<->iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt vmhba32:4:0 On

Disk vmhba32:3:1 /dev/sdc (10239MB) has 2 paths and policy of Fixed
 iScsi sw iqn.1998-01.com.vmware:isv-6220c-5e4d4229<->iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt vmhba32:3:1 On active preferred
 iScsi sw iqn.1998-01.com.vmware:isv-6220c-5e4d4229<->iqn.1986-03.com.sun:02:941a05d7-e46d-4986-df97-f6b371e94281.zvolt vmhba32:4:1 On

Now, we can follow the same method for Virtual Machine creation as above, only this time, use the new lun for the Raw Device. This will boot up as a true clone, so you probably will need to adjust hostname and network settings to avoid conflicts.

I started playing around with the clone to see how it affected space. I booted it up and added the recommended patches from Windows Update Center. You can see that it started using more space as it needed to track its own changed blocks. The snapshot still hasn't deviated from the master "lun0" yet.

pool0/vmware/iscsi/clone              849M  13.0T  2.78G  -
pool0/vmware/iscsi/lun0              2.09G  13.0T  2.09G  -
pool0/vmware/iscsi/lun0@w2k3-golden   0      -  2.09G  -


Next I booted the VM on lun0 and patched it. Here we can see the snapshot adding some space as it copies in the original blocks from lun0 before they get modified. lun0 itself grows as we patch the VM on top of it.

pool0/vmware/iscsi/clone              850M  13.0T  2.78G  -
pool0/vmware/iscsi/lun0              2.93G  13.0T  2.79G  -
pool0/vmware/iscsi/lun0@w2k3-golden   144M      -  2.09G  -


Just for fun, I created one more clone, added it to the iscsi target and created a 3rd VM using the methods above.

After booting it up you can see that 3 Virtual Machines, each thinking they have a 10G boot disk and with a fully patched OS, are only using 3.79G of space in the pool0/vmware/iscsi filesystem. That's quite a bit of space savings over having each VM be its own discrete block of storage.

pool0/vmware/iscsi                   3.79G  13.0T  38.4K  /pool0/vmware/iscsi
pool0/vmware/iscsi/clone              851M  13.0T  2.78G  -
pool0/vmware/iscsi/clone2            20.6M  13.0T  2.09G  -
pool0/vmware/iscsi/lun0              2.94G  13.0T  2.79G  -
pool0/vmware/iscsi/lun0@w2k3-golden   152M      -  2.09G

Friday Jun 13, 2008

One of my jobs here at Sun is to get various devices, arrays, HBAs on the VMware Hardware Compatibility List. This gives everyone the warm fuzzies that our products are tested and compatible with the ever popular VMware ESX.

In the early days, I did the FibreChannel arrays (Sun's 6x00, 2x00 & 3x00 lines) as well as the NAS arrays (our 5x00 products). After a re-org or two, I mainly focus on the NAS line as well as what we could call our Storage Servers..which brings me to the X4500.

A post or two ago, I gave some pointers on how to setup the X4500 as a big disk cache for backups (Netbackup, EBS/Legato). This would be somewhat of a traditional use of the X4500. It's a Solaris server, it's got a lot of disk (48 SATA drives from 500GB to 1TB in size). ZFS makes volume management a snap.

Another way to think of the X4500 is as a storage target. It has network ports, it can talk NFS, iSCSI, Samba. So why not make it look like a BIG NAS array and see what happens!

Now, let's bring VMware ESX into the mix. Everyone knows that you can connect up FibreChannel storage to ESX and create your vmfs3 filesystem/datastores. Most people also know that you can use iSCSI luns as well. What some people aren't aware of is that NFSv3 connected devices are also possible as datastores.

VMware Configuration

In order to connect NFSv3 mountpoints to the vmkernel, the vmkernel needs it's own IP. This is accomplished in the Configuration -> Network screen of your VI3 client application. It would be a best practice to put this "data network" vmkernel on a separate interface than your main Service Console and Virtual Machine Network.

Another tweak you might want to do is bump the maximum number of NFS mountpoints up from the default of 8 to 32. This is done in the Advanced Settings -> NFS section.

X4500 Configuration


To start with, I've been testing the X4500 with Solaris 10u4 and Solaris 10u5 installed. I usually jumpstart with the SUNWCall group (everything except OEM). Nothing special otherwise. Most people find it prudent to mirror the boot drives using Solaris Volume Manager. Check out the installation guide for those procedures. This will also work using any of the OpenSolaris flavors.

Configuring the X4500 is a matter of setting up the zpool and zfs filesystems and sharing them out via NFS. You can choose a mirrored config or RAIDZ/RAIDZ2 depending on your specific needs around capacity, performance and RAS. For a whitepaper I'm working on I chose a mirrored configuration consisting of 22 mirrored pairs and 2 hotspares, but your needs and configuration may vary. I chose mirror because I thought the Random nature of Vmware I/O patterns would make a mirror config the obvious choice, however, in further testing I'm seeing that there really isn't much difference between the IOPs and ResponseTime in any of the configs. I'm guessing that the I/Os, having to go through the various layers and NFS protocol, are getting cached and abstracted out enough to negate any perf advantage once it actually hits the spinning disk. Given that, I think I'm leaning towards a RAIDZ2 recommendation to maximize capacity and RAS. I hope to show some numbers in my whitepaper coming soon.

Once you've settled on a zpool configuration, you can create a few ZFS mountpoints. That's fairly straight forward.

# zfs create -o sharenfs=anon=0 nfspool/nfs1
# zfs set mountpoint=/nfs1 nfspool/nfs1

I use anon=0 because the vmkernel wants to write to the NFS filesystem as root. You could use an experimental ESX "Virtual Machine Delegate" feature under the Security Profile screen in the GUI that allows changing the user that writes/reads from the VMs to something else if you have security issues around anon=0.

Once that's done, it's a simple matter of adding the mountpoint to ESX via Configuration -> Storage -> Add Storage -> Network Filesystem. Give ESX the X4500 name or IP, the filesystem being shared (in this case /nfs1) and give your datastore a name.

That's it. You are now using the X4500 as a big NAS server for VMware ESX. There are some very interesting uses for this configuration. VMware's LabManger for one. LabManager uses NFS and really is looking for something big and low cost. That fits the X4500 to a tee. Or maybe you need a large system to store test VMs. Perhaps you need a lab environment that maybe doesn't need screaming performance, but really wants to keep space and costs low.

Since you are using the ZFS filesystem on the X4500, you get all of the fancy features like snapshots and cloning as well as checksums. Snapshots create the .zfs hidden filesystem under your mountpoint so it is a simple matter of logging onto the X4500, cd'ing to the directory you want and copying any files you want from your snapshot right back over. Cloned filesystem can be added by cloning a snapshot, sharing the newly promoted clone out via the "sharenfs" parameter and mounting it up on ESX just like a normal filesystem. Add the cloned VM's in that new filesystem to the Inventory and you are good to go.

There are a couple of things you need to keep in mind however. One, like all NFS implementations with ESX, you cannot do Raw Device Mapping. Another is that the X4500 itself could represent a Single-Point-of-Failure in that the motherboard would not be a redundant component and, due to its internal storage nature, it is not clusterable. There are multiple network ports and redundant power supplies, but if the MB dies, the box dies. Again, depending on your usage, this may or may not be a problem.

Hopefully, this information will be enough to get the curious started and stay tuned for my upcoming Whitepaper.

In a future post, I hope to talk about using the X4500 as a iSCSI target array. Yes...we are fixing the iSCSI compatibility problem we have with ESX's initiator and our Solaris iSCSI target.

*edit* Added a clarifying paragraph on the Solaris versions used.

Saturday May 17, 2008

As I mentioned a post or two ago, I also test our various storage products with VMware ESX. One of the things I wanted to tinker with was Project COMSTAR and how, even at this early stage of development, it looked with VMware ESX.

COMSTAR stands for Common Multiprotocol SCSI Target and while eventually it will encompass iSCSI, SAS and FibreChannel target modes, for this testing I was going to focus on the FC target.

I first grabbed a SunBlade x8420 server running build 87 of OpenSolaris. This server had a dual port FC Express Module (our Qlogic version, current COMSTAR bits need a Qlogic chipset) and I was going to use one port on the card in target mode and the other port in traditional initiator mode to connect to a FC array in my lab. This is not exactly how you would do things in the real world as if you had a FC array, why not just connect directly to that? But, this is just a test. I would expect a normal installation to take something like a cheap JBOD connected in some manner and then serve that out via COMSTAR. I just needed some disk space and the FC array was handy. I re-zoned the switch I had to put one HBA port into a zone with my FC array to grab a 500GB lun, and put the other HBA port into a zone with my ESX server, to act as the target port.

So, once the hardware was cabled up right, I set about following the instructions on the COMSTAR project page on how to Install. That's pretty straightforward, so I'll just point to that page. When I was finished, I ended up with my desired HBA port bound to the qlt driver and in target mode as show below:

# mdb -k
Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs mpt ip hook neti sctp arp usba qlc fctl nca stmf lofs emlxs fcp md cpc random crypto zfs smbsrv nfs fcip logindmux ptm nsctl sdbc sv ii sppp nsmb rdc ]
> ::devbindings -q qlt
ffffff02d47726e0 /pci@1,0/pci10de,5d@d/pci111d,8018@0/pci111d,8018@2/pci1077,14b@0, instance #0 (driver name: qlt)

# fcinfo hba-port

...

HBA Port WWN: 2100001b320a61b4
Port Mode: Target
Port ID: 10100
OS Device Name: Not Applicable
Manufacturer: QLogic Corp.
Model: d59e69e0
Firmware Version: 4.3.1
FCode/BIOS Version: N/A
Type: F-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2000001b320a61b4

Now that I had the driver set up, it was time to actually create the target lun. First I had to create my backing store. I decided to use a ZVOL from a ZFS file system residing on my FC array lun. Just to get it quickly going, I took a single 500GB FC lun and built a single zpool from it. Again, not optimal, but this was just supposed to be a quick test.

Now to build the ZVOL.

# zfs create -V 50g p1/fs/t1

# zfs list
NAME USED AVAIL REFER MOUNTPOINT
p1 50.0G 438G 18K /p1
p1/fs 50.0G 438G 18K /comstar
p1/fs/t1 50G 488G 24K -


Now to create the COMSTAR lun with my ZVOL as the backing store

# sbadm create-lu -s 50g /dev/zvol/rdsk/p1/fs/t1

Created the following LU:

GUID DATA SIZE SOURCE
-------------------------------- ------------------- ----------------
6000ae40030000000000482ddd0b0001 53687091200 /dev/zvol/rdsk/p1/fs/t1


Next I wanted to add the appropriate mapping/masking to present this lun to my ESX server's HBA ports. I wanted to connect it to two HBA ports, to see if ESX would handle any server side multi-pathing. So I had to set up a HostGroup entry with the WWNs from my ESX server HBAs.

# stmfadm create-host esx2
# stmfadm add-hg-member -g esx2 wwn.210100E08BB0C5C2 wwn.210000E08B90C5C2


Finally, I added the lun to the host I created.

# stmfadm add-view -h esx2 6000ae40030000000000482ddd0b0001


I thought I was done at that stage, so I performed a rescan of the HBA in ESX. But nothing appeared. Strange. I checked the FC switch and noticed that my target port wasn't even logged into the switch.

A quick check of svcadm showed that the stmf service was disabled. Oops.

# svcadm enable stmf
# svcs -a | grep stmf
online 13:38:39 svc:/system/device/stmf:default


Now, a rescan picked up the lun. Here's a brief line from the ESX vmkernel log to show what it looks like.

May 16 13:38:17 esx2 vmkernel: 10:21:46:00.382 cpu1:1034)ScsiScan: 395: Path 'vmhba3:C0:T2:L0': Vendor: 'SUN ' Model: 'COMSTAR ' Rev: '1.0 '
May 16 13:38:17 esx2 vmkernel: 10:21:46:00.382 cpu1:1034)ScsiScan: 396: Type: 0x0, ANSI rev: 5
May 16 13:38:17 esx2 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba3:C0:T2:L0 : 0x0 0x83 0x86


ESX gave me a multi-path, in Fixed mode, across the HBA ports. ESX doesn't really load-balance though, I/O will only go down the 'active' path.

Disk vmhba3:2:0 /dev/sdh (51200MB) has 2 paths and policy of Fixed
FC 6:1.0 210000e08b90c5c2<-&gt;2100001b320a61b4 vmhba3:2:0 On active preferred
FC 6:1.1 210100e08bb0c5c2<-&gt;2100001b320a61b4 vmhba4:2:0 On


From there I was able to create a vmfs3 file system on the lun and clone a VM into the new file system. I fired up Iozone in my Win2003 Virtual Machine and let it run. I noticed that the I/O was a bit choppy as measured at the Solaris server side, but my guess is that this is a result of the many layers of kernel and caching that I/Os are going down. (VM -> ESX file system -> ESX Kernel -> Solaris ZFS -> Solaris Kernel -> Array Cache). Like I said, this wouldn't be considered an optimal setup. I would probably swap the FC array for a cheap JBOD connected via SAS. Also, you could present the COMSTAR lun as a VMware RDM lun to further cut down on caching. Further testing and investigation will be needed to really hash out the best deployment strategies. Stay tuned for that.

So there you have it. VMware ESX connected via the COMSTAR FiberChannel target.

This blog copyright 2008 by rarneson