The X4500+ZFS and VMware ESX 3.5 via NFS
One of my jobs here at Sun is to get various devices, arrays, HBAs on the VMware Hardware Compatibility List. This gives everyone the warm fuzzies that our products are tested and compatible with the ever popular VMware ESX.
In the early days, I did the FibreChannel arrays (Sun's 6x00, 2x00 & 3x00 lines) as well as the NAS arrays (our 5x00 products). After a re-org or two, I mainly focus on the NAS line as well as what we could call our Storage Servers..which brings me to the X4500.
A post or two ago, I gave some pointers on how to setup the X4500 as a big disk cache for backups (Netbackup, EBS/Legato). This would be somewhat of a traditional use of the X4500. It's a Solaris server, it's got a lot of disk (48 SATA drives from 500GB to 1TB in size). ZFS makes volume management a snap.
Another way to think of the X4500 is as a storage target. It has network ports, it can talk NFS, iSCSI, Samba. So why not make it look like a BIG NAS array and see what happens!
Now, let's bring VMware ESX into the mix. Everyone knows that you can connect up FibreChannel storage to ESX and create your vmfs3 filesystem/datastores. Most people also know that you can use iSCSI luns as well. What some people aren't aware of is that NFSv3 connected devices are also possible as datastores.
VMware Configuration
In order to connect NFSv3 mountpoints to the vmkernel, the vmkernel needs it's own IP. This is accomplished in the Configuration -> Network screen of your VI3 client application. It would be a best practice to put this "data network" vmkernel on a separate interface than your main Service Console and Virtual Machine Network.
Another tweak you might want to do is bump the maximum number of NFS mountpoints up from the default of 8 to 32. This is done in the Advanced Settings -> NFS section.
X4500 Configuration
To start with, I've been testing the X4500 with Solaris 10u4 and Solaris 10u5 installed. I usually jumpstart with the SUNWCall group (everything except OEM). Nothing special otherwise. Most people find it prudent to mirror the boot drives using Solaris Volume Manager. Check out the installation guide for those procedures. This will also work using any of the OpenSolaris flavors.
Configuring the X4500 is a matter of setting up the zpool and zfs filesystems and sharing them out via NFS. You can choose a mirrored config or RAIDZ/RAIDZ2 depending on your specific needs around capacity, performance and RAS. For a whitepaper I'm working on I chose a mirrored configuration consisting of 22 mirrored pairs and 2 hotspares, but your needs and configuration may vary. I chose mirror because I thought the Random nature of Vmware I/O patterns would make a mirror config the obvious choice, however, in further testing I'm seeing that there really isn't much difference between the IOPs and ResponseTime in any of the configs. I'm guessing that the I/Os, having to go through the various layers and NFS protocol, are getting cached and abstracted out enough to negate any perf advantage once it actually hits the spinning disk. Given that, I think I'm leaning towards a RAIDZ2 recommendation to maximize capacity and RAS. I hope to show some numbers in my whitepaper coming soon.
Once you've settled on a zpool configuration, you can create a few ZFS mountpoints. That's fairly straight forward.
# zfs create -o sharenfs=anon=0 nfspool/nfs1
# zfs set mountpoint=/nfs1 nfspool/nfs1
I use anon=0 because the vmkernel wants to write to the NFS filesystem as root. You could use an experimental ESX "Virtual Machine Delegate" feature under the Security Profile screen in the GUI that allows changing the user that writes/reads from the VMs to something else if you have security issues around anon=0.
Once that's done, it's a simple matter of adding the mountpoint to ESX via Configuration -> Storage -> Add Storage -> Network Filesystem. Give ESX the X4500 name or IP, the filesystem being shared (in this case /nfs1) and give your datastore a name.
That's it. You are now using the X4500 as a big NAS server for VMware ESX. There are some very interesting uses for this configuration. VMware's LabManger for one. LabManager uses NFS and really is looking for something big and low cost. That fits the X4500 to a tee. Or maybe you need a large system to store test VMs. Perhaps you need a lab environment that maybe doesn't need screaming performance, but really wants to keep space and costs low.
Since you are using the ZFS filesystem on the X4500, you get all of the fancy features like snapshots and cloning as well as checksums. Snapshots create the .zfs hidden filesystem under your mountpoint so it is a simple matter of logging onto the X4500, cd'ing to the directory you want and copying any files you want from your snapshot right back over. Cloned filesystem can be added by cloning a snapshot, sharing the newly promoted clone out via the "sharenfs" parameter and mounting it up on ESX just like a normal filesystem. Add the cloned VM's in that new filesystem to the Inventory and you are good to go.
There are a couple of things you need to keep in mind however. One, like all NFS implementations with ESX, you cannot do Raw Device Mapping. Another is that the X4500 itself could represent a Single-Point-of-Failure in that the motherboard would not be a redundant component and, due to its internal storage nature, it is not clusterable. There are multiple network ports and redundant power supplies, but if the MB dies, the box dies. Again, depending on your usage, this may or may not be a problem.
Hopefully, this information will be enough to get the curious started and stay tuned for my upcoming Whitepaper.
In a future post, I hope to talk about using the X4500 as a iSCSI target array. Yes...we are fixing the iSCSI compatibility problem we have with ESX's initiator and our Solaris iSCSI target.
*edit* Added a clarifying paragraph on the Solaris versions used.
Have you been encountering any of the NFS problems plaguing ZFS in your testing or do you find it to be adequate on the thumper?
With Solaris 10u5, I'm still only averaging about 50MB/s of random i/o write performance with a thumper configured in a 5 chunk raidz2 setup where there are 9 drives in each raidz2. (Comparatively speaking, I'm easily seeing three or four times that when looking at the streaming i/o performance which makes sense on sata drives).
Posted by Travis Campbell on June 18, 2008 at 01:51 PM MDT #
Hi Travis
In my testing I've seen you really have to drive large blocksizes in RandomWrite scenarios to get 50MB/sec or above throughput...on the order of 128k-512k. I've measured RandWrite 512k blocksize transfers at 95MB/sec or above for both cache hit or cache miss. I was testing with a Opensolaris Virtual Machine running vdbench and testing against a 20gb vdisk. My setup was RAIDZ2 with 2 vdevs @ 5 disks each and 6 vdevs @ 6 disks each in a single zpool..for 46 total disks. Interestingly, it didn't really matter if I was using RAIDZ2, RAIDZ or MIRROR (22 mirrored pairs).
I'm waiting to get a system with SSDs to see how that affects performance.
Posted by RyanArneson on June 19, 2008 at 02:36 PM MDT #