Useful stuff for your blog-reading pleasure.
All | General

20080219 Tuesday February 19, 2008

VirtualBox and ZFS: The Perfect Team

I've never installed Windows in my whole life. My computer history includes systems like the Dragon 32, the Commodore 128, then the Amiga, Apple PowerBook (68k and PPC) etc. plus the occasional Sun system at work. Even the laptop my company provided me with only runs Solaris Nevada, nothing else. Today, this has changed. 

A while ago, Sun announced the acquisition of Innotek, the makers of the open-source virtualization software VirtualBox. After having played a bit with it for a while, I'm convinced that this is one of the coolest innovations I've seen in a long time. And I'm proud to see that this is another innovative german company that joins the Sun family, Welcome Innotek!

Here's why this is so cool.

Windows XP running on VirtualBox on Solaris Nevada

After having upgraded my laptop to Nevada build 82, I had VirtualBox up and running in a matter of minutes. OpenSolaris Developer Preview 2 (Project Indiana) runs fine on VirtualBox, so does any recent Linux (I tried Ubuntu). But Windows just makes for a much cooler VirtualBox demo, so I did it:

After 36 years of Windows freedom, I ended up installing it on my laptop, albeit on top of VirtualBox. Safer XP if you will. To the top, you see my VirtualBox running Windows XP in all its Tele-Tubby-ish glory.

As you can see, this is a plain vanilla install, I just took the liberty of installing a virus scanner on top. Well, you never know...

So far, so good. Now let's do something others can't. First of all, this virtual machine uses a .vdi disk image to provide hard disk space to Windows XP. On my system, the disk image sits on top of a ZFS filesystem:

# zfs list -r poolchen/export/vm/winxp
NAME                                                          USED  AVAIL  REFER  MOUNTPOINT
poolchen/export/vm/winxp                                     1.22G  37.0G    20K  /export/vm/winxp
poolchen/export/vm/winxp/winxp0                              1.22G  37.0G  1.05G  /export/vm/winxp/winxp0
poolchen/export/vm/winxp/winxp0@200802190836_WinXPInstalled   173M      -   909M  -
poolchen/export/vm/winxp/winxp0@200802192038_VirusFree           0      -  1.05G  -

Cool thing #1: You can do snapshots. In fact I have two snapshots here. The first is from this morning, right after the Windows XP installer went through, the second has been created just now, after installing the virus scanner. Yes, there has been some time between the two snapshots, with lots of testing, day job and the occasional rollback. But hey, that's why snapshots exist in the first place.

Cool thing #2: This is a compressed filesystem:

# zfs get all poolchen/export/vm/winxp/winxp0
NAME                             PROPERTY         VALUE                    SOURCE
poolchen/export/vm/winxp/winxp0  type             filesystem               -
poolchen/export/vm/winxp/winxp0  creation         Mon Feb 18 21:31 2008    -
poolchen/export/vm/winxp/winxp0  used             1.22G                    -
poolchen/export/vm/winxp/winxp0  available        37.0G                    -
poolchen/export/vm/winxp/winxp0  referenced       1.05G                    -
poolchen/export/vm/winxp/winxp0  compressratio    1.53x                    -
...
poolchen/export/vm/winxp/winxp0  compression      on                       inherited from poolchen

ZFS has already saved me more than half a gigabyte of precious storage capacity already! 

Next, we'll try out Cool thing #3: Clones. Let's clone the virus free snapshot and try to create a second instance of Win XP from it:

# zfs clone poolchen/export/vm/winxp/winxp0@200802192038_VirusFree poolchen/export/vm/winxp/winxp1
# ls -al /export/vm/winxp
total 12
drwxr-xr-x   5 constant staff          4 Feb 19 20:42 .
drwxr-xr-x   6 constant staff          5 Feb 19 08:44 ..
drwxr-xr-x   3 constant staff          3 Feb 19 18:47 winxp0
drwxr-xr-x   3 constant staff          3 Feb 19 18:47 winxp1
dr-xr-xr-x   3 root     root           3 Feb 19 08:39 .zfs
# mv /export/vm/winxp/winxp1/WindowsXP_0.vdi /export/vm/winxp/winxp1/WindowsXP_1.vdi

The clone has inherited the mountpoint from the upper level ZFS filesystem (the winxp one) and so we have everything set up for VirtualBox to create a second Win XP instance from. I just renamed the new container file for clarity. But hey, what's this?

VirtualBox Error Message 

Damn! VirtualBox didn't fall for my sneaky little clone trick. Hmm, where is this UUID stored in the first place?

# od -A d -x WindowsXP_1.vdi | more
0000000 3c3c 203c 6e69 6f6e 6574 206b 6956 7472
0000016 6175 426c 786f 4420 7369 206b 6d49 6761
0000032 2065 3e3e 0a3e 0000 0000 0000 0000 0000
0000048 0000 0000 0000 0000 0000 0000 0000 0000
0000064 107f beda 0001 0001 0190 0000 0001 0000
0000080 0000 0000 0000 0000 0000 0000 0000 0000
*
0000336 0000 0000 0200 0000 f200 0000 0000 0000
0000352 0000 0000 0000 0000 0200 0000 0000 0000
0000368 0000 c000 0003 0000 0000 0010 0000 0000
0000384 3c00 0000 0628 0000 06c5 fa07 0248 4eb6
0000400 b2d3 5c84 0e3a 8d1c
8225 aae4 76b5 44f5
0000416 aa8f 6796 283f db93 0000 0000 0000 0000
0000432 0000 0000 0000 0000 0000 0000 0000 0000
0000448 0000 0000 0000 0000 0400 0000 00ff 0000
0000464 003f 0000 0200 0000 0000 0000 0000 0000
0000480 0000 0000 0000 0000 0000 0000 0000 0000
*
0000512 0000 0000 ffff ffff ffff ffff ffff ffff
0000528 ffff ffff ffff ffff ffff ffff ffff ffff
*
0012544 0001 0000 0002 0000 0003 0000 0004 0000

Ahh, it seems to be stored at byte 392, with varying degrees of byte and word-swapping. Some further research reveals that you better leave the first part of the UUID alone (I spare you the details...), instead, the last 6 bytes: 845c3a0e1c8d, sitting at byte 402-407 look like a great candidate for an arbitrary serial number. Let's try changing them (This is a hack for demo purposes only. Don't do this in production, please):

# dd if=/dev/random of=WindowsXP_1.vdi bs=1 count=6 seek=402 conv=notrunc
6+0 records in
6+0 records out
# od -A d -x WindowsXP_1.vdi | more
0000000 3c3c 203c 6e69 6f6e 6574 206b 6956 7472
0000016 6175 426c 786f 4420 7369 206b 6d49 6761
0000032 2065 3e3e 0a3e 0000 0000 0000 0000 0000
0000048 0000 0000 0000 0000 0000 0000 0000 0000
0000064 107f beda 0001 0001 0190 0000 0001 0000
0000080 0000 0000 0000 0000 0000 0000 0000 0000
*
0000336 0000 0000 0200 0000 f200 0000 0000 0000
0000352 0000 0000 0000 0000 0200 0000 0000 0000
0000368 0000 c000 0003 0000 0000 0010 0000 0000
0000384 3c00 0000 0628 0000 06c5 fa07 0248 4eb6
0000400 b2d3 2666 6fbb c1ca 8225 aae4 76b5 44f5
0000416 aa8f 6796 283f db93 0000 0000 0000 0000
0000432 0000 0000 0000 0000 0000 0000 0000 0000
0000448 0000 0000 0000 0000 0400 0000 00ff 0000
0000464 003f 0000 0200 0000 0000 0000 0000 0000
0000480 0000 0000 0000 0000 0000 0000 0000 0000
*
0000512 0000 0000 ffff ffff ffff ffff ffff ffff
0000528 ffff ffff ffff ffff ffff ffff ffff ffff
*
0012544 0001 0000 0002 0000 0003 0000 0004 0000

Who needs a hex editor if you have good old friends od and dd on board? The trick is in the "conv=notruc" part. It tells dd to leave the rest of the file as is and not truncate it after doing it's patching job. Let's see if it works:

VirtualBox with two Windows VMs, one ZFS-cloned from the other.

Heureka, it works! Notice that the second instance is running with the freshly patched harddisk image as shown in the window above.

Windows XP booted without any problem from the ZFS-cloned disk image. There was just the occasional popup message from Windows saying that it found a new harddisk (well observed, buddy!).

Thanks to ZFS clones we can now create new virtual machine clones in just seconds without having to wait a long time for disk images to be copied. Great stuff. Now let's do what everybody should be doing to Windows once a virus scanner is installed: Install Firefox:

Clones WinXP instance, running FireFox

I must say that the performance of VirtualBox is stunning. It sure feels like the real thing, you just need to make sure to have enough memory in your real computer to support both OSes at once, otherwise you'll run into swapping hell...

BTW: You can also use ZFS volumes (called ZVOLs) to provide storage space to virtual machines. You can snapshot and clone them just like regular file systems, plus you can export them as iSCSI devices, giving you the flexibility of a SAN for all your virtualized storage needs. The reason I chose files over ZVOLs was just so I can swap pre-installed disk images with colleagues. On second thought, you can dump/restore ZVOL snapshots with zfs send/receive just as easily...

Anyway, let's see how we're doing storage-wise:

# zfs list -rt filesystem poolchen/export/vm/winxp
NAME                              USED  AVAIL  REFER  MOUNTPOINT
poolchen/export/vm/winxp         1.36G  36.9G    21K  /export/vm/winxp
poolchen/export/vm/winxp/winxp0  1.22G  36.9G  1.05G  /export/vm/winxp/winxp0
poolchen/export/vm/winxp/winxp1   138M  36.9G  1.06G  /export/vm/winxp/winxp1

Watch the "USED" column for the winxp1 clone. That's right: Our second instance of Windows XP only cost us a meager 138 MB on top of the first instance's 1.22 GB! Both filesystems (and their .vdi containers with Windows XP installed) represent roughly a Gigabyte of storage each (the REFER column), but the actual physical space our clone consumes is just 138MB.

Cool thing #4: ZFS clones save even more space, big time!

How does this work? Well, when ZFS creates a snapshot, it only creates a new reference to the existing on-disk tree-like block structure, indicating where the entry point for the snapshot is. If the live filesystem changes, only the changed blocks need to be written to disk, the unchanged ones remain the same and are used for both the live filesystem and the snapshot.

A clone is a snapshot that has been marked writable. Again, only the changed (or new) blocks consume additional disk space (in this case Firefox and some WinXP temporary data), everything that is unchanged (in this case nearly all of the WinXP installation) is shared between the clone and the original filesystem. This is de-duplication done right: Don't create redundant data in the first place!

That was only one example of the tremenduous benefits Solaris can bring to the virtualization game. Imagine the power of ZFS, FMA, DTrace, Crossbow and whatnot for providing the best infrastructure possible to your virtualized guest operating systems, be they Windows, Linux, or Solaris. It works in the SPARC world (through LDOMs), and in the x86/x64 world through xVM server (based on the work of the Xen community) and now joined by VirtualBox. Oh, and it's free and open source, too.

So with all that: Happy virtualizing, everyone. Especially to everybody near Stuttgart.

"VirtualBox and ZFS: The Perfect Team" has been brought to you by Constantin's Blooog.
This entry was created on 2008-02-19 13:18:18.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.
Comments [9]


20071206 Thursday December 06, 2007

X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage

Digital video editing is one of those applications that tend to be very data hungry. At SD PAL resolution, we're talking about 720 pixels x 576 lines x 3 bytes of color x 25 full frames per second = about 30 MB/s of data. That's about 224 GB for a 2 hour feature film. Not counting audio (that would only be around 3-4 GB). And we (in Germany) haven't looked at HD or Digital Cinema a lot yet...

During the last couple of weeks I worked with a customer who bought a Sun Fire X4500 server (you know, Thumper). The plan is to run Solaris ZFS on it, then provide big iSCSI volumes to the video editing systems, which tend to be specialized Windows or Mac OS X machines. Wonderful idea: Just use zpool create to combine a number of disks with some RAID level into a pool, then zfs create -V to create a ZVOL. Thanks to zfs shareiscsi=on, sharing the volume over iSCSI is dead easy.

But it didn't work.

First, Windows wouldn't mount the iSCSI volume. After some trying, we discovered that there must be an upper limit of 2TB to the size of iSCSI volumes that Windows can mount (we initially tried something like 5 ot 10TB). So be it: zfs create -V 2047G videopool/videovolume.

Now it mounted ok, we formatted the disk with NTFS (yuck!) and started the editing system's speed test. Then came the real issue: The test reported a write performance of 8-10 MB/s, but the editing system needs something like 30 MB/s sustained to be able to record reliably!

After some trying, we started the systematic approach:

  • A simple dd from one disk to another yielded >39 MB/s.
  • dd'ing from one small ZFS pool to another exceeded 120 MB/s (I later learned that cp is a better benchmark because it works asynchronously with large chunks of data vs. dd's synchronous block approach), so that was again more than we needed.
  • We tried re-attaching our ZVOL through iscsiadm to test the iSCSI stack's performance and ran into a TCP fusion issue. Ok, I've always wanted to play with mdb, so we followed the workaround instructions and we were able to attach our own ZVOL over the loopback interface. Slightly less performance (due to up the stack, down the stack effects, I presume) but still way more than we needed. So, it wasn't the X4500's nor ZFS' fault.

Finally, Danilo pointed me into the right direction: Nagle's algorithm. What usually helps maximize network bandwidth turns out to be a killer for iSCSI performance. For Solaris iSCSI clients, we know this already,  but how do we turn off Nagle on Windows?

The answer is deeply buried inside the Microsoft's iSCSI Initiator user guide: The "Addressing Slow Performance with iSCSI Clusters" chapter mentions a similar issue (although they talk about read not write performance) and they do mention RFC 1122's delayed ACK feature, which is related to Nagle's algorithm. The Microsoft document suggests a workaround which involves setting a variable in the registry, so it was worth a try (and my vengeance for having to use mdb before).

And low and behold, the speed test now yielded 90-100 MB/s (Close to a GBE's raw performance)! Yipee that was it! One little registry entry on the client side gave us a 10x improvement in iSCSI performance!

Now, can someone explain to me, why on Windows 2000 you need to set "TcpAckDelTicks=0" while on Windows 2003 the same thing is accomplished by saying "TcpAckFrequency=1" (which is the same thing, only seen from the other side of the division sign)?

So, to all you storage hungry video editors out there: The Sun Fire X4500 with Solaris ZFS and iSCSI is a great solution for reliable, fast, easy to use and inexpensive video storage. You just need to know how to tell your TCP/IP stack to not delay ACKs...
 

"X4500 + Solaris ZFS + iSCSI = Perfect Video Editing Storage" has been brought to you by Constantin's Blooog.
This entry was created on 2007-12-06 13:31:53.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.
Comments [10]




Archives
Subscribe to This Blog!
Most Popular Entries
Watch videos of Constantin
About this site
Links
Get in Touch!
This is Sun employee Constantin Gonzalez' personal blog.
All opinions expressed herein are solely of the author and do not necessarily reflect those of his employer.
If you want to contact the author, please send email to constantin (dot) gonzalez (at) sun (dot) com.
Thank you for reading this blog!