The dot in ... --- ...

Chris Gerhard's Weblog

« Creative Commons... | Main | Good Morning Build... »

20060721 Friday July 21, 2006

Why not use a raid controller to do mirroring or other kind of RAID underneath a zpool?

I got asked this today.

The ZFS manual says it is not recommended:


ZFS works best when given whole physical disks. Although constructing logical devices using a volume manager, such as Solaris Volume Manager (SVM), Veritas Volume Manager (VxVM), or a hardware volume manager (LUNs or hardware RAID) is possible, these configurations are not recommended. While ZFS functions properly on such devices, less-than-optimal performance might be the result.

Disks are identified both by their path and by their device ID, if available. This method allows devices to be reconfigured on a system without having to update any ZFS state. If a disk is switched between controller 1 and controller 2, ZFS uses the device ID to detect that the disk has moved and should now be accessed using controller 2. The device ID is unique to the drive's firmware. While unlikely, some firmware updates have been known to change device IDs. If this situation happens, ZFS can still access the device by path and update the stored device ID automatically. If you inadvertently change both the path and the ID of the device, then export and re-import the pool in order to use it.

So why not use that raid controller?


One reason is that you are preventing ZFS from recovering from a disk that returns the wrong or bad data. Since ZFS would be presented with a single device if it detects a checksum error it has no way to ask the raid controller to read the other side of the mirror or to recalculate Xor from a RAID 5 array. If on the other hand you let ZFS have direct control of the disks if one supplies bad data for any reason ZFS can read from the redundant copy.

I'm left wondering about controllers with write cache where the performance would be a factor that however will have to wait.


Tags:


( Jul 21 2006, 05:46:21 PM BST ) Permalink Trackback

   
Comments:

Given the large amount of HW RAID storage out there, that sounds like a large negative for ZFS. You're saying that ZFS basically works best with JABOD, which is great, but that's not how most enterprise storage is deployed. It's certainly an interesting observation; I hadn't considered it.

Posted by David Lehenky on July 21, 2006 at 09:25 PM BST #

I think your inference is slightly misplaced.

Given HW RAID storage, is ZFS better in terms of data integrity than another file system?

Yes.

If you had a blank sheet of paper and can allow ZFS to do any replication rather than getting the RAIS HW to do it will you get better data integrity?

Yes.

So would you reconfigure RAID devices to present non redundant fast storage to ZFS and rely on your ZFS configuration to provide redundancy?

Quite probably. I would certainly let ZFS have control of the redundant back up data so it can deliver good data to my application rather than an error. That error is still better than any other file system, it would just give back bad data. Which breaks the fundermantal contract between an application and a filesystem.

Posted by Chris Gerhard on July 21, 2006 at 09:44 PM BST #

I guess I didn't phrase it very well. I was trying to say that two of the key features of ZFS - guaranteed data integrity and automatic error recovery - are compromised when using HW RAID, or even host-based volume manager RAID, storage. The storage implementation that gets you 100% of what ZFS has to offer is JABOD.

Posted by David Lehenky on July 21, 2006 at 10:03 PM BST #

This is where ZFS is missing the boat. Most major corporations have sunk their money into large SAN subsystems (some of which Sun sells). Letting the subsystem handle the raid and huge cache are good for performance. Now Sun says don't let the subsystem (which Sun sells) handle raid... I see the benefit of ZFS in data protection, but Sun really needs to make it fit their customers' environments. We have a HUGE base of HDS subsystems already. So if ZFS can't provide a new benefit based on what we already have (or a cheaper benefit with the storage we have), then it's really just a theoretically nice file system.

Posted by Jon Hamlin on July 21, 2006 at 11:50 PM BST #

ZFS seems to be a great fit for the new "Thumper" hybrid server (X4500). Personally I'm not too fussed about the fact that ZFS works best with JBOD arrays, since my employer doesn't use SANs right now, and we only have one hardware RAID controller - an ageing A1000 two thirds full - the main database volume being four 36GB 10,000 RPM SCSI disks configured as a RAID 10 array. This stores 5 Informix SE databases, along with some user backup folders shared out via Samba, the host being an E450 with a single internal disk, plus one on "warm" standby. Our "next generation" server will likely be a V490 with 36GB of RAM running Informix Online 9.x with dual DAT 72 drives, dual internal disks, connected to a 3510 array with seperate volumes for each of the main Informix Online databases (there are actually 12 Informix SE databases on the current system but only 5 really need to be optimized for on-disk performance). Unfortunately our application vendor doesn't support Solaris 10, and has no plans to support it either so we are stuck with either UFS or "raw" volumes for the database storage. Given Informix Online will do "live" streaming backup to a DAT-72 tape drive this probably won't be an issue for us.

Posted by Andrew Pattison on July 22, 2006 at 02:51 PM BST #

I don't really thing ZFS is missing any boat.

To have a system recover from a data corruption you first have to detect the error. If you detect the error then you can the recover *if* you have another source of the data. Now if there was a way to ask a LUN to deliver another copy of the data, which there is not, then ZFS could recover from a data corruption on a hardware raid device.

However even without recovery ZFS still provides detection which is way more than any other file system. It still keeps the implied contract with the application that it will provide the correct data or an error (unless you have corruption that is not detected by the checksum algorythm that you use).

It is true that given a SAN you may well want to build your LUNS in a different way, ie without redundacy and then use ZFS to provide that redundancy.

Posted by Chris Gerhard on July 24, 2006 at 10:53 AM BST #

Post a Comment:

Comments are closed for this entry.

Valid HTML! Valid CSS!

Except where otherwise noted, this site is
licensed under a Creative Commons License 2.0

This is a personal weblog, I do not speak for my employer.