Now that I have a working OpenSolaris build 128 system, I just had to take ZFS deduplication for a spin, to see if it was worth all of the hype.
Here is my test case: I have 2 directories of photos, totaling about 90MB each. And here's the trick - they are almost complete duplicates of each other. I downloaded all of the photos from the same camera on 2 different days. How many of you do that ? Yeah, me too.
Let's see what ZFS can figure out about all of this. If it is super smart we should end up with a total of 90MB of used space. That's what I'm hoping for.
The first step is to create the pool and turn on deduplication from the beginning.
# zpool create -f scooby -O dedup=on c2t2d0s2
This will use sha256 for determining if 2 blocks are the same. Since sha256 has such a low collision probability (something like 1x10^-77), we will not turn on automatic verification. If we were using an algorithm like fletcher4 which has a higher collision rate we should also perform a complete block compare before allowing the block
removal (dedup=fletcher4,verify)
Now copy the first 180MB (remember, this is 2 sets of 90MB which are nearly identical sets of photos).
# zfs create scooby/doo
# cp -r /pix/Alaska* /scooby/doo
And the second set.
# zfs create scooby/snack
# cp -r /pix/Alaska* /scooby/snack
And finally the third set.
# zfs create scooby/dooby
# cp -r /pix/Alaska* /scooby/dooby
Let's make sure there are in fact three copies of the photos.
# df -k | grep scooby
scooby 74230572 25 73706399 1% /scooby
scooby/doo 74230572 174626 73706399 1% /scooby/doo
scooby/snack 74230572 174626 73706399 1% /scooby/snack
scooby/dooby 74230572 174625 73706399 1% /scooby/dooby
OK, so far so good. But I can't quite tell if the deduplication is actually doing anything. With all that free space, it's sort of hard to see. Let's look at the pool properties.
# zpool get all scooby
NAME PROPERTY VALUE SOURCE
scooby size 71.5G -
scooby capacity 0% -
scooby altroot - default
scooby health ONLINE -
scooby guid 5341682982744598523 default
scooby version 22 default
scooby bootfs - default
scooby delegation on default
scooby autoreplace off default
scooby cachefile - default
scooby failmode wait default
scooby listsnapshots off default
scooby autoexpand off default
scooby dedupratio 5.98x -
scooby free 71.4G -
scooby allocated 86.8M -
Now this is telling us something.
First notice the allocated space. Just shy of 90MB. But there's 522MB of data (174MB x 3). But only 87MB used out of the pool. That's a good start.
Now take a look at the dedupratio. Almost 6. And that's exactly what we would expect, if ZFS is as good as we are lead to believe. 3 sets of 2 duplicate directories is 6 total copies of the same set of photos. And ZFS caught every one of them.
So if you want to do this yourself, point your OpenSolaris package manager at the
dev repository and wait for build 128 packages to show up. If you need instructions on using the OpenSolaris dev repository, point the browser of your choice at
http://pkg.opensolaris.org/dev/en/index.shtml. And if you can't wait for the packages to show up, you can always
.
Technocrati Tags:
Sun
Solaris
ZFS
FMA
Netherton
Trackback URL: http://blogs.sun.com/bobn/entry/taking_zfs_deduplication_for_a
Greetings, Bob. Thanks for trying this, I appreciate the early reporting on real-world results. =-). I'm curious about your thoughts on auto-ditto.
According to PSARC 2009/571, there's a default threshold to induce the creation of an additional copy (dedupditto=100) once you add so many references to a block.
Your test pool has a single vdev, so overall resilience is limited. Obviously this is just a scratch testbed, but for scenarios such as this (perhaps embedded devices) would you consider setting dedupditto=2 or 3? This would protect dedup'ed blocks a bit more, but still give you some space advantage.
I imagine that if you had mirror or raidz vdev's, then you could rely entirely upon that instead, even without auto-ditto. I'm not sure just where the benefits of dedupditto kick in, or whether that would actually help with resiliency.
Also, your data may not benefit from compression, but if you plan any more experimentation, I would be interested to see how enabling compression changes the reported values, and how easy it is to piece it all together from reading "zfs list -o space".
Thanks... -cheers, CSB
Posted by Craig S. Bell on November 23, 2009 at 12:51 PM CST #
Thanks for sharing Bob. Since ZFS dedup is on block level, did you ever try to have only one file but within this file there is redundant data. I'm curious to know whether we can get some dedup from this. My customer have one big file (VM ware), wondering how dedup can help them to lower cost of storage.
Posted by Paisit Wongsongsarn on November 24, 2009 at 09:30 AM CST #
dear bob, could you let me know when this dedupe feature will appear in a standard production release?
Posted by mike on November 27, 2009 at 09:52 AM CST #
Hey Mike,
I've seen the question asked quite a few times at Jeff Bonwick's blog (http://blogs.sun.com/bonwick/entry/zfs_dedup) and without an answer. At this point I would not want to speculate about future Solaris 10 features because I just don't know. I would keep an eye on Jeff's blog or follow some of the discussions at zfs-discuss@opensolaris.org. I will certainly post something about this (and other Solaris 10 features) as soon as they become public.
Posted by Bob Netherton on November 27, 2009 at 01:01 PM CST #