When asked about Sun Microsystems, one word will always spring to the top of my mind: innovation
There is such a fantastic DNA in this company that looks to push boundaries and make things better - ok, we often do not got the message across well but the effort and dedication shown by employees always makes me proud.
To emphasis this point again there is great news as told by Jeff Bonwick earlier this week: "ZFS now has built-in deduplication"
Deduplication is a process to remove duplicate copies of data, whether it's files, blocks or bytes.
It's probably easier to explain with an example: suppose you have a database with company addresses, the location 'London' will exist for quite a few customers, so instead of having this entry 100 times, there will be one entry and the other 99 references to the original entry. So it saves space and lookup time as it's likely that the reference will already be loaded in cache.
How easy is it to set up?
Assuming you have a storage pool named 'tank' and you want to use dedup, just type this:
zfs set dedup=on tank
There is more to it, so read Jeffs blog for the whole story.
I'm guess this should appear shortly in the OpenSolaris /Dev builds, which will feed into the next OpenSolaris release (2010.02) and in Solaris 10 Update 9. Once it's released, I'll try and run some tests to see the savings I get.
This should also feed into the FreeBSD project. Such a shame OSX has dumped their ZFS project.
