Trond Norbye's Weblog

« Previous day (Apr 1, 2009) | Main | Next day (Apr 2, 2009) »

http://blogs.sun.com/trond/date/20090402 Thursday April 02, 2009

Pluggable hashing algorithm in memcached?

In my blog post How well is your hash table working for you?, I pointed out that your keys could give you a bad distribution in the internal hash table inside memcached. Right now there is not much you can do apart from using another algorithm to generate your keys, but that may not be the easiest thing to do. Wouldnt it be cooler if you could just use another hashing algorithm instead?

I talked with Brian Aker on IRC the other day, and he pointed out that libmemcached contains a handfull of different algorithms (and is covered by the same license as the memcached server) so we could actually use the hashing routines from libmemcached in the server. The first thing we should do is probably to create a "hashing benchmark tool" in libmemcached that reads an input file of keys and determines the best hashing algorithm to use based upon speed and distribution. With the benchmark in place we could add a new configure option to memcached --with-hashing-algorithm=algorithm (this would of course require that we have libmemcached installed).

With the posibility to change the hashing algoritm in the server, I would love to take this one step further (I hate compiletime settings, because it makes life hard for people shipping binaries). What if we could dynamically change the hashing algorithm on the server without invalidating the existing cache? Wouldn't that be cool? Since memcached supports dynamic hash expansion, it shouldn't be hard to change the hash function as well. If you take a quick look in the function assoc_find located in assoc.c you will see that if expanding is set, we need to search in the old hash-table instead of the new. This is the place where we should add our logic that if the hash function changed (and we haven't repopulated the complete hash yet), we need to recompute the hash with the old hash function.

Anyone up for the challenge of implementing:

  • A key hashing benchmark program (in libmemcached)
  • Add configure option to memcached that detects and links with libmemcached, and overrides the default hashing algorithm
  • Create a new command to set the hashing algorithm runtime


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.