Friday June 26, 2009
My disks crashed!
Yesterday I found out the hard way that you are not 100% safe even if you have a mirror.. Both of my disks started to fail, so I wasn't able to boot the computer anymore. I am pretty sure that someone with more admin experience would be able to get the machine back up and running (I must admit that I haven't paid much attention to how grub etc works, so my knowledge about system startup is a bit outdated ;-)). Because I feel I have a big hole of knowledge here, I booted a live cd and imported the ZFS mirror. Luckily for me I had no problems on the filesystems where I had all of my work, so I was able to copy all of it out to another disk (I don't have more than 4 SATA connectors in my machine, so I had to disconnect the SSD cache and plug in an extra disk there). With all my work safe on another hard drive safe on my desk I decided to do the easy thing: just buy two new disks and reinstalled OpenSolaris on those. My original plan was to get two 650GB disks, but when I arrived at the store they didn't have them anymore. Luckily for me he gave me two 1TB disks for the same price, so now I've got plenty of space for ZFS snapshots :-)
Personally I just love ZFS. During the installation of OpenSolaris, I just opened a terminal and typed in:
jack@opensolaris> pfexec zpool attach -f rpool c7d0s0 c9d0s0
And my mirror was up and running :-) I have been upgrading the old machine for a long time, so I have a lot of old "zombies" laying around. Instead of restoring all of them, I decided to just restore my data (/home), and recreate all of the configuration.
Hopefully I'll be done with everything tonight :-) I'm just so glad that I didn't loose a single bit of my data :-)
Posted at 01:55PM Jun 26, 2009 by trond in Personal | Comments[1]
Thursday June 25, 2009
Replicate your keys to multiple memcached servers.
If you look at a how the (community version) of memcached works, all servers are completely isolated from each other. They don't know (or care) about the existence of other servers, and all advanced logic is implemented by the clients. This removes a lot of complexity from the server, resulting in a small clean source base with few bugs. You will also find this simple design in the client-server protocol, reducing what you can try to implement in the server.
If you scan the mailing lists you will find that requests for replication seems to pop up with a regular interval, so I decided to give it a shot. Personally I am not too interested in a full replicated scenario (where you have all of your keys stored on multiple machines), because I think you would be wasting too much space. I think a mixed mode is more interesting, where you store only a few of the items on multiple servers; and this is what I implemented.
If you look at the design for the replication from a 1000ft, it is dead simple. When we store a key on the server, we will also store it on the n'th next servers. If we encounter a problem when we try to send the GET request to the server we try fetch the replica instead. We will however not try to fetch the replica if:
GET-request so that we can send it to the next replica server.)If you want to try it out you need to grab at least revision 539, but you should be aware of some design choices / limitations:
It is only supported with the binary protocol, so you cannot use a memcached server from the 1.2 series (you need the 1.4 branch).
Why? Well the replication code use the "noreply" mode to store the replicas, and the "noreply" mode in the ASCII protocol is just one big hack ;-)
SET is the only command that will store multiple replicas.
The replication code does not implement any kind of transactions / consistency, so I wanted
to expose this fact to the user. Allowing ADD or REPLACE could confuse the users and introduce strange bugs in their application. INCR and DECR raise the same inconsistency problems. If you have an atomic counter (at least if it doesn't get evicted from the cache) you don't want it to behave strangely because of race conditions updating the replicas.
CAS identified is generated on the server, so the master item and all replicas will have different CAS identifiers. If you enable replication you can't use CAS
memcached_st instance, so the API stays the same (and adds no extra costs if you don't use it
Well, I guess a lot of you don't like reading text that don't end each statement with a semicolon, so I should probably add some code. First you should locate the code where you create your memcached_st handle. You probably have something like (I removed the error checking to keep the example small, but you don't want to do that in your code!!!!):
memcached_st *memc = memcached_create(NULL); memcached_server_st *servers = memcached_servers_parse(server_list); memcached_server_push(memc, servers); memcached_server_list_free(servers);
The first thing we need to do is to enable the binary protocol:
memcached_behavior_set(memc, MEMCACHED_BEHAVIOR_BINARY_PROTOCOL, 1);
As I mentioned above, I don't think you really want to replicate all of your keys, so let's create a new memcached_st instance and enable replication there (num_replicas contains the number of replicas I want):
memcached_st repl = memcached_clone(NULL, memc); memcached_behavior_set(repl, MEMCACHED_BEHAVIOR_NUMBER_OF_REPLICAS, num_replicas);
And that's all you need to do! If you want to store a key with multiple replicas, you would go ahead and store it using the repl instance. For "normal" items, you would use the memc instance:
/* Store a key with replicas: */ memcached_set(repl, "replicated", 10, "foo", 3, 0, 0); /* Try to get the item (or the replicas if we have problems talking to the master) */ void* value = memcached_get(repl, "replicated", 10, &vlen, &flags, &rc); /* Store a without replicas */ memcached_set(memc, "single", 6, "foo", 3, 0, 0); /* Try to get the item */ void* value = memcached_get(memc, "single", 6, &vlen, &flags, &rc); /* We can also get the master of a replicated item: */ void* value = memcached_get(memc, "replicated", 10, &vlen, &flags, &rc);
Posted at 11:50AM Jun 25, 2009 by trond in Memcached | Comments[0]
Sunday June 07, 2009
Compiling Drizzle on OpenSolaris 2009.06
I thought it would be appropriate with a new and updated blog post on how to compile Drizzle with the release of OpenSolaris 2009.06. To make the blog more copy'n'paste friendly I have removed the prompt from all of the command's I am displaying :-)
The first thing we need to do is to install a complier, and all of the common tools used to build opensource projects. Drizzle also require libevent and gperf, and there exists precompiled packages for them. So let's go ahead and install the software with the following command:
pfexec pkg install ss-dev SUNWlibevent SUNWgnu-gperf
I like to put the software I compile in separate ZFS filesystems, so let's go ahead and create:
/opt/dscm - To hold the scm systems/opt/drizzle - This is where we want our Drizzle installation/opt/gearman - This is where we want our Gearman installation"Why not just put everything in /usr/local?" you may ask. Well, I don't like that because then I have a hard time figuring what files to remove when I want to uninstall a package. "This must turn into a long and complex path?" would probably be your next question. The answer is no. Just create the appropriate symbolic links and you are good to go :-)
So let's go ahead and create the ZFS filesystems:
for f in dscm drizzle gearman google do pfexec zfs create -o mountpoint=/opt/$f rpool/$f pfexec chown `/usr/bin/id -u`:`/usr/bin/id -g` /opt/$f done
Drizzle, Gearman and libmemcached all use Bazaar for development, and there isn't a package available for OpenSolaris so we need to install this ourself. The Bazaar team is really active and using the "release early, release often" model, and I want a easy way to keep up with the versions. Instead of having zombie files / versions laying around, I ended up with a model where I install each version into its own directory, and I have a symbolic link to the version I want to use. Because we install in a "nonstandard" location, we need to create a startup-script so that Python can find the modules. So let's go ahead and install Bazaar (1.15 is the latest stable version right now) :
wget --no-check-certificate http://launchpad.net/bzr/1.15/1.15final/+download/bzr-1.15.tar.gz gtar xfz bzr-1.15.tar.gz cd bzr-1.15 python setup.py install --prefix=/opt/dscm/bazaar-1.15 mkdir /opt/dscm/bin cat > /opt/dscm/bin/bzr <<EOF #! /bin/ksh export PYTHONPATH=/opt/dscm/bazaar/lib/python2.4/site-packages exec /opt/dscm/bazaar/bin/bzr "\$@" EOF chmod a+x /opt/dscm/bin/bzr ln -s bazaar-1.15 /opt/dscm/bazaar cd .. rm -rf bzr-1.15.tar.gz bzr-1.15
The next time you want to upgrade Bazaar, all you need to do is to move the symbolic link /opt/dscm/bazaar to point to the new version. You can now either put /opt/dscm/bin into your path, or you can create something like /opt/local/bin and create a symbolic link to /opt/dscm/bin/bzr from there (and then put /opt/local/bin in your path. To avoid path problems, I'll keep on referring to bzr with absolute path throughout the example.
For some reason OpenSolaris doesn't contain a prebuilt 64-bit version of GNU readline, so that we need to compile that ourself (It is scheduled for an upcoming build AFAIK). To keep the example simple, I'll just install the readline library into /opt/drizzle. So just execute the following commands to download, build and install:
wget http://ftp.gnu.org/gnu/readline/readline-6.0.tar.gz gtar xfz readline-6.0.tar.gz cd readline-6.0 ./configure --disable-static --prefix=/opt/drizzle gmake all install gmake clean ./configure --disable-static --prefix=/opt/drizzle --libdir=/opt/drizzle/lib/`isainfo -k` CFLAGS="-m64" gmake all install ln -s `isainfo -k` /opt/drizzle/lib/64 ln -s . /opt/drizzle/lib/32 cd .. rm -rf readline-6.0.tar.gz readline-6.0
Stop! why do you build it two times?" If you look at the options there I compile one version with "-m64", and that option will create a 64bit binary. Most people would probably not care for the 32bit binary, but I like to build both versions when I build a library (so that I don't have problems later on if I want to build a 32 (or 64 bit) binary using the library. The reason for the two symbolic links I create at the end is explained in chapter 32-bit and 64-bit Libraries.
Drizzle use Google Protocol buffers in the communication protocol, so let's go ahead and compile them. I don't use the latest version, because there is a compilation error in that version (and I haven't had the time to look at that yet):
wget http://protobuf.googlecode.com/files/protobuf-2.0.3.tar.gz
gtar xfz protobuf-2.0.3.tar.gz
cd protobuf-2.0.3
./configure --disable-static --with-zlib --prefix=/opt/google CPPFLAGS="-fast -m32" LDFLAGS="-fast" \
--bindir=/opt/google/bin/i86
gmake all install
gmake clean
./configure --disable-static --with-zlib --prefix=/opt/google CPPFLAGS="-fast -m64" LDFLAGS="-fast -m64" \
--libdir=/opt/google/lib/`isainfo -k` --bindir=/opt/google/bin/`isainfo -k`
gmake all install
cd ..
ln -s `isainfo -k` /opt/google/lib/64
ln -s . /opt/google/lib/32
cp /usr/lib/isaexec /opt/google/bin/protoc
rm -rf protobuf-2.0.3.tar.gz protobuf-2.0.3
With all the dependencies installed, we can go ahead and grab the source for libmemcached, libdrizzle, Gearman and Drizzle:
for f in libdrizzle gearmand libmemcached drizzle do /opt/dscm/bin/bzr branch lp:$f done
So let's go ahead and start building them. libdrizzle is first up:
cd libdrizzle ./config/autorun.sh ./configure --disable-static --prefix=/opt/drizzle CFLAGS="-fast -m32" LDFAGS="-fast" gmake all install ./configure --disable-static --prefix=/opt/drizzle --libdir=/opt/drizzle/lib/`isainfo -k` CFLAGS="-fast -m64" LDFAGS="-fast" gmake clean gmake all install cd ..
The next one on the list is libmemcached:
cd libmemcached
./config/bootstrap
PATH=$PATH:/usr/perl5/bin ./configure --disable-static --prefix=/opt/drizzle CFLAGS="-fast -m32" LDFAGS="-fast" \
--without-memcached --bindir=/opt/drizzle/bin/i86
gmake all install
PATH=$PATH:/usr/perl5/bin ./configure --enable-64bit --disable-static --prefix=/opt/drizzle \
--libdir=/opt/drizzle/lib/`isainfo -k` CFLAGS="-fast" LDFAGS="-fast" --without-memcached --bindir=/opt/drizzle/bin/`isainfo -k`
gmake clean
gmake all install
for f in memcat memrm memcp memerror memflush memslap memstat
do
cp /usr/lib/isaexec /opt/drizzle/bin/$f
done
cd ..
There is a problem with the configure script for Gearman, so it is not able to create a 32 bit binary on a machine capable of running in 64 bit mode, so from now on we will only create 64 bit binaries (I will work on a patch for this):
cd gearmand
./config/bootstrap
./configure --prefix=/opt/gearman --disable-static --sbindir=/opt/gearman/sbin/`isainfo -k` --libdir=/opt/gearman/lib/`isainfo -k` \
--bindir=/opt/gearman/bin/`isainfo -k` CFLAGS="-fast -I/opt/drizzle/include -m64" \
LDFLAGS="-L/opt/drizzle/lib/64 -R/opt/drizzle/lib/64"
gmake clean
gmake all install
cd ..
cp /usr/lib/isaexec /opt/gearman/sbin/gearmand
cp /usr/lib/isaexec /opt/gearman/bin/gearman
Before we can start compiling Drizzle we need to make sure that Drizzle can detect our PCRE installation. OpenSolaris ships with a version that is too new for the Drizzle configure script, so that we need to create a symbolic link to make sure it detects it properly:
pfexec ln -s pcre/pcre.h /usr/include/pcre.h
Now all is set for compiling Drizzle:
cd drizzle PATH=$PATH:/opt/dscm/bin ./config/autorun.sh PATH=$PATH:/opt/google/bin ./configure CPPFLAGS="-I/opt/google/include -I/opt/gearman/include -I/opt/drizzle/include" \ LDFLAGS="-L/opt/google/lib/64 -L/opt/gearman/lib/64 -L/opt/drizzle/lib/64 -R/opt/drizzle/lib/64:/opt/gearman/lib/64:/opt/google/lib/64" \ --prefix=/opt/drizzle --libdir=/opt/drizzle/lib/`isainfo -k` PATH=$PATH:/opt/google/bin gmake all install
Now you should have Drizzle installed in /opt/drizzle. If you look in some of my previous blog posts you should be able to find out how to install it as an SMF service :-)
Cheers
Posted at 08:41PM Jun 07, 2009 by trond in OpenSolaris | Comments[0]
Monday May 25, 2009
Manage Gearmand and Drizzle with SMF
If you are running Gearman or Drizzle on Solaris, you may want to let SMF start and monitor the services. I just pushed service definitions and and scripts to install them a couple of days ago.
If you look in the scripts directory in Gearman, you will see a script named smf_install.sh. If you run this script it will define a user and group named gearmand, create the Gearman authorizations and profile before a service named gearman is defined. To start the Gearman service all you need to do is to run:
trond@storm ~> svcadm enable gearman
For Drizzle you need to look in the support-files/smf directory for a script named install.sh. To start Drizzle all you need to do is to run:
trond@storm ~> svcadm enable drizzle
Posted at 06:06PM May 25, 2009 by trond in OpenSolaris | Comments[0]
Tuesday May 12, 2009
Connection pooling libmemcached
A while back I looked at the Memcached UDF for MySQL, and noticed that it didn't use libmemcached in an optimal way. In order to work in a multithreaded environment it used the following pattern:
memcached_st* clone = memcached_clone(NULL, memc); ... memcached operations using the clone --- memcached_free(clone);
Well, that doesn't look bad, does it? Well, it isn't that bad, but if you look at the network traffic you will see that we end up connecting / disconnecting to the involved memcached servers every time, and memcached is not optimized for "single-shot" connections.
So how should you solve this? Well, you should reuse your clones! And luckily for you, you don't have to reinvent the wheel. Yesterday I pushed a patch to libmemcached introducing a new library: libmemcachedutil. The intention of that library is to put utility functions built on top of libmemcached that you might want to use in your application, and the first routine there is the pool functionality.
So let's write some code using the new library:
#include <pthread.h>
#include <stdbool.h>
#include <signal.h>
#include "libmemcached/memcached_util.h"
static volatile bool run = true;
static void sig_handler(int sig) {
assert(sig == SIGINT);
run = false;
}
static void* my_application_thread(void *arg)
{
memcached_pool_st* pool = arg;
while (run) {
memcached_return rc;
memcached_st* mem = memcached_pool_pop(pool, true, &rc);
if (mem != NULL) {
... use the memcached handle for whatever you want! ...
/* Return the instance to the pool */
if (memcached_pool_push(pool, mem) != MEMCACHED_SUCCESS) {
fprintf(stderr, "Failed to release the memcached instance!\n");
}
} else {
fprintf(stderr, "Failed to get the memcached instance from pool!\n");
}
}
return NULL;
}
int main(int argc, char** argv)
{
memcached_st* memc = memcached_create(NULL);
if (memc == NULL) {
fprintf(stderr, "Failed to create memcached instance\n");
return 1;
}
if (memcached_server_add(memc, "localhost", 11211) != MEMCACHED_SUCCESS) {
fprintf(stderr, "Failed to add localhost to the server pool\n");
memcached_free(memc);
return 1;
}
memcached_pool_st* pool= memcached_pool_create(memc, 5, 10);
if (pool == NULL) {
fprintf(stderr, "Failed to create connection pool\n");
memcached_free(memc);
return 1;
}
signal(SIGINT, sig_handler);
/* create 10 threads to use the pool */
pthread tid[10];
for (int x= 0; x < 10; ++x) {
pthread_create(&tid[x], NULL, my_application_thread, pool);
}
for (int x= 0; x < 10; ++x) {
pthread_join(&tid[x], NULL);
}
/* Release allocated resources */
memcached_pool_destroy(pool);
memcached_free(memc);
return 0;
}
Posted at 10:26AM May 12, 2009 by trond in Memcached | Comments[0]
Thursday April 23, 2009
Presentation at the MySQL Users Conference
Earlier today I did the presentation Memcached Meet Flash, the pluggable engine interface, and if you missed it you can download the slides. It is kind of fun to think back on the hackathon at the users conference the last year when Toru shared his ideas about a storage interface, followed by the interesting discussion I had with Matt during the OpenSolaris summit down in Santa Clara. I didn't know back then that I would present this at the users conference this year :-)
My brother came down for my presentation and took the following picture with his iPhone during the session:
If you have any questions regarding the slides, come look me up at the hackathon tonight :)
Posted at 12:38AM Apr 23, 2009 by trond in Memcached | Comments[0]
Sunday April 19, 2009
Using CVS with pserver access with OpenGrok
Today I pushed two fixes into OpenGrok so that you may use OpenGrok on sources you checked out via the pserver protocol in CVS. From a performance perspective I would not recommend that you use this configuration, but it might be good enough for you if you just want to search your own projects.
With the latest development build of OpenGrok installed into /var/opengrok/bin and /var/tomcat6/webapps/source.war I was able to index and browse the PostgreSQL sources. I checked out PostgreSQL into /var/opengrok/source/pgsql, and executed the following commands:
trond@opensolaris> cd /var/opengrok
trond@opensolaris> java -jar bin/opengrok.jar -c /var/opengrok/bin/ctags \
-v -s /var/opengrok/source -d /var/opengrok/data -S -P \
-p /opengrok -n -r on -W /etc/opengrok/configuration.xml
trond@opensolaris> java -Xmx2g -jar bin/opengrok.jar -R /etc/opengrok/configuration.xml -U localhost:2424
Please let me know if you have problems getting this to work (or even better, admit that it is 2009 and move on to a more modern SCM system ;-) )
Posted at 09:28PM Apr 19, 2009 by trond in OpenGrok | Comments[2]
Wednesday April 15, 2009
Using Subversion with OpenGrok
In my previous blog entry Using CVS with OpenGrok I showed the steps needed to configure OpenGrok with CVS, and in this entry I will extend that example to include a project using Subversion.
The first thing we need to do is to install a Subversion client and check out the source code. I don't use Subversion for any of my projects, but Knut Anders is working on Apache Derby (hosted in a Subversion repository) so lets use that in this example.
trond@opensolaris> pfexec pkg install SUNWsvn trond@opensolaris> cd /var/opengrok/source trond@opensolaris> svn co https://svn.apache.org/repos/asf/db/derby/code/trunk derby
If we use the browser to navigate to http://localhost:8080/source/xref you will see a new directory named derby. The history links and selection box selection box in http://localhost:8080/source/ does however not work for Derby yet, so let's go ahead and update the configuration:
trond@opensolaris> cd /var/opengrok
trond@opensolaris> java -jar /var/opengrok/bin/opengrok.jar -c /var/opengrok/bin/ctags \
-v -s /var/opengrok/source -d /var/opengrok/data -S -P \
-p /opengrok -n -r on -W /etc/opengrok/configuration.xml
(run look at the man page for a description of the different options).
With the new configuration in place, we can start the index generation:
trond@opensolaris> cd /var/opengrok trond@opensolaris> java -Xmx2g -jar /var/opengrok/bin/opengrok.jar -R /etc/opengrok/configuration.xml -H
With the new index database in place it is time to update the web application to use the new configuration:
trond@opensolaris> java -Xmx2g -jar /var/opengrok/bin/opengrok.jar -R /etc/opengrok/configuration.xml -n\
-U localhost:2424
Or you could just restart the Tomcat web server:
trond@opensolaris> svcadm restart tomcat6
If you navigate to http://localhost:8080/source/history/derby/README you should get the history for the README file and the annotate link should be available. Subversion supports changesets so you should be able to request history for directories, but the directory information is not cached so this is a potentially slow operation (if you have remote SCM repositories).
Posted at 03:45AM Apr 15, 2009 by trond in OpenGrok | Comments[1]
Friday April 10, 2009
Using CVS with OpenGrok
If you look at the mail archives for OpenGrok it seems that the most popular question out there right now is how to configure OpenGrok with CVS. Personally I have extremely limited experience with cvs, but I guess there is some old projects out there that haven't converted to a distributed scm system yet (check out http://www.selenic.com/mercurial/wiki/index.cgi/RepositoryConversion ;-)). In this blog I'll show you how to configure a project using cvs in OpenGrok.
I don't use cvs, so the first thing we need to do is to install cvs and create a cvs repository for our source. An empty cvs repository doesn't help us, so lets import the OpenGrok sources and use them in the example:
trond@opensolaris> pfexec pkg install SUNWcvs trond@opensolaris> pfexec zfs create -o mountpoint=/cvsroot rpool/cvsroot trond@opensolaris> pfexec chown trond:staff /cvsroot trond@opensolaris> cd /cvsroot trond@opensolaris> export CVSROOT=`pwd` trond@opensolaris> cvs init trond@opensolaris> cd /tmp trond@opensolaris> hg clone ssh://anon@hg.opensolaris.org/hg/opengrok/trunk opengrok trond@opensolaris> cd opengrok trond@opensolaris> rm -rf .hg trond@opensolaris> cvs import -m "Initial import of OpenGrok" opengrok opengrok-trunk start trond@opensolaris> rm -rf opengrok
I got my OpenGrok installation in /var/opengrok with the sources in /var/opengrok/source, so let's check out the sources:
trond@opensolaris> cd /var/opengrok/source trond@opensolaris> cvs co opengrok
The next thing we need to do is to update the configuration with the knowledge of the new project (and it's repository):
trond@opensolaris> cd /var/opengrok
trond@opensolaris> java -jar /var/opengrok/bin/opengrok.jar -c /var/opengrok/bin/ctags \
-v -s /var/opengrok/source -d /var/opengrok/data -S -P \
-p /opengrok -n -W /etc/opengrok/configuration.xml
(run look at the man page for a description of the different options).
With the new configuration in place, we can start the index generation:
trond@opensolaris> cd /var/opengrok trond@opensolaris> java -Xmx2g -jar /var/opengrok/bin/opengrok.jar -R /etc/opengrok/configuration.xml
So let's install tomcat and try it out:
trond@opensolaris> pfexec pkg install SUNWtcat trond@opensolaris> pfexec cp /var/opengrok/source.war /var/tomcat6/webapps trond@opensolaris> svcadm enable tomcat6
If you navigate to http://localhost:8080/source/history/opengrok/LICENSE.txt you should get the history for the LICENSE file and the annotate link should be available. You should be able to navigate around and look at the change history for the files in your repository. Please note that cvs operates on a pr file basis, so you cannot request history for a directory.
Posted at 02:44PM Apr 10, 2009 by trond in OpenGrok | Comments[3]
Thursday April 02, 2009
Pluggable hashing algorithm in memcached?
In my blog post How well is your hash table working for you?, I pointed out that your keys could give you a bad distribution in the internal hash table inside memcached. Right now there is not much you can do apart from using another algorithm to generate your keys, but that may not be the easiest thing to do. Wouldnt it be cooler if you could just use another hashing algorithm instead?
I talked with Brian Aker on IRC the other day, and he pointed out that libmemcached contains a handfull of different algorithms (and is covered by the same license as the memcached server) so we could actually use the hashing routines from libmemcached in the server. The first thing we should do is probably to create a "hashing benchmark tool" in libmemcached that reads an input file of keys and determines the best hashing algorithm to use based upon speed and distribution. With the benchmark in place we could add a new configure option to memcached --with-hashing-algorithm=algorithm (this would of course require that we have libmemcached installed).
With the posibility to change the hashing algoritm in the server, I would love to take this one step further (I hate compiletime settings, because it makes life hard for people shipping binaries). What if we could dynamically change the hashing algorithm on the server without invalidating the existing cache? Wouldn't that be cool? Since memcached supports dynamic hash expansion, it shouldn't be hard to change the hash function as well. If you take a quick look in the function assoc_find located in assoc.c you will see that if expanding is set, we need to search in the old hash-table instead of the new. This is the place where we should add our logic that if the hash function changed (and we haven't repopulated the complete hash yet), we need to recompute the hash with the old hash function.
Anyone up for the challenge of implementing:
Posted at 07:31PM Apr 02, 2009 by trond in Memcached | Comments[1]
Saturday March 28, 2009
How well is your hash table working for you?
If you look in the internals of memcached you will find a large hash table where all of the items are stored in (we hash to a bucket before we do a linary search in a linked list to try to locate the item). Memcached is supposed to grow the hash table automatically to avoid having too long lists in each bucket (that would kill the performance), but wouldn't it be cool to know how well the hash table works for you?? Well that's really easy to find out with dtrace!
So let's look at the user-supplied-bugs in libmemcached as an example. I already have memcached running on my server, so I just open up a new terminal and start the following dtrace one-liner:
trond@opensolaris> pfexec dtrace -n ':::assoc-find { @a = quantize(arg2);}'
dtrace: description ':::assoc-find ' matched 1 probe
In another terminal I just type the following commands:
trond@opensolaris> export MEMCACHED_SERVERS=localhost:11211
trond@opensolaris> ./testapp user
servers localhost:11211
localhost : 11211
user
Testing user_supplied_bug1 0.993 [ ok ]
Testing user_supplied_bug2 0.009 [ ok ]
Testing user_supplied_bug3 0.041 [ ok ]
Testing user_supplied_bug4 0.000 [ ok ]
Testing user_supplied_bug5 0.192 [ ok ]
Testing user_supplied_bug6 0.053 [ ok ]
Testing user_supplied_bug7 0.031 [ ok ]
Testing user_supplied_bug8 0.000 [ ok ]
Testing user_supplied_bug9 0.004 [ ok ]
Testing user_supplied_bug10 2.572 [ ok ]
Testing user_supplied_bug11 2.555 [ ok ]
Testing user_supplied_bug12 0.000 [ ok ]
Testing user_supplied_bug13 0.008 [ ok ]
Testing user_supplied_bug14 2.607 [ ok ]
Testing user_supplied_bug15 0.000 [ ok ]
Testing user_supplied_bug16 0.008 [ ok ]
Testing user_supplied_bug18 0.003 [ ok ]
Testing user_supplied_bug19 0.000 [ ok ]
Testing user_supplied_bug20 0.010 [ ok ]
All tests completed successfully
So let's terminate dtrace and look at the results:
^C
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 245648
1 | 589
2 | 35
4 | 0
So what does this tell us? In almost all of our requests the interesting item is the first item in the bucket :-)
Posted at 09:48AM Mar 28, 2009 by trond in Memcached | Comments[2]
Friday March 27, 2009
Debugging memcached UDF for MySQL
I got an email yesterday about a user experiencing problems when using memcached UDF for MySQL, so today I spent some time trying to recreate the problem. It turned out that the bug was caused by using uninitialized memory, so I guessed it could be a good blog documenting how I found it...
The first thing I did was to compile and install libmemcached and the udf:
trond@opensolaris> ./config/bootstrap trond@opensolaris> ./configure --prefix=/opt/memcached --enable-dependency-tracking --enable-debug --without-memcached trond@opensolaris> gmake all install trond@opensolaris> ./config/bootstrap trond@opensolaris> ./configure --prefix=/opt/memcached --with-mysql=/usr/mysql/bin/mysql_config --with-libmemcached=/opt/memcached CFLAGS=-g trond@opensolaris> gmake all install
The next thing to do is to instruct MySQL to look in /opt/memcached/lib for dynamic libraries by adding the following line in my.cfg (trond@opensolaris> pfexec vi /etc/mysql/my.cfg):
plugin_dir=/opt/memcached/lib
I wanted to use libumem.so to do memory checking in MySQL/libmemcached/UDFs, and the easiest way to do this is to just replace the mysql binary with the following script:
trond@opensolaris> cd /usr/mysql/bin trond@opensolaris> pfexec mv mysqld mysqld.bin trond@opensolaris> cat /tmp/mysqld #! /bin/ksh export UMEM_DEBUG=default export UMEM_LOGGING=transaction export LD_PRELOAD=libumem.so exec /usr/mysql/bin/mysqld.bin --skip-stack-trace "$@" trond@opensolaris> mv /tmp/mysqld . trond@opensolaris> chmod +x mysqld
So let's start MySQL and verify that it use libumem and locate the current directory (so we now where to look for the corefiles).
trond@opensolaris> svcadm enable mysql trond@opensolaris> pfexec pldd `pgrep -x mysqld.bin` 8790: /usr/mysql/bin/mysqld.bin --skip-stack-trace --user=mysql --datadir=/v /lib/libumem.so.1 /usr/lib/libmtmalloc.so.1 /usr/lib/libCrun.so.1 /lib/librt.so.1 /lib/libz.so.1 /lib/libdl.so.1 /lib/libpthread.so.1 /lib/libthread.so.1 /lib/libgen.so.1 /lib/libsocket.so.1 /lib/libnsl.so.1 /lib/libm.so.2 /usr/lib/libCstd.so.1 /usr/lib/libc/libc_hwcap1.so.1 trond@opensolaris> pfexec pwdx `pgrep -x mysqld.bin` 8790: /var/mysql/5.0/data
So let's install the UDFs:
trond@opensolaris> /usr/mysql/bin/mysql -u root < install_functions.sql
trond@opensolaris> /usr/mysql/bin/mysql -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.67 Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> select memc_servers_set('localhost:11211');
mysql> select memc_get("foobar");
ERROR 2006 (HY000): MySQL server has gone away
So the server went away? I guessed it dumped core, so let's look for a corefile:
trond@opensolaris> pfexec su - root@razor:/var/mysql/data# ls -l total 81452 -rw------- 1 mysql mysql 62261434 2009-03-27 20:44 core.mysqld.bin.8790 -rw-rw---- 1 mysql mysql 10485760 2009-03-27 20:38 ibdata1 -rw-rw---- 1 mysql mysql 5242880 2009-03-27 20:44 ib_logfile0 -rw-rw---- 1 mysql mysql 5242880 2009-02-16 10:58 ib_logfile1 drwx------ 2 mysql mysql 53 2009-02-16 10:58 mysql -rw-rw---- 1 mysql mysql 5 2009-03-27 20:44 razor.pid drwx------ 2 mysql mysql 125 2009-02-16 11:24 test
So let's start debugging the corefile to try to figure out what happened:
root@razor:/var/mysql/data# dbx - core.mysqld.bin.8790 Corefile specified executable: "/usr/mysql/5.0/bin/mysqld.bin" For information about new features see `help changes' Reading mysqld.bin core file header read successfully Reading ld.so.1 Reading libumem.so.1 Reading libmtmalloc.so.1 Reading libCrun.so.1 Reading librt.so.1 Reading libz.so.1 Reading libdl.so.1 Reading libpthread.so.1 Reading libthread.so.1 Reading libgen.so.1 Reading libsocket.so.1 Reading libnsl.so.1 Reading libm.so.2 Reading libCstd.so.1 Reading libc.so.1 Reading libmemcached_functions_mysql.so.0.0.0 Reading libmemcached.so.2.0.0 Reading libscf.so.1 Reading libuutil.so.1 Reading libmd.so.1 Reading libmp.so.2 t@13 (l@13) terminated by signal SEGV (no mapping at the fault address) Current function is memcached_quit_server 14 if (ptr->fd != -1)
It seems that the pointer is invalid, so let's take a look at it:
(dbx) print ptr ptr = 0xbaddcafe
If you look at the documentation for libumem, it will write the pattern 0xbaddcafe when you allocate memory, and 0xdeadbeef when you free memory. To me this sounds like we are using uninitialized memory. Let's take a look at the callstack:
(dbx) where current thread: t@13 =>[1] memcached_quit_server(ptr = 0xbaddcafe, io_death = '\0'), line 14 in "memcached_quit.c" [2] memcached_quit(ptr = 0x9e2d450), line 65 in "memcached_quit.c" [3] memcached_free(ptr = 0x9e2d450), line 41 in "memcached.c" [4] memc_get_deinit(initid = 0x8963edc), line 82 in "get.c" [5] Item_udf_func::cleanup(0x8963e40, 0x1, 0x871f2b9, 0x822958e), at 0x81bbfa4 [6] THD::cleanup_after_query(0x9e34008, 0x9e34008, 0x8963d18, 0x19), at 0x82295f3 [7] dispatch_command(0x3, 0x9e34008, 0x9e41589, 0x1a), at 0x825ab32 [8] handle_one_connection(0x9e34008, 0xce76f000, 0xce4fefec, 0xce6dca5e), at 0x8256f96 [9] _thrp_setup(0xce804a00), at 0xce6dca96 [10] _lwp_start(0xcdc8eaff, 0xce6d540a, 0x40, 0x64, 0x40, 0xce804a00), at 0xce6dcd20
I think we should start to look at frame 4 (frame 1, 2, 3 is inside libmemcached, but frame 4 is the first "external" call to libmemcached):
(dbx) frame 4 Current function is memc_get_deinit 82 memcached_free(&container->memc); (dbx) print container container = 0x9e2d448 (dbx) examine container/10 0x09e2d448: 0xbaddcafe 0xbaddcafe 0xbaddcafe 0xbaddcafe 0x09e2d458: 0xbaddcafe 0xbaddcafe 0xbaddcafe 0xbaddcafe 0x09e2d468: 0xbaddcafe 0xbaddcafe
It looks like the complete container-structure isn't initialized at all. Luckily we can find out where it was allocated:
root@razor:/var/mysql/data# mdb core.mysqld.bin.8790
Loading modules: [ libumem.so.1 libuutil.so.1 ld.so.1 ]
> $G
C++ symbol demangling enabled
> ::umalog
T-0.000000000 addr=9e2d440 umem_alloc_896
libumem.so.1`umem_cache_alloc_debug+0x144
libumem.so.1`umem_cache_alloc+0x19a
libumem.so.1`umem_alloc+0xcd
libumem.so.1`malloc+0x2a
libmemcached_functions_mysql.so.0.0.0`memc_get_init+0x73
bool udf_handler::fix_fields+0x66c
bool Item_udf_func::fix_fields+0x2a
bool setup_fields+0xe8
int JOIN::prepare+0x1e0
bool mysql_select+0x33e
bool handle_select+0xf7
int mysql_execute_command+0x4ac3
bool dispatch_command+0x2b9b
handle_one_connection+0x516
libc_hwcap1.so.1`_thrp_setup+0x7e
[ ... cut ... ]
Luckily for us this was the most recent memory allocation (you might have to search a loooong list to find the allocation you looked for), and we can see that the allocation came from memc_get_init. It was pretty easy to spot the allocation, since there is only one in the function:
[... cut ...] container= (memc_function_st *)malloc(sizeof(memc_function_st)); rc= memc_get_servers(&container->memc); memcached_result_create(&container->memc, &container->results); initid->ptr= (char *)container; return 0; }
Here we spot bug number 1.. we don't check the return value from memc_get_server, and if you read on you will see that it is what happens. So let's look at the function memc_get_servers:
int memc_get_servers(memcached_st *clone)
{
int retval;
memcached_st *test;
pthread_mutex_lock(&memc_servers_mutex);
test= memcached_clone(clone, master_memc);
pthread_mutex_unlock(&memc_servers_mutex);
retval= test ? 1 : 0;
return retval ;
}
This seems correct enough, but let's look at the memcached_clone function?
memcached_st *memcached_clone(memcached_st *clone, memcached_st *source)
{
memcached_return rc= MEMCACHED_SUCCESS;
memcached_st *new_clone;
if (source == NULL)
return memcached_create(clone);
if (clone && clone->is_allocated)
{
return NULL;
}
[ ... cut ... ]
Now here is something interesting. We check the member is_allocated in the clone if we should abort the cloning process. Remember that we pass in a memory chunk we allocated with malloc, so we don't know the value of this (==> undefined behavior). We do return the "error" to the caller, but the caller doesn't check the return code resulting in the fact that we have a structure of uninitialized memory (and all use of it will be "undefined").
The manual section for memcached_clone isn't clear on the above fact, so I am going to update the documentation. It is not difficult to fix the source, just replace the call to (memc_function_st *)malloc(sizeof(memc_function_st)) with calloc(1, sizeof(memc_function_st)) (no need for the cast there).. I'll be pushing a patch to the author of the UDFs .
Posted at 10:02PM Mar 27, 2009 by trond in Memcached | Comments[3]
Wednesday March 25, 2009
Drizzle Developer Day 2009
The Drizzle Developer Day 2009 is scheduled the day after the MySQL users conference. I am really looking forward to talk to you, so please sign up today :-)
Posted at 10:14PM Mar 25, 2009 by trond in Personal | Comments[0]
Bazaar plugin for Hudson
I have been using Hudson to build various software projects I am working on for some time and I really like it, so one of the first things I did when I started compiling Drizzle on OpenSolaris was to configure a new build target on my server.
Google pointed me to a Bazaar plugin, but unfortunately it didn't work well in the "master-slave" configuration. I tried to look at the source in the plugin to try to make it work, but it soon found out that it was a lot faster to just clone the Mercurial plugin and adapt it to Bazaar.
I pushed the plugin earlier today, so you may browse the source code and compile it yourself if you like (or wait for it to be listed in the available plugins for Hudson).
We are using this plugin on Drizzle to monitor the builds on one server while compiling Drizzle on other machines.
Posted at 10:07PM Mar 25, 2009 by trond in OpenSolaris | Comments[1]
Thursday March 05, 2009
Adding debugging functions to your dbx session
Have you ever been sitting in a debugging session thinking: "arg, why don't I have a function doing xyz"? I know I have! Luckily for us we don't have to terminate the debugging session, build a new version of the application with the function available and try to recreate the debugging session. There is an easy way to extend our debugging session with new functions :-)
Talking is one thing, but developers don't believe anything before they see the code. So let's go ahead and create an example.
trond@opensolaris> nl -ba main.c
1 #include <stdio.h>
2
3 struct item {
4 /* the interesting data */
5 struct item* next;
6 };
7
8 int main(int argc, char** argv) {
9 struct item items[10];
10 for (int ii = 0; ii < 10; ++ii) {
11 items[ii].next = items + ii + 1;
12 }
13 /* terminate the list */
14 items[9].next = NULL;
15
16 /* let's create a loop */
17 items[9].next = &items[7];
18
19 return 0;
20 }
trond@opensolaris> cc -o testprogram -g main.c -ldl
So let's start a debugging session, and set a breakpoint at line 17 in main.c:
trond@opensolaris> dbx testprogram Reading testprogram Reading ld.so.1 Reading libdl.so.1 Reading libc.so.1 (dbx) stop at 17 (2) stop at "main.c":17 (dbx) run Running: testprogram (process id 17090) stopped in main at line 17 in file "main.c" 17 items[9].next = &items[7]; (dbx)
So how do we verify that we don't have a loop in this list??? The first thing we need to do is to create a small C-function to do loop detection and compile it into a shared object:
trond@opensolaris> nl -ba looptest.c
1 #include <stdio.h>
2
3 struct item {
4 /* the interesting data */
5 struct item* next;
6 };
7
8 int looptest(struct item* root) {
9 struct item* lookahead = root;
10
11 while (root != NULL) {
12 if (lookahead != NULL && lookahead->next != NULL) {
13 lookahead = lookahead->next->next;
14 } else {
15 lookahead = NULL;
16 }
17 if (root == lookahead) {
18 /* loop detected */
19 return 1;
20 } else {
21 root = root->next;
22 }
23 }
24
25 /* no loop */
26 return 0;
27 }
trond@opensolaris> cc -o looptest.so -G -KPIC -g looptest.c
The trick is that we can use the call command in dbx to call a function from within the process we are debugging, and the function we want to call is dlopen. Why? dlopen will load the functions in the shared object into the address space of the process so that we can call them. Let's jump back to the debugging session:
(dbx) call dlopen("./looptest.so", 0x102)
Reading looptest.so
stopped in main at line 17 in file "main.c"
17 items[9].next = &items[7];
So what is 0x102? Well that is result of RTLD_NOW | RTLD_GLOBAL (check the dlopen manual page for more info). Now we can call the looptest function from our debugging session:
(dbx) print looptest(items) looptest(items) = 0
So let's continue the debugging and execute the next line that creates a loop in the list:
(dbx) next stopped in main at line 19 in file "main.c" 19 return 0; (dbx) print looptest(items) looptest(items) = 1
One small caveat is that you have to link your application with -ldl for this to work....
Posted at 12:04PM Mar 05, 2009 by trond in OpenSolaris | Comments[0]
| « July 2009 | ||||||
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 | |
| Today | ||||||