Trond Norbye's Weblog

Main | Next page »

http://blogs.sun.com/trond/date/20091028 Wednesday October 28, 2009

SASL support in libmemcached

In my previous blog entry I announced SASL support in the memcached server provided by Dustin Sallings. Support for SASL in the server might be a good thing to have for someone, but you need support for it in your driver in order to make use of it. Being a contributor to libmemcached, so I decided to add support for SASL there.

So how do the SASL code work? Well you enable it by calling memcached_set_sasl_callbacks with a number of callbacks (I'll get back to them shortly). Whenever libmemcached successfully connects to a server (and SASL is enabled) it will start to authenticate to the server, causing multiple packets to be exchanged between the client and the memcached server (this means that you should use persistent connections, but you know that already... didn't you?)

Let's skip the internals, and look at how you as a user should implement this. The first thing you need to do is to initialize libsasl, create (and initialize) an instance to libmemcached. Before you terminate your application you should also call the cleanup function in libsasl:

...
int main(int argc, char **argv)
{
  if (sasl_client_init(NULL) != SASL_OK)
  {
    fprintf(stderr, "Failed to initialize sasl library!\n");
    return 1;
  }

  memcached_st *memc = memcached_create(NULL);
  memcached_server_st *servers = memcached_servers_parse(servers_list);
  memcached_server_push(memc, servers);
  memcached_server_list_free(servers);
  memcached_behavior_set(memc, MEMCACHED_BEHAVIOR_BINARY_PROTOCOL, 1);


  [ .... cut ... ]

  sasl_done();
}

The next thing you need to do is to create a callback structure where you specify functions that libsasl can call when it wants authentication data from you (like username / password etc):

static sasl_callback_t sasl_callbacks[] = {
  {
    SASL_CB_USER, &get_username, NULL
  }, {
    SASL_CB_AUTHNAME, &get_username, NULL
  }, {
    SASL_CB_PASS, &get_password, NULL
  }, {
    SASL_CB_LIST_END, NULL, NULL
  }
};

And we associate the callback structure with the memcached instance by calling:

memcached_set_sasl_callbacks(memc, sasl_callbacks);

So how does get_username and get_password look like? They may be as simple as:

static char *username = "username";
static char *passwd = "secret";

static int get_username(void *context, int id, const char **result,
                        unsigned int *len)
{
  if (!result || (id != SASL_CB_USER && id != SASL_CB_AUTHNAME)) {
    return SASL_BADPARAM;
  }

  *result= username;
  if (len) {
     *len= (username == NULL) ? 0 : (unsigned int)strlen(username);
  }

  return SASL_OK;
}

static int get_password(sasl_conn_t *conn, void *context, int id,
                        sasl_secret_t **psecret)
{
  static sasl_secret_t* x;

  if (!conn || ! psecret || id != SASL_CB_PASS) {
    return SASL_BADPARAM;
  }

  if (passwd == NULL) {
     *psecret = NULL;
     return SASL_OK;
  }

  size_t len = strlen(passwd);
  x = realloc(x, sizeof(sasl_secret_t) + len);
  if (!x) {
    return SASL_NOMEM;
  }

  x->len = len;
  strcpy((void *)x->data, passwd);

  *psecret = x;
  return SASL_OK;
}

In order to try this out you need either libsasl or libsasl2 on your machine. The functionality is not merged into the development branch of libmemcached, so you will have to grab my development branch and compile from that. If you would like to follow the status for this feature you should monitor RFE 462250. You will find the branch that implements this feature at: https://code.launchpad.net/~trond-norbye/libmemcached/sasl_rfe_462250.

Happy hacking

http://blogs.sun.com/trond/date/20091025 Sunday October 25, 2009

SASL support in Memcached!

I got a merge request for adding SASL support to memcached from Dustin Sallings before the weekend. Luckily for me he had already went a couple of rounds back and forth with dormando fixing some details, so my job reviewing the code was pretty easy resulting in only one minor detail I wanted him to fix before I applied and pushed the patch. We need more documentation of the SASL support, so feel free to submit contributions!

The SASL support requires the binary protocol, so you cannot telnet to the port to test it out. If you enable SASL support, memcached will disable the ASCII protocol.

To build a memcached server with SASL support you need to pass --enable-sasl as an option to configure, and add -S as a parameter to memcached:

trond@storm > ./configure --enable-sasl
[ ... cut ... ]
checking sasl/sasl.h usability... yes
checking sasl/sasl.h presence... yes
checking for sasl/sasl.h... yes
checking for library containing sasl_server_init... -lsasl2
[ ... cut ... ]
trond@storm > ./memcached -S

Right now the only way to play with the SASL support is to test out a development build of the SPY memcached client, or by using the memcached-test program. I would suggest that you start bugging the maintainers of your favorite memcached driver asking for SASL support :-)

http://blogs.sun.com/trond/date/20091020 Tuesday October 20, 2009

Testing libmemcached on EC2

Someone pinged me yesterday about a problem he was seeing when he tried to run the test suite on Jaunty Ubuntu. The tests failed almost immediately in the following assertion:

  value= memcached_behavior_get(memc, MEMCACHED_BEHAVIOR_SOCKET_SEND_SIZE);
  assert(value > 0);

I guess I'm an old-school developer, because I want to use a debugger to hunt down bugs. For some reason the default setting in the shell was to disallow creation of corefiles, so I had to execute the following command to allow the corefiles to be written:

$ ulimit -c unlimited

Now that I was able to generate coredumps I wanted to create a "debug build" of libmemcached, because the optimizer may remove local variables etc. If I'm not able to reproduce the bug with a debug build, well then we have to debug the optimized binary. Why make life harder than it already is ;-) To create a debug build, simply invoke:

$ ./configure --with-debug

This didn't work however :-( Disabling the optimization (-O3) caused the compiler to spit out some new warnings, and we treat warnings as errors in libmemcached. It turns out that the gcc version installed on the machine was the old gcc 4.3.3, and not one of the more recent 4.4 series. I've been struggling with different problems with gcc lately (mostly that it generate bogus warnings on C99 struct initializers), so I cannot say I was too happy about "yet another compiler problem". The code it complained about was:

  unlikely (ptr->flags & MEM_USE_UDP)

With the following warning:

error: conversion to ‘long int’ from ‘uint32_t’ may change the sign of the result [-Wsign-conversion]

MEM_USE_UDP is an enum, and that's an integer according to C99 (see section 6.4.4.3), and flags is defined as an uint32_t. So yes, we are doing a bitwise and on an unsigned and a signed 32 bit word. But we are only testing if the value is 0 or not, so the sign doesn't matter at all!!! Just for the fun of it I decided to replace unlikely with a normal if (you might have had fun with the broken ntohX-macros on Linux generating warnings all of the time, so I guessed this could be a similar problem), and guess what: The warning is gone :-) So I went ahead and replaced all occurrences of unlikely with if... Not the thing you would like to do at 1:30AM :(

With the debug build available I could return to the original problem. I had been looking at the code, and my guess was that it was failing in getsockopt in the following snippet:

      int sock_size;
      socklen_t sock_length= sizeof(int);

      /* REFACTOR */
      /* We just try the first host, and if it is down we return zero */
      if ((memcached_connect(&ptr->hosts[0])) != MEMCACHED_SUCCESS)
        return 0;

      if (getsockopt(ptr->hosts[0].fd, SOL_SOCKET, 
                     SO_SNDBUF, &sock_size, &sock_length))
        return 0; /* Zero means error */

      return (uint64_t) sock_size;

I enabled a breakpoint on the line containing return 0; (so that i could look at errno) and ran the program, but guess what: It didn't fail! So it had to be a problem with memcached_connect. It turned out that this is a race-condition in the test suite, because the test program just starts up the memcached servers and start using them immediately. The memcached servers isn't done initializing themselves (and binding to the specified port) yet, so test fails to bind to the servers.

There are a number of small bugsI am going to fix in the test framework as a result of this:

  • Don't leave behind running memcached servers if the test suite fails
  • Let the memcached server choose an available port itself (so that we can run make test from multiple users at the same time)
  • Wait for the servers to initialize themselves before utilizing them

I guess it's no secret that I really prefer software development using Solaris (and all of the great tools there), so if you are planning to do development on your EC2 image I would suggest that you start off with an OpenSolaris image instead (Check out http://blogs.sun.com/ec2/). That will give you easy access to a lot of great tools I cannot live without (dtrace, dbx, cc etc), and using the right tool for the task saves a lot of time!!! As an extra bonus you can use the DTrace probes I added to memcached to collect more information on what your memcached server is doing. Matt Ingenthron took this a step further in a demo by using the output from DTrace as an input feed to a browser.. I don't remember the link, but you should be able to Google it :-)

http://blogs.sun.com/trond/date/20091007 Wednesday October 07, 2009

memcapable, part two

Today I added support for the ASCII protocol into memcapable so that it may be used to test both the binary and the ASCII protocol. By running it on the example server I added with the protocol parser in libmemcached I discovered that it failed all tests (mostly due to incorrect handling of noreply, but that is another story). It is not merged into trunk yet, so if you want to play with it today you need to branch lp:~trond-norbye/libmemcached/bugparade.

This is the result from running it on the server in my sandbox:

trond@storm> ./memcapable
ascii quit                              [pass]
ascii version                           [pass]
ascii verbosity                         [pass]
ascii set                               [pass]
ascii set noreply                       [pass]
ascii get                               [pass]
ascii gets                              [pass]
ascii mget                              [pass]
ascii flush                             [pass]
ascii flush noreply                     [pass]
ascii add                               [pass]
ascii add noreply                       [pass]
ascii replace                           [pass]
ascii replace noreply                   [pass]
ascii cas                               [pass]
ascii cas noreply                       [pass]
ascii delete                            [pass]
ascii delete noreply                    [pass]
ascii incr                              [pass]
ascii incr noreply                      [pass]
ascii decr                              [pass]
ascii decr noreply                      [pass]
ascii append                            [pass]
ascii append noreply                    [pass]
ascii prepend                           [pass]
ascii prepend noreply                   [pass]
ascii stat                              [pass]
binary noop                             [pass]
binary quit                             [pass]
binary quitq                            [pass]
binary set                              [pass]
binary setq                             [pass]
binary flush                            [pass]
binary flushq                           [pass]
binary add                              [pass]
binary addq                             [pass]
binary replace                          [pass]
binary replaceq                         [pass]
binary delete                           [pass]
binary deleteq                          [pass]
binary get                              [pass]
binary getq                             [pass]
binary getk                             [pass]
binary getkq                            [pass]
binary incr                             [pass]
binary incrq                            [pass]
binary decr                             [pass]
binary decrq                            [pass]
binary version                          [pass]
binary append                           [pass]
binary appendq                          [pass]
binary prepend                          [pass]
binary prependq                         [pass]
binary stat                             [pass]
binary illegal                          [pass]
All tests passed

I just discovered while reading the spec one more time today that some of the tests are not according to the spec, so I am going to submit bug reports on the community server and fix the tests:

  • verbosity should always return OK, but does not if it encounter illegal number of options
  • flush_all does not fail for illegal options
  • delete a b does not fail, but assumes that the second option "noreply"

Do you see any other bugs in the ASCII protocol handling in the community server?

http://blogs.sun.com/trond/date/20090921 Monday September 21, 2009

memcapable

Earlier today Matt Ingenthron blogged about the new tool memcapable I wrote a while back. In his blog Matt mentions some of the reasons why we want such a tool, but he didn't actually mention the reason for why I actually sat down to create the tool.

If you follow my blog you might remember my entry "Callback based protocol parser in libmemcached?". Before I could start implementing the parser, I really needed a tool to:

  1. send all of the defined packet structures to the server
  2. verify the response packet generated from the server

I didn't have the need for the textual protocol at the time I wrote the initial version of memcapable, so right now memcapable can only be used to test the binary protocol (not all variants of all commands are implemented in the initial version).

So how does it work? In it's simplest form you can use it to test the memcached server running on the default port on the same computer:

trond@storm> ./memcapable
noop		[pass]
quit		[pass]
quitq		[pass]
set		[pass]
setq		[pass]
flush		[pass]
flushq		[pass]
add		[pass]
addq		[pass]
replace		[pass]
replaceq		[pass]
delete		[pass]
deleteq		[pass]
get		[pass]
getq		[pass]
getk		[pass]
getkq		[pass]
incr		[pass]
incrq		[pass]
decr		[pass]
decrq		[pass]
version		[pass]
append		[pass]
appendq		[pass]
prepend		[pass]
prependq		[pass]
stat		[pass]
illegal		[pass]
All tests passed

Now this looks really nice doesn't it, but let's try to run it on a 1.2.8 version:

trond@storm> ./memcapable
noop		[FAIL]
quit		[FAIL]
quitq		[FAIL]
set		[FAIL]
setq		[FAIL]
flush		[FAIL]
flushq		[FAIL]
add		[FAIL]
addq		[FAIL]
replace		[FAIL]
replaceq		[FAIL]
delete		[FAIL]
deleteq		[FAIL]
get		[FAIL]
getq		[FAIL]
getk		[FAIL]
getkq		[FAIL]
incr		[FAIL]
incrq		[FAIL]
decr		[FAIL]
decrq		[FAIL]
version		[FAIL]
append		[FAIL]
appendq		[FAIL]
prepend		[FAIL]
prependq		[FAIL]
stat		[FAIL]
illegal		[FAIL]
28 of 28 tests failed

This shouldn't come as a big surprise, because the binary protocol isn't implemented in 1.2.8. Getting [FAIL] isn't really that informative, because it doesn't help you as a developer to figure out what's wrong. I have added a couple of options to the program that may help you to track down the real problem: -v and -c.

-v
Print out the assertion that failed
-c
Create a coredump when an assertion fails

Let's start the memcached server and disable the use of CAS and re-run memcapable with -v -c

trond@storm> ./memcapable -v -c
noop		[pass]
quit		[pass]
quitq		[pass]
set		memcapable.c:493: rsp->plain.message.header.response.cas != 0
zsh: IOT instruction (core dumped)  ./memcapable -v -c

As you can see it expects the response packet to have a CAS value set for the operation. If you would like to inspect the response packet you could load it into your debugger and poke around:

trond@storm> dbx - core
Corefile specified executable: "/source/libmemcached/memcapable/clients/memcapable"
Reading memcapable
core file header read successfully
Reading ld.so.1
Reading libm.so.2
Reading libnsl.so.1
Reading libsocket.so.1
Reading libpthread.so.1
Reading libthread.so.1
Reading libc.so.1
t@1 (l@1) program terminated by signal ABRT (Abort)
0xfffffd7fff2842aa: _lwp_kill+0x000a:	jae      _lwp_kill+0x18	[ 0xfffffd7fff2842b8, .+0xe ]
Current function is ensure
  212         abort();
(dbx) where
current thread: t@1
  [1] _lwp_kill(0x1, 0x6, 0xffffff01cdf92ae0, 0xfffffd7fff284c0e, 0x12, 0x0), at 0xfffffd7fff2842aa 
  [2] thr_kill(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff2788cd 
  [3] raise(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff227511 
  [4] abort(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff1fda41 
=>[5] ensure(val = 0, expression = 0x408498 "rsp->plain.message.header.response.cas != 0", file = 0x408198 "memcapable.c", line = 493), line 212 in "memcapable.c"
  [6] do_validate_response_header(rsp = 0xfffffd7fffdfeed8, cc = '\001', status = 0), line 493 in "memcapable.c"
  [7] test_binary_set_impl(key = 0x4087b0 "test_binary_set", cc = '\001'), line 621 in "memcapable.c"
  [8] test_binary_set(), line 651 in "memcapable.c"
  [9] main(argc = 3, argv = 0xfffffd7fffdff798), line 1196 in "memcapable.c"
(dbx) frame 6
Current function is do_validate_response_header
  493         verify(rsp->plain.message.header.response.cas != 0);
(dbx) print rsp->plain.message.header.response
rsp->plain.message.header.response = {
    magic    = '�'
    opcode   = '\001'
    keylen   = 0
    extlen   = '\0'
    datatype = '\0'
    status   = 0
    bodylen  = 0
    opaque   = 3735928559U
    cas      = 0
}

I noticed earlier today that there are some issues with the -v flag if you don't include -c (it just prints out the assertion and reports everything back up as success ;-)). I'll try to address that in the next release.

http://blogs.sun.com/trond/date/20090810 Monday August 10, 2009

Callback based protocol parser in libmemcached?

I have been working on designing a small library for memcached protocol handling. The intention of the library is to parse the memcached protocol for you, and create callbacks to your application so that you can implement the function. By using this library you should be able to add support for the memcached binary protocol to your application. Please note that the intention of this library is not to create a replacement / yet another fork of memcached!

Before I'm going ahead and spend time implementing the stuff I would like to agree upon an API. I don't care much about the ASCII protocol, so I added _binary_ in all the function nimes if anyone wants to implement something similar for the ASCII protocol.

So how does it look? Instead of just throwing my proposal for the API to you, I'll try to describe when and how each function is used. Please note that the current API proposal doesn't contain any factory methods where you can specify the memory area to use (to avoid calling malloc). Why? well I don't want waste the best function names for such methods because I don't see using that version of the factory methods as being the main usage of the library.

The first thing you need to do in your application is to create a handle to the protocol handler. To avoid locking inside the library you can _only_ access the protocol handler from one thread at a time (unless you add synchronization yourself). If you want to run with multiple threads you should let each thread have its own instance of the protocol handler instead. The function looks like:

/**
* Create and initialize an instance of the protocol handler.
* Please note that the library does not copy the
* callback structure, and you may use the same callback structure
* for all of your instances of the library. You must
* ensure that the memory is valid throughout the use of the instances.
*
* @param callback The callbacks to use from this protocol handler.
* @return NULL if allocation of an instance fails
*/
LIBMEMCACHED_API
struct memcached_binary_protocol_st *memcached_binary_protocol_create_instance(struct memcached_binary_protocol_callback_st *callback);

We will get back to a description of the callback structure later on. You release the instance when you are done using the library with the following function:

/**
* Destroy an instance of the protocol handler
*
* @param instance The instance to destroy
*/
LIBMEMCACHED_API
void memcached_binary_protocol_destroy_instance(struct memcached_binary_protocol_st *instance);

With a handle to the protocol library you can listen to a server socket and accept new clients. When a new client connects to the socket, you need to create a client structure and associate it with the socket:

/**
* Create a new client instance and associate it with a socket
* @param instance the protocol instance to bind the client to
* @param sock the client socket
* @return NULL if allocation fails, otherwise an instance
*/
LIBMEMCACHED_API
struct memcached_binary_protocol_client_st *memcached_binary_protocol_create_client(struct memcached_binary_protocol_st *instance, int sock);

With the client connection in hand, we can tell the protocol library to start to work on the client by calling:

enum MEMCACHED_BINARY_PROTOCOL_EVENT { ERROR_EVENT, READ_EVENT, WRITE_EVENT, READ_WRITE_EVENT };

/**
* Let the client do some work. This might involve reading / sending data
* to/from the client, or perform callbacks to execute a command.
* @param client the client structure to work on
* @return The next event the protocol handler will be notified for
*/
LIBMEMCACHED_API
enum MEMCACHED_BINARY_PROTOCOL_EVENT memcached_binary_protocol_client_work(struct memcached_binary_protocol_client_st *client);

This function will try to read data from the network and fire the callbacks with the given commands, and return the events it is interested of being notified on. If ERROR_EVENT is returned you should close the socket and destroy the client handle with:

/**
* Destroy a client handle.
* The caller needs to close the socket accociated with the client
* before calling this function. This function invalidates the
* client memory area.
*
* @param client the client to destroy
*/
LIBMEMCACHED_API
void memcached_binary_protocol_client_destroy(struct memcached_binary_protocol_client_st *client);

That's all you _need_ to know, but there is also some utility functions:

/**
* Get the socket attached to a client handle
* @param client the client to query
* @return the socket handle
*/
LIBMEMCACHED_API
int memcached_binary_protocol_client_get_socket(struct memcached_binary_protocol_client_st *client);

/**
* Get the error id socket attached to a client handle
* @param client the client to query for an error code
* @return the OS error code from the client
*/
LIBMEMCACHED_API
int memcached_binary_protocol_client_get_errno(struct memcached_binary_protocol_client_st *client);

Earlier I told you that I would come back to the callback structures, so let's start describing them. The memcached_binary_protocol_callback_st is the _only_ structure in the protocol handler library you are allowed to touch the internals of, and it is used to specify the callbacks you are interested in:

struct memcached_binary_protocol_callback_st {
 /**
  * The interface version used (set to 0 if you don't have any specialized
  * command handlers).
  */
 uint64_t interface_version;

 /**
  * Callback fired just before the command will be executed.
  *
  * @param cookie id of the client receiving the command
  * @param header the command header as received on the wire. If you look
  *               at the content you must ensure that you don't
  *               try to access beyond the end of the message.
  */
 void (*pre_execute)(const void *cookie,
                     protocol_binary_request_header *header);
 /**
  * Callback fired just after the command was exected (please note
  * that the data transfer back to the client is not finished at this
  * time).
  *
  * @param cookie id of the client receiving the command
  * @param header the command header as received on the wire. If you look
  *               at the content you must ensure that you don't
  *               try to access beyond the end of the message.
  */
 void (*post_execute)(const void *cookie,
                      protocol_binary_request_header *header);

 /**
  * Callback fired if no specialized callback is registered for this
  * specific command code.
  *
  * @param cookie id of the client receiving the command
  * @param header the command header as received on the wire. You must
  *               ensure that you don't try to access beyond the end of the
  *               message.
  * @param response_handler The response handler to send data back.
  */
 protocol_binary_response_status (*unknown)(const void *cookie,
                                            protocol_binary_request_header *header,
                                            memcached_binary_protocol_response_handler response_handler);

 /**
  * The different interface levels we support. A pointer is used so the
  * size of the structure is fixed. You must ensure that the memory area
  * passed as the pointer is valid as long as you use the protocol handler.
  */
 union {
    /**
     * The first version of the callback struct containing all of the
     * documented commands in the initial release of the binary protocol
     * (aka. memcached 1.4.0).
     */
    struct memcached_binary_protocol_callback_v1_st *v1;
 } interface;
};

The memcached_binary_protocol_response_handler is a function you need to call to send data back to the client:

/**
* Each command-callback will supply a response-handler so that you can
* send data back to the client.
*
* @param cookie Just pass along the cookie supplied in the callback
* @param status The status code for your reply (see protocol_binary.h)
*               for legal values.
* @param key What to insert as key in the reply (may be NIL)
* @param keylen The length of the key (should be 0 if key is NIL)
* @param body What to store in the body of the package (may be NIL)
* @param bodylen The number of bytes of the body (should be 0 if
*                body is NIL)
* @param cas The CAS value to insert into the response (should be 0
*            if you don't care)
* @param datatype Should be PROTOCOL_BINARY_RAW_BYTES
*
*/
typedef void (*memcached_binary_protocol_response_handler)(const void *cookie,
                                                         protocol_binary_response_status status,
                                                         const void *key,
                                                         uint16_t keylen,
                                                         const void *body,
                                                         uint32_t bodylen,
                                                         uint64_t cas,
                                                         protocol_binary_datatypes datatype);

So the simplest example for you would be:

static struct memcached_binary_protocol_callback_st callback= { .unknown= my_function_callback; };

struct memcached_binary_protocol_st *handle;
handle= memcached_binary_protocol_create_instance(&callback);

It wouldn't help you much if you had to read the spec to get all the juicy details on how the protocol looks for all of the different commands, and thats what the interface-union is used for. My proposal for v1 looks like:

/**
* The first version of the callback struct containing all of the
* documented commands in the initial release of the binary protocol
* (aka. memcached 1.4.0).
*
* You might miss the Q commands (addq etc) but the response function
* knows how to deal with them so you don't need to worry about that :-)
*/
struct memcached_binary_protocol_callback_v1_st {
 /**
  * Add an item to the cache
  * @param cookie id of the client receiving the command
  * @param key the key to add
  * @param len the length of the key
  * @param val the value to store for the key (may be NIL)
  * @param vallen the length of the data
  * @param flags the flags to store with the key
  * @param exptime the expiry time for the key-value pair
  * @param response_handler to send the result back to the client.
  */
 protocol_binary_response_status (*add)(const void *cookie,
                                        const void *key,
                                        uint16_t keylen,
                                        const void* val,
                                        uint32_t vallen,
                                        uint32_t flags,
                                        uint32_t exptime,
                                        memcached_binary_protocol_response_handler response_handler);

 /**
  * Append data to an existing key-value pair.
  *
  * @param cookie id of the client receiving the command
  * @param key the key to add data to
  * @param len the length of the key
  * @param val the value to append to the value
  * @param vallen the length of the data
  * @param cas the CAS in the request
  * @param response_handler to send the result back to the client
  *
  */
 protocol_binary_response_status (*append)(const void *cookie,
                                           const void *key,
                                           uint16_t keylen,
                                           const void* val,
                                           uint32_t vallen,
                                           uint64_t cas,
                                           memcached_binary_protocol_response_handler response_handler);

 /**
  * Decrement the value for a key
  *
  * @param cookie id of the client receiving the command
  * @param key the key to decrement the value for
  * @param len the length of the key
  * @param delta the amount to decrement
  * @param initial initial value to store (if the key doesn't exist)
  * @param expiration expiration time for the object (if the key doesn't exist)
  * @param cas the CAS in the request
  * @param response_handler to send the result back to the client
  *
  */
 protocol_binary_response_status (*decrement)(const void *cookie,
                                              const void *key,
                                              uint16_t keylen,
                                              uint64_t delta,
                                              uint64_t initial,
                                              uint32_t expiration,
                                              memcached_binary_protocol_response_handler response_handler);

 /**
  * Delete an existing key
  *
  * @param cookie id of the client receiving the command
  * @param key the key to delete
  * @param len the length of the key
  * @param cas the CAS in the request
  * @param response_handler to send the result back to the client
  */
 protocol_binary_response_status (*delete)(const void *cookie,
                                           const void *key,
                                           uint16_t keylen,
                                           uint64_t cas,
                                           memcached_binary_protocol_response_handler response_handler);


 /**
  * Flush the cache
  *
  * @param cookie id of the client receiving the command
  * @param when when the cache should be flushed (0 == immediately)
  * @param response_handler to send the result back to the client
  */
 protocol_binary_response_status (*flush)(const void *cookie,
                                          uint32_t when,
                                          memcached_binary_protocol_response_handler response_handler);



 /**
  * Get a key-value pair
  *
  * @param cookie id of the client receiving the command
  * @param key the key to get
  * @param len the length of the key
  * @param response_handler to send the result back to the client
  */
 protocol_binary_response_status (*get)(const void *cookie,
                                        const void *key,
                                        uint16_t keylen,
                                        memcached_binary_protocol_response_handler response_handler);

 /**
  * Increment the value for a key
  *
  * @param cookie id of the client receiving the command
  * @param key the key to increment the value on
  * @param len the length of the key
  * @param delta the amount to increment
  * @param initial initial value to store (if the key doesn't exist)
  * @param expiration expiration time for the object (if the key doesn't exist)
  * @param cas the CAS in the request
  * @param response_handler to send the result back to the client
  *
  */
 protocol_binary_response_status (*increment)(const void *cookie,
                                              const void *key,
                                              uint16_t keylen,
                                              uint64_t delta,
                                              uint64_t initial,
                                              uint32_t expiration,
                                              memcached_binary_protocol_response_handler response_handler);

 /**
  * The noop command was received. This is just a notification callback (the
  * response is automatically created).
  *
  * @param cookie id of the client receiving the command
  */
 protocol_binary_response_status (*noop)(const void *cookie);

 /**
  * Prepend data to an existing key-value pair.
  *
  * @param cookie id of the client receiving the command
  * @param key the key to prepend data to
  * @param len the length of the key
  * @param val the value to prepend to the value
  * @param vallen the length of the data
  * @param cas the CAS in the request
  * @param response_handler to send the result back to the client
  *
  */
 protocol_binary_response_status (*prepend)(const void *cookie,
                                            const void *key,
                                            uint16_t keylen,
                                            const void* val,
                                            uint32_t vallen,
                                            uint64_t cas,
                                            memcached_binary_protocol_response_handler response_handler);

 /**
  * The quit command was received. This is just a notification callback (the
  * response is automatically created).
  *
  * @param cookie id of the client receiving the command
  */
 protocol_binary_response_status (*quit)(const void *cookie);


 /**
  * Replace an existing item to the cache
  *
  * @param cookie id of the client receiving the command
  * @param key the key to replace the content for
  * @param len the length of the key
  * @param val the value to store for the key (may be NIL)
  * @param vallen the length of the data
  * @param flags the flags to store with the key
  * @param exptime the expiry time for the key-value pair
  * @param cas the cas id in the request
  * @param response_handler to send the result back to the client.
  */
 protocol_binary_response_status (*replace)(const void *cookie,
                                            const void *key,
                                            uint16_t keylen,
                                            const void* val,
                                            uint32_t vallen,
                                            uint32_t flags,
                                            uint32_t exptime,
                                            uint64_t cas,
                                            memcached_binary_protocol_response_handler response_handler);


 /**
  * Set a key-value pair in the cache
  *
  * @param cookie id of the client receiving the command
  * @param key the key to insert
  * @param len the length of the key
  * @param val the value to store for the key (may be NIL)
  * @param vallen the length of the data
  * @param flags the flags to store with the key
  * @param exptime the expiry time for the key-value pair
  * @param response_handler to send the result back to the client.
  */
 protocol_binary_response_status (*set)(const void *cookie,
                                        const void *key,
                                        uint16_t keylen,
                                        const void* val,
                                        uint32_t vallen,
                                        uint32_t flags,
                                        uint32_t exptime,
                                        memcached_binary_protocol_response_handler response_handler);

 /**
  * Get status information
  *
  * @param cookie id of the client receiving the command
  * @param key the key to get status for (or NIL to request all status).
  *            Remember to insert the terminating packet if multiple
  *            packets should be returned.
  * @param keylen the length of the key
  * @param response_handler to send the result back to the client, but
  *                         don't send reply on success!
  *
  */
 protocol_binary_response_status (*stat)(const void *cookie,
                                         const void *key,
                                         uint16_t keylen,
                                         memcached_binary_protocol_response_handler response_handler);

 /**
  * Get the version information
  *
  * @param cookie id of the client receiving the command
  * @param response_handler to send the result back to the client, but
  *                         don't send reply on success!
  *
  */
 protocol_binary_response_status (*version)(const void *cookie,
                                            memcached_binary_protocol_response_handler response_handler);
};

Comments?

http://blogs.sun.com/trond/date/20090710 Friday July 10, 2009

Memcached 1.4.0 released!

We released memcached 1.4.0 earlier today! It is more than two years since we started the work on one of the most important part of this release, the binary protocol. I guess a lot of users doesn't really care about the binary protocol, but it makes the implementation of other features easier (the replication in libmemcached is only available if you use the binary protocol).

Other highlights in 1.4.0:

  • Performance improvements (removed lock contention)
  • Run without privileges on Solaris
  • More statistics information

Check out the full release notes at http://code.google.com/p/memcached/wiki/ReleaseNotes140.

http://blogs.sun.com/trond/date/20090706 Monday July 06, 2009

Scale beyond 8 cores?

If you try to benchmark a memcached server, you will see that it scales relatively good up to 4 threads, and if you go beyond 8 threads the throughput will start to drop. People have been talking about the scalability problems of the memcached server on the mailing lists, IRC and on the hackathons with various solutions to the problems. If you look at the implementation of the memcached server, you will see that it use only a limited set of locks:

LockPurpose
slabs_lockThe internal memory management of memcached use one single lock to protect access to the internal memory allocator. People tend to want to split this mutex into a separate mutex per slab class, but I don't think that this will give you any measurable performance wins. Why? well the _only_ time we will try to access this lock is during an operation that will modify an entry in the cache (add, set, replace, append, prepend, incr, decr). If this lock really is a performance bottleneck, your GET load is lower than your SET load and that is not the typical use pattern of memcached (and I would like to optimize for the common case...)
stats_lockIn 1.2.x all modifications of statistical information was protected by this single lock. plockstat revealed that we had mutex contention on this mutex, so in 1.4 most of the statistics are collected per thread, and aggregated when you call the stats command.
conn_lockAccess to the list of connection structures is guarded by this lock. If this lock comes up when you try to benchmark your memcached server, you would be way better off by reusing your connections to the memcached server.
cache_lockAll access to the internal hash table is protected by this lock, and this is the lock I'm going to talk a bit more about!

If you look at the implementation of the memcached server, it stores all of the items in one large hash map. Every time we need to insert, delete or search the hash table, we will try to get exclusive access to the entire hash table. And this is pretty much all that memcached does ;-) (You send a request, it looks up the item, and sends it back to you).

So how can we easily fix this problem without rewriting everyting? Well we could partition up the hash into multiple partitions and only lock down a subset of the hash instead of the complete hash. To avoid extra locking to update the LRU (if it spans two partitions) list I decided to let each partition have it's own LRU list. This sounded like a pretty easy fix to implement, so I just grabbed my laptop and implemented it while watching C.S.I on TV with my girlfriend one night :-)

So how did it scale? Zoran Radovic wrote the following blog entry: Scaling Memcached: 500,000+ Operations/Second with a Single-Socket UltraSPARC T2. If you are interested in the code, you can get it from http://github.com/trondn/memcached/tree/partition.

http://blogs.sun.com/trond/date/20090625 Thursday June 25, 2009

Replicate your keys to multiple memcached servers.

If you look at a how the (community version) of memcached works, all servers are completely isolated from each other. They don't know (or care) about the existence of other servers, and all advanced logic is implemented by the clients. This removes a lot of complexity from the server, resulting in a small clean source base with few bugs. You will also find this simple design in the client-server protocol, reducing what you can try to implement in the server.

If you scan the mailing lists you will find that requests for replication seems to pop up with a regular interval, so I decided to give it a shot. Personally I am not too interested in a full replicated scenario (where you have all of your keys stored on multiple machines), because I think you would be wasting too much space. I think a mixed mode is more interesting, where you store only a few of the items on multiple servers; and this is what I implemented.

If you look at the design for the replication from a 1000ft, it is dead simple. When we store a key on the server, we will also store it on the n'th next servers. If we encounter a problem when we try to send the GET request to the server we try fetch the replica instead. We will however not try to fetch the replica if:

  • The server crash before sending the response back. This will result in a cache miss (because we don't have any state withing libmemcached to recreate GET-request so that we can send it to the next replica server.)
  • The server doesn't have the item. A cache miss will be returned to the caller immediately, because trying the replica servers would cause long delays for real cache misses.

If you want to try it out you need to grab at least revision 539, but you should be aware of some design choices / limitations:

  • It is only supported with the binary protocol, so you cannot use a memcached server from the 1.2 series (you need the 1.4 branch).

    Why? Well the replication code use the "noreply" mode to store the replicas, and the "noreply" mode in the ASCII protocol is just one big hack ;-)

  • SET is the only command that will store multiple replicas.

    The replication code does not implement any kind of transactions / consistency, so I wanted to expose this fact to the user. Allowing ADD or REPLACE could confuse the users and introduce strange bugs in their application. INCR and DECR raise the same inconsistency problems. If you have an atomic counter (at least if it doesn't get evicted from the cache) you don't want it to behave strangely because of race conditions updating the replicas.

  • The CAS identified is generated on the server, so the master item and all replicas will have different CAS identifiers. If you enable replication you can't use CAS
  • We don't detect (and handle) network partition
  • If you run several memcached instances on the same server, you don't want to list them next to each other in the server list. The replication works in such a way that it will hash the key to locate the server the object belongs on, and it will store it on the n next servers in the list. If you list memcached instances on the same server next to each other, you might end up having the master and all of the replicas on the same server.
  • If you use consistent hashing you can grow your pool without blowing the complete cache
  • The replication works on a per memcached_st instance, so the API stays the same (and adds no extra costs if you don't use it

Well, I guess a lot of you don't like reading text that don't end each statement with a semicolon, so I should probably add some code. First you should locate the code where you create your memcached_st handle. You probably have something like (I removed the error checking to keep the example small, but you don't want to do that in your code!!!!):

   memcached_st *memc = memcached_create(NULL);
   memcached_server_st *servers = memcached_servers_parse(server_list);
   memcached_server_push(memc, servers);
   memcached_server_list_free(servers);

The first thing we need to do is to enable the binary protocol:

   memcached_behavior_set(memc, MEMCACHED_BEHAVIOR_BINARY_PROTOCOL, 1);

As I mentioned above, I don't think you really want to replicate all of your keys, so let's create a new memcached_st instance and enable replication there (num_replicas contains the number of replicas I want):

   memcached_st repl = memcached_clone(NULL, memc);
   memcached_behavior_set(repl, MEMCACHED_BEHAVIOR_NUMBER_OF_REPLICAS, num_replicas);

And that's all you need to do! If you want to store a key with multiple replicas, you would go ahead and store it using the repl instance. For "normal" items, you would use the memc instance:

  /* Store a key with replicas: */
  memcached_set(repl, "replicated", 10, "foo", 3, 0, 0);
  /* Try to get the item (or the replicas if we have problems talking to the master) */
  void* value = memcached_get(repl, "replicated", 10, &vlen, &flags, &rc);
  /* Store a without replicas */
  memcached_set(memc, "single", 6, "foo", 3, 0, 0);
  /* Try to get the item */
  void* value = memcached_get(memc, "single", 6, &vlen, &flags, &rc);
  /* We can also get the master of a replicated item: */
  void* value = memcached_get(memc, "replicated", 10, &vlen, &flags, &rc);

http://blogs.sun.com/trond/date/20090512 Tuesday May 12, 2009

Connection pooling libmemcached

A while back I looked at the Memcached UDF for MySQL, and noticed that it didn't use libmemcached in an optimal way. In order to work in a multithreaded environment it used the following pattern:

   memcached_st* clone = memcached_clone(NULL, memc);

   ... memcached operations using the clone ---   

   memcached_free(clone);

Well, that doesn't look bad, does it? Well, it isn't that bad, but if you look at the network traffic you will see that we end up connecting / disconnecting to the involved memcached servers every time, and memcached is not optimized for "single-shot" connections.

So how should you solve this? Well, you should reuse your clones! And luckily for you, you don't have to reinvent the wheel. Yesterday I pushed a patch to libmemcached introducing a new library: libmemcachedutil. The intention of that library is to put utility functions built on top of libmemcached that you might want to use in your application, and the first routine there is the pool functionality.

So let's write some code using the new library:

#include <pthread.h>
#include <stdbool.h>
#include <signal.h>

#include "libmemcached/memcached_util.h"


static volatile bool run = true;

static void sig_handler(int sig) {
    assert(sig == SIGINT);
    run = false;
}

static void* my_application_thread(void *arg) 
{
  memcached_pool_st* pool = arg;

  while (run) {
    memcached_return rc;
    memcached_st* mem = memcached_pool_pop(pool, true, &rc);

    if (mem != NULL) {
      ... use the memcached handle for whatever you want! ...

      /* Return the instance to the pool */
      if (memcached_pool_push(pool, mem) != MEMCACHED_SUCCESS) {
        fprintf(stderr, "Failed to release the memcached instance!\n");
      }
    } else {
      fprintf(stderr, "Failed to get the memcached instance from pool!\n");
    }
  }

  return NULL;
}

int main(int argc, char** argv)
{
  memcached_st* memc = memcached_create(NULL);
  if (memc == NULL) {
    fprintf(stderr, "Failed to create memcached instance\n");
    return 1;
  }

  if (memcached_server_add(memc, "localhost", 11211) != MEMCACHED_SUCCESS) {
    fprintf(stderr, "Failed to add localhost to the server pool\n");
    memcached_free(memc);
    return 1;
  }
 
  memcached_pool_st* pool= memcached_pool_create(memc, 5, 10);
  if (pool == NULL) {
    fprintf(stderr, "Failed to create connection pool\n");
    memcached_free(memc);
    return 1;
  }

  signal(SIGINT, sig_handler);

  /* create 10 threads to use the pool */
  pthread tid[10];
  for (int x= 0; x < 10; ++x) {
    pthread_create(&tid[x], NULL, my_application_thread, pool);
  }

  for (int x= 0; x < 10; ++x) {
    pthread_join(&tid[x], NULL);
  }

  /* Release allocated resources */
  memcached_pool_destroy(pool);
  memcached_free(memc);

  return 0;
}

http://blogs.sun.com/trond/date/20090423 Thursday April 23, 2009

Presentation at the MySQL Users Conference

Earlier today I did the presentation Memcached Meet Flash, the pluggable engine interface, and if you missed it you can download the slides. It is kind of fun to think back on the hackathon at the users conference the last year when Toru shared his ideas about a storage interface, followed by the interesting discussion I had with Matt during the OpenSolaris summit down in Santa Clara. I didn't know back then that I would present this at the users conference this year :-)

My brother came down for my presentation and took the following picture with his iPhone during the session:

If you have any questions regarding the slides, come look me up at the hackathon tonight :)

http://blogs.sun.com/trond/date/20090402 Thursday April 02, 2009

Pluggable hashing algorithm in memcached?

In my blog post How well is your hash table working for you?, I pointed out that your keys could give you a bad distribution in the internal hash table inside memcached. Right now there is not much you can do apart from using another algorithm to generate your keys, but that may not be the easiest thing to do. Wouldnt it be cooler if you could just use another hashing algorithm instead?

I talked with Brian Aker on IRC the other day, and he pointed out that libmemcached contains a handfull of different algorithms (and is covered by the same license as the memcached server) so we could actually use the hashing routines from libmemcached in the server. The first thing we should do is probably to create a "hashing benchmark tool" in libmemcached that reads an input file of keys and determines the best hashing algorithm to use based upon speed and distribution. With the benchmark in place we could add a new configure option to memcached --with-hashing-algorithm=algorithm (this would of course require that we have libmemcached installed).

With the posibility to change the hashing algoritm in the server, I would love to take this one step further (I hate compiletime settings, because it makes life hard for people shipping binaries). What if we could dynamically change the hashing algorithm on the server without invalidating the existing cache? Wouldn't that be cool? Since memcached supports dynamic hash expansion, it shouldn't be hard to change the hash function as well. If you take a quick look in the function assoc_find located in assoc.c you will see that if expanding is set, we need to search in the old hash-table instead of the new. This is the place where we should add our logic that if the hash function changed (and we haven't repopulated the complete hash yet), we need to recompute the hash with the old hash function.

Anyone up for the challenge of implementing:

  • A key hashing benchmark program (in libmemcached)
  • Add configure option to memcached that detects and links with libmemcached, and overrides the default hashing algorithm
  • Create a new command to set the hashing algorithm runtime

http://blogs.sun.com/trond/date/20090328 Saturday March 28, 2009

How well is your hash table working for you?

If you look in the internals of memcached you will find a large hash table where all of the items are stored in (we hash to a bucket before we do a linary search in a linked list to try to locate the item). Memcached is supposed to grow the hash table automatically to avoid having too long lists in each bucket (that would kill the performance), but wouldn't it be cool to know how well the hash table works for you?? Well that's really easy to find out with dtrace!

So let's look at the user-supplied-bugs in libmemcached as an example. I already have memcached running on my server, so I just open up a new terminal and start the following dtrace one-liner:

trond@opensolaris> pfexec dtrace -n ':::assoc-find { @a = quantize(arg2);}'
dtrace: description ':::assoc-find ' matched 1 probe

In another terminal I just type the following commands:

trond@opensolaris> export MEMCACHED_SERVERS=localhost:11211
trond@opensolaris> ./testapp user
servers localhost:11211
        localhost : 11211


user

Testing user_supplied_bug1                                       0.993 [ ok ]
Testing user_supplied_bug2                                       0.009 [ ok ]
Testing user_supplied_bug3                                       0.041 [ ok ]
Testing user_supplied_bug4                                       0.000 [ ok ]
Testing user_supplied_bug5                                       0.192 [ ok ]
Testing user_supplied_bug6                                       0.053 [ ok ]
Testing user_supplied_bug7                                       0.031 [ ok ]
Testing user_supplied_bug8                                       0.000 [ ok ]
Testing user_supplied_bug9                                       0.004 [ ok ]
Testing user_supplied_bug10                                      2.572 [ ok ]
Testing user_supplied_bug11                                      2.555 [ ok ]
Testing user_supplied_bug12                                      0.000 [ ok ]
Testing user_supplied_bug13                                      0.008 [ ok ]
Testing user_supplied_bug14                                      2.607 [ ok ]
Testing user_supplied_bug15                                      0.000 [ ok ]
Testing user_supplied_bug16                                      0.008 [ ok ]
Testing user_supplied_bug18                                      0.003 [ ok ]
Testing user_supplied_bug19                                      0.000 [ ok ]
Testing user_supplied_bug20                                      0.010 [ ok ]

All tests completed successfully

So let's terminate dtrace and look at the results:

^C


           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 245648   
               1 |                                         589      
               2 |                                         35       
               4 |                                         0

So what does this tell us? In almost all of our requests the interesting item is the first item in the bucket :-)

http://blogs.sun.com/trond/date/20090327 Friday March 27, 2009

Debugging memcached UDF for MySQL

I got an email yesterday about a user experiencing problems when using memcached UDF for MySQL, so today I spent some time trying to recreate the problem. It turned out that the bug was caused by using uninitialized memory, so I guessed it could be a good blog documenting how I found it...

The first thing I did was to compile and install libmemcached and the udf:

trond@opensolaris> ./config/bootstrap
trond@opensolaris> ./configure --prefix=/opt/memcached --enable-dependency-tracking --enable-debug --without-memcached
trond@opensolaris> gmake all install
trond@opensolaris> ./config/bootstrap
trond@opensolaris> ./configure --prefix=/opt/memcached --with-mysql=/usr/mysql/bin/mysql_config --with-libmemcached=/opt/memcached CFLAGS=-g
trond@opensolaris> gmake all install

The next thing to do is to instruct MySQL to look in /opt/memcached/lib for dynamic libraries by adding the following line in my.cfg (trond@opensolaris> pfexec vi /etc/mysql/my.cfg):

plugin_dir=/opt/memcached/lib

I wanted to use libumem.so to do memory checking in MySQL/libmemcached/UDFs, and the easiest way to do this is to just replace the mysql binary with the following script:

trond@opensolaris> cd /usr/mysql/bin
trond@opensolaris> pfexec mv mysqld mysqld.bin
trond@opensolaris> cat /tmp/mysqld
#! /bin/ksh
export UMEM_DEBUG=default
export UMEM_LOGGING=transaction
export LD_PRELOAD=libumem.so
exec /usr/mysql/bin/mysqld.bin --skip-stack-trace "$@"
trond@opensolaris> mv /tmp/mysqld .
trond@opensolaris> chmod +x mysqld

So let's start MySQL and verify that it use libumem and locate the current directory (so we now where to look for the corefiles).

trond@opensolaris> svcadm enable mysql
trond@opensolaris> pfexec pldd `pgrep -x mysqld.bin`
8790:   /usr/mysql/bin/mysqld.bin --skip-stack-trace --user=mysql --datadir=/v
/lib/libumem.so.1
/usr/lib/libmtmalloc.so.1
/usr/lib/libCrun.so.1
/lib/librt.so.1
/lib/libz.so.1
/lib/libdl.so.1
/lib/libpthread.so.1
/lib/libthread.so.1
/lib/libgen.so.1
/lib/libsocket.so.1
/lib/libnsl.so.1
/lib/libm.so.2
/usr/lib/libCstd.so.1
/usr/lib/libc/libc_hwcap1.so.1
trond@opensolaris> pfexec pwdx `pgrep -x mysqld.bin`
8790:   /var/mysql/5.0/data

So let's install the UDFs:

trond@opensolaris> /usr/mysql/bin/mysql -u root < install_functions.sql
trond@opensolaris> /usr/mysql/bin/mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.67 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select memc_servers_set('localhost:11211');
mysql> select memc_get("foobar");
ERROR 2006 (HY000): MySQL server has gone away

So the server went away? I guessed it dumped core, so let's look for a corefile:

trond@opensolaris> pfexec su -
root@razor:/var/mysql/data# ls -l 
total 81452
-rw------- 1 mysql mysql 62261434 2009-03-27 20:44 core.mysqld.bin.8790
-rw-rw---- 1 mysql mysql 10485760 2009-03-27 20:38 ibdata1
-rw-rw---- 1 mysql mysql  5242880 2009-03-27 20:44 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 2009-02-16 10:58 ib_logfile1
drwx------ 2 mysql mysql       53 2009-02-16 10:58 mysql
-rw-rw---- 1 mysql mysql        5 2009-03-27 20:44 razor.pid
drwx------ 2 mysql mysql      125 2009-02-16 11:24 test

So let's start debugging the corefile to try to figure out what happened:

root@razor:/var/mysql/data# dbx - core.mysqld.bin.8790 
Corefile specified executable: "/usr/mysql/5.0/bin/mysqld.bin"
For information about new features see `help changes'
Reading mysqld.bin
core file header read successfully
Reading ld.so.1
Reading libumem.so.1
Reading libmtmalloc.so.1
Reading libCrun.so.1
Reading librt.so.1
Reading libz.so.1
Reading libdl.so.1
Reading libpthread.so.1
Reading libthread.so.1
Reading libgen.so.1
Reading libsocket.so.1
Reading libnsl.so.1
Reading libm.so.2
Reading libCstd.so.1
Reading libc.so.1
Reading libmemcached_functions_mysql.so.0.0.0
Reading libmemcached.so.2.0.0
Reading libscf.so.1
Reading libuutil.so.1
Reading libmd.so.1
Reading libmp.so.2
t@13 (l@13) terminated by signal SEGV (no mapping at the fault address)
Current function is memcached_quit_server
   14     if (ptr->fd != -1)

It seems that the pointer is invalid, so let's take a look at it:

(dbx) print ptr
ptr = 0xbaddcafe

If you look at the documentation for libumem, it will write the pattern 0xbaddcafe when you allocate memory, and 0xdeadbeef when you free memory. To me this sounds like we are using uninitialized memory. Let's take a look at the callstack:

(dbx) where
current thread: t@13
=>[1] memcached_quit_server(ptr = 0xbaddcafe, io_death = '\0'), line 14 in "memcached_quit.c"
  [2] memcached_quit(ptr = 0x9e2d450), line 65 in "memcached_quit.c"
  [3] memcached_free(ptr = 0x9e2d450), line 41 in "memcached.c"
  [4] memc_get_deinit(initid = 0x8963edc), line 82 in "get.c"
  [5] Item_udf_func::cleanup(0x8963e40, 0x1, 0x871f2b9, 0x822958e), at 0x81bbfa4 
  [6] THD::cleanup_after_query(0x9e34008, 0x9e34008, 0x8963d18, 0x19), at 0x82295f3 
  [7] dispatch_command(0x3, 0x9e34008, 0x9e41589, 0x1a), at 0x825ab32 
  [8] handle_one_connection(0x9e34008, 0xce76f000, 0xce4fefec, 0xce6dca5e), at 0x8256f96 
  [9] _thrp_setup(0xce804a00), at 0xce6dca96 
  [10] _lwp_start(0xcdc8eaff, 0xce6d540a, 0x40, 0x64, 0x40, 0xce804a00), at 0xce6dcd20 

I think we should start to look at frame 4 (frame 1, 2, 3 is inside libmemcached, but frame 4 is the first "external" call to libmemcached):

(dbx) frame 4
Current function is memc_get_deinit
   82     memcached_free(&container->memc);
(dbx) print container
container = 0x9e2d448
(dbx) examine container/10
0x09e2d448:      0xbaddcafe 0xbaddcafe 0xbaddcafe 0xbaddcafe
0x09e2d458:      0xbaddcafe 0xbaddcafe 0xbaddcafe 0xbaddcafe
0x09e2d468:      0xbaddcafe 0xbaddcafe

It looks like the complete container-structure isn't initialized at all. Luckily we can find out where it was allocated:

root@razor:/var/mysql/data# mdb core.mysqld.bin.8790 
Loading modules: [ libumem.so.1 libuutil.so.1 ld.so.1 ]
> $G
C++ symbol demangling enabled
> ::umalog

T-0.000000000  addr=9e2d440  umem_alloc_896
         libumem.so.1`umem_cache_alloc_debug+0x144
         libumem.so.1`umem_cache_alloc+0x19a
         libumem.so.1`umem_alloc+0xcd
         libumem.so.1`malloc+0x2a
         libmemcached_functions_mysql.so.0.0.0`memc_get_init+0x73
         bool udf_handler::fix_fields+0x66c
         bool Item_udf_func::fix_fields+0x2a
         bool setup_fields+0xe8
         int JOIN::prepare+0x1e0
         bool mysql_select+0x33e
         bool handle_select+0xf7
         int mysql_execute_command+0x4ac3
         bool dispatch_command+0x2b9b
         handle_one_connection+0x516
         libc_hwcap1.so.1`_thrp_setup+0x7e
[ ... cut ... ]

Luckily for us this was the most recent memory allocation (you might have to search a loooong list to find the allocation you looked for), and we can see that the allocation came from memc_get_init. It was pretty easy to spot the allocation, since there is only one in the function:

  [... cut ...]
  container= (memc_function_st *)malloc(sizeof(memc_function_st));
  rc= memc_get_servers(&container->memc);
  memcached_result_create(&container->memc, &container->results);
  initid->ptr= (char *)container;
  return 0;
}

Here we spot bug number 1.. we don't check the return value from memc_get_server, and if you read on you will see that it is what happens. So let's look at the function memc_get_servers:

int memc_get_servers(memcached_st *clone)
{
  int retval;
  memcached_st *test;

  pthread_mutex_lock(&memc_servers_mutex);
  test= memcached_clone(clone, master_memc);
  pthread_mutex_unlock(&memc_servers_mutex);
  retval= test ? 1 : 0;

  return retval ;
}

This seems correct enough, but let's look at the memcached_clone function?

memcached_st *memcached_clone(memcached_st *clone, memcached_st *source)
{
  memcached_return rc= MEMCACHED_SUCCESS;
  memcached_st *new_clone;

  if (source == NULL)
    return memcached_create(clone);

  if (clone && clone->is_allocated)
  {
    return NULL;
  }
[ ... cut ... ]

Now here is something interesting. We check the member is_allocated in the clone if we should abort the cloning process. Remember that we pass in a memory chunk we allocated with malloc, so we don't know the value of this (==> undefined behavior). We do return the "error" to the caller, but the caller doesn't check the return code resulting in the fact that we have a structure of uninitialized memory (and all use of it will be "undefined").

The manual section for memcached_clone isn't clear on the above fact, so I am going to update the documentation. It is not difficult to fix the source, just replace the call to (memc_function_st *)malloc(sizeof(memc_function_st)) with calloc(1, sizeof(memc_function_st)) (no need for the cast there).. I'll be pushing a patch to the author of the UDFs .

http://blogs.sun.com/trond/date/20090304 Wednesday March 04, 2009

Use Wireshark to look at your Memcached traffic!

A couple of days ago I got an IM from Stig Bjørlykke (a co-worker from when I worked at Thales Norway) asking about the traffic sent on port 11211. He was looking over the port numbers assigned by IANA and spotted my name there (I applied for port 11211 for memcached traffic some time ago :-).

I pointed Stig to the wiki, and I promised him that I should send him a little test program so that he could look at the traffic.. It took me some days before I sent the email, and a couple of days ago I got a new IM from Stig: "Hey, do you want to test the memcached dissector?".

Unfortunately for me their build farm doesn't build OpenSolaris packages, so I had to test it on my Mac book. I don't use the mac for development, and instead of spending time creating a setup there, I just ran my test program on my OpenSolaris boxes and captured the network traffic with snooop:

trond@opensolaris> pfexec snoop -o /tmp/snoop -d iprb0

I must admit that I was excited when I started Wireshark and loaded the capture file, and I think the result looks great:

If you like to try it yourself, you should download the latest build from http://www.wireshark.org/download/automated/ (or check out the source and build it yourself).


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.