Trond Norbye's Weblog

« Previous day (Dec 16, 2008) | Main | Next day (Dec 17, 2008) »

http://blogs.sun.com/trond/date/20081217 Wednesday December 17, 2008

Adding a backdoor to memcached

I have been using libumem and LD_PRELOAD to track down memory allocation problems in a lot of applications over the years, and I just love the runtime linker on Solaris (AFAIK you will find some of the features on Linux as well). The fact that I can load other libraries that replace or add functionality of the program is just great. If you haven't read it already I would encourage you to read the man page for ld.so.1. If you are a developer using Solaris and haven't used libumem to hunt down memory bugs, you should read this blog by Adam Leventhal.

Last week I spent an evening trying to track down yet another memory allocation problem, so my head was spinning on all these crazy ideas when I got to bed. While I was lying there I got this cool idea for how you could use the runtime linker to add a “backdoor” to memcached poviding the functionality you've been missing all these years. And by backdoor I actually mean a Solaris door as described in the libdoor(3LIB) So how are we going to do this? Well we are going to create a shared object and add some code to the init-section, and let the runtime linker do the rest of the work. Sounds easy doesn't it? The best part is that it is easy as well :-)

Since I have no idea what all of you are missing from memcached (and I want the blog entry to be simple enough to describe how it works, and not describe internal details of memcached), I'll just create a small example that let you "bulk-load" data stored in your MySQL database. It should be fairly simple to modify the code to do other things, like dumping all the data stored in the cache to disk and reading it back in..

Enough talk, let's look at the source!!!

backdoor.c:
     1	#include <sys/types.h>
     2	#include <sys/stat.h>
     3	#include <stdio.h>
     4	#include <pthread.h>
     5	#include <door.h>
     6	#include <fcntl.h>
     7	#include <errno.h>
     8	#include <string.h>
     9	#include <stdlib.h>
       
    10	static const char* doorfile = "/var/run/memcached_backdoor";
       
    11	#include "config.h"
    12	#include "memcached.h"
       
    13	static void door_server(void *cookie, char *argp, size_t arg_size, door_desc_t *dp, uint_t n_desc) {
    14	    if (n_desc == 1) {
    15	        /* I prefer to use fgets() instead of read(), so lets open a stream
    16	        ** for the descriptor passed in the dp argment
    17	        */
    18	        FILE *fp = fdopen(dp->d_data.d_desc.d_descriptor, "rb");
    19	        if (fp == NULL) {
    20	            char buffer[1024];
    21	            int len = sprintf(buffer, "Failed to reopen stream: %s", strerror(errno));
    22	            /* Return to the client with an error message */
    23	            door_return(buffer, len, NULL, 0);
    24	        } else {
    25	            char buffer[1024];
    26	            while (fgets(buffer, sizeof (buffer), fp) != NULL) {
    27	                /* buffer contains one line of input with the following format
    28	                ** "key tab value". I don't do any error checking her to avoid
    29	                ** cluttering the code with extra tests to see that the key is
    30	                ** valid, the tab is there etc etc..
    31	                */
    32	                char *key = buffer;
    33	                char *value = strchr(buffer, '\t');
    34	                *value = '\0';
    35	                ++value;
       
    36	                int len = strlen(value);
    37	                value[len - 1] = '\0';
       
    38	                /* Allocate memory to store the item */
    39	                item* it = item_alloc(key, strlen(key), 0, 0, strlen(value) + 2)
    40	                if (it != NULL) {
    41	                    conn c;
    42	                    if (settings.verbose) {
    43	                        printf("Key: [%s] value [%s]\n", key, value);
    44	                    }
    45	                    /* Insert the value into the item. The memcached server
    46	                    ** stores the data with a terminating \r\n so we need
    47	                    ** to add those as well
    48	                    */
    49	                    memcpy(ITEM_data(it), value, len);
    50	                    *(ITEM_data(it) + it->nbytes - 2) = '\r';
    51	                    *(ITEM_data(it) + it->nbytes - 1) = '\n';
    52	                    if (store_item(it, NREAD_SET, &c) != 1) {
    53	                        char msg[1024];
    54	                        sprintf(msg, "Failed to store %s\n", key);
    55	                        door_return(msg, strlen(msg), NULL, 0);
    56	                    }
    57	                    /* Release our reference */
    58	                    item_remove(it);
    59	                }
    60	            }
    61	            (void) fclose(fp);
    62	        }
    63	    }
    64	    /* Return the control back to the client */
    65	    door_return(NULL, 0, NULL, 0);
    66	}
       
    67	void init(void) {
    68	    /* Create a filesystem entry for our door so that clients may find us */
    69	    mode_t mask = umask(0);
    70	    int fd = open(doorfile, O_CREAT | O_TRUNC, 0444);
    71	    (void) umask(mask);
    72	    if (fd < 0) {
    73	        perror("Failed to open door");
    74	    } else {
    75	        (void) close(fd);
    76	        /* Detach any existing services from the file */
    77	        (void) fdetach(doorfile);
    78	        /* Create a door id for our door function */
    79	        int did = door_create(door_server, NULL, DOOR_NO_CANCEL);
    80	        if (did > 0) {
    81	            /* Associate our door with our door id */
    82	            if (fattach(did, doorfile) < 0) {
    83	                perror("fattach door failed");
    84	            }
    85	        } else {
    86	            (void) perror("door_create failed");
    87	        }
    88	    }
    89	}
       
    90	#pragma init (init)
        

So how does this work? First, look at line 90. The #pragma init(init) instructs the compiler to add the function named init to the init section. This means that during the initialization of the object, the function init is called. We need a filesystem entry for our door for "clients" to be able to use it. In line 68-76 we create a new filesystem entry. Line 79 creates a door identifier associated to the function named door_server. Since there might be other services already attached to the file we want to use as a door, we call fdetach in line 77 to remove all such associations. Line 82 attaches our server function with the door file.

So what does the door_server function do? This function is called whenever someone invokes a door_call on our door. It expect dp->d_data.d_desc.d_descriptor to contain a file-descriptor where we should read input data, and store it as items in our cache. The server expects the data on the input stream to be in the following format:

key tab value
        

Well, we should be ready to compile and start our memcached server:

trond@razor:> cc -o backdoor.so -mt -m64 -G -g -KPIC backdoor.c
trond@razor:> pfexec ksh -c "LD_PRELOAD=./backdoor.so ./memcached -u noaccess" &
trond@razor:> ls -l /var/run/memcached_backdoor
Dr--r--r--   1 root     root           0 Dec 17 15:21 /var/run/memcached_backdoor
        

The capital D in the filesystem listing identifies this as a door. See man ls for more details.

You can telnet to port 11211 and try sending commands to the server if you like, or you could execute the following command:

trond@razor:> echo stats | nc localhost 11211
STAT pid 15396
STAT uptime 4
STAT time 1229524715
STAT version 1.3.1
STAT pointer_size 32
STAT rusage_user 0.009059
STAT rusage_system 0.029021
STAT curr_connections 6
STAT total_connections 7
STAT connection_structures 7
STAT cmd_get 0
STAT cmd_set 0
STAT get_hits 0
STAT get_misses 0
STAT bytes_read 6
STAT bytes_written 0
STAT limit_maxbytes 67108864
STAT threads 5
STAT bytes 0
STAT curr_items 0
STAT total_items 0
STAT evictions 0
END
        

Now that we have the server set up, let's create a client application that uses the door. I want to get my data from a MySQL database, so I want the client program to process the data from standard input:

client.c
     1	#include <stdio.h>
     2	#include <door.h>
     3	#include <sys/types.h>
     4	#include <fcntl.h>
     5	#include <unistd.h>
     6	#include <errno.h>
     7	#include <stdlib.h>
     8	#include <sys/mman.h>

     9	int main(int argc, char** argv) {
    10	    int doorfd = open("/var/run/memcached_backdoor", O_RDONLY);
    11	    if (doorfd == -1) {
    12	        perror("Failed to open door file");
    13	        return EXIT_FAILURE;
    14	    }

    15	    door_desc_t descr;
    16	    descr.d_data.d_desc.d_descriptor = STDIN_FILENO;
    17	    descr.d_attributes = DOOR_DESCRIPTOR;

    18	    door_arg_t door_args = {
    19	        .desc_ptr = &descr,
    20	        .desc_num = 1
    21	    };

    22	    if (door_call(doorfd, &door_args) == -1) {
    23	        perror("door_call failed");
    24	    } else if (door_args.data_size > 0) {
    25	        write(STDOUT_FILENO, door_args.data_ptr, door_args.data_size);
    26	        if (munmap(door_args.rbuf, door_args.rsize) == -1) {
    27	            perror("Failed to unmap memory");
    28	        }
    29	    }

    30	    return (EXIT_SUCCESS);
    31	}
        

This program should be easy to understand without any comments, but I would like to point out a few lines. Line 16 inserts the standard input filedescriptor of this process, and that is passed into the door during the door_call in line 22. The kernel makes sure that the file descriptor is available as a valid file-descriptor in memcached when it invokes my door_server function.

Now it's time for us to compile the client:

trond@razor:> cc -o client client.c
        

Let's use the data stored in our database to test the thing:

trond@razor:>  /usr/mysql/bin/mysql -u root -D memcached \
   -e "SELECT CONCAT('user_', id), bio FROM user" --skip-column-names \
   |  ./client
        

Let's look at the stats and try to get one of the objects to verify that it works:

trond@razor:>echo stats | nc localhost 11211
STAT pid 15396
STAT uptime 190
STAT time 1229524901
STAT version 1.3.1
STAT pointer_size 32
STAT rusage_user 0.014419
STAT rusage_system 0.038423
STAT curr_connections 6
STAT total_connections 8
STAT connection_structures 7
STAT cmd_get 0
STAT cmd_set 0
STAT get_hits 0
STAT get_misses 0
STAT bytes_read 12
STAT bytes_written 463
STAT limit_maxbytes 67108864
STAT threads 5
STAT bytes 1019
STAT curr_items 14
STAT total_items 15
STAT evictions 0
END
trond@razor:> echo get user_1 | nc localhost 11211
VALUE user_1 0 61
Trond spends his evenings in front of the computer... bla bla
END
        

Please note that I don't think this is something you should do in a realworld scenario, but rather something you could do in the development phase of your application. (Or it can be used to preload "mocking" objects for testing various parts of memcached ;)


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.