Trond Norbye's Weblog

Main | Next page »

http://blogs.sun.com/trond/date/20080902 Tuesday September 02, 2008

Memcached UDF for MySQL on OpenSolaris

I have been hearing about the Memcached UDF for MySQL for a while now, so I decided to spend some time playing with them. Being the geek I am, playing for me is to get my hands dirty with the code so I cloned the source repository from: http://hg.tangent.org/memcached_functions_mysql/.

With the source code and a cup of coffee available I ran:

trond@opensolaris:compile> ./config/bootstrap
trond@opensolaris:compile> ./configure CC=cc --with-myslq=/usr/mysql/bin/mysql_config
[... cut ...]
checking for DEPS... configure: error: Package requirements (libmemcached >= 0.17) were not met:
[... cut ...]
    

I know that there is a version of libmemcached in OpenSolaris (I was involved during the integration), so the version is either too old or configure isn't picking it up..

trond@opensolaris:compile> pkginfo -l SUNWlibmemcached | grep DESC
      DESC:  memcached C API 0.16
    

It turns out that the version of libmemcached integrated in OpenSolaris is too old, so I filed 6743510 to get it upgraded.

I have been contributing to libmemcached (adding support for the binary protocol), so I have a "fresh-from-the-oven" version installed in /opt/memcached". All I needed to do was to get the configure script to pick it up... There was at least two different roads I could go:

  • I could just comment out the test in configure.ac, regenerate the configure script and run it as:
    trond@opensolaris:compile> ./configure CC=cc CFLAGS=-I/opt/memcached/include \
           LDFLAGS="-L/opt/memcached/lib -R/opt/memcached/lib" \
           --with-myslq=/usr/mysql/bin/mysql_config
          
  • I could do it the "clean" way and be nice to the community and add a: --with-libmemcached option

For unknown reasons I decided to do it the clean way and submitted a patch back to the project (use hg import if you would like to use it before it is included in the upstream repository).

With the lib installed as /opt/memcached/lib/libmemcached_functions_mysql.so, I tried to figure out how to load it in MySQL. The documentation told me to copy the library to /usr/local/mysql/lib/mysql/plugins/, but my OpenSolaris delivers MySQL in /usr/mysql so I tried to copy the file to /usr/mysql/lib/mysql/plugins/ without success.

A quick truss on mysqld revealed that it did not search any other directories than ld does. Being a MySQL novice I didn't know if my setup was mis-configured or not, so I returned to the MySQL documentation and found the following page. I don't want to expose the library to everything on my system, so creating a link from /usr/lib or using crle was out of the question. Instead I modified /lib/svc/method/mysql and exported LD_LIBRARY_PATH.

I am now able to use the Memcached UDF for MySQL on my OpenSolaris box, but I don't like the LD_LIBRARY_PATH hack so I am looking forward for the MySQL 5.1 release. Perhaps I should try to compile a version myself and test it out.

http://blogs.sun.com/trond/date/20080616 Monday June 16, 2008

Open position in the Memcached team

The last few month has been an interesting period for me. I used to work in the Database Technology Group, and as you may guess that group was affected by the acquisition of MySQL. The merge of the two teams are now complete, and we now all located in the Database Group.

During this merge it was natural to look at the staffing in the different projects, and I am extremely glad to see that the project I am working on get additional headcount.

I am therefore extremely exited over the fact that we have the following open positions in my team:

http://blogs.sun.com/trond/date/20080609 Monday June 09, 2008

Memcached source code

Finding the source code for Memcached can be a challenge. Right now there are multiple source repositories, so you might want to look into multiple of them. You will find a link to a Subversion repository from the download section on the project homepage, but the ongoing development is performed in various git.

I have created my own git repository at http://github.com/trondn/memcached/commits/binary where I do my development. Other repositories you might find useful are:

http://blogs.sun.com/trond/date/20080529 Thursday May 29, 2008

Support for Memcached

During the webinar "Highly scalable solutions with MySQL and Memcached" Ivan Zoratti announced that Sun will support Memcached as a part of MySQL Enterprise Support.

You may want to check you the white paper "Designing and Implementing Scalable Applications with Memcached and MySQL"

There is another webinar later on today (Designing and Implementing Scalable Applications with Memcached and MySQL), so if you missed out on yesterdays webinar (or liked it) you should sign up for the event!

http://blogs.sun.com/trond/date/20080524 Saturday May 24, 2008

Free webinars on MySQL and Memcached

This week Sun will host the first two free webinars on MySQL and Memcached. Ivan Zoratti will kick off on Wednesday with "Highly scalable solutions with MySQL and Memcached".

On Thursday Monty Taylor, Jimmy Guerrero and Frank Mash will talk about "Designing and Implementing Scalable Applications with Memcached and MySQL".

You should do as I did and register for the event as soon as possible :-)

Memcached and customized storage engines

Most forks of memcached is because people would like another storage-engine than the default memory-based engine in Memcached. For the last two weeks I have been working with Toru Maesaka on designing an API to allow users to plug in their storage engine into Memcached. The current version of the API consist of just 14 functions, so it should be an easy task to implement your own storage engine. The storage engine API is defined in engine.h.

With a draft of the engine specification in place, I started to refactor the source code so that the memory based backend use this API. I finished a prototype a couple of days ago, and pushed it to my git repository at http://github.com/trondn/memcached/commits/binary. As you can see there are still some loose ends, and I would like to clean up the implementation of this engine a bit. In this prototype I have just moved the code out of the internal server without trying to be smart ;-)

When I talked about the work at lunch the other day, the guy sitting in the office next to me got exited and wanted a demo. He is working in the PostgreSQL team, so he asked me how difficult it would be to create a small backend that stored the items in a database. I guess others may be interesting in how to create their own backend, so I decided to "document" it as a blog-post.

To create your own storage engine, you must create a shared library that exports the following function:

ENGINE_HANDLE* create_instance(int version, ENGINE_ERROR_CODE* error);
        

The memcached server will call this function to ask your library to create an instance of the storage engine. When the core server calls the function in the API, this handle is passed as the first argument. That makes the handle a perfect place to store extra engine-specific data. The little example-engine we created use the following struct:

struct pg_engine {
   /**
    * The handle defined in the API
    */
   ENGINE_HANDLE engine;

   /**
    * Is the engine initalized or not
    */
   bool initialized;

   /**
    * A single lock is used for accessing the cache ;-)
    */
   pthread_mutex_t cache_lock;

   /**
    * Connection to the postgres database
    */
   PGconn *psql;
};
        

The implementation of create_instance looks like:

ENGINE_HANDLE* create_instance(int version, ENGINE_ERROR_CODE* error) {
   struct pg_engine *handle;
   if (version != 1) {
      if (error != NULL) {
         *error = ENGINE_ENOTSUP;
      }
      return 0;
   }

   if ((handle = calloc(1, sizeof(*handle))) == NULL) {
      if (error != NULL) {
         *error = ENGINE_ENOMEM;
      }
      return 0;
   }

   handle->engine.interface_level = version;
   handle->initialized = false;
   handle->engine.get_info = pg_get_info;
   handle->engine.initialize = pg_initialize;
   handle->engine.destroy = pg_destroy;
   handle->engine.item_size_ok = pg_item_size_ok;
   handle->engine.item_allocate = pg_item_allocate;
   handle->engine.item_delete = pg_item_delete;
   handle->engine.item_release = pg_item_release;
   handle->engine.get = pg_get;
   handle->engine.get_not_deleted = pg_get_not_deleted;
   handle->engine.get_stats = pg_get_stats;
   handle->engine.store = pg_store;
   handle->engine.arithmetic = pg_arithmetic;
   handle->engine.flush = pg_flush;
   handle->engine.update_lru_time = pg_update_lru_time;

   return &handle->engine;
}            
        

If we look at the structure and the create_instance function, there are some small interesting details. First of all you see that the pg_engine structure contains all of the variables needed, so that we don't use global variables. Secondly we verify that we support the version that the memcached server requests.

When the memcached server have created the instance, it will try to initialize the engine by calling initialize. This is where the engine should initialize it's internal datastructures. For our server, we initilize the mutex and creates the databse connection:

static inline struct pg_engine* get_handle(struct engine_handle* handle) {
   /*
    * We can cast the pointer to a pg_handle because the engine_handle
    * is the first member in the pg_engine struct.
    */
   return (struct pg_engine*)handle;
}

static ENGINE_ERROR_CODE pg_initialize(struct engine_handle* handle,
                                       const char* config_str) {
   struct pg_engine* engine = get_handle(handle);

   if (engine->initialized) {
      return ENGINE_EINVAL;
   }

   if (pthread_mutex_init(&engine->cache_lock, NULL) != 0) {
      return ENGINE_EINVAL;
   }

   engine->psql = PQconnectdb("hostaddr='127.0.0.1' port='' dbname='postgres' user='postgres' password='' connect_timeout='10'");
   if (engine->psql == NULL || PQstatus(engine->psql) != CONNECTION_OK) {
      pthread_mutex_destroy(&engine->cache_lock);
      return ENGINE_EINVAL;
   }
   
   engine->initialized = true;
   return ENGINE_SUCCESS;
}            
        

In our little engine we hardcoded the database connection information, but as you see from the prototype memcached will provide a configuration string you could use.

Ok. Now the server is ready to accept clients. The first functions we would like to implement are probably those needed in order to support insertion of data into the server. The first function we must implement is the item_allocate. The server will call this function to allocate memory where it can spool the data from the client. Since we don't try to be smart and implement a fast server, we just use calloc to allocate a memory chunk. Our implementation looks like:

static item* pg_item_allocate(struct engine_handle* handle, const void* key,
                              const size_t nkey, const int flags,
                              const rel_time_t exptime,
                              const int nbytes) {
   item *it;
   char suffix[40];
   size_t nsuffix = snprintf(suffix, sizeof(suffix),
                             " %d %d\r\n", flags, nbytes - 2);
   size_t ntotal = nsuffix + sizeof(item) + nkey + 1 + nbytes;
      
   if ((it = calloc(1, ntotal)) == NULL) {
      return NULL;
   }
   
   it->it_flags = 0;
   it->nkey = nkey;
   it->nbytes = nbytes;
   memcpy(ITEM_key(it), key, nkey);
   it->exptime = exptime;
   memcpy(ITEM_suffix(it), suffix, nsuffix);
   it->nsuffix = nsuffix;

   return it;
}            
        

When the server is done with an item, it will call item_release to notify the backend that it may release the allocated resources. In our little example this function just releases the memory:

static void pg_item_release(struct engine_handle* handle, item* item) {
    free(item);
}            
        

When we have received all of the data from the client, the memcached server will try to store the item in the engine by calling store. This function is used for all types of store-acces (add, set, replace, append and prepend). In our little server, we only support add, set and replace. Add is implemented with a INSERT SQL statement while replace is implemented with an UPDATE. Unfortunately we don't have a "insert-or-update" in SQL, so our set command is implemented by first trying and add and if that fails, we try an update. The source looks like:

const char* add_query = "INSERT INTO memcached (key, header_size, data) VALUES ( $1, $2, $3)";

const char* update_query = "UPDATE memcached SET header_size = $2, data = $3 where key = $1";

static ENGINE_ERROR_CODE do_db_store_item(const char *sql,
                                          struct pg_engine* engine, item *it) {
   uint32_t hl = htonl(it->nsuffix);
   
   const char *params[3];
   params[0] = (char*)ITEM_key(it);
   params[1] = (char*)&hl;
   params[2] = (char*)ITEM_suffix(it);
   
   int sizes[3] = { it->nkey , 4, it->nbytes + it->nsuffix };
   int formats[3] = {1, 1, 1};

   PGresult *result;
   result = PQexecParams(engine->psql, sql, 3, NULL,
                         params, sizes, formats, 1);

   ENGINE_ERROR_CODE ret = ENGINE_SUCCESS;
   
   if (result == NULL || PQresultStatus(result) != PGRES_COMMAND_OK) {
      ret = ENGINE_EINVAL;
   }

   PQclear(result);
   return ret;
}

static ENGINE_ERROR_CODE do_store_item(struct pg_engine* engine,
                                       item *it, enum operation comm) {
   switch (comm) {
   case NREAD_ADD:
      return do_db_store_item(add_query, engine, it);
   case NREAD_REPLACE:
      return do_db_store_item(update_query, engine, it);
   case NREAD_SET:
      if (do_db_store_item(add_query, engine, it) != ENGINE_SUCCESS) {
         return do_db_store_item(update_query, engine, it);
      }
      return ENGINE_SUCCESS;
      
   default:
      return ENGINE_ENOTSUP;
   }
}

static ENGINE_ERROR_CODE pg_store(struct engine_handle* handle,
                                  item* item, enum operation operation) {
   int ret;
   struct pg_engine* engine = get_handle(handle);

   pthread_mutex_lock(&engine->cache_lock);
   if (do_store_item(engine, item, operation) == ENGINE_SUCCESS) {
      ret = 1;
   } else {
      ret = 0;
   }
   pthread_mutex_unlock(&engine->cache_lock);
   return ret;
}            
        

If you look at the assignments of ret in pg_store you will most likely spot a bug. We don't use the correct return values!! I noted that when I wrote this plugin and I haven't had the time to fix it yet.

Now that we got data in the cache we can try to get the data back to the client. When the client tries to get an element from the cache, the memcached server will call item_get. In our little server this function will try to get the item from the database. Because the server may call this function from multiple threads, we need to synchronize the access to our PostgreSQL handle:

static item* do_pg_get(struct pg_engine* engine, const void* key,
                       const int nkey) {
   const char* query = "SELECT header_size, data FROM memcached WHERE key=$1";
   const char *params[1];
   params[0] = (char*)key;
   int paramLengths[1] = { nkey };
   
   PGresult *result;
   result = PQexecParams(engine->psql, query, 1, NULL, params,
                         paramLengths, NULL, 1);
   if (result == NULL) {
      return NULL;
   }
   
   int tup = PQntuples(result);
   if (tup == 0) {
      PQclear(result);
      return NULL;
   }

   char *sptr = PQgetvalue(result, 0, 0);
   uint32_t size = ntohl(*((uint32_t*)sptr));
   int datalen = PQgetlength(result, 0, 1);
   size_t ntotal = sizeof(item) + nkey + 1 + size + datalen;
   
   item *it;
   if ((it = calloc(1, ntotal)) == NULL) {
      PQclear(result);
      return NULL;
   }
   
   it->nkey = nkey;
   it->nsuffix = size;
   it->nbytes = datalen - size;
   memcpy(ITEM_key(it), key, nkey);
   memcpy(ITEM_suffix(it), PQgetvalue(result, 0, 1), datalen);
   
   PQclear(result);
   return it;
}

static item* pg_get(struct engine_handle* handle, const void* key,
                    const int nkey) {
   item *it;
   struct pg_engine* engine = get_handle(handle);
   pthread_mutex_lock(&engine->cache_lock);
   it = do_pg_get(engine, key, nkey);
   pthread_mutex_unlock(&engine->cache_lock);
   return it;
}            
        

Well, that covers the basic functions you need to implement in order to create a minimalistic memcached storage engine. So please go ahead and play with the API and let us know if the API is usable or not.

http://blogs.sun.com/trond/date/20080515 Thursday May 15, 2008

I'm back..

I am finally home after staying four weeks at my brothers place in California. First I attended the MySQL users conference in Santa Clara where I met parts of the Memcached community at the Hackathon. It was great to finally meet the people I have been exchanging emails with :-). It was also nice to meet my new co-workers from MySQL, so I hope we get a chance to hang out more in the future.

I don't get to see my brothers family that often due to the distance between Norway and California, so it was really nice to spend so much time with his wife and the kids. I felt that I got really good contact with the kids (at least the boys worshiped me for being better than my brother playing video games ;-))

I also attended the OpenSolaris summit in Santa Cruz while I was there, before I went back up to attend Community ONE and Java ONE. Java ONE was definitively the highlight of the trip, and it was fun to see my brother on stage during the keynotes. I was able to take a look "back stage" and that was a pretty impressing sight.

Being the brother of one of the Java Posse have both pros and cons. I was able to hang out with the Posse and meet a lot of interesting people, but they also pulled up a slide of me during their BOF :-S

Well, it is good to be back home with my family, but unfortunately I got sick on the trip back home :-(

http://blogs.sun.com/trond/date/20080317 Monday March 17, 2008

Bazaar support in OpenGrok

I have just added support for Bazaar repositories in OpenGrok, and thought that I should give you a warning before you start to use it...

First of all I would like to say that I have never used Bazaar in a real project, so I might have done everything totally wrong.

I am not aware of a API that lets me access Bazaar from Java, so I just created a small class that wraps the command line interface. This is the same way the Mercurial support is implemented, and most of the projects available at http://src.opensolaris.org/source use that back-end. Wrapping the binary do have a runtime-penalty, and that is the startup-cost of the binary. To reduce the number of times the binary is executed, OpenGrok already have a cache-layer for the history log (the cache-layer is not used if you try to get history information for a directory).

The biggest problem with Bazaar is that the bzr log -v command is unbelievable slow, and that is the command I need to run to get the history information (I need the files in the changeset). When I tried it on my computer, it used 13 MINUTES on the Bazaar source code itself. I got the repository with the following command:

$ bzr branch http://bazaar-vcs.org/bzr/bzr.dev bzr.dev

As a comparison hg log -v used ~ 2 secs on:

$ hg clone ssh://anon@hg.opensolaris.org/hg/onnv/onnv-gate

I would therefore not recommend that you use the Bazaar support on an OpenGrok server that serves multiple users. If you use it yourself, you should avoid accessing the directory history if you don't need it ;-)

http://blogs.sun.com/trond/date/20080312 Wednesday March 12, 2008

OpenGrok v0.6 is out

I am glad to announce the release of OpenGrok 0.6. This release contains a lot of bug-fixes and some new features. Please see http://src.opensolaris.org/source/history/opengrok/trunk/ for the full change history, and for the list of contributors. The following is just a summary.

New Features:

  • Analyzer-support for Tcl/Tk
  • Analyzer-support for SQL
  • Support for TeamWare repositories

http://blogs.sun.com/trond/date/20080303 Monday March 03, 2008

Get it while it's hot!

Memcached 1.2.5 is being released today, so you should go ahead and download it. We are currently working on integrating this version into Solaris, but it is easy to compile it yourself if you don't want to wait for us ;-)

If you are running OpenSolaris build 79 (or newer), building 1.2.5 should be as easy as:

./configure --enable-threads --enable-64bit CC=cc CFLAGS=-O

I have added support for large memory pages in this version, but it is disabled by default. To enable the use of large memory pages you need to add -L to the command line. When started with -L memcached will also preallocate all memory up front and reduce the numbers of lock to acquire when the slab allocator needs to allocate more memory for a given slab class. By using large memory pages memcached could reduce the number of TLB misses (depending on the access pattern), and hence improve performance. See http://en.wikipedia.org/wiki/Translation_lookaside_buffer for a description of TLB.

http://blogs.sun.com/trond/date/20080209 Saturday February 09, 2008

Improve the performance on your Memcached server

I recently came across this interesting blog about tweaking the TCP stack in Solaris for improved latency, and with a small test I noticed a significant latency improvement. Since low latency is important for memcached servers, you might want to try it yourself?

http://blogs.sun.com/trond/date/20080129 Tuesday January 29, 2008

Heat pump

This weekend I got help from my friend to install a heat pump in my house, and so far it has been a huge success! We got a relatively large room in the basement (~50m^2) where I got my home office, my guitars and the TV. I have just finished renovating this room (replaced the carpet floor with tiles and painted the walls), but it is usually very cold down there since no one remembers to light the fireplace before we want to watch TV.

With the heat pump mounted in the basement, it's been a pure pleasure coming home from work and go down in the basement to watch TV.

http://blogs.sun.com/trond/date/20080128 Monday January 28, 2008

SunRay @ home

The fans and the disks in my desktop computer is driving my girlfriend crazy, so I decided to go ahead and try to configure my good old SunRay 1G I got. My desktop machine is an old 3GHz Intel P4 with 1,3GB of RAM with two 300GB disks spinning all the time (they are used in an ZFS mirror), so it sounds like an airplane just before takeoff... My desktop is located in the basement and is connected to a wireless router upstairs with a D-Link DWL-G520. I also have a wired network adapter in the machine connected to a switch, so I just connected the SunRay to the switch and ran the software installer. After running utconfig and utadm I got the login screen up on the SunRay!!! Now I just need to find a room in the basement where I can put my server :-)

http://blogs.sun.com/trond/date/20080125 Friday January 25, 2008

Memcached source repository

I am currently working in a team here at Sun that focus on improving Memcached performance.

The official Memcached source repository is a Subversion repository located at http://code.sixapart.com/svn/memcached/. Since Subversion is not well suited for distributed development, we need a place to store our changes while waiting for them to be accepted into the official repository.

I asked the community how we should do it, and they responded that we should set up an a repository to incubate our changes. I have created a Mercurial repository in the Web Stack project. It contains two Mercurial branches:

You may clone the repository with the following command:

$ hg clone ssh://anon@hg.opensolaris.org/hg/webstack/memcached-incubator

To select the branch you would like to see, execute the following command:

$ hg update branch

To see the difference between the two branches, just select the default branch and execute:

$ hg diff -r memcached

Please note that bugs should be reported to "memcached at lists dot danga dot org" unless it only applies to our branch. In that case you can send them to "webstack-discuss at opensolaris dot org"

http://blogs.sun.com/trond/date/20080124 Thursday January 24, 2008

OpenGrok and SMF

I have had SMF controlling my OpenGrok server for a long time, but up until today I have always performed the SMF management as root.

When I upgraded my server today I decided to try to figure out what I needed to do in order to create a new profile that I could use to start and stop OpenGrok, and it turned out to be quite easy.

The first thing you need to do is to create the authorizations and the profile by adding them to /etc/security/auth_attr and /etc/security/prof_attr:

/etc/security/auth_attr:
solaris.smf.value.opengrok:::Change OpenGrok value properties::
solaris.smf.manage.opengrok:::Manage OpenGrok service states::

/etc/security/prof_attr:
OpenGrok Administration::::auths=solaris.smf.manage.opengrok,solaris.smf.value.opengrok

The next thing you should do is to add this profile to the users you trust by updating /etc/user_attr

username::::profiles=OpenGrok Administration

(If you don't trust them that much you could give them just one of the authorizations)

You should now be ready to import the OpenGrok SMF description file (tools/smf/opengrok.xml in the OpenGrok source repository) and modify the environment-section to match your local configuration. (Note: you need the one I committed in changeset 228:175ea847bf89)

Import the service by executing the following command:

# svccfg import /path/to/opengrok.xml

Users should now be able to start and stop the service as long as they have the appropriate authorizations.


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.