/var/adm/blog

Wednesday Nov 04, 2009

@deprecated

So I am "deprecating" this blog, which, I guess, is my not so clever way of saying I won't be updating this blog anymore. Instead, I will be setting up a new blog at http://www.ricoconbico.com/blog in the very near future and all of my future updates will be made there ... adios :-)

Friday Oct 30, 2009

The Gearman Replicator For Drizzle (Episode IV: A New Hope)

THE GEARMAN REPLICATOR FOR DRIZZLE

EPISODE IV: A NEW HOPE

Recently, Jay Pipes has been blogging about the new Drizzle Replication system. Jay has done a great job of describing how the system works and why it behaves the way it does. While Jay has been doing a bang-up job putting together the internals of the Drizzle replication system, I’ve been working on the “Gearman Replicator for Drizzle” which, as the name suggests, is a Gearman based Drizzle replicator that dove tails very nicely with the work that Jay has been doing and is not only a proof-of-concept for the replication system and the principals it is based on but also exemplifies how Drizzle can leverage the tools and environment around it, in this case Gearman.

Describing the entire “Gearman Replicator for Drizzle” would probably make for a lengthy blog post that I suspect many people would not see all the way through, so I’ve decided to follow in the footsteps of Mr George Lucas and make this a trilogy (and yes, I am aware that the Star Wars saga eventually had six episodes, don't tempt me, I may go that far to):

  • The first episode (A New Hope), which you are reading now, will provide an overview of the system, the components that make it up, and how those components interact with each other. 
  • The second episode (The Applier Strikes Back) will focus mainly on the behavior in the master database and go over the role Gearman Replication Applier Drizzle plugin plays. 
  • The third episode (Return of the Gearman Job Result) will look at the “slave side” behavior and focus on describing how the Java based Gearman Replicator receives and applies transactions to the slave database.

WHAT IS GEARMAN


As may be obvious by now, the Gearman framework plays a central role in the “Gearman Replicator for Drizzle”. For those of you that don’t know what Gearman is, Gearman is essentially a distributed and scalable job scheduling framework that allows any number of clients to submit jobs to any number of workers, where a worker is some process that is capable of executing one or more particular requests or jobs. The architecture of Gearman looks something like the following:

THE GEARMAN REPLICATOR OVERVIEW


From 10,000 feet (or 3,048 meters for the rest of the world) The Gearman Replicator For Drizzle system looks a little like this.

As you can see, the system consists of four major pieces:


  1. The Drizzle “Master” Database: This is the database that is being replicated. As transactions are applied to the “Master" they are placed into the replication stream and eventually applied to the “Slave” database.
  2. The Gearman Job Server: The Gearman Job Server(s) acts as traffic cop or match maker by accepting job requests from Gearman Clients (in this case the Gearman Replication Applier), matching that request with an appropriate worker (in this case, a worker that has the Applier Function) and then submitting the request to the worker. In our scenario, the job request is a wrapper around a Transaction that has been placed into the replication stream by the Master database. Note, in the diagram above there are two job servers, this number was used for demonstration purposes. The number of job servers used by the system can be determined by the users and can range from one to many.
  3. The Gearman Worker: The Gearman Worker, as its' name suggests, is a process that executes requests on behalf of a Gearman Client. The worker registers which functions it can execute with Gearman Job Server(s) then executes any Gearman Job Requests passed on to it by the Gearman Job Server. In the case of the Gearman Replicator for Drizzle, the worker registers an applier function that can take a Drizzle Transaction and apply it to a target database.
  4. The Drizzle “Slave” Database: This database is the target database in our replication workflow. Transactions that are applied to the “Master” database will be applied to the “Slave” database as well. For the time being, the Slave is also a Drizzle database, but in theory with very little change it should be possible to use non-Drizzle databases (MySQL, PostgreSQL, etc.) as a “Slave” database.


While the diagram above does provide a good overview of the pieces of the system and how data and messages flow among them, it does leave open a few questions including :


1) What types of messages are being passed around the system?


2) What does it mean that he Gearman Replication Applier and Applier Functions are encased in an external entity (the Master Database and Gearman Worker Respectively)?


MESSAGES IN GEARMAN REPLICATOR FOR DRIZZLE


In regards to the type of messages that are passed between the pieces of the system, the diagram does provide a hint about what is going on. The arrows between the entities represent messages going to and from those entities. You may have noticed that some of the arrows are light blue while others are red, and one is even grey. The color coding here has meaning.


Those arrows that are light blue represent a Gearman Job Request being sent. A full description of a Gearman Job Request can be found in the Gearman documentation, but suffice to say that the request contains some meta-data about the job (a handle, name of the function to be executed, etc) and a payload to be used by the Gearman Function. In this case, the payload consists of a Google Protobuffer message that contains the Transaction to be replicated into the slave database (for more details of the Transaction message, see Jay’s blog). As it turns out, encapsulating the Transaction in Google Protobuffer pays big dividends here. One of the nice advantages of Gearman is the Workers and Functions can be written in a variety of languages, (C, Java, Python, Perl, etc ..). Since Google Protobuffers provides binding for several different languages, having the Transaction as a Protobuffer means clients need not create their own way of parsing the Transaction message. Instead, that functionality is provided by the Google Protobuffers library. Allowing the Gearman Workers/Functions to use the Google Protobuffers library to deconstruct the Transaction message results in cleaner and less buggy worker/functions and also makes their implementation more robust against version changes in the Transaction message.


The arrows that are red represent a Gearman Job Result message. Again, the exact structure of a Gearman Job Result message can be found in the Gearman documentation, but it essentially consists of a job status as well as data, and any error messages generated by the function as it executed the job.


The arrow that is grey represents a data transformation message used to modify the Slave database to be ‘in sync’ with the Master database. The exact format of that message depends on the implementation of the Applier Function but in the case of the Gearman Replicator for Drizzle, this is a JDBC call containing the statements that make up the transaction being replicated.


THE GEARMAN REPLICATION APPLIER AND APPLIER FUNCTION


One of the main components of the system is the Gearman Replication Applier. A detailed description of the Gearman Replication Applier will follow in a future blog, but it is worth noting that in the diagram above, this entity is completely contained within the Master database. This was done intentionally as the Gearman Replication Applier is an applier Drizzle plugin and as such actually runs within the Drizzle process. The job of the Gearman Replication Applier is to “listen” to the replication stream of the Master Database and to wrap each Transaction into a Gearman Job Request and pass it on to the Gearman Job Server.


Like the Gearman Replication Applier, the Applier Function also exists within the scope of another entity. It resides within a Gearman Worker process. While the Gearman Worker acts as a conduit between the Gearman Job Server and the Applier Function, it is the Applier Function that performs all the real work. It is the responsibility of the Applier Function to validate the Transaction it has received, apply the Transaction to the slave database (if you look carefully in the diagram, you’ll note that the grey arrow which represents the application of the transaction to the slave database originates from the Applier Function) and then generate the appropriate Gearman Job Result.


CONCLUSION


So far we have gone over the major pieces of the Gearman Replicator For Drizzle system and described how these pieces communicate with each other. In upcoming blogs I will go into more detail to describe the individual pieces and even show some code. For the time being if you want to see some code, look in the following places



Also, if you have any comments or questions about this, feel free to make a comment below or drop me a line (eric.d.lambert@gmail.com).


Tuesday Sep 15, 2009

Dell Dvd Store for Drizzle

A while back, the good folks at Dell created an e-commerce benchmarking application called the "DVD Store" that emulates, of all things, a web based DVD store. The test application contains a backend database component, a web application layer, as well as series of driver program to drive load against the application. The DS2, as it is known, is built in such a way that the database backend can implemented with a variety of databases, including Oracle, SQLServer, and MySQL. Last week, I spent some time porting the application to work with Drizzle.

You can find the fruits of my labor at http://launchpad.net/ds2drizzle/trunk/0.01/+download/ds2-drizzle-0.01.tar.gz. This port not only includes a Drizzle based backend --the backend currently uses the default INNODB storage engine-- but also supports the web based DS2 driver which can be run against both JSP or PHP based implementations of the DVD Store Web Application. Not included in this port is the ASP based Web Application nor has the direct (non-web-based) driver been ported to work with Drizzle.

The database schema used by this port looks very similar to the MySQL DS2 schema with one major exception. The original MySQL schema contained FULLTEXT indices. Since the INNODB storage engine does not support FULLTEXT indices, these indices have been removed from the schema. The affect of this is that the queries which relied on the FULLTEXT indices needed to be reworked. For the time being, these queries have been changed from MATCH type queries that took advantage of the FULLTEXT index to LIKE queries. This is obviously not an optimal solution and should be considered a hack (which I solely chose out of expedience). A better solution would have drizzle work in conjunction with the fulltext search engine such as lucene or sphynx.

Tuesday Sep 01, 2009

Drizzle in the Snow (how to build Drizzle on OS X 10.6 , aka Snow Leopard)

So these days I do most of my development on my Mac Book Pro and for the most part it works just fine. In fact, things have been so smooth that I've lulled myself into the false sense of complacency that things will "just work". That is until this morning when I pulled down a fresh version of the drizzle trunk and tried to build it. Not more that a few seconds after kicking of the build I noticed the cursor blinking at me below with an error indicating that my build had failed. At this point it dawns on me that I had installed Snow Leopard (OS X 10.6) on the machine over the weekend and most likely this was the culprit.

As it turned out, there were some issues building Drizzle on OS X 10.6, but nothing to difficult to overcome.

ISSUE #1: FDATASYNC

This is issue manifests itself with the following build failure:

libtool: compile:  /usr/bin/g++-4.2 -DHAVE_CONFIG_H -I. -I. -isystem ./gnulib -isystem ./gnulib -ggdb3 -I/Users/elambert/dev/drizzle/include -D_THREAD_SAFE -pipe -O3 -Werror -pedantic -Wall -Wextra -Wundef -Wshadow -fdiagnostics-show-option -fvisibility=hidden -Wformat -fno-strict-aliasing -Wno-strict-aliasing -Woverloaded-virtual -Wnon-virtual-dtor -Wctor-dtor-privacy -Wno-long-long -Wno-redundant-decls -std=gnu++98 -MT mysys/my_sync.lo -MD -MP -MF mysys/.deps/my_sync.Tpo -c mysys/my_sync.cc  -fno-common -DPIC -o mysys/.libs/my_sync.o
mysys/my_sync.cc: In function ‘int my_sync(File, myf)’:
mysys/my_sync.cc:59: error: ‘fdatasync’ was not declared in this scope
make[2]: *** [mysys/my_sync.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

The problem here is that 'configure' is being fooled into thinking that the fdatasync() system call is available on the system when in reality it should be using fsync instead. Unfortunately, the fix for this problem requires changes to the build system. Fortunately, those changes should already by in drizzle trunk by the time you read this. So if you are seeing this error, do a fresh pull from the trunk. If, for some reason, the changes have not made it to the trunk yet or pulling from the trunk is not option for you, just apply the diff listed at the bottom of this blog.

ISSUE #2: READLINE 'INCOMPATIBILITY'?

This issue manifest itself with the following build failure:

g++ -DHAVE_CONFIG_H   -I. -I. -isystem ./gnulib -isystem ./gnulib -ggdb3  -I/Users/elambert/dev/drizzle/include -D_THREAD_SAFE  -pipe  -O3 -Werror -pedantic -Wall -Wextra -Wundef -Wshadow  -fdiagnostics-show-option -fvisibility=hidden -Wformat -fno-strict-aliasing -Wno-strict-aliasing -Woverloaded-virtual -Wnon-virtual-dtor -Wctor-dtor-privacy -Wno-long-long  -Wno-redundant-decls    -std=gnu++98 -MT client/drizzle.o -MD -MP -MF $depbase.Tpo -c -o client/drizzle.o client/drizzle.cc &&\
        mv -f $depbase.Tpo $depbase.Po
client/drizzle.cc:109: error: conflicting declaration ‘typedef int (rl_compentry_func_t)(const char*, int)’
/usr/include/readline/readline.h:44: error: ‘rl_compentry_func_t’ has a previous declaration as ‘typedef char* (rl_compentry_func_t)(const char*, int)’
client/drizzle.cc: In function ‘void initialize_readline(char*)’:
client/drizzle.cc:2348: error: invalid conversion from ‘char* (*)(const char*, int)’ to ‘int (*)(const char*, int)’
make[2]: *** [client/drizzle.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

The problem here appears to be due to the fact that the readline implementation that ships with SnowLeopard (which I believe is just a wrapper under the editline lib) is 'incompatible' with Drizzle. The readline.h file in /usr/lib/readline defines some functions expected by Drizzle but not all of them, making it difficult to use the header file. To work around this you will need to build your own version of readline and compile Drizzle against it. The steps to do so are listed below:

1 - Download the readline 6.0 source tarball (other versions of readline may work, but I've only tested 6.0)

$ wget ftp://ftp.cwru.edu/pub/bash/readline-6.0.tar.gz

2 - Unarchive the tar ball 

$ gtar xfz readline-6.0.tar.gz 

3 - configure the readline build scripts. Note, I decided not to install the new version of readline in the default location but instead have it be installed in a specified location of my choosing. My reason for not installing the new version in the default location was that i did not want to upset any dependencies other parts of my system may have on the current version.

$ cd readline-6.0
$ ./configure --prefix=/Users/elambert/dev/readline-6.0

4 - Build and install readline. This step is fairly straight forward

$ make all && make install

5 - Configure Drizzle. If you are reading this blog, you should be fairly familiar with configuring Drizzle by now,but of note here is the fact that I specify the location to find readline with the --with-lib-prefix option. If you installed readline into the default location, you do not need to include this flag. 

$ cd <DRIZZLE_HOME>
$ ./config/autorun.sh 
$./configure --with-lib-prefix=/Users/elambert/dev/readline-6.0 \
--with-libdrizzle-prefix=/Users/elambert/dev/drizzle \
--prefix=/Users/elambert/dev/drizzle

6 - Build Drizzle

$ make -j2

7 - Test it

$ cd test
$ ./test-run

And thats it .... see not so bad was it.

DIFF FOR FDATASYNC FIX

=== modified file 'configure.ac'
--- configure.ac        2009-08-20 16:14:47 +0000
+++ configure.ac        2009-09-02 02:44:43 +0000
@@ -451,11 +451,27 @@
   AC_MSG_ERROR("Drizzle requires fcntl.")
 fi

+AC_CACHE_CHECK([working fdatasync],[ac_cv_func_fdatasync],[
+  AC_LANG_PUSH(C++)
+  AC_RUN_IFELSE([AC_LANG_PROGRAM([[
+#include <unistd.h>
+    ]],[[
+fdatasync(4);
+    ]])],
+  [ac_cv_func_fdatasync=yes],
+  [ac_cv_func_fdatasync=no])
+  AC_LANG_POP()
+])
+AS_IF([test "x${ac_cv_func_fdatasync}" = "xyes"],
+  [AC_DEFINE([HAVE_FDATASYNC],[1],[If the system has a working fdatasync])])
+
+
+
 AC_CONFIG_LIBOBJ_DIR([gnulib])

 AC_CHECK_FUNCS( \
   cuserid fchmod \
-  fdatasync fpresetsticky fpsetmask fsync \
+  fpresetsticky fpsetmask fsync \
   getpassphrase getpwnam \
   getpwuid getrlimit getrusage index initgroups isnan \
   localtime_r log log2 gethrtime gmtime_r \

=== modified file 'm4/pandora_ensure_gcc_version.m4'
--- m4/pandora_ensure_gcc_version.m4    2009-07-08 07:09:13 +0000
+++ m4/pandora_ensure_gcc_version.m4    2009-09-01 21:21:13 +0000
@@ -8,12 +8,15 @@
 AC_DEFUN([PANDORA_MAC_GCC42],
   [AS_IF([test "$GCC" = "yes"],[
     AS_IF([test "$host_vendor" = "apple" -a "x${ac_cv_env_CC_set}" = "x"],[
-      AS_IF([test -f /usr/bin/gcc-4.2],
+      host_os_version=`echo ${host_os} | perl -ple 's/^\D+//g;s,\..*,,'`
+      AS_IF([test "$host_os_version" -lt 10],[
+        AS_IF([test -f /usr/bin/gcc-4.2],
         [
           CPP="/usr/bin/gcc-4.2 -E"
           CC=/usr/bin/gcc-4.2
           CXX=/usr/bin/g++-4.2
         ])

Calendar

Feeds

Search

Links

Navigation

Referrers