/var/adm/blog
Two wrongs do not make a right ... but two 'set's can make a 'delete'
I've spent some time the past couple days looking through the memcached server code and I noticed something interesting in how we process set commands when the server is configured not to evict items when low on memory(-M). For a split second, the behavior I was seeing in the code seemed wrong to me, but after giving it a bit of thought I realized that the behavior made perfect sense. Without getting into the details upfront, let me show you an example.
Lets start a memcached server. The behavior in question happens when the server is not able to fulfill a set request due to memory constraints, so to make it easier to achieve this condition, lets limit our cache size to 1MB.
smacky:memcached elambert$ ./memcached -m 1 -M |
Now that we have our server up and running, I want to add some data. On my system, I have file called 'afile', its 524,288 byte file that consists of nothing but the letter A. Lets store the contents of this file and associate it with the key 'key1'.
smacky:~ elambert$ ls -ld afile
-rw-r--r-- 1 elambert staff 524288 Mar 25 14:58 afile
smacky:~ elambert$ val=`cat ./afile`; printf "set key1 0 0 524288\r\n$val\r\n" | nc 127.0.0.1 11211
STORED
smacky:~ elambert$
Now, some time passes, and I want to insert the content of file 'bfile' (a 524,288 byte file filled with the letter B) into the cache, again under the key 'key1' (essentially replacing the old entry for 'key1').
smacky:~ elambert$ ls -ld bfile
-rw-r--r-- 1 elambert staff 524288 Mar 25 14:59 bfile
smacky:~ elambert$ val=`cat ./bfile`; printf "set key1 0 0 524288\r\n$val\r\n" | nc 127.0.0.1 11211
SERVER_ERROR out of memory storing object
smacky:~ elambert$
Well, it looks like we don't have enough room in the cache and since we told the server not to evict entries we get an out-of-memory error. So what does this mean for our 'key1' entry. Well, lets retrieve it and find out.
smacky:~ elambert$ printf "get key1\r\n" | nc 127.0.0.1 11211 |
Wow, thats interesting. The cache says it does not have an entry for 'key1', yet we were able to store the 'afile' contents under the key 'key1' and we told the server not to evict an items. Shouldn't the original entry still be in the cache? Well, actually, no. The server did exactly what it should have, which is delete the entry when the second set operation failed.
For some of you this may be obvious, but I suspect others are finding this behavior a little counter-intuitive and are wondering why a failed set would result in delete, especially when we tell the server not to evict items.
But to understand this behavior, we need to realize that memcached is ... well ... a cache. It is not designed to be the sole source of truth in your system. Memcached is just an 'optimization' that aides you in scaling your system and that it should always operate in tandem with a reliable store (might I suggest MySQL for this purpose :-) ). It is that store which acts as 'truth' for your system. And in a well designed system, updates must first be applied to this backend store before being applied to the cache. So even though the second set operation fails, the mere fact that an attempt was made to modify the value of the entry tells the server that the entry for 'key1' is most likely no longer in sync with the our backend --otherwise, why else would we attempt to change the value. Armed with this knowledge, the worst thing the server could do is leave the entry for 'key1' as is. If it were to leave the original entry, than any proceeding get request would get a stale value. A better strategy is to delete the item. By deleting the entry, clients that request the key will incur a cache-miss and simply go to the backend for the correct value.
Another point of confusion is the presence of the -M (don't evict) flag. Some people may interpret this flag to instruct the server to never evict an item. If you read the description of the flag, you'll see that is not the case. This flag only has meaning when the cache has exhausted its memory. If the server is out of space and we have specified the -M flag, the server will not evict an entry to make room, but it is still free to evict entries for many other reasons, including the example discussed above.
Posted at 04:32PM Mar 25, 2009 by Eric Lambert in Memcached | Comments[0]
basic UDP support in libmemcached
Yesterday I submitted a patch to libmemcached, the popular Memcached client library. The patch provides basic, yet limited, support for using the User Data Protocol (UDP) when communicating with a Memcached server. For those of you interested, you can get the source and documentation at http://hg.tangent.org/libmemcached .
In this post, I will describe how this patch behaves and describe some of the rational for how it was implemented and how it should be used.
UDP SUPPORT IN MEMCACHED SERVER
The Memcached Protocol Specification (http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt), does allow for UDP communication with the Memcached server. The Memcached UDP specification is a super-set of the TCP specification, it follows the same format as the Memcached server TCP protocol with the following additions:
- each request must be entirely contained within a single UDP data-gram.
- each data-gram request must include an 8 byte frame header (see specification for details). Components of this header provide basic session management as well as a means to ensure that the client is not violating the Memcached server UDP protocol, for example by sending multi-data-gram requests.
- When responding to a request, the server is not limited to a single data-gram. The response may be a multi-datagram request.
UDP SUPPORT IN LIBMEMCACHED
Limited Operations
Each of the constraints placed upon UDP communications with the Memcached server (see above), combined with the stateless/connectionless nature of the UDP protocol, ended up having an impact on how UDP support was implemented in libmemcached.
The most significant impact was the decision to limit the type of operations which could be executed when running in UDP mode. There exists a set of operations in libmemcached that issue a request of the server and then require a response from the server, for example memcached_get(). The fact that the nature of these types of operations require a response from the server posses a difficulty since UDP gives us no assurance that either the request or the response will be delivered. That the server may respond to a request with a multi-datagram message further complicates the matter since UDP does not ensure proper ordering of data-grams that have been received. Since ordering and delivery assurance is the purview of TCP and not wanting to rewrite TCP with in libmemcached, we decided to simply not support these types of operations in UDP mode. As such the following functions are not available when using UDP communication:
memcached_version(), memcached_stat(), memcached_get(), memcached_get_by_key(), memcached_mget(), mem-cached_mget_by_key(), memcached_fetch(), memcached_fetch_result(), memcached_value_fetch(). All other operations are supported.
Fire-and-Forget
As is hinted to in the section above, the UDP implementation does not attempt to handle responses from the server, the rationale being that since we can not be assured to receive the response, we should assume that we wont receive it. Because of this assumption, all supported operations are executed in a "fire-and-forget" mode, by which it means that once the operation has been executed by the client, no attempt is made to ensure that the operation was received or executed by the server. Since no attempt will be made to handle the server's response, when executing an operation in UDP mode the 'noreply' option is sent for all operations which support noreply.
Limited Size
In UDP mode, the current implementation limits the size of cache entries that can be sent to the server to 1KB. Technically, the limit of user supplied data is above 1300 bytes, the exact limit depending on whether the binary or ASCII protocol is being used and if the ASCII protocol is being used then the limit depends on which command is being executed (set, replace, cas, etc.). For simplicity's sake the user should consider the limit to be 1KB, although the client will allow operations above 1KB as long as the entire message --include UDP header and command overhead, does not exceed 1400 bytes. This limit is a consequence of the fact that the server does not support multi-datagram requests, which implies that no cache entry may be larger than a data-gram. Currently, the server defines the maximum data-gram size as 1400 bytes and to be consistent with the server, our implementation abides by this limit (it should be noted that the server currently only uses this limit when crafting UDP responses --which we ignore-- so it is conceivable that we could raise the client-side size limit to a larger value).
No Mixing
The primary data-structure in the libmemcached library is memcached_st, which acts a handle for the client. The typical use pattern is to create an 'instance' of this structure (by calling memcached_create() ) and then adding server 'instances' to this structure for each Mecached server we wish to communicate with (via memcached_add_servers()). Our UDP implementation does not allow for UDP and TCP servers to be added to the same memcached_st client instance. When we consider the fact that UDP server instances will not be able to support all operations (such as get, see above) we can see how mixing UDP and TCP servers in the same client would be problematic. As an example, let us assume we have added ServerA and ServerB to the same client instance. If ServerA were using UDP while serverB used TCP, than any get operation who's key hashed to ServerA would fail, since UDP get is not supported, while those that hash to Server B would pass (assuming the key exists in the cache). This inconsistent behavior is undesirable and as such the mixing of UDP and non-UDP servers in the same client 'instance' is not allowed. Should an application need to use both UDP and TCP to communicate with its Memcached servers, this can be accomplished by using two separate memcached_st client 'instances', one for each transport protocol (see examples below).
EXAMPLES
The example below shows how to set up a libmemcached client which communicates with the server via UDP
memcached_st *memc;
//creates handle for client instance
memc= memcached_create(NULL);
//turns on udp behavior for the memc client handle
memcached_behavior_set(memc, MEMCACHED_BEHAVIOR_USE_UDP, 1);
//add the server 127.0.0.1:11211 to the client
memcached_server_add_udp(memc, "127.0.0.1", 11211);
//store some data
char *key= "foo";
char *value= "when we sanitize";
memcached_return rc= memcached_set(memc, key, strlen(key),value, strlen(value),(time_t)0, (uint32_t)0);
This example shows how to set up two clients, one using UDP and one using TCP
memcached_st *udp_client;
memcached_st *tcp_client;
//creates udp handle for client instance
udp_client= memcached_create(NULL);
//creates tcp handle for client instance
tcp_client= memcached_create(NULL);
//turns on udp behavior for the udp client handle
memcached_behavior_set(udp_client, MEMCACHED_BEHAVIOR_USE_UDP, 1);
//add the server 127.0.0.1:11211 to the clients
memcached_server_add_udp(udp_client, "127.0.0.1", 11211);
memcached_server_add(tcp_client, "127.0.0.1", 11211);
//store some data
char *key= "foo";
char *value= "when we sanitize";
memcached_return rc= memcached_set(udp_client, key, strlen(key),value, strlen(value),(time_t)0, (uint32_t)0);
//get some data; note we use the tcp client
size_t vlen;
uint32_t flags;
char * value= memcached_get(tcp_client, key, strlen(key),&vlen,&flags,&rc);
FUTURE
We recognize this is a fairly basic implementation of UDP support in libmemcached; the idea was to start small and use feedback from this implementation to determine what a more fully formed implementation should look like or if one is even needed at all. So if you are interested, please download the bits, kick the tires and tell us what ya think.
Posted at 09:14PM Mar 10, 2009 by Eric Lambert in Memcached | Comments[0]
Memcached Java Client Performance on OpenSolaris: Part II
Happy New Year! I hope 2009 will be a great year for all of you!
As for me, I am starting out 2009 the same way I finished 2008 ... looking at the performance of the Spy and Whalin Memcached clients on OpenSolaris. Today I re-ran nearly the same scenario that I discussed in my Dec 22nd blog entry (you'll have to read the blog for all the details) but with one major difference, I increased the size of the cache entries being placed into the cache. In my earlier tests, the entries I was storing into the cache were pretty small --varying from 768 bytes to 1,280 bytes. This time, I increased the size of the cache entries to 30,800 bytes. While a 30KB file is not exactly large, it is large enough to ensure that both clients will compress the data before sending it to the Memcache server.
The data for this new scenario is captured in the table below. The workload used in this scenario consisted of 90% get operations and 10% set operations. All get operations were 'bulkGet' operations that each requested 100 entries from the cache.
| Whalin 2.0.1 | Spy 2.2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Threads | Ops/Sec | setLatency | getLatency | cpu % busy | NIC Saturation | Ops/Sec | setLatency | getLatency | cpu % busy | NIC Saturation |
| 1 | 43 | 1.3 | 25.6 | 26 | 0.4% | 45 | 1.6 | 24.0 | 26 | 0.4% |
| 10 | 91 | 6.1 | 121.2 | 99 | 0.9% | 115 | 9.5 | 95.3 | 98 | 1.2% |
| 20 | 91 | 12.3 | 242.9 | 98 | 0.9% | 114 | 18.9 | 192.8 | 98 | 1.2% |
| 30 | 89 | 26.3 | 371.9 | 99 | 0.9% | 101 | 44.4 | 321.2 | 90 | 0.9% |
| 40 | 87 | 37.8 | 511.9 | 98 | 0.8% | 14 | 140.7 | 1913.4 | 14 | 0.1% |
| 50 | 86 | 56.2 | 645.0 | 98 | 0.8% | 9 | 307.0 | 4235.3 | 9 | 0.5% |
As we can see, in this scenario, the performance of the Whalin client is much closer to that of Spy than in the non-compression/small-data scenario. When using small cache entries, the Whalin client performed at approximately 75% of the rate of the Spy client, where as in this scenario, we see that the clients are fairly equally matched in terms of ops/second as well as resource utilization and that somewhere between 30 and 40 threads the Whalin client exceeds the performance of the Spy Client (although, since both clients appear to plateau around 20 threads, not sure there is much gained by running over 30 threads).
Posted at 10:05PM Jan 06, 2009 by Eric Lambert in Memcached | Comments[0]
Memcached Java Client Performance on OpenSolaris
So I have spent some time in the last month taking a look at how the two main Java-based Memcached clients (Whalin and Spy) perform when run on OpenSolaris. The point of this exercise was to generate some meaningful and useful data that we could use to understand the behavior of these two clients when compared with each other.
I should point out that no attempt was made to optimize either client nor the environment in which the clients were run. The idea was to simulate, as much as possible, an 'out-of-the-box' experience. So clearly, the data below should not be taken as the optimal performance for each of the clients --as I am sure that there are means to squeeze more performance out of them. I also want to state that the results of this comparison should not be considered a repudiation or endorsement of a particular client.
So, with all these caveats out of the way, let's get to it ....
THE EXECUTION ENVIRONMENT
The environment used to measure the performance consisted of Four Dual-Core 2.2 GHZ Sun Fire x2200 M2 with Four GB of memory running OpenSolaris build 90 (snv_90). All of the x2200s were on the same sub-net and each had GigE NICs. Three of the x2200s each hosted a Memcached server instance. The server instance was running version 1.2.5 and was allocated 1 GB of memory. The fourth x2200 hosted the client under test.
THE BENCHMARK
I used the Faban benchmarking framework to run these benchmarks. The really nice thing about this framework is that it allows you to focus on creating the workload logic, while it takes care of data collection and process/service management.
The workload pattern consisted of the following steps
- Start the 1.2.5 Memcached Server on the 3 server hosts.
- Preload 1,000,000 objects into the cache. The size of the objects in the cache varied from 768 bytes to 1,280 bytes with an average size of 1,024. The entries were distributed equally so that each server node was holding approximately 367 MB of cache data.
- Start the client load. The client load came from a single Java VM (1.6.0_6 in 64bit mode) running on a single host. The number of client threads varied from 1 - 50.
- Load was applied for 330 seconds, with a ratio of 90% get operations to 10% set operations. Gets were performed in a bulk fashion, with each get asking for 100 cache entries.
- Shutdown the servers and collect the data.
It should be noted that the Spy Client provides the user with the ability to perform non-blocking I/O while Whalin does not. In order to provide a more "apples-to-apples" comparison, the Spy Client was used in a blocking manner. The Spy client was also configured to use the WhalinTranscoder.
The following data was collected while executing this benchmark
- Operations-per-second (Ops/Sec): Total number of operations completed divided by number of seconds during which the benchmark ran.
- meanSetLatency (setLatency): Arithmetic mean for how long, in milliseconds, individual set operations took to complete.
- meanBulkGetLatency (getLatency): Arithmetic mean for how long, in milliseconds, individual getBulk operations took to complete.
- CPU utilization (cpu % busy): Percentage of time the CPU was busy as measured by the vmstat utility.
- NIC Saturation: Network Card Saturation as captured by the nicstat utility
THE RESULTS
| Whalin 2.0.1 | Spy 2.2 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Threads | Ops/Sec | setLatency | getLatency | cpu % busy | NIC Saturation | Ops/Sec | setLatency | getLatency | cpu % busy | NIC Saturation |
| 1 | 272 | 0.5 | 4 | 29 | 20 | 581 | 0.4 | 2 | 34 | 45 |
| 10 | 548 | 5 | 19 | 92 | 45 | 764 | 7 | 13 | 48 | 60 |
| 20 | 544 | 7 | 40 | 92 | 45 | 759 | 13 | 28 | 48 | 60 |
| 30 | 538 | 8 | 61 | 92 | 45 | 763 | 18 | 42 | 49 | 60 |
| 40 | 534 | 9 | 83 | 92 | 45 | 702 | 25 | 61 | 45 | 58 |
| 50 | 541 | 11 | 102 | 92 | 45 | 746 | 28 | 72 | 48 | 60 |
FINDINGS/OBSERVATIONS/NEXT STEPS
The data above indicates that the Spy client can achieve a higher throughput while using less CPU than Whalin and that Spy achieves a greater saturation of the NIC than Whalin. With this in mind, it is worth again noting that no attempt was made to optimize either client and that it is entirely conceivable that each client could be configured to achieve greater performance.
Both clients appear to reach a performance/resource utilization plateau between 1 and 10 threads. Further benchmarking (not included in the data above) has shown the plateau occurs between 1 and 5 threads.
The lower setLatency averages achieved by the Whalin client indicate that in a more 'set-intensive' environment, Whalin may achieve higher performance than Spy.
The fact that the Whalin client has such a high CPU utilization rate is fertile ground for investigation. Is this something that can be addressed via a change in the runtime environment/configuration?
The benchmark used very small cache entries that were below both clients' compression threshold. What will the results look like when the cache entries exceed the compression threshold?
The benchmarking was performed using the ASCII protocol, how will things change if we run against a 1.3.X server and enable the binary protocol?
Posted at 11:16PM Dec 22, 2008 by Eric Lambert in Memcached | Comments[0]