cn=Directory Manager
All about Directory Server
All | Personal | Sun

20070305 Monday March 05, 2007

Read-Only Replicas Considered Harmful

Back in the dark ages, the Netscape/iPlanet Directory Server 4.x only supported single-master replication. No matter how many directories you had, only one instance was writable and all the others had to be read-only. If you tried to write to one of those read-only directories, you'd get a referral redirecting you to the master. In simple deployments, you'd have what was basically a star topology, where each of the read-only replicas was directly updated by the masters. In more complicated environments, you might see replication hubs that accept changes from the supplier and forward them on to consumers, so that the supplier itself wasn't directly responsible for updating all of the read-only replicas. While the topology was simple, it didn't lend itself well to highly-available deployments, and it caused problems for applications that didn't handle referrals (although Directory Proxy Server was able to hide a lot of that from clients if it was installed).

When Directory Server 5.0 came out, we added the ability to have two masters (as long as they were both in the same data center). This was big step forward for high availability, and for many deployments where two servers were enough to handle all the load you didn't need to have any read-only replicas. However, if you wanted to have more than two servers, or if you wanted to have servers in multiple data centers, then you still had to have read-only replicas.

When Directory Server 5.2 was released, we added support for up to four masters, and support for masters in different data centers. This was an even bigger leap forward. For the vast majority of single data center deployments, four servers is more than enough to handle all the client load, and many two data center environments, two servers per data center was fine as well. However, you still needed those pesky read-only servers if you wanted to have more than two data centers with high availability in each one.

Now that Directory Server 6.0 is available, there's no longer any limit on the number of masters that you can have. You can make every server a master, and in the vast majority of environments that's exactly what you should do. No matter how many data centers you have or how many servers per data center, it's just plain easier if they're all masters. Note that you don't have to have them all directly connected to each other -- in larger environments spanning multiple data centers it's probably nice to have all of the local servers fully-interconnected but only a couple of cross-WAN links into other data centers -- but you can if you want. Some of the benefits of having only masters include:
  • Binary copy becomes much easier since you only have one type of server to manage. You can't use binary initialization from a master to a consumer (or vice versa), but if you have only masters then you can use binary copy between them as long as they meet the other constraints (e.g., they have the same system architecture, filesystem layout, and indexing configuration).

  • Password policy works better across the environment. If you have account lockout enabled and you try to authenticate to a read-only replica, then the failure counter will only be updated on that one replica but other servers in the environment won't know about it. But if it happens on a master, then that login failure will be replicated throughout the environment so that other servers will see it as well. The same is also true for the last login time functionality if you want to enable that.

  • If there are read-only replicas in the environment, then clients that try to modify them will need to be able to handle referrals, and in the event that they do encounter a referral and send their write somewhere else then they could still have to deal with issues around propagation delay if they immediately read the entry back from the read-only server after having to perform the write somewhere else. As I mentioned before, Directory Proxy Server can follow referrals on behalf of the client, and in fact the new release has features like server affinity that help avoid problems with propagation delay. However, if there are any clients that attempt to directly communicate with the server instead of going through a proxy, then it's a lot easier if all the servers are masters.

  • In the past, there were cases in which you were able to get higher modify performance if you constrained yourself to only sending writes to one server. That's not true anymore, and in fact you'll probably find that you get better overall write performance by spreading the writes out across multiple servers than you do if you just send them to one instance.


I have seen Directory Server 5.2 deployments that included read-only replicas just because the people who set things up thought that was just the way it was always done without thinking about whether or not it was the right approach. I have already seen a couple of cases with Directory Server 6 where people talking about how to deploy an environment were thinking about including read-only servers. Certainly it's still an option if you really do have a legitimate need for read-only servers, but don't feel like there's any need to do it that way simply because that's the way things were done in the past.

Note that with OpenDS, we're taking even more steps to help eliminate the last few potential arguments against making all servers masters. We're introducing an architecture where it's possible to separate the changelogs from the server instances (where only some of the servers need to have changelogs, or you can put the changelogs on completely separate machines ), so you can have masters without changelogs if you're concerned about the extra disk space associated with the changelog. We'll also be adding support for writable partial replicas (containing a subset of the attributes and/or a subset of the entries). If there are still other reasons that you you think might tie to into a scenario that requires read-only replicas then let us know so we can think about ways to eliminate those road blocks as well.

Posted by cn_equals_directory_manager ( Mar 05 2007, 07:43:13 PM CST ) Permalink Comments [5]

20070301 Thursday March 01, 2007

Data Distribution in DSEE 6

The latest version of the Sun Java Enterprise System was officially released today, and included in it is the 6.0 release of our Directory Server Enterprise Edition suite. There are some great changes in the Directory Server itself (no more limit on the number of masters, new graphical and command-line administrative interfaces, security improvements, added 64-bit platform support for Solaris x86/x64, etc.), and they'll make great fodder for future posts. However, I want to shift the focus of this entry to Directory Proxy Server. I haven't talked about it much in the past, but it has always offered very useful features like transparent load balancing and failover, improved compatibility for clients, data translation, and added security features. But Directory Proxy Server 6 takes a huge leap forward from its predecessor. Not only are there a lot of improvements in the core proxy functionality (e.g., operation-based load balancing, improved connection pooling, support for SASL EXTERNAL, etc.), but it also two major new categories of features: virtual directory operations and data distribution. In this post, I want to focus on data distribution.

The new data distribution capabilities in Directory Proxy Server 6 make it possible to dramatically scale the size and performance of your directory environment. On its own, the Directory Server is able to take advantage of large amounts of memory and large numbers of CPUs. However, eventually you're going to hit a limit on the amount of data you can put in a single box and still get acceptable performance. Some of our largest customers (both in terms of the number of entries in their directory environment and in the size of those entries) also have the very strict response time requirements (often single-digit milliseconds). To meet those requirements, you don't have the luxury of going to disk so you've got to serve the data all from memory (in some cases, going with a solid-state disk solution may be a possibility, but that's probably yet another topic for another time). Sun has some big machines (and for Directory Server, it's going to be hard to find anything available right now that can beat the Sun Fire X4600 with 16 Opteron cores and up to 128GB of memory, and if you've got to go monolithic then the E25K can hold over a terabyte of memory), but eventually there's a limit to what one box can hold.

Data distribution changes the game by allowing you to split up your data across multiple sets of servers. If one server can hold 25 million entries but you need to support 100 million, then you can break up the data into four sets. This is done in a manner that is virtually transparent to clients, so there's no need to artificially create hierarchy in your data or perform other kinds of transformations. When a client request comes into the Distribution Server, it figures out which set(s) of backend servers might need to be involved in processing that request, and then forwards it on to one of the servers in each of the sets (most of the time, only one set is involved, but some kinds of searches may need to involve multiple sets). You can customize how the data gets split up by specifying which distribution algorithm you want to use (or if you don't like any of them that are provided with the server, you can write your own), and you can customize the way that the Distribution Server picks the actual backend server within that set through a pluggable load balancing algorithm.

Another benefit that data distribution can provide is improved write performance. In the past, it's been easy to get improved read performance by simply adding more replicas, but that doesn't work for write operations because in a standard replicated environment, all of the changes have to go everywhere. With data distribution, replication only needs to occur between the servers in a backend set, so if you've got five sets of servers, then you've got the potential for five times the aggregate write performance. We've demonstrated this technology to a number of customers over the last couple of years, and we've seen some very impressive results.

I'll be the first to admit that data distribution isn't for everyone. It really is targeted at those environments with large amounts of data that can't fit on a single system, or for those cases in which the single-server write performance isn't adequate. If you're doing fine in your current environment and don't expect to grow by leaps and bounds in the near future, then it's probably not for you. There is a bit of a learning curve, and it's wise to put some thought into how best to split up the data. We've already got improvements lined up for when this functionality gets integrated into OpenDS that we hope will make it easier to use and lower the barrier to entry, but we're also making improvements that we hope will allow for more effective use of single-server (or single replicated set) deployments. If you're doing fine in your current environment and don't expect to grow a lot in terms of amount of data or performance requirements, then the traditional approach is probably still the best. But if you expect to see a lot of new data being added to the server, or a lot more stringent performance requirements, then data distribution might be right up your alley.

Posted by cn_equals_directory_manager ( Mar 01 2007, 03:03:10 PM CST ) Permalink Comments [2]


Archives
Language
Links
Referrers