Monday May 01, 2006 | cn=Directory Manager All about Directory Server |
Breaking Up (Directory Data) is Hard to DoIn the 4.x version of the Directory Server, all data was organized into a single database. With the 5.0 release, it became possible to create multiple backend databases. Each suffix must be in a separate database, but it is also possible to define sub-suffixes in their own backends. There can be benefits to having sub-suffixes, including the ability to define different kinds of indexes or different replication toplogies, but is there any performance benefit to be had? The answer (which is a common one when talking about performance) is "it depends".Many directory vendors recommend creating lots of branches so that the data can be split, but that's often because their servers don't scale beyond two or three processors and the only way that they can handle large numbers of entries is to break them up across lots of different systems or at least multiple instances on the same system. On the other hand, we're constantly testing our server on 4-way, 8-way, 12-way, 24-way, and 32-way systems with memory sizes up to a couple hundred gigabytes, and we're always looking to scale even higher so there's rarely an absolute need to break up the data just to get the scalability that you need. Of course, we're also working on scaling down to help make it possible to get better performance out of larger data sets on existing hardware, so this will help even further. I should point out that even if it is possible to run the server with a big monolithic database, that may not always be the best choice. It is certainly the easiest case in terms of keeping all the data together, ensuring the best compatibility for client applications, and giving you the flexibility to use whatever DIT you want. However, really big databases can cause headaches when it comes to things like backup and restore, and even more so for LDIF import and export. We are doing things in the Directory Server itself to help combat this in future releases, and external technologies like ZFS snapshots will also dramatically reduce the pain associated with these kinds of operations, but nevertheless there may be legitimate cases in which splitting the directory contents may be beneficial or even necessary. Historically, the way to achieve a split like this has been to introduce new hierarchy into the DIT or leverage existing hierarchy. These branches would then be split into separate databases in the same instance or even placed on separate instances with chaining to link them together. With the upcoming Directory Proxy Server 6 release (which is now in beta), a new option will be available in the form of data distribution. Distribution will make it possible to split the contents of a flat DIT across multiple instances on the same or separate machines without the need to introduce any hierarchy. This will be much more palatable to existing applications since the introduction of hierarchy is almost always a bad idea. There are both benefits and drawbacks to splitting the data. First, let's address the case where the data is split into multiple databases in the same instance:
Most of this remains the same if the data is split across multiple instances, whether on the same or different systems. The backup and restore time does get reduced since each individual server has less data, and if you use the data distribution features coming in Directory Proxy Server 6 then you can avoid adding unnecessary hierarchy. However, there are new problems/benefits that can arise as a result of this. The first is that in some or all cases, the overall latency (i.e., the length of time that elapses between the client sending the request and receiving the response) may be increased. If all requests are forced to go through a proxy (which will be the case with distribution) or at least some of them need to be chained to another server, then there will be some time required for the additional processing and network communication. Even though the overall throughput (in terms of operations that can be processed in a given amount of time) may be higher, the latency will be as well and it may adversely impact clients that are sensitive to the response time. The increased latency may be even more evident if there are requests that need to be sent to multiple instances. If the associated request doesn't contain anything in it that is specific enough to limit it to just one instance, then that request may need to be broadcast to multiple instances which can increase the total load against the directory environment. Another issue is that splitting the data among multiple systems means that you need to have more systems running the Directory Server, and potentially others running Directory Proxy Server. This can create additional work for administrators in order to ensure that all systems are kept up to date and running properly. However, this can have some benefits as well because in cases like this it is generally possible to use smaller, cheaper machines to run the Directory Server for each portion of the data when compared with what would be required to run a large monolithic instance. It can also make it feasible to cache the data set across many smaller systems where it isn't an option as a single large data set. Ultimately, the decision to split the data into multiple chunks isn't one that should be taken lightly. In some cases, it may be the best option (or the only one that is feasible) but most of the time there will be other strategies that will work out better. In general, I wouldn't recommend seriously considering it unless you have a database size at least into the tens of millions of entries, and then it's probably something that we should look at on a case-by-case basis. We work with customers all the time to help determine the best course of action, and if you are considering splitting your data either in the same instance or across multiple instances then it's probably a good idea to have someone take a look at it to see if that is the best choice. Posted by cn_equals_directory_manager ( May 01 2006, 08:34:58 AM CDT ) Permalink Comments [1] Post a Comment: Comments are closed for this entry. |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Posted by Freeman Fridie on May 02, 2006 at 11:17 AM CDT #