cn=Directory Manager
All about Directory Server
All | Personal | Sun

20060508 Monday May 08, 2006

Tips for Developing LDAP Applications

So far in my posts I've tended to focus on Directory Server performance by tuning the server itself. However, the design of the client also plays an important role in the overall performance of the environment. A poorly-designed client can cause problems by consuming a lot of server resources and interfering with its ability to process other requests. This can be avoided by configuring resource limits in the server, but in the process it would prevent the server from fully processing the inefficient requests. Therefore, it is important to ensure that clients are designed properly so that they can perform the appropriate operations as efficiently as possible.

The following are a collection of tips that I've lifted from a presentation I gave last year at a Product Masters Event. They're not all performance-oriented, but hopefully they will be helpful. Some of these probably deserve their own posts, so I'll try to expand on them in the future.
  • Make sure to use LDAPv3 rather than LDAPv2. Some APIs still default to LDAPv2, but LDAPv2 doesn't support features like controls, extended operations, referrals, SASL authentication, and multiple binds on the same connection.

  • Use at least minimal caching to avoid repeating the same queries. If you include a list of attributes to return, then make sure that you include all attributes you may need rather than performing different queries to retrieve the same entry with different attribute lists.

  • Design your application to allow for loose consistency in replication and the possibility that reads and writes may happen on different systems without the application's knowledge. Avoid read-after-write behavior because it can have inconsistent results.

  • Don't treat the Directory Server like a relational database. Avoid splitting data into separate pieces so that you need to retrieve multiple entries to get all the information about a given entity.

  • If you generate search filters, then do so intelligently. If you have compound filters, then use a form like "(&(a=b)(c=d)(e=f))" rather than "(&(&(&(a=b))(c=d))(e=f))" to avoid unnecessary nesting.

  • Unbind connections when they're no longer needed. It's generally best to re-use connections as much as possible, but whenever you're done with a connection make sure it gets closed.

  • Know the standards. There are a lot of them, but RFCs 2251 through 2256 will give a good overview. The specifications define what's legal and what isn't -- just because something happens to work doesn't mean that you should depend on it if it isn't a standard behavior.

  • Learn to capture and interpret LDAP communication over the network. If your application is misbehaving then it will help dramatically if you can see the actual requests that it's sending and the responses that it is receiving. The LDAPDecoder tool provided with SLAMD has been designed specifically for this task, although other utilities like Ethereal or even Solaris snoop may be sufficient.

  • Be directory-independent. Even though we'd prefer that you use Sun's directory, it's always best to design applications that are based completely on standard behavior. Even Sun's directory may change the way that certain features are implemented between releases, and as such you should be wary of proprietary features. If you do use features that are implementation-specific, then try to compartmentalize them into pluggable code that can be easily replaced if necessary.

  • Use controls and extended operations wisely. Realize that not all servers support the same sets of extensions, and take that into account when designing the application. You can use the root DSE to determine whether the server supports a given control or extended operation. In some cases, it may be possible to provide a client-side implementation (e.g., client-side sort instead of server-side sort) as an alternative.

  • Don't litter your code with hard-coded attribute/objectclass names, base DNs, server addresses/ports, usernames/passwords, etc. If you need to change something later, it can be hard to make sure that everything gets updated properly. You should centralize all such values in a constants class or a properties file so that they are simple to change if necessary.

  • Where possible, maintain a set of persistent connections to the server (i.e., connection pools) rather than connecting and disconnecting for each operation. This will be much more efficient, especially when using SSL. In order to avoid leaking connections and duplicating large amounts of code, it may be a good idea to code the various types of operations into the connection pool itself so that those operations will check out a connection, perform the operation and any necessary error handling, and make sure the connection is put back into the pool.

  • If you do use pooled connections, then use the proxied authorization control (if the server supports it) to avoid the need to constantly rebind as a given user in order to perform operations.

  • Design your application to be able to handle the different kinds of failures that may arise: server down, network outage, DS backlogged or unresponsive, DS returning unexpected responses (e.g., unavailable or busy). Don't assume that a lost connection means the server is down -- it could be that the connection was closed due to the idle timeout or some other constraint.

  • In some environments, it may be necessary to allow for the possibility of failing over to a read-only server. In this case, your application should be able to follow referrals, and also to handle cases where none of the referred servers are available. Also, consider adding a read-only mode that could allow your application to continue with at least partial functionality in the even that no writable servers are available.

  • Consider the authentication mechanisms you might need to support. Simple authentication (bind DN and password) is virtually guaranteed to be supported, but may require SSL/StartTLS. In some cases (depending on access control configuration), SASL authentication may be required for some operations. You should never try to retrieve the hashed password and compare it against what the user provided because this can introduce significant security holes in your application and can bypass password policy and account lockout constraints.

  • When binding, make sure that the user actually specified a password, since simple binds that don't contain a password will be treated as anonymous. Consider using the authorization identity controls (RFC 3829) or the LDAP "Who Am I?" extended operation (draft-zeilenga-ldap-authzid) to ensure that the authentication was actually performed as the appropriate user and isn't anonymous.

  • When binding, make sure to check for password policy controls to see if the password has expired or will expire in the near future. Don't design your application to expect hard-coded result codes or error messages to figure out the reason for the bind failure.

  • Work within the access control constaints of the underlying Directory Server. Don't perform all operations as an administrator, as that may open security holes and can make auditing difficult or impossible. Avoid programmatic interaction with the ACIs of the underlying server because they are non-standard and the syntaxes may vary between servers or even between releases of the same server. Keep access controls simple and avoid using too many of them to reduce performance impact and preserve clarity. You may be able to use the GetEffectiveRights control to determine what rights the client has.

  • When designing your application, make sure to document the kinds of operations that may be performed, both individually and in sequences of operations. Consider developing a SLAMD job that can simulate the access patterns of your application to help administrators better understand the impact that changes to the server configuration might have on the applications using it.

  • Talk with the server administrators to discuss any schema or indexing changes that your application may require. Let them know about any controls or extended operations you might want to use to ensure they are supported and that they won't significantly hurt server performance. Also indicate anticipated usage levels to ensure that the administrators can prepare for the increased load that the application may cause.

  • Use groups effectively. Prefer dynamic groups over static groups since they are much easier to maintain and faster to work with. Avoid roles altogether, since they are non-standard functionality and don't really provide much benefit. If you must use static groups, then let the server determine the membership with a filter like "(|(member=userdn)(uniqueMember=userDN))" instead of retrieving the entire member list and trying to make the determination on the client side.


Posted by cn_equals_directory_manager ( May 08 2006, 08:43:48 AM CDT ) Permalink Comments [5]

Comments:

Got any tips how to set up LDAP on Fedora or Sol ?

Posted by Joseph on May 08, 2006 at 10:30 AM CDT #

If you're asking about configuring the OS to authenticate users against a Directory Server, then no I don't. That's not really an area that I get into. I have done both in the past (well, not with Fedora but with an old version of Red Hat, probably around version 6.2) and didn't find either to be particularly hard when using the vendor-supplied documentation.

Posted by Neil Wilson on May 08, 2006 at 11:04 AM CDT #

Any comment on how schema design might or might not impact client application? For example, my understanding is that it's preferrable for client applications to try to work with existing schema elements before extending it. Also, I believe the use of single-valued attributes, instead of multi-valued, is preferred. I frequently find developers who want me to sign a bloodoath that mutlivalued entries will always be returned in the same order they were written - while I THINK they will, I thought the standards didn't specify that they had to be. In addition, I thought that multi-valued attributes could have negative performance implications for any attribute that is frequently modified. And lastly, do the syntaxes defined for schema elements have any impact? Thanks for posting so much useful information here.

Posted by Bill McClintock on May 10, 2006 at 12:22 PM CDT #

I would agree that it is a good idea for applications to re-use existing attributes wherever possible. I have seen several cases in which customers have directories with several attributes in the same entry with the same values simply because the applications were designed to look for hard-coded names. From the server side, this may be possible to work around using attribute aliases (i.e., multiple names for the same attribute), but if you're designing an application then you should try to make the attribute names flexible.

As for single-valued versus multivalued attributes, it is primarily a function of how it will be needed. If you need to have multiple values, then using multivalued attributes makes sense, and is more elegant and flexible than trying to come up with some awkward syntax for storing multiple items in a single value. You are correct that the order of values cannot be guaranteed (RFC 2251 section 4.1.8 covers that). There is a proposed Internet Draft that discusses the possibility of adding ordering, but this is a very recent draft (it was only first published last Friday) and while I think it's interesting it is probably too early to seriously consider implementing it.

In the past, our replication mechanism has required potentially storing a lot of internal state information for multivalued attributes whenever they are modified, and that could cause performance issues, but the set of cases in which that is required has been reduced dramatically in the 5.2 patch 4 release so it is no longer an issue in most instances.

I'm not entirely sure what you mean by your last question regarding the impact of syntaxes. If you're asking about a performance impact, then no there isn't really a big performance impact as a result of what syntax you choose. There can certainly be an impact on whether the server performs operations as you would expect. For example, if an attribute value is a DN but the attribute type has the directory string syntax, then searches targeted at that attribute may not yield the expected results because the directory string syntax wouldn't know that it should do things like ignore spaces around commas or properly handle certain kinds of escaping that DNs might require. Your best bet is to always define attribute types with the most appropriate syntax in order to ensure that you get the expected results when interacting with those values.

Posted by Neil Wilson on May 10, 2006 at 03:22 PM CDT #

The specific syntax situation I have in mind is a migration for a WebLogic Portal embedded LDAP server to Sun's DS, in which the existing attributes are internally defined to use syntaxes the DS does not support. For example, 1.3.6.1.4.1.1466.115.121.1.9 - Certificate List syntax. The best bet I could come up with for a replacement was binary (.5). If we don't need to worry about performance, I guess we'll just set it up in a test environment as see if it works. Thanks again.

Posted by Bill McClintock on May 10, 2006 at 03:47 PM CDT #

Post a Comment:

Comments are closed for this entry.

Archives
Language
Links
Referrers