cn=Directory Manager
All about Directory Server
All | Personal | Sun

20051220 Tuesday December 20, 2005

Directory Server disk layout: the "new" way

In my last post, I provided details about the conventional way of layout out Directory Server components on disk. This works well for traditional filesystems like UFS and VxFS, but it is also somewhat wasteful because you're not necessarily getting the most out of the underlying storage. During a checkpoint, the DB disks go to 100% utilization, but are usually idle the rest of the time. The transaction logs don't generate all that much write traffic but can do a lot of expensive fsyncs. It might be helpful if you could pool all the underlying disks together and spread the load across all of them, but you've still got a lot of expense of all the seeks when the disk heads move back and forth from the DB files to the access logs to the transaction logs.

Fortunately, there is a way to get the best of both worlds. You can get the performance benefit of pooling all the disks together and harnessing their collective throughput while at the same time performing only sequential writes so that the disk heads aren't spending all their time going from one place to another. And in fact, you can even further improve I/O performance by reducing the amount of data that actually needs to be written to disk. Then couple that with unprecedented error detection and correction capabilities, and for good measure add in the ability to do instantaneous zero-downtime backups and the ability to restore just as quickly.

If you're thinking that something like this sounds too good to be true, then you're right (at least for now). If you look in Solaris 10, you won't find anything like what I've described. If you look in Linux or AIX or HP-UX or Windows, then you won't find it there either. There is no officially-supported operating system that can offer this capability. However, it's on its way into a future version of Solaris in a fully supported manner, and you can get it today through either OpenSolaris or Solaris Express. I am, of course, talking about ZFS.

All of the things that I said above are absolutely true:
  • ZFS provides many ways in which you can pool all your disks together. You can stripe multiple disks together to make one big volume with no redundancy, or you can mirror disks to make them fully redundant, or you can stripe mirrors or mirror stripes if you want to. And of course, there is RAID-Z, which is kind of like RAID-5 in that it provides striping with pairity so that you only lose the capacity of a single disk (no matter how many disks you have in the pool) without giving up redundancy and fault-tolerance.

  • ZFS uses a copy-on-write (COW) approach to I/O so that all writes are sequential so you don't need to have disks seeking all over the place when performing writes.

  • ZFS offers the ability to use compression in order to even further improve performance in many cases. This may sound counter-intuitive, but most of the time the CPU overhead required to perform the compression is pretty handily outweighed by the performance gains that you get from having to put less data on the disk. And as an added bonus you get increased storage capacity at the same time (and directory data generally compresses very well).

  • ZFS provides end-to-end data integrity using 256-bit message digests with a few options for the digest algorithm. You can even use the 256-bit SHA-2 variant if you're really paranoid. This checksumming mechanism goes beyond what you can get from storage hardware because it can catch errors that happen outside the hardware. See this post for more details.

  • ZFS provides an instantaneous point-in-time snapshot mechanism that allows you to create atomic views of the data that you can instantly roll back to if necessary, or even clone to have multiple divergent branches. And because of its copy-on-write nature, the only extra disk space consumed by a snapshot is the amount taken up by the blocks that have changed after the snapshot was taken (so a snapshot doesn't take any disk space initially, but the amount of space consumed grows over time as changes occur). Because you can create them in an instant and there's very little overhead in having them, you can take lots of snapshots over the course of a day and roll back to any one of them if the need arises. Of course, that assumes that your redundant storage is intact, but if you want to copy a snapshot to another system then you can easily do that as well.

I could go on, but there are a lot of people far more qualified than I am that can tell you all about what ZFS has to offer and how it's implemented. So let's get back to the business of how it can help you with your Directory Server deployment.

Last time, I mentioned how you can split up the various Directory Server components onto separate disk subsystems for better performance. Today, I'm going to talk about how you can keep all those components together and combine all the disks that you were using into a single pool for better results and easier administration. As a simple example, let's assume that you took my advice and you have three different disk subsystems that you had previously been using for the DB, transaction logs, and everything else, respectively, and you want to use them in a single ZFS pool. Let's assume that the device IDs for those disks are c0t0d0, c1t0d0, and c2t0d0. You can create a RAID-Z pool covering all of them with the command:
zpool create directory-pool raidz c0t0d0 c1t0d0 c2t0d0

Once that's done, you will have a ZFS pool named "directory-pool" that is conveniently mounted at "/directory-pool". If you want it mounted somewhere else (e.g., "/export/ds"), then you can change that by setting a value for the "mountpoint" option. While we're at it, let's also enable compression and turn off access time tracking. That can be done with the commands:
zfs set mountpoint=/export/ds directory-pool
zfs set compression=on directory-pool
zfs set atime=off directory-pool

And there you go. You now have a fully-functional, high-performance, redundant, checksummed-out-the-wazzoo storage pool in which you can install and run your Directory Server. And the only files that you need to relocate for better performance are the DB cache backing files (which should always go on a tmpfs filesystem).

So what about creating and restoring backups? We'll start with snapshots, since they are the fastest and cheapest way to do it, and all the other mechanisms are based on them. To create a snapshot of our ZFS filesystem, we can use the command:
zfs snapshot directory-pool@snap1

This will create a snapshot of the "directory-pool" filesystem named "snap1" comprised of whatever happened to be on the disk at that time. At any time after that, we can roll back to that snapshot using the command:
zfs rollback directory-pool@snap1

I should point out that this is a completely safe backup and recovery mechanism that will work as long as the underlying storage is OK. You can take a snapshot in the middle of your heaviest period of write activity and if you need to restore it later, then you'll end up with a database that has exactly the same contents as it did when you took the snapshot, and the DB recovery time (the time that the database will spend replaying transaction logs when it is started) will be much shorter than if you had used db2bak because there is no need to temporarily prevent transaction log removal while the copy is in progress because the copy is instantaneous and therefore there won't be that many outstanding transactions to replay.

If you want to play it safe and back up your data to a remote system (which is always a good idea), then you can easily do this by first taking a snapshot and then using the following command to create a file containing a backup of that filesystem:
zfs backup directory-pool@snap1 > /backup/snap1.backup

Note that the "zfs backup" command will actually write the data to standard output, so you can send it to a file or pipe it to another process or whatever you want to do to get it where it needs to go. In this case, we can assume that "/backup" is an NFS-mounted volume or some other safe, remote repository.

The command given above will create a full backup based on the specified snapshot. You can create incremental backups as well by specifying two snapshots. For example:
zfs backup -i directory-pool@snap1 directory-pool@snap2 > /backup/snap2.incremental

Note that you can technically use this backup to initialize a ZFS filesystem on another system, so theoretically this could be used as a faster means of performing binary copy initialization for replicas. However, the mechanism that I've described here won't work well as-is because the backup will contain the entire Directory Server installation, including things like the configuration and logs that you don't want to put on the other system. It would be nice to be able to just copy the database files over to the other system, and in fact if we plan ahead we can allow for that when we first set up our filesystem and use a separate ZFS filesystem for the database. This is pretty easy to do, but it will take a few steps to describe and this post is already getting long enough, so I'll save that one for a future post. Or if you can't wait, then you should be able to figure it out for yourself using the zpool(1M) and zfs(1M) man pages and using cpio(1) to transfer the files.

Posted by cn_equals_directory_manager ( Dec 20 2005, 11:55:10 PM CST ) Permalink Comments [2]

20051218 Sunday December 18, 2005

Directory Server disk layout: the "old" way

When you deploy Directory Server in a production environment, you have some decisions to make about the way that you lay out the components on disk. By default, all of the components are placed under the instance root, but when using a traditional filesystem like UFS, QFS, or VxFS, there may be a significant advantage to splitting these components out onto different disk subsystems.

Note that I refer to this as the "old" way because ZFS is going to change a lot of these recommendations, and therefore my next post will talk about it as the "new" way. Of course, ZFS has only recently been released through OpenSolaris and Solaris Express so they may not be considered quite ready for production use (although ZFS has been used internally for Sun home directory servers and other critical components for over a year and a half with no corruption or data loss), so the "old" way is going to be the most common way for now.

First, let's discuss which Directory Server components are candidates for being split out and the type of disk I/O generally associated with each:
  • The Main Database Files -- These are all the *.db3 files that comprise the Directory Server data and index files. With ideal caching, there will not be any read activity against these files, but if that's not the case, then it will be random access, and even a little activity in this area can saturate the underlying storage subsystem. Writes to these files will occur only during checkpoints, and the DB will try to optimize for sequential writes, but if there have been a lot of changes since the last checkpoint, then this can still be very disk intensive and saturate the underlying storage.

  • The Transaction Logs -- These are all the log.* files, which hold a record of all changes that have been made for use when updating the main database files during a checkpoint. Reads from these files should only occur during checkpoints, and they should only be sequential, but generally they will be held in the filesystem cache so there shouldn't be much actual disk I/O there. Writes to these files are also sequential and the write rate by itself does not generally saturate the disks, but the frequent fsync(3C) operations to ensure that the information is on-disk can be very costly. As I wrote earlier, you can use transaction batching to share these fsync calls among multiple writes, but under heavy write load it can still be pretty expensive.

  • The Changelog DB Files -- The Directory Server changelog (and I'll throw the retro changelog in here too because its access patterns are about the same) keeps a record of all changes that occur for the purpose of replicating them to other systems. Both reads and writes here are sequential, and generally are not enough to saturate the underlying storage.

  • The Server Log Files -- The server log files include the access, error, and audit logs, as well as the referential integrity log file (if this plugin is enabled and configured for asynchronous mode, which it should be). The Directory Server itself doesn't read from these files (except possibly for the referential integrity log file, but it will usually be held in the filesystem cache), but if they are read by an external process (e.g., a log analysis tool), then it will be sequential. Writes to these files are always sequential, and are generally not enough to saturate the underlying storage. The main exception to this is the case in which audit logging is enabled (and it is not by default) because writes to it are not buffered and therefore each write would require an fsync, putting it into the same category as the transaction logs. The same is also true for the error log, but unless you've got some kind of debugging enabled or a significant problem of some kind, writes to the error log will be negligible.

  • The DB Cache Backing Files -- By default, the database cache uses mmap(2) for its memory, which means that changes to the content of the DB cache will ultimately be reflected on disk. The use of mmap makes it possible for multiple processes to access the database concurrently (e.g., allowing you to use db2bak or db2ldif while the server is running), but if there are a lot of changes to the DB cache content, then that can cause a lot of disk thrashing. On Solaris, the way to mitigate this problem is to store the DB cache backing files on a tmpfs volume (e.g., in a directory under /tmp). On other operating systems, you should use a ramdisk or whatever equivalent will allow the files to be held in memory. The DB cache is only used while the Directory Server is running, and it doesn't need to be re-used if the server is stopped and then restarted, so there isn't a problem if these files are lost if the system is rebooted. The only potential penalty is that if the files are lost then they need to be recreated when the server is restarted, and for a large DB cache that can take a noticeable amount of time.

  • Backup Files -- Directory Server backups involve sequential disk I/O, both when reading the current DB and when writing the backup files. Binary backups (those created by db2bak or db2bak.pl) can saturate the underlying storage both for reads and for writes. For LDIF backups, the act of reading the DB will generally be sequential and should only cause problems if there's a lot of disk I/O in the DB to support other operations that may be going on, but in that case it can be disruptive. Writing to the LDIF file will always be sequential and usually not enough to saturate the underlying storage.

Based on all of this, it's not too difficult to put together a pretty simple set of recommendations:
  • You should always use the nslapd-db-home-directory configuration attribute to relocate the database cache backing files to tmpfs or a memory-backed filesystem. This is a completely free optimization (because the mmap process won't cause the files to consume the memory twice) and there's just no reason to not do it.

  • If you have the ability to use at least two different disk subsystems for Directory Server components, then you should dedicate one of them for use by the main database files with the nsslapd-directory attribute. Note that this needs to be set not only in the "cn=config,cn=ldbm database,cn=plugins,cn=config" entry, but also in the configuration entry for each backend (e.g., "cn=userRoot,cn=ldbm database,cn=plugins,cn=config").

  • If you have the ability to use at least three different disk subsystems for Directory Server components, then you should dedicate the second for use by the transaction log files with the nsslapd-db-logdirectory attribute.

  • If you have audit logging enabled, then it's probably a good idea to try to isolate the server log files onto their own storage. This can be done using the nsslapd-accesslog, nsslapd-errorlog, and nsslapd-auditlog attributes in cn=config, and also if you've enabled the referential integrity plugin then the log for it may be specified using the nsslapd-pluginarg1 attribute of the cn=Referential Integrity Postoperation,cn=plugins,cn=config entry.

Normally, when we are performing a Directory Server benchmark, we use three storage arrays (in the past, it was the StorEdge T3B array, but now we're using the StorEdge 3510 array) for the server components. Each of them are configured with RAID 1+0 and use UFS with the logging and noatime options. One of these disks is used for the Directory Server database, one for the transaction logs, and the third for the server logs, changelog, and backups. We also put the DB cache backing files in a subdirectory under /tmp.

Of course, all of this talk about disk layout brings up a lot of questions. Some of the most frequently asked questons in this area are:
  • We've got an EMC storage array, so this doesn't apply to me, right? -- This question comes up pretty frequently, and it does less frequently get asked for storage solutions from other vendors, but EMC customers in general seem to think that their storage is some kind of magical device with the ability to circumvent the laws of physics (maybe it's because of how expensive they are). But the fact is that because of the way the server interacts with each of its components, the various types of accesses in a busy directory do have the ability to saturate almost any kind of underlying storage. Unless you can guarantee that all writes will be sequential (like ZFS can), then you'll probably find that it's better to isolate these components to avoid cases where I/O targeted at one component won't interfere with I/O for another.

  • Why should I use UFS instead of VxFS or QFS or {other filesystem here}? -- If you're on Solaris (at least, Solaris 9 after update 2 or any version of Solaris 10), then UFS with logging is generally faster than VxFS for Directory Server operations (and in fact, for many common access patterns). Sun has spent a lot of time optimizing UFS and in most cases it is now faster than the alternatives. It's also quite a bit faster than QFS, but that's more of a specialty filesystem for clusters and not really ideal for use with Directory Server.

    On Linux, the question of which filesystem to use may also have some relevance. The default filesystem on Red Hat (and often the only one available by default) is EXT3, but our testing has shown that both JFS and XFS are usually quite a bit faster. Although we haven't tested Reiser4, earlier versions of ReiserFS were found to be quite a bit slower than EXT3 in most cases.

  • Can I use NFS? -- No, you can't. In many cases, NFS doesn't support the appropriate type of locking required by our underlying database. There is some question as to whether or not Solaris NFS does offer the appropriate locking, but the bottom line is that it is not supported and if you care about the integrity of your data, then you should stay away from it. The fact that we don't support NFS or other network filesystems for use with the Directory Server is documented here in the Directory Server deployment guide. Technically, there shouldn't be any problems backing up to or restoring from NFS, but generally your best bet is to avoid it.

  • What about using the forcedirectio mount option? -- The forcedirectio mount option basically turns off the filesystem cache for any volume on which it is enabled. If you've got caching configured such that everything fits into both the entry cache and the DB cache, then there shoudn't be any actual disk reads for the main database. However, this doesn't apply to the transaction logs or the referential integrity log file. Further, you'll find that performance is degraded after the server is restarted and before the caches have been primed. As a result, it's almost always a good idea to not use the forcedirectio option. If you are benchmarking and want to start with empty filesystem caches, then you can simply unmount and remount all the filesystems, which will invalidate anything from those volumes that may have been in the filesystem cache.


Posted by cn_equals_directory_manager ( Dec 18 2005, 03:03:44 PM CST ) Permalink Comments [5]

20051215 Thursday December 15, 2005

Sun T2000 vs Dell 6850: LDAP AuthRate

As I mentioned in my last post, at the Sun Fire™ T2000 launch last week, we had our own demo at the Austin campus to help show it off. We pitted the T2000 against the Dell PowerEdge™ 6850 server, which we believe to be the best system that Dell has to offer. Here's a side-by-side comparison of the system specs:

  Dell PowerEdge™ 6850 Sun Fire™ T2000
CPU Type Intel® Xeon® EM64T UltraSPARC® T1
with CoolThreads™ technology
Total CPU Sockets 4 1
Total CPU Cores 8 8
Total Hardware Threads 8 32
CPU Clock Rate (GHz) 3.2 1.0
DDR2 Memory Available (GB) 32 32
System Height (Rack Units) 4 2
Approx. Idle Power Draw (Watts) 320 220
Approx. Loaded Power Draw (Watts) 600 267
Operating System (Pre-Installed) Microsoft® Windows® Server 2003 Standard
x64 Edition
Sun Solaris™ 10 3/05 HW 2
List Price (US Dollars) $33,652 $25,995


A few notes on the information in this table:
  • The Sun Fire™ T2000 server usually ships with a 1.2 GHz UltraSPARC T1 processor. However, as this was a pre-release system we only had a 1.0 GHz processor.
  • The Dell PowerEdge™ 6850 was actually observed to draw in excess of 640 Watts under load. However, for the purpose of the SWaP calculation, a value of 600 Watts was used.
  • The Dell PowerEdge™ 6850 was actually observed to draw in excess of 640 Watts under load. However, for the purpose of the SWaP calculation, a value of 600 Watts was used. It should also be noted that this system requires 200-240VAC power.
  • The Sun Fire™ T2000 was actually observed to draw around 240 Watts under load. However, the system spec sheet lists a maximum consumption of 267 Watts, and therefore that value was used for our SWaP calculations.
  • The provided list price for the Sun Fire™ T2000 server is for a system with a 1.2 GHz CPU whereas the system we were testing had only a 1.0 GHz processor.

In order to compare the performance of these two systems, we measured LDAP authentication performance running the Sun Java™ System Directory Server 5.2 patch 4. The workload for this test was very similar to that used by the SLAMD LDAP Weighted AuthRate job, but because this was a public demo, we used a custom client to show the relative performance of these systems in real time (I may be able to post a Shockwave Flash recording of this demo later this week). In this workload, each authentication consists of a subtree equality LDAP search operation on an indexed attribute (in order to locate the user entry based on a login ID) followed by a bind as that user. The login ID value for each authentication was selected at random from the entire data set, with a weighted access pattern such that 80% of the search operations were targeted at a set of 20% of the user entries, which reflects many measured real-world access patterns. Two Sun Fire™ V20z servers were used to generate the load against the Directory Server instances (1 V20z for each server).

The Directory Server instances were installed and optimally tuned for each system. A total of 250,000 user entries (generated using the MakeLDIF tool provided with SLAMD using the default example.template template file) were loaded into each directory. We would have used a much larger number of entries, but this is near the maximum cacheable amount for our Directory Server on a Windows system, and exceeding that would have given the T2000 system a large unfair advantage. In fact, tests with a data set of 1 million users showed that the server running on the T2000 system was able to achieve even higher authentication performance than with a set of 250,000 users, whereas the server on the Dell system exhibited severely degraded performance compared with that measured with a set of 250,000 users.

Each server was asked to process a total of 250,000 user authentications as quickly as possible. Both total length of time required to process these operations and the average number of authentications per second were measured. The average number of authentications per second was used as the "performance" component of the SWaP (space, Watts, and performance) metric. The SWaP value for each system was calculated by dividing the average number of authentications per second by the product of the space consumed (in rack units) and the power consumption under load (in Watts).

The maximum LDAP authentication performance that we were able to achieve from each system is as follows:

  Dell PowerEdge™ 6850 Sun Fire™ T2000
Performance (LDAP auths/second) 2853.34 8067.58
Total Processing Time (seconds) 87.65 31.00
SWaP 1.19 15.11


As can be seen from this information, when it comes to LDAP authentication performance the Sun Fire™ T2000 server beats the Dell PowerEdge™ 6850 server in all areas that we compared:
  • The Sun Fire™ T2000 server delivers over 2.8 times better LDAP authentication performance than the Dell PowerEdge™ 6850, even when you consider that the Sun system only had a 1.0 GHz processor whereas the T2000 normally comes with a 1.2 GHz processor.
  • The list price for the Sun Fire™ T2000 server is $7657 less than the list price for the Dell PowerEdge™ 6850 server, even when you consider that the price for the Sun system includes a faster CPU than the one that we were able to test.
  • The Dell PowerEdge™ 6850 requires twice as much rack space as the Sun Fire™ T2000 server.
  • The Dell PowerEdge™ 6850 consumes nearly 1.5 times as much power (measured in Watts) than the Sun Fire™ T2000 server when both systems are idle.
  • The Dell PowerEdge™ 6850 consumes 2.25 times as much power (measured in Watts) under load as the Sun Fire™ T2000 server, even with an optimistic estimate for the Dell system and a pessimistic estimate for the Sun system.
  • The Dell PowerEdge™ 6850 system requires 200-240VAC power, which is less commonly-available than 110V power, particularly in environments that typically run on x86/x64 systems. The Sun Fire™ T2000 server accepts 100-240VAC power and therefore can use standard 110V circuits.

When you look at any one of these metrics, the Sun Fire™ T2000 certainly looks attractive. However, the real value of this system is even more apparent if you compare what you would need in order to meet a given level of performance. For example, if your directory environment needs to be able to handle 10,000 authentications per second under peak load, then a solution with two Sun Fire™ T2000 systems would be over $82000 cheaper to buy up front than the four Dell PowerEdge™ 6850 systems that would be required, and you'd also save more than 1.8 kiloWatts of power and 12 rack units of space with the Sun solution.

Posted by cn_equals_directory_manager ( Dec 15 2005, 11:46:33 PM CST ) Permalink Comments [7]

20051214 Wednesday December 14, 2005

I need to follow my own advice

Last week, when the Sun Fire T2000 system was officially launched in New York and London, we had our own mini-event in Austin, where there are a lot of chip designers working on Niagara technology. As part of that, we demonstrated the Sun Fire T2000 running side-by-side with a Dell PowerEdge 6850, which we believe to be about the best system that Dell has to offer. It was fully loaded with four 64-bit Xeon processors at 3.2GHz and 32GB of memory (not to mention a list price over $7000 higher than a fully-loaded 8-core T2000, also with 32GB of memory). Since I put the demo together, we were demonstrating our Directory Server using LDAP authentication performance as our metric. Although we do have a Shockwave Flash version of that demo, I haven't yet gotten permission to post it here, so I'll play it safe and avoid posting exact details from those tests. But I will say that the Directory Server on the T2000 was able to achieve quite more than two and a half times the performance of the server on the Dell system. And since we also had the systems running on power meters, we were also able to show that the Dell PowerEdge 6850 was drawing about two and a half times the power of the Sun Fire T2000. Plus, with a height of 2 rack units, the T2000 only uses half the space of the 4U 6850. If you put all those together, you've got a pretty wide SWaP (space, Watts, and performance) gap between them.

Earlier today, I was asked to update the test so that it would also include results from the Sun Fire V40z. The version we have in our lab was the top of the line at the time we got it with four Opteron 850 CPUs, but since then the dual-core versions have come out but we haven't gotten our hands on them yet, much less the new Galaxy systems. Nevertheless, they're still pretty impressive systems so we wanted to include them in the mix to see how they fared against the top-of-the-line system from that other company.

I installed the server, gave it our usual tuning, imported a test data set, and started it up. After priming the caches, I threw it into the mix but was disappointed to see it delivering quite a bit less than half the performance of the Dell system. Sure, on the surface it might seem reasonable given that the Dell system had eight cores rather than four, and its Xeon CPUs were running at 3.2 GHz rather than the 2.4 GHz Opterons, but it just didn't feel right.

And then it hit me. In my haste I forgot to update the start-slapd script so that the server would use the libumem memory manager and the fixed-priority scheduling class. It didn't take that long for me to realize it, but given that I wrote about it just last week it is a little embarrassing. Nevertheless, I quickly corrected the problem and was much happier with the results. It was now delivering more than double the performance of the Dell PowerEdge 6850, and it was actually a six-fold improvement over what it had been just moments before.

So now our demo shows two Sun systems that are faster, cheaper, use less power, and take up less space than the best that you can get from Dell. The T2000 does beat out the V40z in all of those categories, but we weren't testing the top-of-the-line V40z and perhaps the dual-core version could give the T2000 a run for its money. But come to think of it, our T2000 only had a 1GHz processor (whereas the version shipping to customers is running at 1.2GHz). At any rate, I guess that's a pretty good problem to have.

Posted by cn_equals_directory_manager ( Dec 14 2005, 01:21:35 AM CST ) Permalink Comments [1]

20051213 Tuesday December 13, 2005

Real-time Directory Server performance monitoring

As I've mentioned before, I spend a lot of my time measuring and analyzing the performance of the Directory Server. To help with that, I've developed tools like SLAMD that you can use to stress the Directory Server (or other applications) and the hardware it's running on to the limits and show you what kind of performance you can get out of it. That's really useful in a lot of ways because it can help us find hot spots in the code, and it can help customers understand the limits of what they can achieve.

Of course, unless your production Directory Server instances are all under peak load 100% of the time, benchmarking won't give you a very good idea of the kind of demand the server is under at any given time. They can tell you the best that you might be able to achieve, but they can't tell you what you're getting right now. There are a few ways that you can get that kind of information, like analyzing the server access logs, but that's not really well-suited for real-time analysis.

To be honest, we don't really have anything inside the Directory Server (at least, not in current versions) that can provide you with information about the kind of load that the server is under in real time. But as it turns out, with Solaris 10 you don't need to wait for us to add something like that to the server because you can get it for yourself using the new Dynamic Tracing (DTrace) framework. You just need to know where to look. Or at least, you would need to know if I hadn't already done it for you. Using this simple DTrace script you can see what the Directory Server is doing in real time and with minimal impact on performance.

The dsstat.d script is a simple DTrace script that operates in the style of vmstat or iostat to show you what's going on inside the server. It prints a line of output once every second showing the total number of binds, searches, compares, adds, deletes, modifies, modify DNs, and unbinds that occurred over that one-second period (customizing it to use an interval other than 1 second is left as an exercise to the reader). It accomplishes this by using the PID provider to increment a counter every time the server enters a function used to process one of these kinds of operations, and then using the profile provider to print out a summary of this information once every second.

To use the script, simply use chmod to make it executable, and then run it using the PID of the Directory Server process as the only argument. For example, if the PID of your Directory Server process is "1234", then the command you would use is:
./dsstat.d 1234

The output of this tool looks like the following:
# ./dsstat.d 2906
TOTAL   BIND    SEARCH  COMPARE ADD     DELETE  MODIFY  MODDN   UNBIND
92      0       69      0       4       1       18      0       0
454     0       369     0       11      10      65      0       0
441     0       352     0       22      6       60      0       0
485     0       405     0       18      10      52      0       0
391     0       317     0       18      6       50      0       0
529     0       433     0       17      9       68      0       0
528     0       432     0       17      15      60      0       0

I should point out that these numbers were obtained on my laptop under intentionally light levels of load for demonstration purposes only and are not representative of the actual performance you can achieve from the server. I should also point out that this script is provided "as-is" and without any warranty of any kind, so use it at your own risk. It should be entirely safe, but if for some reason it should have some undesirable side effect like horribly degrading performance or causing the server system to catch fire, then I'm not responsible.

I will say that under certain circumstances when I've been running this against large directory server processes (e.g., those consuming tens or hundreds of gigabytes of memory), DTrace can take a significant amount of time to start up, as I believe it has to walk the memory map associated with the Directory Server process and during this time the execution of the server is essentially suspended. So if you are running your server with big caches and are consuming lots of memory, you'll definitely want to see what kind of impact it has in a non-production environment before unleashing it on a live system. Since we've got more than a few customers that run their server under those conditions, it's something you'll want to watch out for.

Posted by cn_equals_directory_manager ( Dec 13 2005, 12:38:03 AM CST ) Permalink Comments [4]

20051211 Sunday December 11, 2005

Forget your roots

It's a very common security practice to run server processes as a non-root user with limited capabilities. Unfortunately, you often need certain capabilities that are normally available only to the root user, including the ability to bind to privileged ports and the ability to increase the number of available file descriptors that may be used by a single process.

The way that the Directory Server and other daemons commonly deal with this is to require that they are started by the root user so they can create the listen sockets and do any other root-required processing, but then they call setuid to drop to an unprivileged user. This is effective, and certainly much safer than running as root, but there are a couple of problems with it:
  • It requires that the user have root access in order to be able to start the server. This is possible when a system boots through startup scripts (or an SMF profile in Solaris 10), but if you need to start the server at a time other than system boot then you need to be root. There is a way to work around this problem using RBAC, but I'll describe a better alternative below (and it still doesn't solve the next issue).
  • If you want to do something in the server process that would require root access after the startup has completed and setuid is called (e.g., start listening on another privileged port), then that operation would fail.

The ideal solution to this problem would be to have some way of granting a normal user the ability to do things that are normally only allowed for the root user. Solaris 10 provides exactly this capability through process rights management, also called least privilege. Using this mechanism, you can grant a normal user the ability to do things like bind to privileged ports or increase the number of available file descriptors. In fact, you can even take away the ability to do things that a normal user is allowed to do, like see what processes others are running on the system. In the Solaris 10 GA release, there are 48 privileges that have been identified and can be granted to or removed from a user. For a description of all of these privileges, issue the command:
ppriv -lv

To see what privileges you currently have, issue the command:
ppriv $$

By default, when this command is run as a normal user, you will see that you have "basic" privilege set, which expands to the combination of file_link_any, proc_exec, proc_fork, proc_info, and proc_session (these are the capabilities that have historically been granted to unprivileged users -- you can get this list using "ppriv -l basic").

For the purposes of Directory Server, the most interesting privileges include:
  • net_privaddr -- Controls the ability to bind to privileged ports (i.e., port 1024 and below).
  • sys_resource -- Controls the ability to modify resource limits, including the number of available file descriptors.
  • dtrace_proc -- Controls the ability to use DTrace against processes that the user can control.
  • dtrace_user -- Controls the ability to use DTrace for things outside the kernel (i.e., "user space" code).
  • proc_info -- Controls the ability to see any processes on the system other than those that the user can control.
  • file_link_any -- Controls the ability to create hard links to files owed by anyone other than the current user.

When I'm configuring an account for use by the Directory Server, I will usually take away the proc_info and file_link_any privileges and grant the other four. You can do this with a regular user account, but as I mentioned yesterday, it's better to use a role. When you're creating the role, you can do this by adding the "-K" option and specifying a value for the defaultpriv property. For example:
roleadd -d /export/home/dirsrv -m -s /usr/bin/bash -K defaultpriv=basic,net_privaddr,sys_resource,dtrace_proc,dtrace_user,-proc_info,-file_link_any dirsrv

If the role already exists, then you can use rolemod with the same -K argument string to accomplish the same effect. And of course if you want to use a normal user account rather than a role, then you can do it with useradd and usermod rather than roleadd and rolemod.

Once this has been done, you should be able to start the server while logged into the system as that role or user, and you will see that the server can start and operate without requiring root access. Further, if the server attempts to do anything that might require those privileges once it's running, then it will be granted as well. With current versions of the server, this shouldn't be needed, but we do have improvements in the works that could benefit from this.

One additional point that I should make in this discussion is that process rights management has been integrated with the Service Management Facility (SMF), another new capability in Solaris 10 that is intended for use in starting or stopping the services configured for use on a system. If you do put the Directory Server under SMF control (which is possible, but outside the scope of this discussion), then you can grant or revoke privileges in the SMF manifest itself so that they would be available for use by the Directory Server but not by any other processes started by that user or role. Perhaps that's a good topic for a future post.

Posted by cn_equals_directory_manager ( Dec 11 2005, 02:35:59 PM CST ) Permalink

20051210 Saturday December 10, 2005

The role Directory Server was meant to play

As a Directory Server administrator, you've got a bit of a quandry at install time. For security reasons, you want the server to run using a restricted account that can't directly log into the system and has limited capabilities. But also for security reasons, you probably don't want to need root access to look at the log files, update the configuration, or take backups. Unfortunately, the Directory Server creates these files owned by the user specified in the nsslapd-localuser attribute with 0600 access permissions. And what's worse, changing the permissions or adding filesystem ACLs or setting your umask won't help because the files are explicitly created with those permissions rather than inheriting them, and all of them have files that will be rotated (the DB has transaction logs that are periodically created and removed, and whenever a config change is made, a new file is written and renamed rather than overwriting the existing dse.ldif) so the existing permissions won't be retained.

Fortunately, you can make nearly all configuration changes over LDAP (and the few remaining gaps should be filled in with the DS 6 release), so you don't even have to be logged into the system to do that. Similarly, the db2bak.pl and db2ldif.pl scripts can cause the server to initiate backups and LDIF exports over protocol as well, or you can use the nsslapd-mode configuration attribute to make them group-readable if you wish. But what about the log files? If you're willing to wait for Directory Server 6, then you will be happy to see that we've added a configuration option that will allow you to define the permissions used for the log files. But what can you do until then?

The best way of addressing these and other issues, at least on a Solaris system, is to make the Directory Server account a role rather than a standard user account. A role is a special type of account that works like a normal user in many ways, but with two notable differences:
  • A role cannot be used to directly log into the system. Rather, you must first log in as a normal user and then use su to assume the role (after providing the appropriate password). Not only does this provide a first line of defense, it can also help provide an audit trail because you can see exactly who is assuming that role and what they are doing.
  • Only authorized users are allowed to assume a role. If you're not a member of a given role, then you won't be allowed to assume that role even if you know the passowrd for it.


This addresses many of the security concerns of having the server run as an account that can actually be used because it can't directly log in and because the set of users that can assume it can be tightly controlled, and you can configure that role with tight restrictions. And because it is safe to assume the role and be able to use the shell as that account, you can do things like look at the log files or touch the configuration as necessary. There's no need for convoluted cron jobs that periodically change file permissions, nor do you need to beg the system administrators for root access if something out of the ordinary comes up. In fact, "role" is the "R" in "RBAC", a mechanism that can be used to grant roles certain capabilities that might normally need to be run as root (e.g., starting the server so that it can listen on privileged network ports). I won't get into the details of that in this post (although my next one will provide information on using least privilege in Solaris 10 to accomplish this and other things).

So how do you go about creating a role? If you're just getting ready to install the server and don't yet have an account on the system for it to use, then you can create a new role using the roleadd command. This command is virtually identical to the useradd command, with the exception that it creates a role instead of a normal user. For example, the command:
roleadd -d /export/home/dirsrv -m -s /usr/bin/bash dirsrv
will create a new role called "dirsrv" and create a home directory for it and configure it to use the bash shell by default.

On the other hand, if you've already got a normal user account that you're currently using to run the Directory Server, then you can convert it to a role with a single usermod command. For example:
usermod -K type=role dirsrv
will convert the "dirsrv" user account to a role.

Once you have a role, you need to assign it a password, which you can do with the passwd command just like any other user account (of course, if you converted the role from an existing user account and that user already had a password, then it will be retained). Then, you can add the users that need to be able to administer the server to that role, also via the usermod command. For example, the following will give the user "john" the ability to assume the "dirsrv" role:
usermod -R dirsrv john
After this, whenever the user "john" is logged into the system and needs to perform some administrative task with the Directory Server, he can just issue the command "su - dirsrv", type in the password for that role, and have the access he needs to manage the server.

Posted by cn_equals_directory_manager ( Dec 10 2005, 04:42:59 PM CST ) Permalink

20051209 Friday December 09, 2005

Decoding LDAP communication

If you work with directory servers or directory-enabled applications long enough, chances are you'll run into cases in which you need to see exactly what the client is sending to the server, or vice versa. You could look in the server's access and/or audit logs, and they will often provide you with all the information that you need. However, they don't include all details of the requests and responses, and there are times that you need more in-depth information than the logs can provide.

The next place that you can turn is to use a network sniffer. While this may be able to meet your needs, it still won't work in all cases. For example, if the communication is encrypted using SSL, or if the client and server are on the same system and the underlying OS won't let you snoop over the loopback interface, then you won't be able to get anything useful. And even if you do capture the clear-text information, how will you be able to interpret it? Some tools (like Solaris snoop and Ethereal) can even try to decode some of the communication but they are often lacking in some areas, particularly when it comes to things like controls and extended operations.

Fortunately, even if the logs aren't helpful and it isn't feasible to use a network sniffer (or the sniffer doesn't provide the level of detail you would like), then there is another option: the LDAPDecoder tool, which is provided with the SLAMD Distributed Load Generation Engine, in the tools/LDAPDecoder directory. This is a very useful tool that can help give you a better understanding of exactly what is going on between the client and the server, and as a bonus it can even help you with your performance testing. More on that later.

The LDAPDecoder has two primary modes of operation. The first is proxy mode, in which you configure it to operate as a very simple LDAP proxy. You point it at the Directory Server you want to target, and then you point your client(s) at the LDAPDecoder. Whenever the LDAPDecoder receives a request from the client, it will print it to standard output or write it to a file and then forward it on to the Directory Server. When it receives the response message(s) from the server, it will write them out before sending them on to the client. It has the intelligence to be able to decode all types of LDAP operations, and even has special support for several kinds of controls so it can decode them as well. In this mode, it offers support for several things you can't get with a sniffer, including:

  • It can handle SSL-based communication (using server authentication only -- client authentication is not available).
  • It can handle cases where the client and server are on the same machine and the underlying OS does not allow packet captures over the loopback interface.
  • It does not require root or otherwise privileged access to the machine if you configure the proxy to listen on a port greater than 1024. Packet captures virtually always require root access.
  • It is easier to handle cases in which the payload of the network packets does not contain exactly one complete LDAP message. If an LDAP message is split into multiple packets, or if multiple complete or partial messages are placed into the same packet, then it can be more difficult to interpret them.

In order to start the LDAPDecoder in proxy mode, the basic usage is:
java -jar LDAPDecoder.jar -h {serverAddress} -p {serverPort} -L {listenPort}

This will start listening for requests on the specified listen port, and will decode any communication that it receives before forwarding to or from the directory server. There are a number of other options that can be used as well -- see the documentation either online or in the tools/LDAPDecoder directory under the SLAMD server installation, or you can simply invoke the tool as above but adding the "-H" argument for usage information.

The second mode of operation is called offline mode. In this case, the LDAPDecoder isn't directly involved with the communication between the client and the server, but rather is provided with a binary capture file in the format used by either snoop or tcpdump. It will parse the capture and attempt to identify and decode any LDAP communication contained in it and either display it to the terminal or send it to a file. Running in offline mode, it can't interpret SSL-encrypted communication and it can't deal with packets that are fragmented or contain multiple LDAP messages or message fragments, but otherwise functionality should be identical.

To start the LDAPDecoder in offline mode, the basic usage is:
java -jar LDAPDecoder.jar -p {serverPort} -i {captureFile}

This will read the capture file and pick out any packets with a source or destination port equal to the specified server port and attempt to decode them as LDAP. As with proxy mode, other options exist, so check the documentation for details.

So how can the LDAPDecoder help you with your performance testing? There are a couple of ways. The first is that it can simply let you see exactly what's going on between the client and the server. Once you know what the client is requesting and how the server is responding, you may be able to see ways that you can better tune the server to handle those requests. Another possibility is that you can take the information that you learned about what the client is doing and use it to write a SLAMD job or script to simulate that workload so that you can use it to stress the directory server.

It is this last use case where another very useful LDAPDecoder feature comes into play: it can automatically write a SLAMD job script for you based on the LDAP communication that it captures. If you add the "-F {scriptFile}" argument when running in either proxy or offline mode, then the LDAPDecoder will write information about any requests that it receives from the client into a SLAMD job script that you can use to help reproduce the same load against the server. Note that you will probably want to edit the resulting script to make it more generic (e.g., to target a range of users rather than always using the exact details that were captured), and also that the SLAMD scripting language doesn't currently include support for more advanced components of the language like controls or SASL binds or extended operations. However, it is still capable of handling many of the most common cases to make your job a little easier.

Posted by cn_equals_directory_manager ( Dec 09 2005, 08:42:57 AM CST ) Permalink

20051208 Thursday December 08, 2005

Little-known performance enhancements #2: Solaris tuning

This set of tips is kind of a cop-out since I'm not really talking about tuning the Directory Server itself all that much. However, if you're running the Directory Server on Solaris (and you should be), then there are a couple of simple things that you might not know about that can provide notable performance improvements. And unlike my last post, which was around write performance, this one should provide an all-around benefit. However these improvements will be most noticeable for CPU-intensive operations like searches, binds, and compares.

The first thing that you can do is to make the server run under the fixed-priority scheduler rather than the default time-sharing scheduler. In my experience, this generally gives you about a 5% increase in maximum search performance. And it's really simple. Just add the following near the top of your start-slapd script (and I also usually put in ldif2db as well, since it can help out there too):
# Use the fixed-priority scheduler.
priocntl -s -c FX $$

In this case, the "$$" is a special shell variable that expands to the PID of the current process (i.e., the start-slapd script itself), and the scheduler for that process will be inherited by anything that it spawns, including the Directory Server. Note that while you can provide additional configuration like setting a specific priority, you often need special privileges for doing that and even scheduling with the maximum priority doesn't seem to make much difference versus the default.

The second tip applies primarily if you're running the server on Solaris x86 (which is an excellent choice for our Opteron-based systems). By default, the Directory Server will use the mtmalloc memory manager, which was intended to help provide better memory allocation performance than the standard single-threaded malloc. Unfortunately, it doesn't quite live up to its intended purpose. However, Solaris 10 introduces the new libumem memory manager, and it has been backported to most versions of Solaris 9 (I believe update 3 or later). It is far better than mtmalloc in virtually every respect, and it will generally allow the server to go faster and consume less memory. To configure the server to use libumem rather than mtmalloc, edit the start-slapd script to add the following:
# Use the libumem memory manager.
LD_PRELOAD=/usr/lib/libumem.so
LD_PRELOAD_64=/usr/lib/64/libumem.so
export LD_PRELOAD LD_PRELOAD_64

Note that you can do this with Solaris on SPARC-based systems as well, but in general this is not necessary because on that platform we use a third-party allocator that accomplishes the same thing.

For those of you that may still be running Solaris 8, there's another potential boost that you can take advantage of (until, of course, you decide to upgrade to Solaris 10 for even better performance and tons of great new features like least privilege and zones). You can configure the server to use the alternate thread library, which became the default in Solaris 9. To do this, add the following to your start-slapd:
# Use the alternate thread library.
LD_LIBRARY_PATH=/usr/lib/lwp
LD_LIBRARY_PATH_64=/usr/lib/lwp/64
export LD_LIBRARY_PATH LD_LIBRARY_PATH_64

Note that in this case, though, you'll also need to remove another line already in the start-slapd script that unsets the LD_LIBRARY_PATH so they don't get undone before they can be of any use.

Posted by cn_equals_directory_manager ( Dec 08 2005, 12:08:57 AM CST ) Permalink Comments [2]

20051207 Wednesday December 07, 2005

Little-known performance enhancements #1: Transaction batching

This is the first of what I hope to make a series of posts about things you can do to help improve Directory Server performance, particularly with regard to settings that you may not be aware of. Some of these may touch on much more prominent configuration options where there is a lot of room for discussion (e.g., cache sizing), but I'll probably save those for when I have a good bit of time to cover them in appropriate detail.

The first topic I want to touch on is that of transaction batching. Historically, directory servers have been used much more heavily for read operations than for writes, with searches, binds, and compares vastly outnumbering adds, modifies, and deletes. In those cases, write performance isn't necessarily all that important. However, that is changing for a few reasons:

  • Directories are becoming more heavily used. As they need to handle higher and higher numbers of requests, even a relatively low percentage of write operatiosn can start to add up.
  • Directory data is becoming more dynamic. The percentage of write operations is increasing, and some popular directory-enabled apps include a write in most sequences of operations.
  • The amount of data synchronization is increasing. Products like Identity Synchronization for Windows, Identity Manager, and home-grown applications can merge data from multiple repositories, so a change in one is reflected in the other.
  • Although it happens less frequently, if large companies or organizations merge together, they've often got a flurry of activity combining their data into a single repository.

All of this adds up to the need for higher write performance rates than may have been required in the past. Even though the out-of-the-box tuning may be sufficient for many environments, when it isn't then there may be a few things that can be done to get some pretty significant boosts. One of the less well-known configuration options is the ability to use transaction batching, however it can have a very significant impact on performance. In our performance testing, enabling transaction batching can often double your write rate (of course, individual mileage may vary).

Transaction batching is basically a means of grouping multiple write operations together. In its default configuration, the Directory Server uses full durability, which means that by the time the response is sent to the client the server has verified that the data is committed to disk. It does this using the fsync(3C) call, which has historically been pretty heavyweight. Even though recent versions of Solaris have made improvements, it can still be an expensive operation, so reducing its frequency can dramatically improve performance.

One way to accomplish this is to disable transaction durability in the server by setting the nsslapd-db-durable-transactions attribute to "off". However, this is somewhat risky because if the worst should happen (e.g., an application or system crash) before the record of the write operation(s) get committed, there is basically no limit to the amount of change information that could be lost. The integrity of the database itself will remain intact, but recent changes (possibly any updates since the last checkpoint) may be lost. This is not a very desirable behavior, and even though it can help performance a lot it is generally not seen as a viable option for use by customers.

Transaction batching serves as a middle ground between full durability and no durability. In particular, it gives you control over the exposure that you might have in the event of a failure. To enable transaction batching, provide a nonzero value for the nsslapd-db-transaction-batch-val configuration attribute, where the value of that attribute controls the number of changes that may be grouped together under a single fsync. We typically use a transaction batch value of 5, which provides a significant performance improvement at a relatively low risk. In this case, you're at risk of losing information from at most four changes if an unexpected outage should occur. However, this risk can be further limited by other factors. The decision about when to commit changes to disk is based primarily on the following factors:

  • The configured transaction batch value. If you've configured a batch value of N, then the server will call fsync whenever the Nth oustanding change is performed, meaning that at most N-1 changes may be at risk.
  • An internal timer that will commit any outstanding changes if it has been 1000 milliseconds since the last fsync.
  • The transaction log buffer size. This buffer is used to hold the set of outstanding changes before they are committed. If this buffer gets full, then the changes will be immediately flushed to disk.

Most of the customers that I've talked to about transaction batching are willing to accept this relatively small risk of change loss because of the large performance benefits that it can provide. However, if this risk is not acceptable in your environment, then you may still be able to benefit from this capability by enabling transaction batching only when you need a little extra boost it and keeping it off the rest of the time to minimize exposure. To do that, change the transaction batch value from its default of zero to one. These values have the same semantic behavior (all changes will be immediately committed to disk before returning the result to the client), but the latter causes the server to use a different code path where it is possible to apply these changes on the fly without the need to restart, whereas a batch value of zero can only be changed by shutting down and restarting the Directory Server. Then, if you know that you have a lot of write operations that need to be performed and want the extra performance, you can use a tool like ldapmodify to make the configuration change over LDAP to temporarily enable batching, and then change it back to one when it is no longer needed.

Another quick note that I should make in this case is about the transaction log buffer size. As I mentioned above, the server will automatically commit any outstanding changes if the transction log buffer gets full, so you will generally want to make this big enough to hold all the outstanding changes to prevent flushing too frequently. The transaction log buffer size is controlled by the nsslapd-db-logbuf-size attribute, and its value is specified in bytes. Earlier versions of the server used a default of 32KB, but as of 5.2 patch 3 and later, it now uses 512KB. This should be OK in most cases, but if you have particularly large entries then you may want to increase it even further. You can make the log buffer size up to 25% of the transaction log file size (which is 10MB by default, so that means a maximum transaction log buffer size around 2.5MB).

Posted by cn_equals_directory_manager ( Dec 07 2005, 12:38:24 AM CST ) Permalink

20051206 Tuesday December 06, 2005

I've put it off long enough

Hi. I'm Neil Wilson and I work with Sun's LDAP Directory Server. My primary responsibility is writing code, but I spend a lot of time doing performance analysis and benchmarking, and also working with customers to help them with deployment advice or performance issues or whatever problems they might have. Despite the "cn=Directory Manager" moniker, I'm not in management, nor am I particularly interested in doing a lot of managerial things.

I've avoided registering for blogs.sun.com for quite a while now, but I've finally been guilted into it by the deluge of high-quality posts that have been showing up recently, particularly those around the recent release of ZFS and all the (entirely justified) buzz on the UltraSPARC T1 (aka Niagara) and the Sun Fire T1000 and T2000 systems that use it. Both of those technologies, by the way, are incredible complements to the Directory Server, and maybe I'll talk about one or both of them sometime soon. But for now, I'll just try to match the enthusiastic and informative nature of those posts.

If you've heard of me before (which isn't all that likely), then it may be in conjunction with my work on the SLAMD Distributed Load Generation Engine. SLAMD is a tool that we started developing a few years ago to help us benchmark Directory Server, although it is very flexible and also works quite well for all kinds of other applications like mail servers, databases, and Web applications. It is distributed in nature and allows you to harness the power of multiple client systems to drive load against your servers, and it can also help you measure system resources like CPU utilization, disk I/O and network load while your tests are running. We use it heavily within Sun for testing Directory Server, as well as a number of other products. It's been Open Source (under the Sun Public License) for a little over a year now and there are some pretty significant improvements in the works for the hopefully not-too-distant 2.0 release.

There's a lot of really exciting stuff coming up in the area of directory services in our upcoming versions, and I may get into that more as the releases draw near. But for at least the short term I'll probably focus on what you can do to help you get the most performance and scalability out of the server. I should of course provide the disclaimer that you should take any advice with a grain of salt because there are always special cases and exceptions to the rule. For best results, test changes thoroughly in a non-production environment before applying them on mission-critical servers.

Posted by cn_equals_directory_manager ( Dec 06 2005, 10:33:06 PM CST ) Permalink Comments [1]


Archives
Language
Links
Referrers