Monday January 23, 2006 | cn=Directory Manager All about Directory Server |
Making the Directory Server Dump CoreIn general, the Directory Server has shown to be quite reliable. We've had customers report the server running in a production environment for over a year between crashes or restarts. While it's generally a good idea to apply upgrades and patches as they come out, hopefully there's no other reason to have to interrupt its operation. But from time to time a customer will run into a bug that could cause the server to crash. Nobody wants this to happen, but even worse than having the server crash due to a previously-undiscovered bug would be for it to crash multiple times before the cause can be identified and resolved.On UNIX systems, if a process crashes then it can leave behind a core file. A core file is essentially a snapshot of all the memory associated with the process at the time of its demise. Using tools like pstack, dbx, or mdb, engineers can look into the core file and use the information it provides to determine the cause of the crash. Of course, this will only be helpful if a core file is actually generated, and there are a few reasons that this may not happen. It's in everyone's best interests to ensure that those reasons are addressed and resolved so that if a crash does occur it will provide enough information to give us a chance to figure out what went wrong. The first thing that might prevent the server from dumping core would be if the process can't actually write the file. This can happen if either the process doesn't have write permission to the target directory, or if there isn't enough disk space available. The first of these used to be a problem with early versions of the 4.x server (it would try to place the core file in the filesystem's root directory), but all recent versions will attempt to write it into the same directory as the error log file. To address the second concern, you should make sure that there is enough disk space available to hold the core file. The core file size will be basically equal to the resident size of the Directory Server process, which you can find with the command:
On Solaris systems, if you don't have enough space there and don't want to move your error log to another location, or if you just want the core file written somewhere else, then you can change the location using the coreadm command, although that command will be discussed in more detail later in this post. Another reason that a core file might not be written would be if a resource limit has been set that could constrain the core file size. This is particularly common on many Linux systems, which for some reason seem to consider core files undesirable. I suppose that this might be the case if you're just resolved that something will crash from time to time and you'll just restart without any investigation, but it's a bad thing if you actually care about figuring out why the crash occurred. To determine whether your system will prevent core files from being generated, issue the command
If the output of this command is anything other than "unlimited", then that value will be the maximum allowed size of a core file in 512-byte blocks. You can fix that by updating the start-slapd script so that the following apears near the beginning:
Another very common reason that the Directory Server may not be allowed to dump core if it crashes is that it often runs as a setuid process. If you want the server to listen on a privileged port (e.g., 389 and/or 636) and you don't have the ability to use process rights management, then you will need to start it as the root user. For security reasons, if you do this then the server will later call setuid to drop root privileges and run as a different user. However, if this happens then that application will not be allowed to dump core because the core file would be written as the unprivileged user but may still have sensitive information from memory that was obtained while it was running as root (this isn't true for the Directory Server but may be true for other applications that operate in this manner). Solaris provides a way of circumventing this problem through the use of the coreadm(1M) command. This command provides all kinds of useful capabilities for dealing with the way that core files are handled, and one of those options can be used to specifically allow setuid/seteuid/segid processes to dump core. For security reasons, the core files will only be readable by the root user, but this is a minor inconvenience when compared with not being able to get a core file at all. To enable set*id processes to dump core, issue the following commands:
This is a global setting and only needs to be done once per system (the " coreadm -u" command updates a configuration file so that the setting will be applied automatically when the system boots).
As I mentioned above, the coreadm command has all kinds of features for tweaking core file generation, and I won't get into all of them here, but you can read the man page for all the details. However, I will address one more important option that you may want to consider using. By default, if the Directory Server crashes and dumps core, then it will create a file named "core" in the same directory as the error log file. If there's already a previous core file there, then it will be overwritten and any useful information it may have contained will be lost (this assumes that the existing file had permissions such that it could be overwritten; if not, then the old core file would be kept and the details of the new crash would be lost). This is generally considered a bad thing, so you probably want to avoid it. Further, depending on how much disk space you have available on that volume, you may not want to allow writing a core file in the logs directory but would rather have it appear somewhere else. Both of these can be accomplished on a per-process level using the "-p" option to coreadm. For example, when I'm setting up the Directory Server on Solaris, I always add the following near the beginning of the start-slapd script:
This will cause core files generated by the Directory Server (actually, the " $$" is the PID of the current process which is the start-slapd script itself, but it will be inherited by child processes) to contain both the executable name and the process ID. So if the Directory Server was running with PID 1234, then the name of the core file generated if it crashes would be "core.ns-slapd.1234". The value to the -p argument can also contain path information, so if you wanted core files to appear in the "/export/cores" directory instead of the server logs directory, then you would use a pattern like "/export/cores/core.%f.%p" instead. Just make sure that the target directory has enough space to hold the core file and that permissions are configured correctly to allow the server to dump core.
Once you've got your system configured properly, you can test it by trying to make the Directory Server crash and dump core. Hopefully you won't come across any cases in which the server does this accidentally, but you can intentionally cause it to happen by issuing the command:
This will send a signal to the Directory Server that would be identical to what it would have received from the OS if it had been guilty of a segmentation fault. If everything is set up properly, then the server should immediately stop and dump core. Obviously, this is not something you want to try on a production system that is actively handling requests, but it is a good practice to follow when you're first setting up the server so that if you do encounter a crash at some point in the future that it will at least leave behind a core file that can be used to help figure out what went wrong so it doesn't need to happen again. Posted by cn_equals_directory_manager ( Jan 23 2006, 07:59:12 AM CST ) Permalink Comments [2] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||