Reflections on OS integration Eric Schrock's Weblog
Musings about Fishworks, Operating Systems, and the software that runs on them.

Sunday Jun 27, 2004

One of the most powerful but least understood aspects of the Solaris /proc implementation is what's known as the 'agent lwp'. The agent is a special thread that can be created on-demand by external processes. There are a few limitations: only one agent thread can exist in a process, the process must be fully stopped, and the agent cannot perform fork(2) or exec(2) calls. See proc(4) for all the gory details. So what's its purpose?

Consider the pfiles command. It's pretty easy to get the number of file descriptors for the process, and it's pretty easy to get their path information (in Solaris 10). But there's a lot of information there that can only be found through stat(2), fcntl(2), or getsockopt(3SOCKET). In this situation, we have generally three choices:

  1. Create a new system call. System calls are fast, but there aren't many of them, and they're generally reserved for something reasonably important. Not to mention the duplicated code and hairy locking problems.
  2. Expose the necessary information through /proc. This is marginally better than the above. We still have to write a good chunk of kernel code, but we don't have to dedicate a system call for it. On the other hand, we have to expose a lot of information through /proc, which means it's a public interface and will have to be supported for eternity. This is only done when we believe the information is useful to developers at large.
  3. Using the agent lwp, execute the necessary system call in the context of the controlled process.

For debugging utilities and various tools that are not performance critical, we typically opt for the third option above. Using the agent LWP and a borrowed stack, we do the following: First, we reserve enough stack space for all the arguments and throw in the necessary syscall intructions. We use the trace facilities of /proc to set the process running and wait for it to hit the syscall entry point. We then copy in our arguments, and wait for it to hit the syscall exit point. We then extract any altered values that we may need, clean up after ourselves, and get the return value of the system call.

If all of this sounds complicated, it's because it is. When you throw everything into the mix, it's about 450 lines of code to perform a basic system call, with many subtle factors to consider. To make our lives easier, we have created libproc, which includes a generic function to make any system call. libproc is extremely powerful, and provides many useful functions for dealing with ELF files and the often confusing semantics of /proc. Things like stepping over breakpoints and watchpoints can be extremely tricky when using the raw proc(4) interfaces. Unfortunately, the libproc APIs are private to Sun. Hopefully, one of my first tasks after the Solaris 10 crunch will be to clean up this library and present it to the world.

There are those among us who have other nefarious plans for the agent LWP. It's a powerful tool that I find interesting (and sometimes scary). Hopefully we can make it more accesible in the near future.