|
Sendmail Died in a Two SIGALRM Fire
In honor of today's full release of
OpenSolaris
I thought I would retell the story of one of my favorite bugs that I've
debugged while working at Sun on Solaris. This problem is Sun bug
4278156 and was originally debugged during development of Solaris 8 during
the first week of October, 1999.
The investigation itself began with a single core file found on one of our
internal test servers from a sendmail daemon that was running there.
At the time, we were testing out the use of a new Solaris facility
called coreadm(1M) that Roger Faulkner and I designed. If you
haven't used it before, coreadm permits administrators and developers
to create different core file patterns (i.e. other than ./core) to
be used when creating core files, and to create a global core file repository
where a copy of all core dumps will appear on a system. This is an essential
valuable part of our strategy of first-fault debugging at Sun,
how we try to ensure a single failure can be debugged to root-cause.
Aside from illustrating how we do this with a good story which ultimately
benefited everyone using Sendmail, I'll also go into some details about:
- How we debug core files at a low-level with mdb
- How signal handling works between the kernel and user processes
- How sometimes one bug helps you find another
The problem began simply enough: a single core file from sendmail
sitting in our global core file directory /var/core on a machine
running build 31 of Solaris 8. The stack trace was as follows:
$ mdb core
> $c
sfgets+0x80(0, 0, 0, 0, f0878, d1800)
That is, the stack was truncated at a single frame in sfgets().
Before launching into a detailed analysis of how exactly sendmail died, here
are a few pieces of background information to help understand the analysis:
- Our sendmail is configured (in conf.h) such that in sendmail's source
code, jmp_bufs are #defined to be sigjmp_bufs, setjmp() is #defined to
be sigsetjmp(env, 1), and longjmp() is #defined to be siglongjmp().
This is important to keep in mind when reading the sendmail code snippets.
- This bug report presumes a reasonably detailed knowledge of how signal
delivery (in this case on SPARC 32-bit) is implemented by the kernel
and libc. To understand how this works and how a debugger can locate
signal frames on the stack in Solaris, refer to my comments in libproc
which are found
here.
- To debug the problem, I wrote several new mdb dcmds and walkers.
It's always worth the effort to do so: as part of this debugging effort,
we not only got sendmail fixed, we also shipped these new debugging
features, thereby benefitting all future development efforts. You can
learn more about mdb by reading the
Solaris Modular Debugger Guide. I'll go into further details about
mdb and how to write commands for it in a future blog.
By looking at the core file I was able to observe the immediate cause of death:
the process died from a SIGBUS at sfgets+0x80 because it
attempted to store to %fp - 0xc, and somehow its frame pointer has
been corrupted (it was set to 0x1, a clearly bogus value):
> $c
sfgets+0x80(0, 0, 0, 0, f0878, d1800)
> <pc/i
sfgets+0x80: st %o0, [%fp - 0xc]
> <fp=K
1
Specifically, sendmail was attempting to store the pointer to the string
"local" to the stack for a call to sm_syslog(), as show below:
char *
sfgets(buf, siz, fp, timeout, during)
{
/* ... */
if (timeout != 0)
{
if (setjmp(CtxReadTimeout) != 0)
{
if (LogLevel > 1)
sm_syslog(LOG_NOTICE, CurEnv->e_id,
"timeout waiting for input from %.100s during %s",
CurHostName ? CurHostName : "local",
during);
/* ... */
}
You can observe that it was executing code following a read timeout, which
caused a siglongjmp back to the CtxReadTimeout sigjmp_buf, and caused the
setjmp (remember, really a sigsetjmp) to return non-zero. Somehow when sendmail
returned from an interrupt (a SIGALRM used to timeout sfgets()), the fram
e
pointer ended up as 0x1. The invalid frame pointer is also the reason why the
stack backtrace only shows the single sfgets() frame. The readtimeout()
routine used as a SIGALRM callback is quite simple, and I disassembled
it to determine the address of CtxReadTimeout (since the binary was stripped).
Here's the source code for readtimeout():
static void
readtimeout(time_t timeout)
{
longjmp(CtxReadTimeout, 1);
}
mdb permits developers to add a custom symbol missing from a stripped
symbol table to a private symbol table so you can use its
name for later debugger queries. So I added a symbol for CtxReadTimeout
at 0xcc800+304 with size 76 bytes. Once a symbol is added, the
address-to-symbol conversions elsewhere in the debugger can reference it.
> 0xcc800+304::nmadd -s 0t76 CtxReadTimeout
added CtxReadTimeout, value=ccb04 size=4c
> ::nm -P
Value Size Type Bind Other Shndx Name
0x000ccb04|0x0000004c|NOTY |GLOB |0x0 |UNDEF |CtxReadTimeout
In order to confirm that sendmail came through readtimeout(), I decided to
verify this hypothesis in two ways: (1) by confirming that the current
%sp matched the %sp saved inside of CtxReadTimeout, and
(2) by confirming that %o7 was still set to somewhere inside of
siglongjmp(). Here is the annotated debugger output:
> CtxReadTimeout/3p
CtxReadTimeout:
CtxReadTimeout: 1 0xffbe8630 sfgets+0x50
^ flags ^ saved %sp ^ saved %pc <--[ for reference ]
> <sp=K
ffbe8630 <-- same as the saved %sp above
> <o7=p
libc.so.1`_libc_siglongjmp+0x1c
With this confirmed, my next move was to figure out what the %sp was
before we made the siglongjmp of death, so I could figure out what the
stack looked like before sendmail died. To do this, I wrote a new mdb
walker called oldcontext to produce each lwp's
lwpstatus_t.pr_oldcontext value, which is the address of the last
ucontext_t saved on the stack by a signal handler. Again, to learn about
how signal frames are saved on a stack, look
here
in the Solaris source code.
> ::walk oldcontext
ffbe82c0
Using the knowledge of how signal frames are set up, I knew that
prior to sendmail's death, its stack basically looked like this:
ucontext_t \__ signal ^ high addresses
struct frame / frame |
struct frame <-- frame saved by libc`sigacthandler |
struct frame <-- frame saved by siglongjmp v low addresses
In 32-bit SPARC Solaris, sizeof (struct frame) = 0x60, so the old
stack pointer should have been located at
the address ffbe82c0 - (60 * 3) = ffbe81a0. And sure enough:
> 0xffbe81a0$C
00000000 0(0, 0, 0, 0, 0, 0)
ffbe81a0 0(d0800, 654c6, 6386c, 0, 0, 0)
ffbe8200 libc.so.1`sigacthandler+0x28(e, 0, ffbe82c0, cfb60, 21858, 24cf4)
ffbe8578 libc.so.1`_wait+0xc(654d7, f07c8, d00c8, f07c8, 654d7, 0)
ffbe85e0 endmailer+0x7c(188c6c, d0ae0, 0, d0ae0, a9c08, 5)
ffbe8640 smtpquit+0x88(f0878, d1800, 1, d0ae0, 188c6c, 54)
ffbe86a0 reply+0x3fc(cf400, cfc00, cfc00, 1, 0, 188c6c)
ffbe9700 smtpquit+0x64(f0878, d1800, 0, d0ae0, 188c6c, 78)
ffbe9760 mci_flush+0x58(2, 4, cfb28, 0, 1, 0)
ffbe97c0 finis+0xdc(cec00, cfabe, ffbe98e8, ff1ba000, 0, 0)
ffbe9828 libc.so.1`sigacthandler+0x28(f, ffbe9ba0, ffbe98e8, ff1ba000, 0, 0)
ffbe9c20 libc.so.1`sigacthandler+0x18(1, ffbe9f98, ffbe9ce0, 0, 21d2c, 245e0)
ffbea018 libc.so.1`_read+0xc(cfba0, 18a0f4, cfba0, 0, 213a8, 21b94)
ffbea078 libc.so.1`fgets+0x24c(ff1c164c, 0, cfba0, ff1ba000, 18a0f4, 7ff)
ffbea0d8 sfgets+0x164(d0000, cfba0, cfba0, e10, c44a0, a9b8f)
ffbea148 reply+0xbc(cf400, 1, ce0d9, d019c, 0, 188c6c)
ffbeb1a8 smtpgetstat+0x20(f0878, 188c6c, cf5dc, 162, 188c6c, cf5dc)
ffbeb208 deliver+0x2490(c1de0, 1000, 130208, ffbecb3c, c1cf0, d00c8)
ffbecff8 sendenvelope+0x1e0(cf5dc, 69, d98e8, a03b0, cfb60, 0)
ffbed170 sendall+0xddc(cf5dc, cfac9, ce6d0, 0, 0, 0)
ffbed1d8 dowork+0x200(ce400, 6d, 0, cfae4, 1, d98e8)
ffbed238 smtp+0x1a14(1, d98e8, 0, f1b70, cf4f8, ff00)
ffbee338 main+0x2ffc(cec00, cfc00, d0ae0, cf4f8, ce400, cfc00)
ffbefd68 _start+0xb8(0, 0, 0, 0, 0, 0)
Notice that there are 3 (!) observed instances of libc's
sigacthandler()
function on the stack, with signals 0x1 (HUP), 0xf (TERM),
and 0xe (ALRM) pending respectively. To walk backward through the list
of signals that should have been delivered, I wrote another mdb
command: a ucontext walker which follows the uc_link chain.
Here is the list of signal ucontext_t's I found by following the
ucontext_t.uc_link field and my notes on each:
> ::walk oldcontext | ::walk ucontext
ffbe82c0 <-- SIGALRM that caused siglongjmp of death [1]
ffbe82b8 <-- ? [2]
ffbe98e8 <-- siginfo ffbe9ba0 = SIGTERM from PID 424186 [3]
ffbe9ce0 <-- siginfo ffbe9f98 = SIGHUP from PID 424186 [4]
I've annotated each ucontext_t address with what I knew so far, and a bit
more. I knew that ucontext 1 was the most recent, put there by the
SIGALRM that caused sendmail's death. Looking back on the stack, I
saw that there were siginfo_t's (the 2nd parameter to sigacthandler())
corresponding to the third and fourth ucontext_t's. I then decoded the contents
of each one and determined that they were externally generated
SIGTERM and SIGHUP signals respectively, both from process 424186.
Interestingly, the SIGTERM was received at sigacthandler+0x18;
that is, while we were still in libc setting up to handle the SIGHUP,
but before we called sendmail's SIGHUP handler, we took a SIGTERM.
Finally, the second ucontext_t was a mystery to me at this point: it didn't
have a corresponding sigacthandler() frame on the stack.
In investigating this oddity, I discovered an interesting (and useful) bug in
the Solaris SPARC siglongjmp implementation: normally, at the completion
of a signal handler routine, a setcontext(2) to the interrupted
context is performed. This setcontext() call effectively restores the
interrupted context's uc_link value to lwp_oldcontext in the
kernel, thus "deleting" the ucontext_t from the "list" of saved signal contexts.
However, at that time, Solaris did not store the uc_link obtained from
a getcontext(2) call in sigsetjmp() into the sigjmp_buf.
Furthermore, libc's siglongjmp() performed another
getcontext(), modified the resulting context, and then performed a
setcontext() after updating the %pc and %sp, all without
touching uc_link. Thus, at this time, Solaris SPARC callers of
siglongjmp from a signal handler never delete ucontexts from the
uc_link list, even when these signals are retired. (After the
sendmail problem had been fixed, a bug on this was also filed and fixed).
At first glance this uc_link bug seemed bad to me, until I considered
that it gave me a critical piece of data I wouldn't have had otherwise: I knew
that a signal was delivered and retired after the SIGTERM but before the
SIGALRM that caused sendmail's death, and I knew that its ucontext_t
was saved at ffbe82b8 on the stack. So I was now ready to make a couple of
observations and a hypothesis:
-
The initial signal (SIGHUP) was delivered while sendmail
was blocked inside of read(2), as called from sfgets(),
and thus the sfgets() SIGALRM had not yet fired. I also
knew that much later, after sendmail decided to quit
because of the subsequent SIGTERM, that reply()
called smtpquit():
...
ffbe8640 smtpquit+0x88(f0878, d1800, 1, d0ae0, 188c6c, 54)
ffbe86a0 reply+0x3fc(cf400, cfc00, cfc00, 1, 0, 188c6c)
ffbe9700 smtpquit+0x64(f0878, d1800, 0, d0ae0, 188c6c, 78)
...
Notice, using the saved frame pointers, that the mystery ucontext_t pointer
(ffbe82b8) was placed on the stack prior to reply()'s call to smtpquit().
I also knew (from groveling inside of CtxReadTimeout) that the signal
mask (sigset_t) restored by our siglongjmp of death was:
0x00004001 0x00000000 0x00000000 0x00000000
which is the mask consisting of SIGTERM and SIGHUP blocked.
This told me that SIGTERM and SIGHUP had not yet completed
processing (as I had guessed from the stack trace), but that SIGALRM
was unblocked (that is, its processing was done). I also knew from
sendmail's source code that reply() itself calls sfgets() before
calling smtpquit(), thus installing another SIGALRM event.
-
I now hypothesized that the mystery ucontext_t represented another
SIGALRM, namely the SIGALRM corresponding to the first
sfgets(), the one still pending on the stack. This SIGALRM was
handled as if it was the timeout for the second sfgets (called
from reply(), no longer on the stack) because sfgets always uses (and
thus overwrites) the same global sigjmp_buf, and caused reply() to abort
its sfgets() and call back to smtpquit().
At this point, a second SIGALRM (the one corresponding to reply()'s
sfgets() call) was delivered, and this caused a second siglongjmp()
back to the same stack location, a stack location we had now re-used and
overwritten with local variable information. I then attempted to
prove this hypothesis using sendmail's event queue and free event list.
When sendmail wishes to set up an sfgets() timeout, it uses an internal
setevent() function that keeps a linked list of pending event structures.
alarm() is then used to install a SIGALRM event for the time of the next
event that requires a timeout. The event structure looked like this:
struct event {
time_t ev_time; /* time of the function call */
void (*ev_func)__P((int)); /* function to call */
int ev_arg; /* argument to ev_func */
int ev_pid; /* pid that set this event */
struct event *ev_link; /* link to next item */
};
At the time of death, the EventQueue list was empty:
> EventQueue/K
0xd09ec: 0
but luckily, retired events were not immediately freed; instead, they were
placed on the head of the FreeEventList (linked again using
ev_link). Dumping out this list showed
that two events had been retired:
> FreeEventList/K
0xcf4ec: f1cd0
> f1cd0/YpnXDX
0xf1cd0: 1999 Sep 15 15:04:10 readtimeout
0 414918 f1b50
> f1b50/YpnXDX
0xf1b50: 1999 Sep 15 15:02:20 readtimeout
0 414918 0
Notice that (1) they both were set up to call readtimeout(), the aforementioned
SIGALRM handler, and (2) sendmail conveniently records the time_t of
expiration, which we have formatted using /Y in the output above.
The code for reply() leading up to smtpquit() looks like this (trimmed for
simplicity):
int
reply(m, mci, e, timeout, pfunc)
{
/* ... */
p = sfgets(bufp, MAXLINE, mci->mci_in, timeout, SmtpPhase);
mci->mci_lastuse = curtime();
/* ... */
if (p == NULL)
{
if (errno == 0)
errno = ECONNRESET;
mci->mci_errno = errno;
mci->mci_state = MCIS_ERROR;
smtpquit(m, mci, e);
}
/* ... */
}
Notice that when sfgets() returns, sendmail records the current time in an mci
structure. Then if p is NULL, indicating sfgets() failure, it also records
errno, and calls smtpquit(). Again from the stack, I was able retrieve the
pointer to the struct mci in question and dump it out:
> 188c6c/6xnXXnppDnXXXXXnYp
0x188c6c: 2048 83 0 0 0 0
6c 0
0 0 414935
c44a0 f0878 130920
0 0
1999 Sep 15 15:02:20 0
If you haven't debugged enough core files and crash dumps to have format
characters wired into your brain and fingers, use ::formats in
mdb to list them out. In the output above I observed two things:
(1) the 0x83 at the top is mci_errno, which is decimal 131 which is
ECONNRESET, and (2) the formatted time_t at the
bottom (mci_lastuse) exactly matches the expiry time of the first
SIGALRM event structure (0xf1b50 shown above). (1) told me
that errno was zero when we came back from sfgets() with p == NULL, which
means sendmail timed out, as opposed to a read() error. (2) told me
that indeed the earlier SIGALRM event (last on the
FreeEventList) triggered the later siglongjmp back to sfgets(),
confirming my hypothesis.
So the complete chain of events leading to the core dump was:
- sfgets() was pending, SIGHUP interrupted
- SIGHUP interrupted by SIGTERM, causing a call to finis()
- reply() called sfgets(), overwriting global CtxReadTimeout with new
sigsetjmp
- sfgets() was pending, SIGALRM from earlier sfgets() interrup
ted
- readtimeout() called siglongjmp(), sfgets() returned to reply()
instead of to caller of earlier sfgets()
- reply() called into smtpquit(), overwriting the portion of stack used by
second sfgets()'s saved register window (including %fp) and pointed
to by
CtxReadTimeout's saved %sp value
- SIGALRM from second sfgets() now delivered, causing another
siglongjmp() to the same CtxReadTimeout sigjmp_buf values
- %sp was restored to ffbe8630, but now this pointed to more recent s
tack
data instead of a saved register window, so %fp was restored to 0x1
- sfgets() now managed to run along from sfgets+0x50 to sfgets+0x80, the first
instruction following return from the sigsetjmp call at sfgets+0x48 that
references %fp, and then died from a SIGBUS trying to perform a stor
e.
Essentially the bug illustrated a pretty serious design flaw in sendmail's
timeout mechanism -- with an sfgets() timeout pending, it responded to a
SIGTERM and begin to execute a long sequence of cleanup code with
other calls back into sfgets(). Since every sfgets() timeout re-used
the same global sigjmp_buf, this buffer was then overwritten and the earlier
timeout triggers a logical return to the wrong stack location. The sendmail
code needed to potentially clear previously pending timeouts once it decided
it was on a one-way path to quit (or block HUP and TERM while
waiting on read in sfgets()), and perhaps the setevent() mechanism needed
to have a per-event sigjmp_buf. In any case, the designers of this code
needed to rethink the mechanism based on the evidence uncovered here. And
they did just that: fixing the problem in the 8.9.3 release.
Looking back,
this problem is a great illustration of how much can be discerned from a
single core dump, whether it be from the kernel or a complex user application,
and the value in investing the time in root-causing every failure,
even those that happen once. Without coreadm this core file would
likely never have been seen. And without root-causing the first time this
fault was seen, this type of subtle signal condition would likely have gone
unreproducible and undebugged for months or even years, until biting
one of our customers. Sun's investment in a first-fault debugging strategy
helps to ensure that Solaris and other OpenSource software we leverage like
Sendmail are of the highest possible quality. As we move into the
OpenSolaris era, I know we'll benefit
from the debugging skills and brains of those who join the effort, and I hope
that our ideas, tools, and techniques for debugging will spread to other Open
efforts where developers believe in and value a high-quality product.
$q
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
Technorati Tag: MDB
( Jun 10 2005, 06:15:34 PM PDT / Jun 14 2005, 08:00:00 AM PDT )
Permalink
Trackback: http://blogs.sun.com/mws/entry/sendmail_died_in_a_two
|