Buffer overflow, register window and register allocation.
I work on Sun's compiler, especially the SPARC code generator part. The inevitable (and sometimes boring, and sometimes the most interesting) part of my job is to evaluate bugs and (of course) fix them if I can. But as any engineers working on a complex software know, more often than not, a bug turns out to be an user error - in compiler's case, it could mean the user code has a bug.This is a story of one recent case of not-a-bug.
One of our largest ISVs filed a bug where their application receives SIGSEGV when the program is compiled at -xO4 or above with our S1S8 compiler. The program worked just fine with WS6U2 at the same optimization level, so the customer naturally thought this is a compiler bug. I can't fault them for that since they had experienced quite a few compiler bugs in the past.
Because the bug went away whenever you turned off the global register allocator, it was sent to me (since I was the author of the register allocator). This particular ISV application was one of the most difficult ones to deal with, because this ISV, like most other large ISVs, does not allow their code to be shipped to us, thus we have to rely on either their engineer or our support engineer working on their site.
Since there's always a possiblity of a user error, running dbx's rtc or purify like tools is one way to exclude some of the most common programming errors. Unfortunately, this application was too large and complex for dbx rtc or purify to handle correctly and produce a userful report.
The symptom was quite simple - the program gets SEGV and at the time of SEGV, the stack trace showed that one pointer parameter had upper 32bit of 64bit pointer "zero"ed. So obviously the caller of the function was the first suspect. Upon manual inspection of the disassembly, it was clear that the code was quite correct because the code looked like following:
add %fp,1xxx,%l0 ...bunch code including many calls... call problematic_func mov %l0,%o0
On dbx, %l0 contained a correct value right after the add but somehow the upper 32bit of %l0 got zeroed out when the control reached the problematic call. Subsequent dbx printout showed that %l0 gets changed after a call to a certain function.
Assuming save-restore are correctly placed, the only other way to modify %l0 is to change the register window save area. It just so happened that the %l0 is the first entry in the register window save area. Since SPARC is big-endian, the upper 32bit (MSB) is stored in the lower address. This all suggested the function in question was overwriting the first 32bit of the register window save area. This can happen, among others, if there's a buffer overflow on a local array. Because the compiler allocates stack space for local variables from the higher address to the lower address in the order of appearance in the source, the first variable is usually placed at the top, thus right below the current %fp (or the %sp of the caller). Of course, optimization can move stuff around and get rid of variables, and most scalar variables are allocated in the register so there's no guarantee for the above rule.
The preprocessed source code for the function in question looked like following:
returntype func(something *ptr,...) {
wchar_t a[81];
wchar_t b[81];
...initialize b by calling some initfunc...
for(i = 0;i < wslen(b);i++) {
...do some operation on b[i]...
}
b[i] = 0;
...more code...
}
The array "a" wasn't used in the function, so the compiler didn't bother to allocate it on the stack. Thus "b" was at the top of the stack. If b was to overflow, the window save area could be overwritten - i.e. b[81] = 0 would overwrite the upper 32bit of %l0 save area.
After hearing the above analysis, our support engineer looked at the code of the initfunc and found a bug as expected, and the bug was closed as not a bug.
One may wonder why this code worked fine in the past. That's because %l0 wasn't live across that particular function call. The morale of the story is that any slight change in the register assignment can reveal a user error.
( Jul 16 2004, 03:44:06 PM PDT ) Permalink Comments [6]
Post a Comment:
Comments are closed for this entry.


Posted by Ryan on September 26, 2004 at 06:28 AM PDT #
Posted by Seongbae Park on September 27, 2004 at 11:02 AM PDT #
Posted by Ryan on September 27, 2004 at 06:12 PM PDT #
Posted by Ryan on September 27, 2004 at 08:43 PM PDT #
Posted by Seongbae Park on September 28, 2004 at 09:09 AM PDT #
Posted by Ryan on October 02, 2004 at 05:42 PM PDT #