Solaris tip of the week: debug native code with disassembler
I had to track down a segv in some JNI code recently from a core file ... not sure if there is an easier way to do this, here's what I came up with:
Program crashes with segv indicating the error occurs in the following routine:
Java_com_sun_zfsGet+0x18d
I use the solaris 'dis' command to disassemble the jni library:
# dis /usr/lib/libtest.so
Java_com_sun__zfsGet+0x184: 83 c4 10 addl $0x10,%esp
Java_com_sun__zfsGet+0x187: 83 ec 08 subl $0x8,%esp
Java_com_sun__zfsGet+0x18a: 8b 45 08 movl 0x8(%ebp),%eax
Java_com_sun__zfsGet+0x18d: 8b 00 movl (%eax),%eax
Java_com_sun__zfsGet+0x18f: 68 01 00 00 00 pushl $0x1
Java_com_sun__zfsGet+0x194: ff 75 08 pushl 0x8(%ebp)
But it's not obvious to which line of C code this segv corresponds ...
I decided to use nanosleep() to place markers in the code, and recompile. For example:
struct timespec ns;
ns.tv_sec = 0;
ns.tv_nsec = 1;
nanosleep(&ns,(struct timespec *)NULL);
/* a bit of code */
ns.tv_nsec = 2;
nanosleep(&ns,(struct timespec *)NULL);
/* more code */
ns.tv_nsec = 3;
nanosleep(&ns,(struct timespec *)NULL);
/* etc, etc */
Now when I recompile, I have an identifiable marker in the disassembled code that I use to match up with my source code.
The new 'dis' output now contains:
movl $0x1,-0xc(%ebp) /* This is the value of ns.tv_nsec */
subl $0x8,%esp
pushl $0x0
leal -0x10(%ebp),%eax
pushl %eax
call -0x4 <Java_com_sun_zfsGet+0x1e>
for each call to nanosleep().
By incrementing the number of nsec by one with each call, I can now easily track down the bug in my code.
With this instrumentation, I see that a segv occurring at Java_com_sun__zfsGet+0x18d corresponds to the
code following my nanosleep() marker with an ns.tv_nsec value of 6.
Java_com_sun_zfsGet+0x16f: c7 45 f4 06 00 00 movl $0x6,-0xc(%ebp)
Java_com_sun_zfsGet+0x176: 83 ec 08 subl $0x8,%esp
Java_com_sun_zfsGet+0x179: 6a 00 pushl $0x0
Java_com_sun_zfsGet+0x17b: 8d 45 f0 leal -0x10(%ebp),%eax
Java_com_sun_zfsGet+0x17e: 50 pushl %eax
Java_com_sun_zfsGet+0x17f: e8 fc ff ff ff call -0x4 <Java_com_sun_zfsGet+0x180>
Java_com_sun_zfsGet+0x184: 83 c4 10 addl $0x10,%esp
Java_com_sun_zfsGet+0x187: 83 ec 08 subl $0x8,%esp
Java_com_sun_zfsGet+0x18a: 8b 45 08 movl 0x8(%ebp),%eax
Java_com_sun_zfsGet+0x18d: 8b 00 movl (%eax),%eax
Instead of changing your code, you should recompile it simple with -S option.
The CC compiler then generates assembly code output, and you can easily
find which line of the C code caused the segv. For example:
! 1245 ! else if (foo() && (oldBar != newBar)) {
/* 0x00fc 1245 */ ldub [%i0+2913],%o1 ! volatile
/* 0x0100 */ cmp %o1,0
/* 0x0104 */ be,pn %icc,.L77000954
/* 0x0108 */ cmp %i3,%i4
/* 0x010c 1245 */ be,pn %icc,.L900004769
segv occured at 0xfc offset.
foo() was a simple getter. The compiler do it inline instead
of function call.
Posted by Majki on February 27, 2009 at 03:49 AM EST #