Greg Nakhimovsky recently told me about an interpose library that he wrote to track down small-sized allocations (which can cause heap bloat since the minimum allocation from libc's malloc is 8 bytes for 32-bit binaries and 16 bytes for 64-bit binaries). I suggested that a DTrace script could do much of the same work and offered the following:
#!/usr/sbin/dtrace -s
pid$target:libc:malloc:entry
/ arg0 <= 16 /
{
@s[ustack(2),arg0]=count();
}
END
{
trunc(@s,10);
}
This tracks all calls to malloc() for 16 or fewer bytes and prints
out the top 10 most frequently executed call sites. By passing the
argument "2" to ustack(), it only keeps two levels for each
stack frame (one for malloc() and one for the call site). The
output looks like:
libc.so.1`malloc
a.out`epsilon+0x18
12 109485
libc.so.1`malloc
libtestd.so`delta+0x235
16 160086
libc.so.1`malloc
libtestc.so`chi+0x451
8 250510
Usually this kind of stack trace is sufficient because methods/functions
are often short and the thing I'm looking for is distinct. For example,
to find the first location in the previous listing, I'd just look near
the beginning of function epsilon() for a call to malloc().
However, sometimes the location isn't obvious because the method might be
quite large or there might be multiple call sites for the function in question.
In that case, what's the easiest way to convert function+offset into a source line location?
If the code is compiled with -g (either for use with dbx during development or with optimization for use with collect), I just use dbx to do the mapping. I suppose I could try to execute the program (using dbx) up to the same point that triggered the DTrace probe, but that's not always easy. A short cut which is inelegant but often successful is to:
- Invoke the debugger on the binary.
$ dbx a.out
- Execute up to _start() (in order to load the shared libraries).
(dbx) stop in _start
(dbx) run
- Set the program counter to function+offset address.
(dbx) assign $pc=epsilon+0x18
- Have the debugger print out the current location.
(dbx) where
=>[1] epsilon(sz = 0), line 1012 in "testprog.c"
(dbx) list +1
1012 int_p = (int*) malloc(3*sizeof(int));
Unfortunately, this doesn't consistently work. However, if I advance the PC by one machine instruction (using "stepi"), then it does. I admit that this is not a particularly reasonable thing to do: execute up to the beginning of _start() and then execute a single instruction in an arbitrary method of the application. Despite the illogic of it all, it generally provides the source line information that I want.
When I try this on a SPARC system, the initial where command almost never works; however, if I set $npc (as opposed to $pc), and perform the machine level single-step, then it does provide the source information that I want.
The whole thing is a kludge, and the extra stepi command is a kludge on top of a kludge; however, I still find it sufficiently useful to have it encapsulated in a short script (called "lineinfo"):
$ lineinfo testprog chi+0x27c
=>[1] chi(sz = ), line 219 in "libtc.c"
219 p = (char*)malloc(sz);
This isn't exactly ready for prime time, but I'd like to hear if anyone has a better solution (either more robust or more elegant).
The script is:
#!/bin/sh
executable=$1
if [ ! -x $executable ] ; then
echo "Usage: $0 executable symbol+offset"
exit 0
fi
shift
case `uname -p` in
sparc) PC='$npc';;
*) PC='$pc';;
esac
dbx -q $executable 2> /dev/null <<%
>/dev/null stop in _start
>/dev/null run
>/dev/null assign $PC=$*
>/dev/null stepi
where
list +1
%
Posted by Chris Quenelle on January 08, 2007 at 10:32 AM PST #
Posted by Chris Quenelle on January 08, 2007 at 02:52 PM PST #
Posted by Chris Quenelle on January 08, 2007 at 04:12 PM PST #