Darryl Gove's blog
Calling libraries
I've previously blogged about measuring the performance of calling library code. Lets quickly cover where the costs come from, and what can be done about them.
The most obvious cost is that of making the call. Probably this is a straight-forward call instruction, although calls over indirection can involve loading the address from memory first of all. There's also a linkage table to negotiate - let's take a look at that:
#include <stdio.h>
void f()
{
printf("Hello again\n");
}
void main()
{
printf("Hello World\n");
f();
}
There's two calls to printf in the code, libc is lazy-loaded, so the first call does the set up, and then we can see what happens more generally on the second call.
% cc -g p.c
% dbx a.out
Reading ld.so.1
Reading libc.so.1
(dbx) stop in f
(2) stop in f
(dbx) run
Running: a.out
(process id 63626)
Reading libc_psr.so.1
Hello World
stopped in f at line 4 in file "test.c"
4 printf("Hello again\n");
(dbx) stepi
stopped in f at 0x00010bc0
0x00010bc0: f+0x0008: bset 48, %l0
0x00010bc4: f+0x000c: call printf [PLT] ! 0x20ca8
0x00010bc8: f+0x0010: or %l0, %g0, %o0
0x00020ca8: printf [PLT]: sethi %hi(0x15000), %g1
0x00020cac: printf+0x0004 [PLT]: sethi %hi(0xff31c400), %g1
0x00020cb0: printf+0x0008 [PLT]: jmp %g1 + 0x00000024
0x00020cb4: _get_exit_frame_monitor [PLT]: sethi %hi(0x18000), %g1
0xff31c424: printf : save %sp, -96, %sp
So the call to printf actually jumps to a procedure lookup table, which then jumps to the actual start address of the library code.
So that's the additional costs of libraries. But just doing a call instruction also has some costs:
- For SPARC processors, there's the possibility of hitting a register windows spill/fill trap.
- The other issue with call instructions is that the compiler does not know whether the routine being called will read or write to memory. So all variables need to be stored back to memory before the call, and read from memory afterwards - this can get quite ugly particularly for floating point codes where there maybe quite a few active registers at any one time. This behaviour can be avoided using the pragmas
does_not_read_global_data,does_not_write_global_data,no_side_effect. Theno_side_effectpragma means that the compiler can eliminate the call to the routine if the return value is not used. - There are also ABI issues. For example, the SPARC V8 ABI requires floating point parameters to be passed in the integer registers. Doing this requires storing the fp registers to the stack and then loading the values into the integer registers, and doing the opposite on the other side of the call!
So generally calling routines can be time consuming, but what can be done?
- Check to see whether you might use intrinsics such as
fsqrtrather than callingsqrtin libc (-xlibmil) - Compiling with
-xO4enables the compiler to avoid calls by inlining within the same source file. - Compiling and linking with
-xipoenables the compiler to do cross-file inlining. - Make sure that every call that is made does substantial work - not just a handful of instructions.
- Profile the application to confirm that there is real work being done in library code, and that the library routines called do perform substantial numbers of instructions on every invocation.
Posted at 03:15PM Jun 26, 2008 by Darryl Gove in Sun | Comments[2]



"But just doing a call instruction also has some costs:
For SPARC processors, there's the possibility of hitting a register windows spill/fill trap."
I know you're simplifying, but there's no way for a call instruction to cause a spill trap, let alone a fill trap. Those are caused by save/restore instructions and happen after the call instruction.
Posted by Valued Reader on June 26, 2008 at 04:14 PM PDT #
Yes, that's quite correct, the call instruction just changes the pc, the save and restore instructions in the called routine change the register windows (and not all routines need the save and restore instructions).
Posted by Darryl Gove on June 26, 2008 at 04:49 PM PDT #