
Friday June 02, 2006
The real story on the cc command and the -xregs=frameptr option
Alas the Sun Studio 11 C User's Guide documentation about
-xregs=frameptr and -fast expansion that includes -xregs=frameptr is
incorrect to the point of being completely confusing.
http://docs.sun.com/source/819-3688/cc_ops.app.html Appendix B 2.16,
Table B-5 -fast option description incorrectly list -fast on x86 as
expanding to -xregs=no%frameptr. That is not correct. The
-fast macro option on x86 expansion includes -xregs=frameptr.
Later in Appendix B at 2.129, near the end of the description of -xregs, exists the following incorrect sentence:
The x86 default is -xregs=no%frameptr unless you specify -fast or an optimization of -xO5 in which case -xregs=frameptr.
-xO5 has no effect what-so-ever on -xregs option. The sentence should read:
The x86 default is -xregs=no%frameptr unless you specify the -fast macro option whose expansion includes -xregs=frameptr.
Unfortunately the cc man page also has the equivalent errors incorrectly
showing -fast expanding to -xregs=no%frameptr, and claiming -xO5
implies -xregs=frameptr.
So to recap, here is an accurate description of -xregs=frameptr, and how the -fast macro expands for cc on x86/64:
-xregs=frameptr Tells the compiler it is allowed to use the frame-pointer
register as an unallocated callee-saves register for purposes
of optimization.
-xregs=no%frameptr Tells the compiler it is NOT allowed to use the frame=pointer
register for purposes of optimization.
The x86 default is -xregs=no%frameptr unless you specify the -fast macro option whose expansion includes -xregs=frameptr.
-xregs=frameptr allow the compiler to use the frame-pointer register
(%ebp on IA32, %rbp on AMD64) as an unallocated callee-saves
register. Using this register as an unallocated callee-saves
register may improve program run time. However, it also reduces the
capacity of some tools to inspect and follow the stack. This stack
inspection capability is important for system performance measurement
and tuning. Therefor, using this optimization may improve local
program performance at the expense of global system performance.
-
Tools, such as the Performance Analyzer, that dump the stack for
postmortem diagnosis will not work.
- Debuggers (e.g adb, mdb, dbx)
will not be able to dump the stack or directly pop stack frames.
- The
dtrace performance analysis facility will be unable to collect
information on any frames on the stack before the most recent frame
missing the frame pointer.
- Posix pthread_cancel will fail trying to
find cleanup handlers.
- C++ exceptions cannot propagate through C
functions.
The failures in C++ exceptions occur when a C function that has lost
its frame pointer calls a C++ function that throws an exception through
the C function. Such calls typically occur when a function accepts a
function pointer (for example, qsort) or when a global function, such
as malloc, is interposed upon. The last two affects listed above may
impact the correct operation of applications. Most application code
will not encounter these problems.
Note some might also be confused by the following bullet item in the C man page ...
The documentation at the top of the cc man page about -xregs=no%frameptr is incorrect, it currently says:
o A new x86-only flag for the -xregs option,
-xregs=[no%]frameptr, lets you use the frame-pointer regis-
ter as an unallocated callee-saves register to increase the
run-time performance of applications.
This needs to be corrected to say:
o A new x86-only flag for the -xregs option,
-xregs=frameptr, lets you use the frame-pointer regis-
ter as an unallocated callee-saves register to increase the
run-time performance of applications.
Note the Sun Studio 10 C User's guide does not have these errors, i.e. it correctly describes -xregs=frameptr and -fast
( Jun 02 2006, 02:59:24 PM PDT )
Permalink
Trackback URL: http://blogs.sun.com/dew/entry/the_real_story_on_the