| |
Sherry Q. Moore's Weblog
Sherry Q. Moore's Weblog
Tuesday May 17, 2005 |
|
Compilation Options for Best Performance
| Target |
Hardware |
Compilation Options |
| 32-bit |
x86, no SSE |
-xtarget=pentium{3|4} |
| 32-bit |
x86, SSE |
-xtarget=pentium{3|4} -xarch=sse |
| 32-bit |
amd64 |
-xtarget=opteron |
| 64-bit |
amd64 |
-xtarget=opteron -xarch=amd64 |
* -xtarget=opteron implies -xarch=sse2, -xchip=opteron, and -xcache=64/64/2:1024/64/16
( May 17 2005, 05:28:17 PM PDT / May 17 2005, 05:26:33 PM PDT )
Permalink
Trackback: http://blogs.sun.com/sherrym/entry/compilation_options_for_best_performance
|
|
|
Friday May 13, 2005 |
|
Obtaining Function Arguments on AMD64
Now that you have experienced enough pain debugging on AMD64 platforms
without arguments, you would be delighted to hear that there are
options out there to help you!
The Studio 10 patch
compilers (minimum patch number is 117846-03, use ube -V to verify)
offers an option -Wu,-save_args on amd64 for saving
INTEGER type function arguments passed via registers on the
stack. When this option is specified, up to 6 arguments are saved on
the stack on function entry, and will not be modified through out the
life of the routine (the checkpoint effect we have all dreamed about).
For example,
void
foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7)
{
...
}
Disassembled code will look something like the following:
pushq %rbp
movq %rsp, %rbp
subq $0x30, %rsp **
movq %rdi, -0x8(%rbp)
movq %rsi, -0x10(%rbp)
movq %rdx, -0x18(%rbp)
movq %rcx, -0x20(%rbp)
movq %r8, -0x28(%rbp)
movq %r9, -0x30(%rbp)
...
**: The space being reserved is in addition to what the current
function prolog already reserves.
return PC
| %rbp
| %rdi
| %rsi
| %rdx
| %rcx
| %r8
| %r9
|
Nothing special is done for arguments beyond the first 6. If there are
odd number of arguments to a function, additional space should be
reserved on the stack to maintain 16-byte alignment. For example,
argc == 0: no argument saving.
argc == 3: save 3, but reserve space for 4 to maintain stack alignment.
argc == 7: save 6.
The -save_args flag has no direct association with the
optimization level. In other words, you can use various optimization
level along with -save_args.
A new Dwarf attribute has been introduced to indicate if a function
has been compiled with -save_args:
DW_AT_SUN_amd64_parmdump = 0x2224
The attribute has the value of 1 or 0. The attribute is only added
when the value is 1. The attribute is attached to DW_TAG_subprogram
tag.
You might wonder about the following:
- How does the extra argument saving affect performance?
With a 20-deep small function calls stack each with 6 arguments (to
cause maximum argument saving), the impact of the extra saving is
18 nanoseconds around a 10% hit.
#define FUNC(i, j) \
static int \
func##i(int i1, int i2, int i3, int i4, int i5, int i6) \
{ \
i3 = i1 + i2; \
i4 = i2 + i3; \
i5 = i3 + i4; \
i6 = func##j(i1, i2, i3, i4, i5, i6); \
return (i3 + i4 + i5 + i6); \
}
This is on hot cache where the first store to the stack won't
suffer a page fault. Since in reality functions actually do
something more complicated, the actual hit should be much smaller.
If it turns out the -save_args option does affect performance of
your particular application, you can always turn it off in
production code.
- Why was it implemented as callee-saved instead of caller-saved?
- Smaller code size when functions are called by many callers.
- Avoids useless argument saving when calling assembly functions.
- Can be enabled only on the module that's being debugged.
- So what does the output look like?
Ha, I thought you would never ask!
stack pointer for thread fffffe8123debe80: fffffe80006296c0
[ fffffe80006296c0 unix`_resume_from_idle+0xde() ]
fffffe8000629700 unix`swtch+0x241()
fffffe8000629730 genunix`cv_wait+0x83(ffffffff82a44ed8, ffffffff82a44ed0)
fffffe80006297a0 ufs`ufs_check_lockfs+0x14c(ffffffff82a44e00, ffffffff82a44eb0, 80000030)
fffffe8000629800 ufs`ufs_lockfs_begin+0x14e(ffffffff82a44e00, fffffe8000629840, 80000030)
fffffe8000629920 ufs`ufs_readlink+0x7e(ffffffff90377300, fffffe8000629980, ffffffff832e9428)
fffffe8000629950 genunix`fop_readlink+0x24(ffffffff90377300, fffffe8000629980, ffffffff832e9428)
fffffe80006299d0 genunix`pn_getsymlink+0x66(ffffffff90377300, fffffe8000629b20, ffffffff832e9428)
fffffe8000629bc0 genunix`lookuppnvp+0x3f5(fffffe8000629ca0, 0, 1, 0, fffffe8000629e10, ffffffff8c907b80)
fffffe8000629c60 genunix`lookuppnat+0x13e(fffffe8000629ca0, 0, 1, 0, fffffe8000629e10, 0)
fffffe8000629d40 genunix`lookupnameat+0x88(805bd38, 0, 1, 0, fffffe8000629e10 , 0)
fffffe8000629dd0 genunix`cstatat_getvp+0x17d(ffd19553, 805bd38, 1, 1, fffffe8000629e10, fffffe8000629e18)
fffffe8000629e60 genunix`cstatat32+0x68(ffd19553, 805bd38, 1, fcfdbef8, 0, 10
fffffe8000629e80 genunix`stat32+0x33(805bd38, fcfdbef8)
fffffe8000629eb0 genunix`xstat32+0x26(2, 805bd38, fcfdbef8)
fffffe8000629f00 unix`sys_syscall32+0x1ff()
( Dec 09 2008, 11:33:44 AM PST / May 13 2005, 10:31:05 AM PDT )
Permalink
Trackback: http://blogs.sun.com/sherrym/entry/obtaining_function_arguments_on_amd64
|
|
|
Friday May 06, 2005 |
|
I currently work in Solaris Kernel Development at Sun Microsystems. My
projects over the last 1 1/2 years include:
- Solaris port to AMD64 platforms, for which we won the 2005
Chairman's Award.
- Improved write performance by 80-120% on AMD64 as measured by
libMicro.
- Got -save_args option implemented by Sun Studio compilers for
AMD64 so that function arguments passed via register are available
to the debugger (more on this later).
- Improved debugability on AMD64 in general.
Prior to this new adventure in x86 land, I spent 6 1/2 years working in
Sun's Enterprise Server Group, mostly on the SunFire 4800-6800 product
line (Code name Serengeti). I designed and implemented
- POST (Power On Self Test)
- Parts of the System Controller software (test sequencer, domain
console and domain communication channel)
- The Solaris driver for communicating with the system controller
- The Solaris drivers for DR (Dynamic Reconfiguration).
Prior to Sun I worked at Intel, and still have fond memories of the
Pentium II launch party held at OMSI.
In addition to my day job, I also play mommy for two wonderful young
children. When at times I exclaimed, "I found the bug!", my son would
respond with the same enthusiasm, "Did you kill it?".
( Feb 20 2007, 02:44:06 PM PST / May 06 2005, 10:14:16 AM PDT )
Permalink
Trackback: http://blogs.sun.com/sherrym/entry/welcome
|
|
|
|
| May 2005 » | | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
1 | 2 | 3 | 4 | 5 | | 7 | 8 | 9 | 10 | 11 | 12 | | 14 | 15 | 16 | | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | | | | | | | | | | | | | | Today |
Today's Page Hits: 39
|