Sherry Q. Moore's Weblog

Sherry Q. Moore's Weblog


20050517 Tuesday May 17, 2005

 Compilation Options for Best Performance

Compilation Options
Target Hardware Compilation Options
32-bit x86, no SSE -xtarget=pentium{3|4}
32-bit x86, SSE -xtarget=pentium{3|4} -xarch=sse
32-bit amd64 -xtarget=opteron
64-bit amd64 -xtarget=opteron -xarch=amd64

* -xtarget=opteron implies -xarch=sse2, -xchip=opteron, and -xcache=64/64/2:1024/64/16



( May 17 2005, 05:28:17 PM PDT / May 17 2005, 05:26:33 PM PDT ) Permalink
Trackback: http://blogs.sun.com/sherrym/entry/compilation_options_for_best_performance

20050513 Friday May 13, 2005

 Obtaining Function Arguments on AMD64

Now that you have experienced enough pain debugging on AMD64 platforms without arguments, you would be delighted to hear that there are options out there to help you!

The Studio 10 patch compilers (minimum patch number is 117846-03, use ube -V to verify) offers an option -Wu,-save_args on amd64 for saving INTEGER type function arguments passed via registers on the stack. When this option is specified, up to 6 arguments are saved on the stack on function entry, and will not be modified through out the life of the routine (the checkpoint effect we have all dreamed about). For example,
        void
        foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7)
        {
        ...
        }
Disassembled code will look something like the following:
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $0x30, %rsp                     **
        movq    %rdi, -0x8(%rbp)
        movq    %rsi, -0x10(%rbp)
        movq    %rdx, -0x18(%rbp)
        movq    %rcx, -0x20(%rbp)
        movq    %r8, -0x28(%rbp)
        movq    %r9, -0x30(%rbp)
        ...
**: The space being reserved is in addition to what the current function prolog already reserves.

return PC
%rbp
%rdi
%rsi
%rdx
%rcx
%r8
%r9

Nothing special is done for arguments beyond the first 6. If there are odd number of arguments to a function, additional space should be reserved on the stack to maintain 16-byte alignment. For example,
        argc == 0: no argument saving.
        argc == 3: save 3, but reserve space for 4 to maintain stack alignment.
        argc == 7: save 6.
The -save_args flag has no direct association with the optimization level. In other words, you can use various optimization level along with -save_args.

A new Dwarf attribute has been introduced to indicate if a function has been compiled with -save_args:
        DW_AT_SUN_amd64_parmdump        = 0x2224
The attribute has the value of 1 or 0. The attribute is only added when the value is 1. The attribute is attached to DW_TAG_subprogram tag.

You might wonder about the following:
  • How does the extra argument saving affect performance?

    With a 20-deep small function calls stack each with 6 arguments (to cause maximum argument saving), the impact of the extra saving is 18 nanoseconds around a 10% hit.
            #define FUNC(i, j) \
                    static int      \
                    func##i(int i1, int i2, int i3, int i4, int i5, int i6) \
                    {                                                       \
                            i3 = i1 + i2;                                   \
                            i4 = i2 + i3;                                   \
                            i5 = i3 + i4;                                   \
                            i6 = func##j(i1, i2, i3, i4, i5, i6);           \
                            return (i3 + i4 + i5 + i6);                     \
                    }
        
    This is on hot cache where the first store to the stack won't suffer a page fault. Since in reality functions actually do something more complicated, the actual hit should be much smaller. If it turns out the -save_args option does affect performance of your particular application, you can always turn it off in production code.

  • Why was it implemented as callee-saved instead of caller-saved?

    • Smaller code size when functions are called by many callers.
    • Avoids useless argument saving when calling assembly functions.
    • Can be enabled only on the module that's being debugged.


  • So what does the output look like?

    Ha, I thought you would never ask!
    
    stack pointer for thread fffffe8123debe80: fffffe80006296c0
    [ fffffe80006296c0 unix`_resume_from_idle+0xde() ]
      fffffe8000629700 unix`swtch+0x241()
      fffffe8000629730 genunix`cv_wait+0x83(ffffffff82a44ed8, ffffffff82a44ed0)
      fffffe80006297a0 ufs`ufs_check_lockfs+0x14c(ffffffff82a44e00, ffffffff82a44eb0, 80000030)
      fffffe8000629800 ufs`ufs_lockfs_begin+0x14e(ffffffff82a44e00, fffffe8000629840, 80000030)
      fffffe8000629920 ufs`ufs_readlink+0x7e(ffffffff90377300, fffffe8000629980, ffffffff832e9428)
      fffffe8000629950 genunix`fop_readlink+0x24(ffffffff90377300, fffffe8000629980, ffffffff832e9428)
      fffffe80006299d0 genunix`pn_getsymlink+0x66(ffffffff90377300, fffffe8000629b20, ffffffff832e9428)
      fffffe8000629bc0 genunix`lookuppnvp+0x3f5(fffffe8000629ca0, 0, 1, 0, fffffe8000629e10, ffffffff8c907b80)
      fffffe8000629c60 genunix`lookuppnat+0x13e(fffffe8000629ca0, 0, 1, 0, fffffe8000629e10, 0)
      fffffe8000629d40 genunix`lookupnameat+0x88(805bd38, 0, 1, 0, fffffe8000629e10 , 0)
      fffffe8000629dd0 genunix`cstatat_getvp+0x17d(ffd19553, 805bd38, 1, 1, fffffe8000629e10, fffffe8000629e18)
      fffffe8000629e60 genunix`cstatat32+0x68(ffd19553, 805bd38, 1, fcfdbef8, 0, 10
      fffffe8000629e80 genunix`stat32+0x33(805bd38, fcfdbef8)
      fffffe8000629eb0 genunix`xstat32+0x26(2, 805bd38, fcfdbef8)
      fffffe8000629f00 unix`sys_syscall32+0x1ff()
    
        


( Dec 09 2008, 11:33:44 AM PST / May 13 2005, 10:31:05 AM PDT ) Permalink
Trackback: http://blogs.sun.com/sherrym/entry/obtaining_function_arguments_on_amd64

20050506 Friday May 06, 2005

 Welcome

I currently work in Solaris Kernel Development at Sun Microsystems. My projects over the last 1 1/2 years include:
  • Solaris port to AMD64 platforms, for which we won the 2005 Chairman's Award.
  • Improved write performance by 80-120% on AMD64 as measured by libMicro.
  • Got -save_args option implemented by Sun Studio compilers for AMD64 so that function arguments passed via register are available to the debugger (more on this later).
  • Improved debugability on AMD64 in general.
Prior to this new adventure in x86 land, I spent 6 1/2 years working in Sun's Enterprise Server Group, mostly on the SunFire 4800-6800 product line (Code name Serengeti). I designed and implemented
  • POST (Power On Self Test)
  • Parts of the System Controller software (test sequencer, domain console and domain communication channel)
  • The Solaris driver for communicating with the system controller
  • The Solaris drivers for DR (Dynamic Reconfiguration).
Prior to Sun I worked at Intel, and still have fond memories of the Pentium II launch party held at OMSI.

In addition to my day job, I also play mommy for two wonderful young children. When at times I exclaimed, "I found the bug!", my son would respond with the same enthusiasm, "Did you kill it?".


( Feb 20 2007, 02:44:06 PM PST / May 06 2005, 10:14:16 AM PDT ) Permalink Comments [22]
Trackback: http://blogs.sun.com/sherrym/entry/welcome


May 2005 »
SunMonTueWedThuFriSat
1
2
3
4
5
7
8
9
10
11
12
14
15
16
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today


XML



blogs.sun.com
Weblog
Sherry Q. Moore's Weblog
About
Login






Today's Page Hits: 39