Alfred Huang's Weblog

Disclaimer: Whatever I suggested in my blog is what I would do, it does not necessarily mean the only way to do it.


« Kpic under Small... | Main | The humble Frame... »
Tuesday Feb 21, 2006

A look into the AMD64 Aggregate Argument Passing

Recently there has been questions on argument passing for AMD64. As part of the calling convention, argument passing and returned value are described in detail in the AMD64 ABI. The portion on passing scalar arguments is clear and straightforward, but the description for passing aggregates is pretty algorithmic and rather obscure. Maybe I can help by explaining it with examples.

Generally speaking, all objects that can be accomodated in registers will be passed in registers until the designated registers run out and memory stack is then used. Regardless of the actual object size, all arguments are passed in a multiple of 8 bytes.

To start the topic on passing aggregates, let me reiterate argument passing of the most common scalar types, namely integers and floating point types. The first six integer types (class INTEGER) are passed in %rdi, %rsi, %rcx, %rdx, %r8, %r9, then on memory stack. Likewise, the first 8 float and double types (class SSE) are passed in %xmm0 to %xmm7, then on memory stack.

Aggregate arguments larger than 16 bytes (2 EightBytes) are always passed on stack, it is the aggregate argument smaller than or equal to 16 bytes that is the most interesting. First of all, you have to figure out the fields of the aggregate belonging to the 1st and 2nd EightBytes. This can be achieved with the knowledge of the possible padding used in between the fields of a struct.

Example 1:

     struct S { short i;
                float f1;
                short j;
                float f2;
              } s;

     Since the alignment of f1 and f2 is 4, there is a padding of 2 bytes
     between i and f1, and between j and f2. So we have

      ---------
      |   i   |  2 bytes  ---
      ---------              |
      |  pad  |  2 bytes     |---  1st EightByte 
      ---------              |
      |   f1  |  4 bytes  ---
      ---------
      |   j   |  2 bytes  ---
      ---------              |
      |  pad  |  2 bytes     |---  2nd EightByte 
      ---------              |
      |   f2  |  4 bytes  ---
      ---------

     Now the rule in the AMD64 ABI calls to consider 2 adjacent fields 
     in an EightByte recursively in a merge step. I will not repeat the 
     rules here, but one of the rule is if one class is INTEGER, the
     result class is INTEGER.  In this case, since i is of class INTEGER
     and f1 is of class SSE, the result is INTEGER.  Hence the  1st
     EightByte has class INTEGER and the 2nd EightByte also has class
     INTEGER.  If object 's' is passed as the first argument, it will 
     then be passed in %rdi and %rsi, in which i and f1 are contained in
     %rdi, while j and f2 are contained in %rsi.

     Surprise?

Example 2:

     struct S { float f[4] } s;

     Since 's' is exactly 128 bits in size which exactly fits an xmm 
     register, would 's' be passed in a single xmm register?  The answer 
     is no, we have in this case:

        ---------
        |  f1   |   4 bytes  ---
        ---------               |---  1st EightByte
        |  f2   |   4 bytes  ---
        ---------
        |  f3   |   4 bytes  ---
        ---------               |---  2nd EightByte
        |  f4   |   4 bytes  ---
        ---------

     Since f1 and f2 are both of class SSE, the result class for the 1st
     EightByte is SSE, likewise for the 2nd EightByte.  Hence if object 
     's' is passed as the first argument, it will then be passed in %xmm0
     and %xmm1, where %xmm0 contains the value of f1 and f2, while %xmm1 
     contains the value of f3 and f4.

Example 3:

     Should the entire aggregate reside in one single class of register
     when being passed? Again the answer is no. Consider the following 
     case:

     struct S { int i;
                float f1;
                float f2;
                float f3;
              } s;

        ---------
        |  i    |   4 bytes  ---
        ---------               |---  1st EightByte
        |  f1   |   4 bytes  ---
        ---------
        |  f2   |   4 bytes  ---
        ---------               |---  2nd EightByte
        |  f3   |   4 bytes  ---
        ---------

    Note i is of class INTEGER and f1 is of class SSE, as one of the merge
    rules says if one class is INTEGER, the result is INTEGER, so the 1st
    EightByte is of class INTEGER, while the 2nd EigthByte is of class
    SSE.  Hence if object 's' is passed as the first argument, its first
    8 bytes containing i and f1  are passed in %rdi, whereas the remaining
    8 bytes containing f2 and f3 are passed in %xmm0.

Hope these little examples provide some insights into the interpretation of the aggregate argument passing rule in the AMD64 ABI.

Comments:

Really helpful. I'm writing a kernel core dump analysis slides these days.

Posted by Miles Xu on July 13, 2008 at 11:25 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed

Today's Page Hits: 53