
Friday June 01, 2007
GCC-style asm inlining support in Sun Studio 12 compilers
Sun Studio 12 Asm Statements Introduction In order to support developers used to Gcc's Inline Assembly Feature, Sun Studio 12 has implemented a compatible interface to allow the C and C++ programmer to insert assembly instructions into the code stream generated by the compiler. There are several advantages to this feature above and beyond those of the Inline Assembly feature supported by prior Sun Studio releases. These include allowing the routine containing the inline assembly to be optimized, compatibility with Gcc, more flexibility in the compiler's ability to choose registers efficiently.
In this new scheme the inline assembly takes the form of an asm statement in the source language that has the following form:
asm("<inst> %0, %1\n" : <outputs> : <inputs> : <clobber list>);
Where <inst> is an assembly-language opcode, <outputs> is a comma-separated list of outputs; likewise with <inputs>. Each input or output consists of a constraint string and an expression from the source language enclosed in parentheses. These expressions provide the inputs to pass to the asm statement or the outputs to store the results of the asm statement to. The clobber list is a comma-separated list of strings that name machine registers (other than inputs or outputs) that one or more of the instructions in the asm statement are known to write to. A typical function in C containing an asm statement might look like this:
#include <stdio.h> void foo() { int result, source = 3;
asm("movl %1, %0\n" : "m" (result) : "r" (source)); printf("result = %d (expected 3)\n", result); }
The %0 and %1 in the above example are placeholders for "result" and "source", respectively. The compiler will evaluate "source" and load it into a free register denoted by %1. Then generate the movl instruction to move that register into the memory location corresponding to the variable "result" denoted by %0.
There is an alternative notation for placeholders that users may find more readable. Rather than using %0, %1, %2, etc. to denote positional arguments, the user may refer to arguments symbolically:
#include <stdio.h> void foo() { int result, source = 3;
asm("movl %[input], %[output]\n" : [output] "m" (result) : [input] "r" (source)); printf("result = %d (expected 3)\n", result); }
In the above example, input and output have no special meaning, they could be any names, but they must match a corresponding square-bracketed name in the input or output lists of the asm statement.
These are very simple examples. In actuality, an asm statement may have more than one instruction and the constraints can get quite complex. With flexibility of expression comes some degree of complexity which we will try to demystify in the sections that follow.
The Instruction String The instruction(s) to be executed are contained in one or more quoted strings which precede the first colon in an asm statement. The compiler does not parse the contents of these strings except to scan for placeholders that it needs to replace with the arguments of the asm statement. So, the compiler knows nothing of the semantics of the instructions in an asm statement other than what it is told via the constraints on the input and output arguments and the contents of the clobber list. Within the instruction strings, any percent sign that does not introduce a placeholder must be doubled. For example, in the following asm statement, the %eax register must be written as "%%eax" but in the clobber list, no percent sign is needed:
asm("movl %0, %%eax\n" : : "r" (foo) : "eax");
Inputs and Outputs For an asm statement to affect a program, it most often must be able to receive information from expressions in the source language and be able to assign to variables (or other lvalues) in the program. This is accomplished by passing outputs and inputs into the asm statement in a manner similar to the arguments to a function call.
Expressions
The source language expressions for inputs may be rvalues or lvalues. Outputs must be lvalues. Expressions may be of arbitrary complexity and are enclosed in parenthesis following the constraint string.
Unused inputs and outputs
If there is no use of an input or output in an asm statement's instruction string, then no loads from or stores to that variable are generated. This saves registers for those arguments that are used in the asm instruction string. There is one exception to this rule: If an input or output is constrained to a specific hardware register (as opposed to a register class), then is must be loaded or stored even if it is not referred to in the instruction string. This is because it value may be used implicitly by the instructions in the asm statement.
Constraints Register Constraints
Integer In the descriptions that follow, only one size of register is listed in the tables, but in most cases the size of the register actually chosen depends on the type of the source expression being loaded into or stored from it. See "Matching register types to input and output types" below for more details about how the compiler chooses the size of register to use.
Register classes
The following constraints specify a class of integer register that the compiler may choose from when it needs a register within an asm statement:
Constraint Register class
g or r
rax, rbx, rcx, rdx,
rbp, rsi, rdi, rsp, r8 - r15
R
eax, ebx, ecx, edx, ebp,
esi, edi, esp (legacy registers)
q
al, bl, cl, dl
Q
ah, bh, ch, dh
A
eax or edx (used for returning 64-bit values)
Specific registers
The following constraints may be used to lock a source variable or expression to a specific hardware register:
Constraint Register
64-bit 32-bit
a
rax
eax
b
rbx ebx
c
rcx ecx
d
rdx edx
di
rdi edi
si
rsi
rdi
Floating point XMM and MMX registers
The following constraints are used to specify that the source variable or expression should occupy an XMM or MMX register:
Constraint Register class
x
xmm0 - xmm15
y
mm0 - mm15
Note: Be sure to specifiy -xarch=sse2 when using
these constraints if compiling in 32-bit mode.
x87 Floating point stack The following constraints are used to refer to variables or expressions loaded on the x87 floating point stack:
Constraint Register
f
ST(0) - ST(7)
t
ST(0) (top of the FP stack)
u
ST(1) (register just below the
top of the FP stack)
Memory Constraints A memory constraint has the form "<m>" where <m> is one of the following letters:
Constraint Description
m
Memory operand
of any general addressing mode
o
Offsettable addressing mode
V
Non-offsettable addressing mode
<
Autodecrement
addressing mode
>
Autoincrement addressing mode These constraints instruct the compiler to generate a memory reference wherever this argument's placeholder occurs in the instruction string.
Immediate Constraints An immediate constraint has the form "<i>" where <i> is one of the following letters: Constraint Description
i
Any sized constant
e
Constant in range -2147483648 - 2147483647
n
A
constant less than a word wide
I
Constant
in range 0 - 31
J
Constant in range 0
- 63
K
0xff
L
0xffff
M
Constant in range 0
- 3
N
Constant in range 0 - 255
Z
Constant in range 0 - 0xffffffff
E
Floating
point operand (native const double)
F
Floating point operand (const double)
G
Standard 80387 floating point constant
s
Constant not know at compile time (symbolic)
These constraints instruct the compiler to generate an immediate operand wherever this argument's placeholder occurs in the instruction string.
Digit Constraints Digit constraints are of the form "<n>" where <n> is a number which corresponds to the position of an output. This constraint is only allowed on an input and the digit must refer to an output. The semantics are to bind the constrained input to use the same location to load its input to as the indicated output uses.
The example below illustrates the use of digit constraints.
asm ("addl %1,%0 \n\t"
:"=r"(foo)
:"r"(bar),"0"(foo) );
The simple example above essentially implements foo = foo + bar; The "0" in the input constraint indicates that variable foo needs to be loaded into the same register which will also contain the output result. It is also possible to specify a particular register as shown below:
asm ("addl %1,%0 \n\t"
:"=a"(foo)
:"b"(bar),"0"(foo) );
In this case, the compiler will generate code to load variable foo into register %eax (since that input is constrained to output 0 and output 0 is constrained to %eax by the "=a" constraint) and bar will be loaded into register %ebx and the result foo will be available in register %eax.
Here is another example of using digit constraints to shift a value by a given shift count:
int shift_count = 5; int shifted_value = 37;
asm ("sarl %1, %0\n\t" : "=r" (shifted_value) : "c" ((char) shift_count), "0" (shifted_value) );
In this example, the variable "shift_count" is loaded into the %cl register (note that the cast is required to convert the 32-bit integer "shift_count" to an 8-bit value as required by the sarl instruction. The variable "shifted_value" is loaded into a register chosen by the compiler with the proviso that the compiler will choose the same register to hold the result of the sarl instruction as requested by the "0" digit constraint.
Multiple Constraints More than one constraint letter may be used in a constraint string. When this occurs, the compiler looks at the input or output to determine which constraint is the best match for the given expression. If the constraint string contains an immediate constraint, and the input is a constant of the correct type, then the input will be treated as an immediate. Otherwise, if the constraint string contains a memory constraint and the input or output is an lvalue, then a memory reference will be generated. Failing this, if the constraint string contains a register constraint then the input will be loaded into or the output will be written to a register. The example below illustrates usage of multiple constraints:
asm ("mulq %3"
: "=a"(low),"=d"(high)
: "a"(word),"rm"(foo) );
The mulq instruction multiplies the contents of a 64-bit memory or register by the contents of %rax and the result is available in the %rdx, %rax register pair - the high 64-bits in %rdx and low 64 bits in %rax.
One of the operands of the multiply, the variable foo in the example above, can be available in either memory or in a register. The "rm" constraint used in the example allows the compiler to choose the most appropriate location.
The example above also shows an interesting instance of constraints usage. Although there is no explicit reference to %0 or %1 in the asm template, the mulq instruction implicitly returns the results in %rax and %rdx, therefore "=a" and "=d" must be indicated as output constraints. Similarly, the first input operand (word) is expected to be available in the %rax register.
Modifiers Certain modifier characters may be included in a constraint string to control how the compiler applies that constraint. They are:
Modifier Description
=
Operand is only written
+
Operand is read and written
&
Operand is clobbered early
%
This operand and the following
one are commutative
#
Ignore all characters up to the
next comma as constraints
*
Ignore the following character when choosing
register preferences
Note: If = or + are specified in
a constraint string, they must be the first
character in the string.
The following example shows a use of the "+" modifier:
asm ("sarl %1, %0\n\t" : "+r" (shifted_value) : "c" ((char) shift_count) );
The variable "shifted_value" in the example above is both an input and an output. The compiler would generate code to load "shifted_value" into a general purpose register and ensure that "shifted_value" is available as an output in that same register. The same effect can be achieved using digit constraints (see example above) as well. However, if there is no explicit reference to the input parameter in the asm template, it is more concise to use "+" modifier instead.
The compiler normally makes the assumption that all inputs to an asm statement are consumed before any outputs are written to in the instructions which constitute the asm's instruction string. If this is not the case for a particular instruction sequence, the user must inform the compiler which outputs are written early (i.e. before the last input is used). This rule allows the compiler to use registers efficiently by choosing the same register for an input and an output under normal conditions, but allows the user to override this behavior when it would be semantically incorrect to do so. The use of the early clobber ("&") modifier provides the means to communicate this information to the compiler. A register chosen for an operand marked as early clobber may not be used to hold any of the input operands. The following example illustrates the use of early clobber:
asm (
"
subq %2,%2 \n" ".align 16 \n"
"1:
movq (%4,%2,8),%0 \n"
"
adcq (%5,%2,8),%0 \n"
"
movq %0,(%3,%2,8) \n"
"
leaq 1(%2),%2 \n"
"
loop 1b \n"
"
sbbq %0,%0 \n" : "=&a"(ret),"+c"(n),"=&r"(i) : "r"(rp),"r"(ap),"r"(bp) : "cc" );
Matching register types to input and output types The register chosen by the compiler must match the type of the input or output in the source code. There are two ways to for the user to affect what type of register the compiler will choose for any given input or output. The first is to insert a size letter between the "%" and the digit in the placeholder in the instruction string such as: asm("movi %l1, %l0\n" : "r" (result) : "r" (source)); This will choose a 32-bit register for the each of the registers chosen to hold "result" and "source". The supported types are:
Type letter Register size
b
8-bits
h
16-bits
l
32-bits
q
64-bits
The second way to way to affect the type of the register chosen is by changing the type of the source expression passed to the asm statement. By default the type of register is chosen based on the type of the input or output expression. Casting this expression will also influence the size of register chosen to hold that expression in the code generated for the asm statement.
The Clobber List Some instructions implicitly modify a register or the user may insert a specific register name in the instruction string such as: asm("movl %0, %%eax\n" : : "r" (var) : "eax"); In such cases the modified register should be placed in the clobber list (the comma-separated list of strings following the
third colon) to inform the compiler that this register is written to by the asm statement. This allows the compiler to keep enough information about the liveness of registers around an asm statement to continue to do normal optimizations. Without this information, the compiler would have to forgo many optimizations in any routine that contained asm statements. Note that outputs need not be placed in the clobber list. The compiler knows that they are written to already.
The following example shows a use of clobber lists:
__asm__("movl
%0,%%ecx \n\t"
"movl
%1,%0
\n\t"
"movl
%%ecx,%1
\n\t"
:"=a"(bar),"=b"(foo) :"0"(bar),"1"(foo) :"ecx" );
The values of variable foo and variable bar are swapped in the example above, using %ecx as an intermediate place holder. Any value held in the register %ecx earlier will be lost after executing the asm template; therefore, "ecx" must be mentioned in the clobber list.
Current Limitations and Known Bugs No alternative constraints Gcc allows an operand's constraint string to have more than one series of constraint letters in a comma-separated list from which the best matching constraint is chosen based on the cost of loading that operand for each legal alternative constraint. Sun Studio 12 currently implements only the simpler multiple constraint syntax described above.
Assembler is not operand sensitive At present, the Sun Studio 12 assembler requires that the type of the opcode for any given instruction matches the types of its operands. Gcc's assembler, by contrast, can infer the suffix required for an opcode from the types of the operands of the instruction. This is a limitation when writing asm statements intended to work interchangably on 32-bit and 64-bit platforms. Most often such asm statements must be split into 32-bit and 64-bit versions surrounded by appropriate #ifdefs as in the following example:
void f () {};
int main () { void (*fptr)() = 0; #ifdef __amd64 asm ("movq %[f], %[fptr]" #else asm ("movl %[f], %[fptr]" #endif
: [fptr] "=m" (fptr) : [f] "r" (f)); if ( fptr != f ) return 1; return 0; }
As another example of operand sensitivity, the following program will fail to assemble because of type mismatches between the opcode and one of its operands:
int main() { int a, res; char b; /* The input argument "c" is of the wrong type. The movl instruction expects a 32-bit integers as its operands. */ asm("movl %1, %0\n\t" : "=r" (res): "c" (b));
/* The sete instruction requires an 8-bit result register, but res is a 32-bit integer. */ asm("sete %0\n\t" : "=r" (res));
/* Variable "a" is an int, but the shrl instruction requires an 8-bit shift count in register %cl. */ asm ("shrl %1, %0\n\t" : "+r" (res) : "c" (a)); } The user will see assembly errors such as the following: Assembler: "/tmp/srscott/yabeAAAJqaGsx", line 14 : Syntax error Near line: "movl %cl, %edx" "/tmp/srscott/yabeAAAJqaGsx", line 18 : Syntax error Near line: "sete %eax" "/tmp/srscott/yabeAAAJqaGsx", line 23 : Syntax error Near line: "shrl %ecx, %eax"
The following modifications will allow it to compile without errors:
int main() { int a, res; char b; /* Casted second argument to required type. */ asm("movl %1, %0\n\t" : "=r" (res): "c" ((int) b));
/* Use an 8-bit lvalue for the output argument. */ asm("sete %0\n\t" : "=r" (b));
/* Casted second argument to required type. */ asm ("shrl %1, %0\n\t" : "+r" (res) : "c" ((char) a)); }
Inefficiency of memory constraints
Memory constraints lead to an extra level of indirection which requires an extra register to hold the address. This will not impact correctness, but is less efficient than the user intended when the address is simple enough to fit one of the addressing modes supported for that instruction.
Immediate constraints do not work in C++
The following program will compile and execute correctly when compiled using the Sun Studio 12 C compiler, but C++ has a bug relating to the "i" constraint that prevents successful compilation:
int main() { int res=0, inp=3;
asm("\tmovl %1, %0\n": "=m" (res) : "i" (4)); if (inp == 3 && res == 4) return 0; return 1; }
This problem can be worked around by storing the immediate value in a variable and using that variable with a "r" constraint:
int main() {
int res=0, inp=3;
const int imm = 4;
asm("\tmovl %1, %0\n": "=m" (res) : "r" (imm));
if (inp == 3 && res == 4) return 0;
return 1; }
Support for x87 floating point constraints when optimizing
When optimizing, support for x87 floating point constraints is incomplete. We intend to solidify this area in a future patch to Sun Studio 12.
Conclusion This article has attempted to explain the syntax and semantics of Sun Studio 12's new Asm Statement and provide examples of how to work around know differences from the Gcc Asm Statement. This article reflects the current state of the Sun Studio 12 with respect to this feature as of the SS12 patch 1 release. Some of what is described here may not work with the Sun Studio 12 FCS release. We intend to improve our compatibility with Gcc in future patches of Sun Studio 12. As we do so, many of the limitations and known bugs described above will be removed. We hope that you have found this article useful. Any comments are welcomed.
Posted by x86be
( Jun 01 2007, 06:47:43 PM PDT )
Permalink
Trackback URL: http://blogs.sun.com/x86be/entry/gcc_style_asm_inlining_support
|
|
|
|
|
Posted by Roland Mainz on June 03, 2007 at 09:59 PM PDT #
Posted by Anoop Kumar on July 02, 2007 at 03:36 PM PDT #
Any idea why having a bsfq instruction in inline amd64 assembler under sun studio 12 would prevent iropt from inlining the whole function?
Posted by Andy on August 29, 2007 at 12:38 AM PDT #