Nikolay Igotti
Fast interpreter using gcc's computed goto
Writing interpreter could be fun, but many people end up doing them in assembly, for performance reasons. Gcc provides neat "computed goto" feature, very useful in writing interpreters. Consider following simple bytecode, containing 3 instructions:0x1 - takes following byte in insn stream and prints "first <byte>" 0x2 - prints "second" 0x3 - terminates programWe can write an interpreter for such a language in following manner:
#include <stdio.h>
void interp(char* start) {
void* table[4] = {NULL, &&inst1, &&inst2, &&inst3};
#define DISPATCH(n) start += n; goto *(table[*start]);
DISPATCH(0);
do {
inst1:
printf("first %d\n", *(start+1));
DISPATCH(2);
inst2:
printf("second\n");
DISPATCH(1);
inst3:
break;
} while (1);
}
int main() {
char test[] = {0x1, 0x5, 0x1, 0x2, 0x2, 0x3};
interp(test);
return;
}
For x86 gcc -O4 -fomit-frame-pointer generates following machine code
80483c6: 0f be 03 movsbl (%ebx),%eax 80483c9: 8b 04 84 mov (%esp,%eax,4),%eax 80483cc: ff e0 jmp *%eax 80483ce: 89 f6 mov %esi,%esi 80483d0: 83 ec 08 sub $0x8,%esp 80483d3: 0f be 43 01 movsbl 0x1(%ebx),%eax 80483d7: 50 push %eax 80483d8: 68 0c 85 04 08 push $0x804850c 80483dd: e8 0a ff ff ff call 80482ecwhich is pretty effective.80483e2: 83 c3 02 add $0x2,%ebx 80483e5: 0f be 03 movsbl (%ebx),%eax 80483e8: 8b 44 84 10 mov 0x10(%esp,%eax,4),%eax 80483ec: 83 c4 10 add $0x10,%esp 80483ef: ff e0 jmp *%eax 80483f1: 8d 76 00 lea 0x0(%esi),%esi 80483f4: 83 ec 0c sub $0xc,%esp 80483f7: 68 16 85 04 08 push $0x8048516 80483fc: e8 cb fe ff ff call 80482cc 8048401: 43 inc %ebx 8048402: 0f be 03 movsbl (%ebx),%eax 8048405: 8b 44 84 10 mov 0x10(%esp,%eax,4),%eax 8048409: 83 c4 10 add $0x10,%esp 804840c: ff e0 jmp *%eax 804840e: 89 f6 mov %esi,%esi 8048410: 83 c4 10 add $0x10,%esp 8048413: 5b pop %ebx 8048414: 5e pop %esi 8048415: 5f pop %edi 8048416: c3 ret 8048417: 90 nop
Posted at 03:33PM Jun 07, 2007 by nike in Sun | Comments[12]
Thursday Jun 07, 2007
Posted by Ashish Shukla on June 07, 2007 at 05:52 PM MSD #
Posted by Andrew on June 07, 2007 at 05:58 PM MSD #
Posted by nike on June 07, 2007 at 06:02 PM MSD #
Posted by nike on June 07, 2007 at 06:02 PM MSD #
Posted by Ashish Shukla on June 07, 2007 at 06:18 PM MSD #
Posted by fatcatair on June 07, 2007 at 10:02 PM MSD #
Posted by DP on June 08, 2007 at 08:31 PM MSD #
Posted by Nico on June 09, 2007 at 04:11 AM MSD #
Posted by Nico on June 09, 2007 at 04:11 AM MSD #
Posted by nike on June 09, 2007 at 09:58 AM MSD #
[...] movl -1176(%ebp), %edx movl .L333(,%edx,4), %eax jmp *%eax .section .rodata .align 4 .align 4 .L333: .long .L41 .long .L78 [ ... ]Or am I missing something here?Posted by Philip Kendall on July 30, 2007 at 05:26 PM MSD #
I have the same problem as Philip. With G++ the compiler seems to go though incredible contortions to preserve a single indirect jump. Even going so far as to combine jumps from separate jump tables - with a series of direct jumps. This seems utterly bewildering behaviour as it specially breaks the performance gain having a jmp *%eax for each interpreter leg.
The performance gain comes bout because the branch prediction logic has many separate jmp instructions - not one. So there is a far better chance of a good predict. Funnelling the execution pretty much guarantees a branch mispredict, and the dire penalty that comes with it.
Posted by Francis Vaughan on September 07, 2007 at 03:45 PM MSD #