On the web
Nick Stephen's blog
Archives
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
Click me to subscribe
Search

Links
 

Today's Page Hits: 61

« JMX: Building a JMX... | Main | DTrace - (ab)using... »
20061122 Wednesday November 22, 2006

Some Assembly Required - down memory lane

I'm writing this blog article about how a simple idea in some code I wrote in 1995 seems to have propagated itself, and shows the power of open-source... it's not a revolutionary idea, it's no doubt been invented by others too, but one can still trace my implementation and I found that fun :-)

At the time, I was one of the two developers who started the mklinux project at the Open Software Foundation. We were porting linux to the Mach microkernel, and porting the whole lot to the PowerPC processor, and in particular, to Apple's PowerMac hardware.

The problem I faced was one of bootstrapping the development. There wasn't a suitable development environment on the PowerMac, so we were forced to use cross-compilation from an x86 box.

So far so good, gcc is great at cross-compilation, so building the bits for the PowerMac wasn't a big deal. But the boundary between C and assembler was a problem.

The problem statement

Down in the guts of the kernel, there is a little bit of assembler that's required to interact with the hardware in specific ways. Things like interrupt handlers and Virtual memory managers require small amounts of assembly code to operate correctly. And this assembly code needs to read and write data that is also manipulated via data structures in C.

Here's where there's a catch - it's necessary to be able to correctly access the different elements inside a C data structure from assembler code, which requires understanding the lay-out of the C data structure in memory.

Typically this is done via a header file that defines a bunch of constants which provide the offsets for all elements of C structures that need to be accessed from assembler, so that the assembly file can source the header file and just refer to the constant.

But how do you know how the compiler is going to lay down the structures in memory? Wouldn't it be a good idea to write some C code that you could compile, execute and which would generate the C header file for use by the assembly, with all the correct offsets guaranteed? Absolutely! And this is what the OSF MK microkernel and things like FreeBSD already had in their build logic.

The difficulty I had was that this was being executed on my host machine, not on my target machine (on which I could not execute code yet) - my host was an x86 box which has a different endianness to my powerpc target. The generated offsets were all completely bogus!

An intermediate fix

Initially, my fix included manually editing the header file to reflect the offsets inside the data structures as I understood them from reading the PowerPC ELF ABI specifications. But this was prone to errors, hard to debug, and it wasn't easy to add new constant definitions! But while cross-compiling, I couldn't see another way to get those offsets for powerpc when running on x86.

A eureka moment

(No, I wasn't in my bath when I saw how to fix this, and I didn't run down the street naked - but I was in my bed and fast asleep when suddenly I woke up with the solution in mind).

The answer was staring in my face all along - I already had the gcc cross-compiler, which generated powerpc code for my target machine, code that I couldn't execute on my host to create the header file... but I could also persuade the cross-compiler to output the information in a format that WAS usable directly on the host machine to create the header file in question.

The secret wasn't to try and execute the code to generate the header file, but to force the compiler to generate the code directly, on the host machine.

But it's a cross-compiler! It's not meant to do that! However, it can be done by clever use of the intermediate assembly file for the powerpc target... the C compilation phase can be used to generate assembly information containing all of the structure offset needed information. In particular, by adding fake 'asm' statements in some C code, and by compiling just to the assembly file, I was able to create the header file almost as-is by a simple string extraction from the intermediate assembly file.

Here's an example little C program that can be compiled to assembly and from which a few constants can be extracted providing offset information:
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE)0)->MEMBER)

#define DECLARE(SYM,VAL) \
	__asm("#DEFINITION##define\t" SYM "\t%0" : : "n" ((u_int)(VAL)))

#include "pcb.h"

int main(int argc, char *argv[]) {

	DECLARE("PCB_KSP",	offsetof(struct pcb *, ksp));
	DECLARE("PCB_SR0",	offsetof(struct pcb *, sr0));

	DECLARE("PCB_SIZE",	sizeof(struct pcb));
}
 
If you put code like this in your program, don't expect it to compile fully! But you can compile it to assembly, and in the assembly you'll find lines containing the string '#DEFINITION', and if you extract these lines, you will produce the body of the header file that we had so much problem in generating. On a Unix box this can be a simple sed or awk expression.

Spreading of the meme

Well, this idea was coded up and worked fine in the OSF MK port to powerpc, and was used in MkLinux.

Soon after it was available in mklinux, it got used to solve the same problem for the native port of linux to powerpc... great!

Now I can see that this technique has spread across the linux kernel to lots of architectures. Even better!

From there it's gone into other projects around linux, such as a couple of virtualization projects.

On a parallel branch, the OSF MK codebase also got used in the GNU Hurd project for powerpc.

Similarly, the OSF MK kernel used in MkLinux was used as the initial base of Apple's Darwin project, used as the heart of MacOS X, and you can see the same idea present in their kernel, for the powerpc and now also for i386.

All the above can be fairly easily shown to originate from the original code I'd written for mach, which was completely free of any licensing terms for reuse (free as in BSD), so that shows a tiny example of the power of open-source to share ideas and reuse them.

I can't wait for our current project here to go open-source :-)

( Nov 22 2006, 11:07:29 AM CET ) Permalink

Comments:

Post a Comment:
Comments are closed for this entry.