GetJava Download Button XML Feed
All | About | Flying | General | Java | Solaris 10
20060623 Friday June 23, 2006

Calling Conventions

Well I actually got a comment saying that they were interested in the arcane reason -Xbatch works slightly differently in mustang. This is kind of long so it will take multiple entries before it is complete.

So the reason is tied to the calling conventions we use when a Java method calls another method. Now it is not too surprising if you think about it that the interpreter uses a stack based calling convention. Since the operations in the Java Virtual Machine are defined in terms of stack operations it makes sense. One of the other key points is how "locals" are created and used. The JVM spec. says that the N parameters we pass to called method (the callee) become the first N locals of the callee (Locals[0] .. Locals[N-1]). Now a method can and will usually have more locals than it has parameters. So for efficiency in accessing locals the interpreter will want to have the locals contiguous in memory. Imagine how slow the interpreter would be if every local access looked like:

   if (local_num > num_of_parms)
       ... locals[local_num - num_of_parms] ...
   else
       ... parms[local_num] ...

It would be pathetic. So clearly we want this in contiguous space. So this implies two things: interpreter wants to see parameters in memory and all locals must be contiguous. The first condition has an impact on the compilers.

Now it is clear that the interprete want parameters passed in memory. It is also reasonably clear that the compiler wants to pass parameters in registers if at all possible. Compilers try to avoid memory accesses because memory is just slow. So how does a compiled Java method call an interpreted Java method or vice versa? Well your first thought might be that the compiler knows it is calling an interpreted method so it should just do the parameters where the interpreter expected. Wrong!

This is a dynamic environment. Even if at the point the compiler was creating the caller's code it knew the callee was interpreted by the time the call executed we may now have compiled code for that callee. We sure don't want to always have that path run interpreted so we'd have to recompile the caller to  now use the compiled calling convention. Oh wait it's a dynamic environment, by the time we execute the call the system has decided for whatever reason that the callee's compiled code is invalid and now we must interpreter (at least until a recompile is complete). Now what do we do? Clearly we're getting nowhere with this approach.

Here's where adapters come in. So we produce small pieces of code that convert from compiled convention to interpreted ( C2I adapter ) or the reverse ( I2C adapter ). One thing to realize is we don't really need a separate piece of code for each one of these. We need to a unique ones for each signature we see. So two methods with the identical signature can share the same adapter. As it turns out prior to Mustang the server compiler would actually produce litle code blobs this way and share them. The client compiler would actually embed the I2C adapter code in the code for the method. So for the rest of this entry I'm going to be talking about how things looked when using -server.

So prior to Mustang when we would make one of these kind of call transitions the adapter code would actually leave behind a frame. If this is a new concept to you then you might want to stop reading now and wait for another episode of airplane building but I'll try and make it clear enough for the intrepid.

So every time a method is called the runtime environment gets modified so that we can allocate stack space for local variables, save registers the caller expects us not to destroy (like stack pointer the caller was using) and what program counter we need to return to for execution in the caller. This space is the "activation frame" or frame for those of us that don't type well. Now for many different reasons (garbage collection being a very frequent one) the virtual machine needs to be able to walk the stack and identify all the frames. So a piece of the calling convention is the notion of how frames are layed out so that if I have the stack pointer ( SP ) and the program counter ( PC ) of the youngest frame (the method currently executing) I should be able to find the SP and PC for every older frame ( [ SP(1), PC(1) ], [SP(2), PC(2) ], ... [ SP(n), PC(n)] ). Where for each older frame n grows by one.

So a question to ask here is this, "When we call from a compiled method to an interpreted method and clearly execute an adapter does the adapter leave a frame behind?" Well the answer is that prior to Mustang adapters would leave behind a frame. So here's a picture of what the stack would look like prior to Mustang.

So here we see that the youngest frame is an interpreted frame and that we have left behind the C2I adapter frame in between the compiled frame and the interpreted frame.

So what is bad about this and why would
we bother changing this. (Believe me before I was done with this change I asked myself this question a lot).

Well the biggest thing is that the stack walker has to be able to identify this frame and process it. 

 


From the point of view of the the frame handling and stack walking code in the vm they are just a nuisance. Code has to be aware of them and that costs us time in stack walking. For most of the system they really provide no benefit.

So lets look at a different picture. If you remember the reason I started this entry was to answer the question of why -Xbatch works differently. So imagine that a thread executing in a compiled Java method goes to call a method and there is no compiled code for that method. So we'll execute a C2I adapter and then go to enter the interpreter. No imagine that just as we reach the interpreter that we install compiled code for the callee. So we want to call the compiled code but now have the parameters in interpreter format on the stack. So what to do. Well we call an I2C adapter of course.

So here's what the stack looks like after we finally make it to the method we wanted to call. Pretty ugly. Now we have a C2I frame and an I2C frame that don't really have a big benefit and while this is rare the system (frame code, stack walker) have to deal with it. Things are really bad when we put deoptimization in the picture but that's a topic for much later.

Now those of you really paying attention might ask, "So what happens if just as you go to start executing the callee's compiled code the system decides that code is invalid and wants us to interpret?".
 
Now that's a truly evil question. Well we could just call the C2I adapter and go interpreted after all. But wait, what if just as we get to the interpreter there is newly recompiled code available? We're not getting anywhere (fast) and worse we could do this forever or at least until we run out of stack space while creating useless C2I/I2C transition pairs.

Well fortunately that isn't what we do (did). In this rare case we can unwind the I2C frame and then since there is an C2I frame ready for the interpreter we can proceed (modulo some register juggling). That way we can lose the race forever and accomplish no useful work but at least we won't blow out the stack. Worst case we'll see a single useless C2I/I2C pair.

Well that sets the stage for the changes in Mustang and why -Xbatch works differently but that's an entry for another day...





Jun 23 2006, 03:49:15 PM EDT Permalink

Comments:

Post a Comment:

Comments are closed for this entry.