Friday June 23, 2006
Calling Conventions
Well I actually got a comment saying that they were interested in the arcane reason -Xbatch works slightly differently in mustang. This is kind of long so it will take multiple entries before it is complete.
So the reason is tied to the calling conventions we use when a Java method
calls another method. Now it is not too surprising if you think about
it that the interpreter uses a stack based calling convention. Since
the operations in the Java Virtual Machine are defined in terms of
stack operations it makes sense. One of the other key points is how
"locals" are created and used. The JVM spec. says that the N parameters
we pass to called method (the callee) become the first N locals of the
callee (Locals[0] .. Locals[N-1]). Now a method can and will usually
have more locals than it has parameters. So for efficiency in accessing
locals the interpreter will want to have the locals contiguous in
memory. Imagine how slow the interpreter would be if every local access
looked like:
if (local_num > num_of_parms)
... locals[local_num - num_of_parms] ...
else
... parms[local_num] ...
It would be pathetic. So clearly we want this in contiguous space. So
this implies two things: interpreter wants to see parameters in memory
and all locals must be contiguous. The first condition has an impact on
the compilers.
Now it is clear that the interprete want parameters passed in memory.
It is also reasonably clear that the compiler wants to pass parameters
in registers if at all possible. Compilers try to avoid memory accesses
because memory is just slow. So how does a compiled Java method call an
interpreted Java method or vice versa? Well your first thought might be
that the compiler knows it is calling an interpreted method so it
should just do the parameters where the interpreter expected. Wrong!
This is a dynamic environment. Even if at the point the compiler was
creating the caller's code it knew the callee was interpreted by the
time the call executed we may now have compiled code for that callee.
We sure don't want to always have that path run interpreted so we'd
have to recompile the caller to now use the compiled calling
convention. Oh wait it's a dynamic environment, by the time we execute
the call the system has decided for whatever reason that the callee's
compiled code is invalid and now we must interpreter (at least until a
recompile is complete). Now what do we do? Clearly we're getting
nowhere with this approach.
Here's where adapters come in. So we produce small pieces of code that convert from compiled convention to interpreted ( C2I adapter ) or the reverse ( I2C adapter ).
One thing to realize is we don't really need a separate piece of code
for each one of these. We need to a unique ones for each signature we
see. So two methods with the identical signature can share the same
adapter. As it turns out prior to Mustang the server compiler would
actually produce litle code blobs this way and share them. The client
compiler would actually embed the I2C adapter code in the code for the
method. So for the rest of this entry I'm going to be talking about how
things looked when using -server.
So prior to Mustang when we would make one of these kind of call
transitions the adapter code would actually leave behind a frame. If
this is a new concept to you then you might want to stop reading now and wait for another episode of airplane building
but I'll try and make it clear enough for the intrepid.
So every time a method is called the runtime environment gets modified
so that we can allocate stack space for local variables, save registers
the caller expects us not to destroy (like stack pointer the caller was
using) and what program counter we need to return to for execution in
the caller. This space is the "activation frame" or frame for those of us that don't type well.
Now for many different reasons (garbage collection being a very frequent one) the
virtual machine needs to be able to walk the stack and identify all the
frames. So a piece of the calling convention is the notion of how
frames are layed out so that if I have the stack pointer ( SP ) and the program counter ( PC ) of the youngest frame (the method currently executing) I should be able to find the SP and PC for every older frame ( [ SP(1), PC(1) ], [SP(2), PC(2) ], ... [ SP(n), PC(n)] ). Where for each older frame n grows by one.
So a question to ask here is this, "When we call from a compiled method
to an interpreted method and clearly execute an adapter does the
adapter leave a frame behind?" Well the answer is that prior to Mustang
adapters would leave behind a frame. So here's a picture of what the
stack would look like prior to Mustang.
| So here we see that the youngest frame is an interpreted frame and
that we have left behind the C2I adapter frame in between the compiled
frame and the interpreted frame. So what is bad about this and why would we bother changing this. (Believe me before I was done with this change I asked myself this question a lot). Well the biggest thing is that the stack walker has to be able to identify this frame and process it. |
![]() |
From the point of view of the the frame handling and stack walking code in the vm they are just a nuisance. Code has to be aware of them and that costs us time in stack walking. For most of the system they really provide no benefit.
So lets look at a different picture. If you remember the reason I started this entry was to answer the question of why -Xbatch works differently. So imagine that a thread executing in a compiled Java method goes to call a method and there is no compiled code for that method. So we'll execute a C2I adapter and then go to enter the interpreter. No imagine that just as we reach the interpreter that we install compiled code for the callee. So we want to call the compiled code but now have the parameters in interpreter format on the stack. So what to do. Well we call an I2C adapter of course.
| So here's what the stack looks like after we finally make it to the
method we wanted to call. Pretty ugly. Now we have a C2I frame and an
I2C frame that don't really have a big benefit and while this is rare
the system (frame code, stack walker) have to deal with it. Things are
really bad when we put deoptimization in the picture but that's a topic
for much later. Now those of you really paying attention might ask, "So what happens if just as you go to start executing the callee's compiled code the system decides that code is invalid and wants us to interpret?". |
![]() |
Well fortunately that isn't what we do (did). In this rare case we can unwind the I2C frame and then since there is an C2I frame ready for the interpreter we can proceed (modulo some register juggling). That way we can lose the race forever and accomplish no useful work but at least we won't blow out the stack. Worst case we'll see a single useless C2I/I2C pair.
Well that sets the stage for the changes in Mustang and why -Xbatch works differently but that's an entry for another day...
Jun 23 2006, 03:49:15 PM EDT Permalink
Comments are closed for this entry.

