Thursday June 22, 2006
More tiered compilation
So I got this message embedded in the bug(rfe) for tiered compilation
where a developer wanted to give us some suggestions about how the jvm
should work. I've included it here so I can respond and also to clear up a possible misconception that might be present.
> I am concerned about start-up time before HotSpot gathers enough information to determine that a method
> will be (ie is already) hot, especially if that routine is going to be called 10,001 times, ie just
> once after it happens to have been compiled! For start-up, HotSpot is often *just-too-late*
> compilation, especially for heavy work done in <clinit> and <init> methods once only. I really
> do have routines that are run ~18,000 times at start-up and never again as it happens!
> May I suggest that you add to the tiered compilation the possibility to save in the current
> working dir the full signatures of methods hot enough for C1 or C2 optimisation on the previous
> run(s). On subsequent runs those methods are compiled *before* their first call with cheap (C1)
> optmisation so that (1) if needed at startup they will run better than interpretted and (2) if
> not actually needed for this run then not too much effort has been expended. This is also
> safe as it does not (for example) store native code that could be tampered with or inappropriate
> for the next JVM to run the system. It should also catch things like <clinit> code that might
> normally never be compiled.
So a repository of information collected from one run and used on
another run is on the list for things we want to do in Dolphin. It's
actually on the runtime groups list but we will certainly take
advantage of it. There has also been talk of using annotations to give
the jit a hint. This actually isn't too popular since it is too much
like "register" declarations in C. It's only a hint and is too often
abused.
Now the other thing in this message I wanted to clear up is the idea
about how we transition from interpreting a method to a compiled
version of a method. There is an implicit message here that if the
compiler threshold for invocation is set at 10000 then when the method
is called for the 10001 time it will run the compiled version. This is
most likely not true and is much less true in mustang.
So when we decide to compile a method the thread that hit the counter
overflow cause a compile event to get queued up. The compiler runs as a
separate thread(s) (-server normally has two threads, -client
can only use a single thread). Until the compilation is complete and
the compiled code is installed every method invocation that occurs
after the compile queue is updated will still run interpreted. So
depending on whether the compile is a server or a client compile you
may execute a lot or a few more invocations in the interpreter.
Now there is a switch -Xbatch
that sort of used to, almost give you want was described above. This
switch says that the thread that initiates the compile request will block and
wait for the compile to finish. Now even with this switch set if
another thread went to execute the method we're compiling then it would
run it as interpreted even though the requesting thread would be
blocked awaiting the compilation to complete. This is the "almost" side
of the statement.
Now the reason I said "used to" is because the behavior changed in
mustang. Prior to mustang the waiting thread would execute the compiled
code as son as the compilation was complete. In mustang that is not the
case. When the compilation is complete the waiting thread is released
but it resumes executing in the interpreter. The -Xbatch
switch only ever meant that the thread would block waitng for the
compile there was never a promise of block and then execute the
compiled code. The reason for this behavior change is actually somewhat
related to work I did for tiered compilation. I won't go into it here
as it is somewhat arcane but if your interested leave a comment and
I'll do an entry that will probably tell you more than you care to know.
So the -Xbatch switch is not a
particularly useful switch even in the pre-mustang days. It is somewhat
useful for jvm delveopers since it tends to make a run more predictable
and reproducible. One of the great debugging adventures of the jvm is
the fact that things are not as repeatable as you might like.
Now this entry brings up one other topic that could be misunderstood. So hotspot does something we call On Stack Replacement (OSR).
This can cause a thread that is executing in an interpreted method to
execute in compiled code. Now you might be mistakenly led to believe
that when we compile a method and install the compiled code that we OSR all the threads that are currently executing the method. This does NOT happen.
When we do an OSR it requires a very special compile to take place. So we decide to do an OSR
when we observe a thread execute the back branch of a loop a specified
number of times. So when this happens we queue a compile that requests
a specialized compile of the method. The compile will treat then head
of the loop body as the entry point of the method. In a sense the state
of the method at that point becomes the arguments to the method at
least as far as the compiler is concerned. So we are likely to not even
generate the code that leads from the normal method entry to the loop
so clearly this OSR compile is not useful for the general method call. Similarly the normal compile is not useful for OSR
case (we can't predict the entry points before hand.) Even if we could
predict them you wouldn't want to because of the possible impact on
optimization.
Jun 22 2006, 12:33:41 PM EDT
Permalink