GetJava Download Button XML Feed
All | About | Flying | General | Java | Solaris 10
20060622 Thursday June 22, 2006

More tiered compilation

So I got this message embedded in the bug(rfe) for tiered compilation where a developer wanted to give us some suggestions about how the jvm should work. I've included it here so I can respond and also to clear up a possible misconception that might be present.

> I am concerned about start-up time before HotSpot gathers enough information to determine that a method
> will be (ie is already) hot, especially if that routine is going to be called 10,001 times, ie just
> once after it happens to have been compiled!  For start-up, HotSpot is often *just-too-late*
> compilation, especially for heavy work done in <clinit> and <init> methods once only.  I really
> do have routines that are run ~18,000 times at start-up and never again as it happens!

> May I suggest that you add to the tiered compilation the possibility to save in the current
> working dir the full signatures of methods hot enough for C1 or C2 optimisation on the previous
> run(s).  On subsequent runs those methods are compiled *before* their first call with cheap (C1)
> optmisation so that (1) if needed at startup they will run better than interpretted and (2) if
>  not actually needed for this run then not too much effort has been expended.  This is also
> safe as it does not (for example) store native code that could be tampered with or inappropriate
> for the next JVM to run the system.  It should also catch things like <clinit> code that might
> normally never be compiled.

So a repository of information collected from one run and used on another run is on the list for things we want to do in Dolphin. It's actually on the runtime groups list but we will certainly take advantage of it. There has also been talk of using annotations to give the jit a hint. This actually isn't too popular since it is too much like "register" declarations in C. It's only a hint and is too often abused.

Now the other thing in this message I wanted to clear up is the idea about how we transition from interpreting a method to a compiled version of a method. There is an implicit message here that if the compiler threshold for invocation is set at 10000 then when the method is called for the 10001 time it will run the compiled version. This is most likely not true and is much less true in mustang.

So when we decide to compile a method the thread that hit the counter overflow cause a compile event to get queued up. The compiler runs as a separate thread(s) (-server normally has two threads, -client can only use a single thread). Until the compilation is complete and the compiled code is installed every method invocation that occurs after the compile queue is updated will still run interpreted. So depending on whether the compile is a server or a client compile you may execute a lot or a few more invocations in the interpreter.

Now there is a switch -Xbatch that sort of used to, almost give you want was described above. This switch says that the thread that initiates the compile request will block and wait for the compile to finish. Now even with this switch set if another thread went to execute the method we're compiling then it would run it as interpreted even though the requesting thread would be blocked awaiting the compilation to complete. This is the "almost" side of the statement.

Now the reason I said "used to" is because the behavior changed in mustang. Prior to mustang the waiting thread would execute the compiled code as son as the compilation was complete. In mustang that is not the case. When the compilation is complete the waiting thread is released but it resumes executing in the interpreter. The -Xbatch switch only ever meant that the thread would block waitng for the compile there was never a promise of block and then execute the compiled code. The reason for this behavior change is actually somewhat related to work I did for tiered compilation. I won't go into it here as it is somewhat arcane but if your interested leave a comment and I'll do an entry that will probably tell you more than you care to know.

So the -Xbatch switch is not a particularly useful switch even in the pre-mustang days. It is somewhat useful for jvm delveopers since it tends to make a run more predictable and reproducible. One of the great debugging adventures of the jvm is the fact that things are not as repeatable as you might like.

Now this entry brings up one other topic that could be misunderstood. So hotspot does something we call On Stack Replacement (OSR). This can cause a thread that is executing in an interpreted method to execute in compiled code. Now you might be mistakenly led to believe that when we compile a method and install the compiled code that we OSR all the threads that are currently executing the method. This does NOT happen.

When we do an OSR it requires a very special compile to take place. So we decide to do an OSR when we observe a thread execute the back branch of a loop a specified number of times. So when this happens we queue a compile that requests a specialized compile of the method. The compile will treat then head of the loop body as the entry point of the method. In a sense the state of the method at that point becomes the arguments to the method at least as far as the compiler is concerned. So we are likely to not even generate the code that leads from the normal method entry to the loop so clearly this OSR compile is not useful for the general method call. Similarly the normal compile is not useful for OSR case (we can't predict the entry points before hand.) Even if we could predict them you wouldn't want to because of the possible impact on optimization.
Jun 22 2006, 12:33:41 PM EDT Permalink