GetJava Download Button XML Feed
All | About | Flying | General | Java | Solaris 10
20051109 Wednesday November 09, 2005

Safepoints

So I was corresponding with someone that was reporting a problem using the JVM on the forums and it dawned on me that a portion of what I wrote to him would make a good blog entry. Well you'll have to decide on how good it turns out.

In the vm we will at various times have to bring the threads to a stopping point where all the threads are in a state where it is safe to walk their execution stacks and do things like garbage collection. We need to do this for other reasons too but that reason is the common. We call this situation a safepoint.

For simplicity purposes think of a thread that is executing Java code as being in one of three states: in_Java, in_VM and in_native. The simplest of these as far as what the vm has to do is the in_native state. Basically when a thread is in that state we just leave it alone. The thread can continue to execute. Its stack is consistent and walkable. We have things arranged so that if the thread wants to transition to a new state (in_Java, in_VM) that we cause it to block. So those kind of threads are simple. A similar vein threads in_VM (think some runtime service like say a slow path allocation) are a blocked either when they attempt to acquire a lock in the vm that enforces a safepoint or when the thread attempts to return from the vm. So a thread in this state is assured of blocking on its own in a very short period of time.

The other state and the more problematic one is in_Java. Now a thread that is in this state can either be executing in the interpreter or in compiled code. It would be in compiled code if the method was deemed hot enough that we compiled it. Well in the case of threads executing in the interpreter we simply switch the bytecode dispatchtable so that on the next bytecode (or so) the thread will automatically block itself.

So the interesting case is the situation with compiled code. Interesting in the painful sense. Now if we did absolutely nothing we can expect that almost any real application will either return from compiled code to interpreted code and then block, or  it will need some vm service and call from compiled code into the vm and once again it would block. So the case we have to worry about is the situation where we stay in compiled code forever (ok a long time) and never leave compiled code.

The way we handle this situation has changed over the years. Prior to Java 5.0  (1.5) we used a non-polling technique. In these earlier vms we would notice that a thread was in this situation and suspend it. We would then copy the code it was executing in to a temporary buffer and patch all the calls out to another Java method or any place the code might return from the method. (We didn't have to patch calls to the runtime since they would block on their own). We would then reposition the thread's pc into this temporary buffer and let it go. In short order it would hit one of these patches and the patch would cause it to block. Sounds painful and it was to some degree. The advantage was that code executing in compiled code would not have to poll looking to see if we wanted it stop. We stopped doing it this way in 5.0. The reason might surprise you. We found that doing the thread suspension was always problematic. The thread libraries on virtually every OS always seemed to have some obscure bug and this would cause some strange vm failure. Every release would have some new bandaid in the vm to cover the next bad thread library behavior we found. We gave up in 5.0.

In 5.0 we decided to convert to polling. Now the original fear was that polling would have a bad performance impact. Now it certainly could have a bad impact if you weren't too smart about how and where you did the polling. So the important places to poll are in loops without calls, loops that can't be determined to be finite, and also at the return from a method.

Well it is obviously trivial to see if a loop has calls so neither compiler (client or server) has problem with that. Determining if a loop might execute for too long is another matter. The client compiler not being as smart as the server compiler is much more conservative and so will place polling instructions in more loops than the server compiler. Finally the other trick is that we don't want to add an extra branch in the code path (in other words we don't want the poll to add a compare and branch). Branches are just too expensive.

 So we made the poll be a simple read of a word in special page in the vm process. When we want to bring the system to a safepoint we simply change the protections on the page such that a read on that page will cause a fault (signal). From the signal handler we can then bring the thread to a stopped state. So using this scheme polling works out to be pretty cheap. It is more expensive than the previous method but not by very much. The big win is in reliability. Because of this change we no longer have to do forced suspension of threads and as a result we've noticed a definite increase in robustness of the vm.
Nov 09 2005, 01:30:55 PM EST Permalink

Comments:

Hi Steve, How's it going :) Actually I stumbled upon your web page looking for answers on locks, and figured you might have an answer. In Java when the VM transitions from a fat lock to a thin lock (or the other way) is there a safepoint inserted in between? Can you see the deflation or inflation taking too long? Ideas?

Posted by Azeem Jiva on April 05, 2006 at 07:24 PM EDT #

Post a Comment:

Comments are closed for this entry.