Wednesday November 09, 2005
Safepoints
So I was corresponding with someone that was reporting a problem using the JVM on the forums
and it dawned on me that a portion of what I wrote to him would make a
good blog entry. Well you'll have to decide on how good it turns out.
In the vm we will at various times have to bring the threads to a
stopping point where all the threads are in a state where it is safe to
walk their execution stacks and do things like garbage collection. We
need to do this for other reasons too but that reason is the common. We
call this situation a safepoint.
For simplicity purposes think of a thread that is executing Java code as being in one of three states: in_Java, in_VM and in_native. The simplest of these as far as what the vm has to do is the in_native
state. Basically when a thread is in that state we just leave it alone.
The thread can continue to execute. Its stack is consistent and
walkable. We have things arranged so that if the thread wants to
transition to a new state (in_Java, in_VM) that we cause it to block. So those kind of threads are simple. A similar vein threads in_VM
(think some runtime service like say a slow path allocation) are a
blocked either when they attempt to acquire a lock in the vm that
enforces a safepoint or when the thread attempts to return from the vm.
So a thread in this state is assured of blocking on its own in a very
short period of time.
The other state and the more problematic one is in_Java.
Now a thread that is in this state can either be executing in the
interpreter or in compiled code. It would be in compiled code if the
method was deemed hot enough that we compiled it. Well in the case of
threads executing in the interpreter we simply switch the bytecode
dispatchtable so that on the next bytecode (or so) the thread will
automatically block itself.
So the interesting case is the situation with compiled code.
Interesting in the painful sense. Now if we did absolutely nothing we
can expect that almost any real application will either return from
compiled code to interpreted code and then block, or it will need
some vm service and call from compiled code into the vm and once again
it would block. So the case we have to worry about is the situation
where we stay in compiled code forever (ok a long time) and never leave
compiled code.
The way we handle this situation has changed over the years. Prior to
Java 5.0 (1.5) we used a non-polling technique. In these earlier
vms we would notice that a thread was in this situation and suspend it.
We would then copy the code it was executing in to a temporary buffer
and patch all the calls out to another Java method or any place the
code might return from the method. (We didn't have to patch calls to
the runtime since they would block on their own). We would then
reposition the thread's pc into this temporary buffer and let it go. In
short order it would hit one of these patches and the patch would cause
it to block. Sounds painful and it was to some degree. The advantage
was that code executing in compiled code would not have to poll looking
to see if we wanted it stop. We stopped doing it this way in 5.0. The
reason might surprise you. We found that doing the thread suspension
was always problematic. The thread libraries on virtually every OS
always seemed to have some obscure bug and this would cause some
strange vm failure. Every release would have some new bandaid in the vm
to cover the next bad thread library behavior we found. We gave up in
5.0.
In 5.0 we decided to convert to polling. Now the original fear was that
polling would have a bad performance impact. Now it certainly could
have a bad impact if you weren't too smart about how and where you did
the polling. So the important places to poll are in loops without
calls, loops that can't be determined to be finite, and also at the
return from a method.
Well it is obviously trivial to see if a loop has calls so neither
compiler (client or server) has problem with that. Determining if a
loop might execute for too long is another matter. The client compiler
not being as smart as the server compiler is much more conservative and
so will place polling instructions in more loops than the server
compiler. Finally the other trick is that we don't want to add an extra
branch in the code path (in other words we don't want the poll to add a
compare and branch). Branches are just too expensive.
So we made the poll be a simple read of a word in special page in
the vm process. When we want to bring the system to a safepoint we
simply change the protections on the page such that a read on that page
will cause a fault (signal). From the signal handler we can then bring
the thread to a stopped state. So using this scheme polling works out
to be pretty cheap. It is more expensive than the previous method but
not by very much. The big win is in reliability. Because of this change
we no longer have to do forced suspension of threads and as a result
we've noticed a definite increase in robustness of the vm.
Nov 09 2005, 01:30:55 PM EST
Permalink
Posted by Azeem Jiva on April 05, 2006 at 07:24 PM EDT #
Comments are closed for this entry.