GetJava Download Button XML Feed
All | About | Flying | General | Java | Solaris 10
20060426 Wednesday April 26, 2006

Tiered compilation again

Well I can see that people are paying attention to the changes going into mustang. In the last 2 days people on the Java gaming forum and Java Lobby have noticed that the changes for tiered compilation went into the source base for mustang. People should not get too excited. We're not building a tiered jvm in mustang. It's much too late in the release cycle unfortunately. The code was pretty safe to go into mustang without disrupting it too much and it makes it a lot simpler to get the code in now rather than later should it become apparent it would be good to have in an update release. Obviously though it will make the dolphin schedule :-)

If you are so inclined you can certainly get the mustang source and build your own tiered jvm to play with. The makefile support is only available on solaris (sparc/x86). Obviously it is only 32bit since the client compiler is only 32bit at this point. Given the performance I've seen from the tiered system (which isn't remotely tuned at this point) I wouldn't run out and do that experiment at home but it's up to you. Of course if you get some great performance numbers I want to know.  Apr 26 2006, 11:36:43 AM EDT Permalink

20060313 Monday March 13, 2006

Tiered Compilation Limps

So as I've previously written I've been working on getting tiered compilation to work in the HotSpot JVM. Unfortunately it isn't going to make the Mustang release. However last Friday I managed to get the client and server compilers to coexist in the same JVM (SPARC only at this point).

I'm missing a lot of the infrastructure to actually have it work the way it is supposed to; compile first with client then if the compiled code is hot enough recompile it with the server compiler. So at the moment I'm happy just to give the runtime systems a good workout.

So to accomplish that I have the JVM alternate compilers on every other request. This gives the system a good workout and things are working out remarkably well. Now I'm off getting it to work on x86 and then I'll begin getting it to function something like we expect it to. I can't wait to start seeing some benchmark results.
Mar 13 2006, 02:20:32 PM EST Permalink

20060307 Tuesday March 07, 2006

Sun SPOT

Sun announced this really cool gadget yesterday Project Sun SPOT. For someone like me that has this home weather station and all this X10 equipment hooked up to my Solaris 10 AMD64 server running misterhouse, this looks like the next thing I need to have. The development kits are supposed to be available in May. I sure wish the cost for the kit was lower. At $499 it is kind of steep for hobbyists to get on board. Especially one with all his money tied up in airplanes. I think it ought to be like $100 and we can make it up on volume. :-) Hmm maybe they'll be making a special deal on them at Java One... Mar 07 2006, 11:00:04 AM EST Permalink

20060221 Tuesday February 21, 2006

RTP Java User Group

Last night I went to my first meeting at the local Java Users Group meeting. It is sort of strange that although I work on the jvm day in and day out I'm not really a user like the typical JUG member. I rarely write code in Java, in fact the vast majority of the Java code I wrote was a result of taking Sang's J2EE course.

I went to the meeting to see what it was like and also to offer to give a talk to the meeting. I was planning to give the talk I'll be part of at Java One which describe recent optimizations in Hotspot. Compared to what these guys usually hear about it'll be like a talk on assembly language programming.

Last night's talk was about Aspect Oriented Programming (Aspect/J in particular). I knew absolutely nothing about AOP so this was all new to me. It was actually quite interesting. I can see how this can be a great tool during development for testing or debugging. I'm far less convinced about it as a general way of writing code for production use. Too much magical stuff happening where you can't really see what effect actually happens at any particular line of code. All in all though it was a quite interesting talk.
Feb 21 2006, 12:02:56 PM EST Permalink

20060112 Thursday January 12, 2006

J2EE class - done!

What with the holidays and all its been quite a while since I've posted anything. I'm going to try and get a rhythm going now.
As I'd written before I've been taking Sang Shin's J2EE course. I tried taking it a couple of years ago and just got so overwhelmed with other stuff I gave up. This time I made it all the way thru. It was a really different experience for me. My day job is working on the Java virtual machine and I tend to spend some of every day in a debugger working at the assembly level. So to take this course was very strange from that perspective.

This time around the course was a lot easier. The first time I took it, everything was done by hand. You had to build the ant scripts to compile and deploy on your own. You had to install Tomcat, JWSDP and Pointbase individually. It was not much fun. This time you could just get the NetBeans and S1 App. server combination and the installation side was painless. I did virtually all of the homework using NetBeans and it was a nice experience.

In the class we still ended up doing some things by hand that could be painful. There was editing of xml files that was not much fun (struts or JSF config files for instance). In the real world I'd expect you'd use some tools like Java Studio Creator or StrutsConsole but I think the intention in the class was to let you see what was happening behind the scenes. I think you really want to use the higher level tools because when you have a mistake in a hand edited xml file and something goes wrong with the app figuring out what you did wrong is not much fun.

I don't expect I'll really have much use for what I learned but at least now I know what is going on in J2EE whereas before it was just a lot of acronyms to me. The experience in NetBeans though was useful since now when I have to right some test program to try and reproduce some failure in the JVM I'm comfortable enough with the ide to use it instead.
Jan 12 2006, 05:47:25 PM EST Permalink

20051109 Wednesday November 09, 2005

Safepoints

So I was corresponding with someone that was reporting a problem using the JVM on the forums and it dawned on me that a portion of what I wrote to him would make a good blog entry. Well you'll have to decide on how good it turns out.

In the vm we will at various times have to bring the threads to a stopping point where all the threads are in a state where it is safe to walk their execution stacks and do things like garbage collection. We need to do this for other reasons too but that reason is the common. We call this situation a safepoint.

For simplicity purposes think of a thread that is executing Java code as being in one of three states: in_Java, in_VM and in_native. The simplest of these as far as what the vm has to do is the in_native state. Basically when a thread is in that state we just leave it alone. The thread can continue to execute. Its stack is consistent and walkable. We have things arranged so that if the thread wants to transition to a new state (in_Java, in_VM) that we cause it to block. So those kind of threads are simple. A similar vein threads in_VM (think some runtime service like say a slow path allocation) are a blocked either when they attempt to acquire a lock in the vm that enforces a safepoint or when the thread attempts to return from the vm. So a thread in this state is assured of blocking on its own in a very short period of time.

The other state and the more problematic one is in_Java. Now a thread that is in this state can either be executing in the interpreter or in compiled code. It would be in compiled code if the method was deemed hot enough that we compiled it. Well in the case of threads executing in the interpreter we simply switch the bytecode dispatchtable so that on the next bytecode (or so) the thread will automatically block itself.

So the interesting case is the situation with compiled code. Interesting in the painful sense. Now if we did absolutely nothing we can expect that almost any real application will either return from compiled code to interpreted code and then block, or  it will need some vm service and call from compiled code into the vm and once again it would block. So the case we have to worry about is the situation where we stay in compiled code forever (ok a long time) and never leave compiled code.

The way we handle this situation has changed over the years. Prior to Java 5.0  (1.5) we used a non-polling technique. In these earlier vms we would notice that a thread was in this situation and suspend it. We would then copy the code it was executing in to a temporary buffer and patch all the calls out to another Java method or any place the code might return from the method. (We didn't have to patch calls to the runtime since they would block on their own). We would then reposition the thread's pc into this temporary buffer and let it go. In short order it would hit one of these patches and the patch would cause it to block. Sounds painful and it was to some degree. The advantage was that code executing in compiled code would not have to poll looking to see if we wanted it stop. We stopped doing it this way in 5.0. The reason might surprise you. We found that doing the thread suspension was always problematic. The thread libraries on virtually every OS always seemed to have some obscure bug and this would cause some strange vm failure. Every release would have some new bandaid in the vm to cover the next bad thread library behavior we found. We gave up in 5.0.

In 5.0 we decided to convert to polling. Now the original fear was that polling would have a bad performance impact. Now it certainly could have a bad impact if you weren't too smart about how and where you did the polling. So the important places to poll are in loops without calls, loops that can't be determined to be finite, and also at the return from a method.

Well it is obviously trivial to see if a loop has calls so neither compiler (client or server) has problem with that. Determining if a loop might execute for too long is another matter. The client compiler not being as smart as the server compiler is much more conservative and so will place polling instructions in more loops than the server compiler. Finally the other trick is that we don't want to add an extra branch in the code path (in other words we don't want the poll to add a compare and branch). Branches are just too expensive.

 So we made the poll be a simple read of a word in special page in the vm process. When we want to bring the system to a safepoint we simply change the protections on the page such that a read on that page will cause a fault (signal). From the signal handler we can then bring the thread to a stopped state. So using this scheme polling works out to be pretty cheap. It is more expensive than the previous method but not by very much. The big win is in reliability. Because of this change we no longer have to do forced suspension of threads and as a result we've noticed a definite increase in robustness of the vm.
Nov 09 2005, 01:30:55 PM EST Permalink

20051102 Wednesday November 02, 2005

J2EE Class

So as I've previously mentioned I'm taking Sang Shin's online J2EE course. I tried to take this course quite a while ago and got completely swamped with it and other things so I had to give up about halfway. It's apparently a lot longer ago than I thought since that was I think the 2nd offering and now is the 9th session.

When I took it the first time the thing that impressed me was what a pain it was to build and deploy J2EE apps. In that incarnation everything was done by hand. I had to locate and install Pointbase and JWSDP on my (now) old linux server and get them functional. All the ant scripts were done by hand. It was a lot of junk that didn't seem that important to the task at hand.

What a difference it is this time! This time I downloaded three packages, the Sun Appserver and Netbeans 4.1 and the J2EE tutorial. I could probably have downloaded the combined appserver package from the netbeans site but I didn't. In any case it was simple to get it up and running. Starting and stopping the appserver and pointbase is right in the netbeans ide. I've competely avoided doing any manual ant scripts and ant runs doing the homeworks. I've done every bit from the ide. Netbeans has made this attempt at the class much easier and enjoyable. I'm still wondering why I'm learning J2EE since I'm a vm guy but it has been interesting to see how all this stuff fits together and what is going on behind the scenes at all the various websites.
Nov 02 2005, 02:23:32 PM EST Permalink

20050913 Tuesday September 13, 2005

Russia - Day 9

Back to work today. Still angry about the camera lens. Hopefully I can do something  about a replacement. I get to the office kind of early, 10:30 or so. I'm the first one there in the office space I'm sharing since everyone comes in late and works late to try and have some overlap with EST/EDT. So I work on posting to the blog and catching up on the news. I'm astounded to see that the Miami Dolphins won their first game. Given last year and the preseason I wasn't expecting much this year. I sure hope I didn't miss seeing the best game they'll play all year.

Everyone else comes in an hour or so later. I'm supposed to give another presentation today. This one will be on deoptimization in the JVM. I also think I'll be doing an interview of an applicant for a position on the runtime team. The interviewee comes in and Nikolay and Andrey do a group interview session. Eventually Philip joins in too. It seems quite spirited and I'm thinking I'm glad these guys didn't interview me. After an hour it is time to do my presentation and I never got to talk to the candidate. I'm not sure what was up with that.

We begin the talk and for the first hour or so I explain how we perform what is called an uncommon trap. In the server vm the compiler can put code in paths that it never expects to occur (because of profiling) or it can't handle (some exception paths). On those paths we perform an uncommon trap. When this occurs we replace the frame for compiled code with one or more interpreter frame (more than one because the compiled code might represent inlined method call). An uncommon trap can be viewed as cooperative deoptimization. It is deoptimization because we go from an optimized frame to an unoptimized frame. It is cooperative because the thread that is running initiates the sequence.

Conceptually the conversion is simple but the actual conversion process is pretty complicated and there is a point where the compiled frame is removed and we're creating the new frame(s) that I also describe as "balancing the world while standing on one toe". There is much agreement in the room.  

After an hour of this people ask for a break. We decide to go to lunch while the food place downstairs is still selling food. This results in probably the most embarassing part of the trip. One of the things that Nikolay has realized is that while he lived in the States for 3 years and his english is quite good for technical discussions his vocabulary of english words for food is not at the same level. I decided to have the same chicken dish I had last Friday (a fried chicken breast and rice). So I ask him to order the chicken dish.

Over lunch the mystery of the interview candidate is answered. Apparently his knowledge of system program was such that they decided it was a waste to have me talk to him.

We sit down with Andrey and a couple of others and they bring our food one at a time. They eventually bring a chicken dish like I wanted and they give it to me. I start to eat when they bring a second chicken dish, a leg and thigh and rice. Apparently I got what Andrey ordered and the second dish is what Nikolay ordered for me. Since I had already started to eat Andrey takes the meal but I feel guilty the rest of the meal. :-(

After lunch we resume the presentation with vanilla deoptimization. This version of deoptimization is not cooperative. In this case the compiled code we are running is made invalid for some reason. A simple example is loading a new class where the method in question has inlined a method because at compilation time there was only a single implementor. When a class loads with an alternative implementation a dependency is violated and we must remove the code. So the offending code is patched so new calls to it can be entered and all thread currently executing in it must be evicted. Note that they may not be directly executing in the method they may have called out but they must also be stopped from returning to the now invalid method.

Prior to the tiger (aka 1.5/5.0) release the JVM performed what we called eager deoptimization in tiger we switched to lazy deoptimization. In eager deoptimization at the same time we make the method non-entrant we also examine all frames and the frames that represent activation of the method have their JVM state (Java local, Java expression stack, Java locks) extracted immediately and store into a side storage area. This was always complicated because the GC code needed to be aware of these side areas and keep them up to date. Meanwhile the original frames stay on the runtime stack until control returns to them and we can replace them.

In lazy deoptimization we leave the JVM state just where it was. The GC code examines it just like nothing has happened at all. This makes it seem so much simpler you'd wonder why it was ever done the other way. Me too. Actually lazy deoptimization does have a drawback. In eager deoptimization in order to be get the invalid frame of the stack we patch the return address that is stored in the callee. [I'm sure this is not clear without a picture]. So this involves "merely" patching an address on the stack to cause a return to a special piece of VM code. In lazy deoptimization instead of patching a return address we must patch the code of the method itself so that it will jump/call into that special piece of VM code. Patching code is always problematic so it isn't something taken lightly. Overall the benefits of lazy deoptimization are such that even though I really hate patching code it was worth it.

In another couple of hours we finish the presentation (so you can see how easy you just got off with only a few paragraphs). I'm not sure how much of it really sunk in. There are some Monty engineers that seemed to have gotten it and Nikolay seems to have gotten most of it the others it was hard to tell since they were pretty quiet.

After work I head back to Nevsky. Today is the first day it has really rained while I was out walking. This is sort of what I imagined the weather to be like around here. On Nevsky I check photo shops to see if my lens is now being sold as second hand. So far no luck. It is late enough I can't check many shops so I think tomorrow I'll try before work. I can hope anyway...

Sep 13 2005, 09:06:06 AM EDT Permalink

20050826 Friday August 26, 2005

Tiered Compilation

I've had it in the back of my mind for a while to talk about the progress we have been making to putting tiered compilation into the Hotspot Java VM. Recently a VM member posted a link to a discussion of it on a java.net forum. There is a bit of incorrect (and out of date) information on this thread but the overall gist is that they want tiered compilation and they want it now. Me too. You will get it but it wont be as soon as I'd like.

Tiered compilation is something I've been wanting to do for quite some time (at least 4 years). I'm not convinced it will solve all the problems the gamers were talking about but I definitely think it will help.

So for those of you that don't know what this tiered compilation beast is I'll give an overview. There is basically a tradeoff in the way we compile in the VM. We actually have two compilers which we call C1 and C2. As it turns out C1 was actually written after C2. So much for good counting skills.  The original hope was for C2 to be able to compile bytecodes to machine code fast enough to give both good client  (startup) performance and excellent peak performance (server). Unfortunately that didn't happen and C1 was born. What we'd like to do now is for at least the applications where footprint isn't so critica to run both compilers in the same VM. When methods first get hot we compile them with C1 as we learn more about them and their heat increases we'd recompile them with C2. So you end up with a mixture of code some compiled with little optimization and some with a high level of optimization.  Some people refer to this to be like adding an automatic transmission to the VM whereas -client or -server is akin to a manual transmission.

It doesn't seem like it should be that big of a deal to get them both to live in the same VM. It probably shouldn't be as big a deal is it is but for reasons we don't want to get into here the runtime interfaces the two compilers are not entirely compatible. So the very first step in getting tiered compilation to work is to reduce this or at least mitigate it.

This work is a lot of what I've been doing for the mustang (6.0) release. This has turned out to be a lot more work than I expected. Let's look at what these changes involved. The two compilers used different calling ABIs. C1 was a very simple calling sequence, no register based parameters. C2 used a convention that included passing arguments in registers. So right off there is a problem if code compiled by C1 want to call code compiled by C2. It is more involved than that though. For reasons that will probably be obvious if you think about it the interpreter uses a stack based calling convention. This convention is really optimized for interpreted code calling interpreted code. Ok, maybe designed is a better word than optimized when we're talking about the interpreter. In any case we are pretty much stuck with two calling conventions at best, interpreted and compiled.  This isn't so awful as you might think because if the Hotspot strategy works out the paths that cross these boundaries are not particularly hot so we don't spend a lot of time doing the conversions.

So because of the need for supporting interpreted and compiled calling conventions we need to be able to convert arguments across this boundary because the caller doesn't know what type of code it might call. If you only had to worry about megamorphic virtual calls you could see how a call site might need both types of argument passing at the same site. It is actually way more complicated than that in Hotspot but I'll leave that for another discussion. So obviously you need some sort of mechanism for crossing the compiled->interpreted and interpreted->compiled barrier. There are obviously a lot of different ways of doing this. If you've been paying any attention at all it probably won't come as a big surprise to learn that the two compilers did it in different ways. What can I say.

So the mechansim for crossing these two boundaries use an thing we call an adapter. So we have c2i adapters and i2c adapters. So in C2 the adapters are actually separate pieces of code (from the compiled Java byte codes). So these can be shared by methods that have the same signature (sequence of parameters). This is a good thing. These adapters will create a new stack frame (think frame pointer) as they do the conversion. This stack frame is visible to all the code in the VM that must walk the stacks of Java threads. This leads to lots of tests like: if  (frame->is_interpreted()) ... else if (frame->is_compiled()) ... else if(frame->is_adapter()) ...  This was seen as a bad thing.

The client compiler (C1) did things differently. The adapter code was actually generated as part of the code for the Java method so it wasn't shared. A not so good thing. However this code didn't leave behind a stack frame that the stack walkers would have to worry about.

So we have two distinct mechanisms for the arguments conversion. So when you have that situation what do you do?

Naturally you invent a new third way of doing things. :-) Ok it sounded like a good idea at the time. So in the forthcoming release we have replaced all this conversion code. I didn't mention it  but you probably could have guessed but in the previous system the compilers were responsible for generating this conversion code. If we wanted them to share the code in a tiered system and we wanted non-tiered systems to still be possible we had to pull this code from the compilers and make it separate and sharable. That is exactly what we've done to date. [If you've been picking up snapshots of the builds of mustang these changes went into build 28. The dreaded build 28... ]

So the new scheme uses a data structure that describes the particular calling signature of a method and spits out a blob of code that contains both the c2i version and the i2c version of that parameter conversion code. The code that we now generate for these adapters no longer produces a stack frame during the process. While this was good for the stack walkers it isn't such a great thing for some other pieces of the system. In the next rambling episode I'll talk about the downsides of not having a frame left behind and other changes on the road to tiered compilation.

Aug 26 2005, 02:38:57 PM EDT Permalink

20050809 Tuesday August 09, 2005

More Racy JVM Bug...

Well this entry is embarrassing. In the previous episode I detailed a race that seemed to cause a vm failure. I've learned more since then and figured I have to correct the previous entry since it is out there in full view. Next time I'll be a little slower about such stories to avoid being red-faced. Here goes...

The race I described earlier  where the patching of an inline cache site that originally looks like:

movl   eax,-1
call     resolve_invoke

and is patched to look like:

                    movl eax, <address_of_ICHolder>
call c2i_uvep


doesn't really occur. Precisely because of the possible race. What actually happens when compiled code must call an interpreted method at an inline cache site is that until the next safepoint (the world stopped ) we insert a transition stub. So the call sequence looks like:

movl eax,-1   
call   transition_stub

transition_stub:
movl eax,<address_of_ICHolder>
jmp   c2i_uvep

Since we first build the transition stub (an ICBuffer) before we patch the call no race actually exists.

So what is the cause of the fault? It turns out that due to recent changes by someone that will remain anonymous the code for the unverified entry of native wrappers (this is the code Java methods use to call JNI methods) was completely wrong. Basically it was accidentally a copy of the unverified entry for c2iadapters, which has the dereference of eax, instead of the unverified entry normal for compiled code. The code should have looked like:

                uvep: cmpl    klass_offset(ecx), eax            // expected klass match?
                 jne       ic_miss_handler                    // no, find correct method
      vep:   <setup_frame>

As you can see even if we get the incorrect value for eax because of the race in patching the we don't get a fault, the worst we get is a false miss. Now if only I wasn't in such a race to do the previous blog entry...


Aug 09 2005, 09:18:35 AM EDT Permalink

20050805 Friday August 05, 2005

Racy JVM bug

I was actually going to talk about tiered compilation a feature we are working on for Hotspot but I got involved in a bug yesterday I think was kind of cool. Ok maybe not cool but I think it might be interesting for people to see the kind of things that happen inside the Hotspot virtual machine. Just remember that quote about making sausage...

So these days I work in the compiler group (that's the jit compiler not javac) but for quite a while I worked on the runtime system. I've debugged quite a number of evil bugs. Today's was actually quite simple but it gives the flavor of some of the bugs we see. One of the things to realize is that in order to get the performance we want and others expect out of Java the vm does a lot of things that they warn you not to do back in school. As they don't try this at home. So in the current virtual machines everyone expects to see that when you run machine code is generated dynamically on the fly for the "hot spots" in the code. So this is somewhat unusual on its own but what makes it truly scary is that not only is the executed generated on the fly but we're patching it on the fly while threads may be executing it! Getting this to work correctly is a challenge, debugging it when it fails can be truly painful as even reproducing the failure may take weeks or months.

So as part of qualifying a release of Java we go thru a lot a testing. One of the places that is the most stressful is a thing we call "big apps". As the name implies these are typically large applications.  During the weekly builds we typically run these apps for several days to see where we stand. During the later stages of a release the times get longer. When I first started out, working on 1.3, the criteria was pretty low, 24 hours. On some platforms that was hard. In the tiger (5.0) release the criteria much longer and we basically could run them until we got tired of it. The bug I'm going to describe was found during this testing, running a webserver on an x86 after about 22 hours.

In order to understand this bug you'll need a bit of background on how things are accomplished in the vm on several fronts. The vm uses a technique we call "inline-caches" in order to speed up virtual calls. [ I've never really understood the name because they don't seem to be inlined  ]. As you probably know in most cases in an OO language when you do a method dispatch at a particular spot in the source code, most of the time the call is to the same method. In other words the type of the object you use to do the dispatch (the receiver) is almost always the same at a particular site. So doing a full vtable dispatch is more expensive than what you would like (at least on some cpu implementations). These "inline-caches" attempt to solve this problem.

When we generate code for a call site where we can't prove that only one type is possible we must be prepared to do a full vtable dispatch. The code we generate looks something like this on an x86 (note: I'm placing destinations as the left operand in these examples).

movl  eax,-1
call  resolve_invoke
...


The routine "resolve_invoke" is a call into the runtime system. This particular routine is responsible for figuring out what method we should call given the particular  receiver type we have now. When it figures that out, it patches the two instructions in this order; call and then the movl. The call is hopefully patched to another piece of dynamically generated code but it might take us back to the interpreter if we haven't generated code for the particular target method. The movl is patched so that eax will be loaded with the class of the Java object being used as the receiver (in Hotspot parlance that is the klass of the receiver oop). 

Now while we are patching this code the system is running normally. Some other thread may very well be executing this same path. We need to be resilient in the face of those races. For the moment we'll ignore that and see what happens at the patched call target. At the
target code we will enter thru what we call the "unverified entry point" (the uvep). Every method has an unverified and a verified (vep) entry point. When we can prove that only one target can exist we use the vep and no the uvep. The uvep conceptually looks like:

uvep: cmpl  klass(ecx),eax      // did we hit?
jeq      vep                      // yes we did
call     handle_wrong_method
vep:    <prolog code>

So looking back at our call site. What might a racing thread see. Because we patch the call and then the move it might see:

          movl   eax,-1
          call     uvep

If this happens we'll get a miss. So the worst that will happen is that the racing thread will end up in the runtime call for handle_wrong_method and slowly do the right thing. It's a rare race and the right thing happens. Now lets see what happens if we don't actually happen to have compiled code for the target method. In that case we have to call the interpreter. This is more complicated for several reasons.

Calling the interpreter is more complicated because compiled code uses a different calling convention than interpreted code. Compiled code uses a convention much like you'd expect for any compiler (and in many case uses the native ABI). This might very well involve passing arguments in registers. The observant reader would notice that based on the code I showed for the uvep that we pass the first argument ("this" or the receiver) in ecx. For reasons that aren't difficult to understand the interpreter uses a stack based argument convention for the java arguments. It also takes one register arg that is a pointer to the internal object that describes the exact method to run ( the methodOop in hotspot parlance).  So in order to call from compiled code to the interpreter we call thru what we call a c2i adapter. (Not surprisingly i2c adapters exist for the other direction).

Now since the call site we resolved in the compiled code want to call a specific method based on the receiver class it and it knows it will be calling the interpreter it does the patching a little differently. It patches the code so it looks like this:

movl eax, <address_of_ICholder>
 call   c2i_adapter_for_call_signature

As you might guess there is an adapter for based on each argument pattern or signature. Now it turns out that like compiled code c2i adapters have verified and unverified entry points. In this case we'll be going to the unverified entry point because we need to load up a methodOop for the interpreter to use. So what is this "ICholder" item that gets its address loaded into eax? It is a small data structure that looks like:

             methodOop _holder_method;    // the methodOop we need for the interpreter
             klassOop _holder_klass;            // the expected receiver klass

So now the uvep of the c2i adapter looks a bit different from the compiled entry. It looks like:

c2i_uvep:    movl   ebx, klass_offset(ecx)                  // get the receiver's klass
cmpl  holder_klass_offset(eax), ebx      // did we hit
movl  ebx, holder_method_offset(eax) // load the methodOop where the interpreter expects it
jcc     c2i_vep
call    handle_wrong_method

Ok now let's see what can go wrong here with the same race. We assume that the racing thread doesn't see
the patched move instruction but does see the call change. So when it gets to the c2i_uvep it will attempt to
execute:
cmpl holder_klass_offset(-1), ebx

Now holder_klass_offset is a small value (12 in fact) so this code will attempt to load the 32 bit value at address 12 + -1 or 11. This will cause a trap because the first page of the process is not mapped in. We use kind of thing all the time to implicitly check for NULL pointer dereferences. Rather than checking for NULL in a pointer to a Java object every time we use it for the first time if the offset is such that we are certain it is within the first unmapped page we just let the code fault and generate the null pointer exception via runtime magic.

Using that same kind of magic when we take the fault we see that it happened at an address within the first unmapped page and that it was in range of instruction in an adapter for verifying the klass and we just fix things up and continue on. Once again the racing thread sees this call taking a long time but the race is rare and the usual path is quite quick.

Well now were finally ready to see the bug. Remember the bug I was going to talk about many paragraphs ago? :-)

In this instance the program crashed and it did so at the instruction

cmpl  holder_klass_offset(eax), ebx

In the debugger the value of eax was 0xffffffa0. What a strange value. So looking further I looked back at the call site and it
looked like:

movl    eax, $0x43e071a0
call    0x40b5c560

and at 0x40b5c560 we find

0x40b5c560:     mov   eax, $0x43c053d0
0x40b5c565:     jmp    0x40bc4150

Now I'm not showing it here but that jmp to 0x40bc4150 isn't actually taking us to the c2i_uvep where we crashed. It is actually taking us to a true vtable dispatch. So this call site was busy indeed. Not only did it get patched to a monomorphic site (single receiver type expected ) but by now it has gone megamorphic and will do full vtable dispatches from now on. So the question is, how did we end up with the strange value in eax?

Ok I cheated a little here and left out some information on purpose. Here's the actual disassembly of the call site:

0x40bfc27e:     movl    eax, $0x43e071a0
0x40bfc283:     call   0x40b5c560

It might be obvious now. Notice the 0xa0 in the low byte of the move instruction and notice the address of the mov instruction itself. If we dump the instructions at the natural 32bit boundaries we see:

0x40bfc27c:     0xa0b89090      0xe843e071

Aha! So the original -1 (0xffffffff) was in two memory words and those words straddled a cache line. So when the patch occurred converting the 0xffffffff to 0x43e071a0, we actually wrote two words and the fact that they straddled a cache line meant that a racing thread could see the writes individually. So in this case that happened and got the 0xffffffa0 value as a result of the mismash.

 So this takes just the right alignment of the instructions and even to some degree the right (or wrong) values being written. To see why the value matters,  remember we can tolerate a bad value as long as the value in eax + 12 ends up being an address in the first page.  I think you can see why there are some classes of bugs that can be very difficult to reproduce. At least in this instance the cause was pretty simple to detect, just a few minutes in the debugger. Now if only I knew how best to fix it. There are several choices but which is the best isn't clear. If you're the slightest bit interested and I suspect if you've read this far you might very well be, then watch for further developments by checking on bug 6306102.







Aug 05 2005, 11:58:24 AM EDT Permalink

20050718 Monday July 18, 2005

Reflections on JavaOne

Well I just went to my first JavaOne conference and I thought I'd write about my experiences. This actually was my first trip to what could be labeled as a trade show. I'd been to conference like PLDI before but nothing like JavaOne. After more than thirty years in the business I thought my record might make it to the end of my career but the streak has finally ended.

I was asked to do a presentation for Licensee Day which happens the day before JavaOne. Since I was travelling all the way from North Carolina it made sense to stay for the rest of the week and get the whole environment. After all I've been working on the Hotspot Java Virtual Machine (insert many TM's here) for over six years it was probably time to see what people were really doing with stuff other than submitting bugs :-).


Saturday

I arrived on Saturday the day before Licensee Day so I took advantage to meet up with one of my college roommates that I hadn't seen in probably twenty years. At Carnegie-Mellon I was the only engineer living in a rented house with a bunch of architects and an artis and it helped keep my geek tendencies somewhat in check. Wayne  (the artist) has lived in the Bay area since college and we've caught a few times when he's been on the east coast. His living north of Berkeley so I took BART (also a first for me) out to see him. He's doing ok but could use more work so if you're in the need of a mural you should check him out he does great work.

Sunday

Sunday was Licensee Day and it was also the day of the Gay Pride parade.  I thought I made a big mistake leaving the hotel because I had to cross Market Street to get from my hotel to the Marriot where Licensee Day was held. There were already barricades up when I left and I thought at first I'd have a long way around but luckily that wasn't the case.

For quite some time we've been trying to get the client and server JIT compilers in the VM to coexist in order to perform what we call tiered compilation. Think of it as like an automatic transmission. You start out with lightly optimized code but as you need it some code get recompiled at higher levels of optimization. We thought that this was going to make it into Mustang (6.0) but unfortunately it didn't make the cut. My talk was going to be about some of the changes that were made to support this feature. Specifically the changes that made it so the compilers shared a calling convention and also shared the same code that allows passing arguments from a compiled frame to an interpreted frame and vice-versa  This is remarkably involved code and explaining it in the 20 minutes allotted limited how deep I could go. Getting the correct balance is something I was worried about.

The talk went ok,   I only caught one guy dozing off. I'm sure it was jet lag and not the topic. Afterwards several people came up and said they thought it was pretty good. So that was ok. By the time all the talks had finished the parade was all over so I missed the extravganza but then there was no trouble getting back to the hotel.

Monday

The next morning was the start of JavaOne. I was surprised walking up to Moscone at how long the line was for the general session. Since I hadn't even picked up my conference badge yet I expect I'd be sitting in outer Mongolia. As it happened I was registered as speaker and the speaker badge let me bypass the big line. So I got a decent seat.

I didn't really expect the session to take on the rock concert feel but the first day certainly had that air.  It was interesting to see people like Jonathan Schwartz and James Gosling in person. The first day's T-shirt delivery system was a disappointment. It looked like a nice bit of machinery but I think only one shirt made it off the stage and only a couple of rows in to the audience. I think they broke it during testing before the event.

I worked at one of the Sun booths for three hours in the afternoon. It was the Performance  booth. Most of the people that came by were looking for the DTrace booth. The rest had GC questions at which I'm only moderately competent. The sessions I did manage to get to on Monday were not up my alley. Working on the vm like I do (compilers/runtime) I hardly even see Java, most of the stuff I went to this day was Java EE (formerly know as J2EE but they announced s/2/ava /g today).  I don't know much at that level even though I attempted to take Sang Shin's  J2EE course before being swamped by the homework involved. After the first day of JavaOne I thought "well this was nice, but once is enough".

Tuesday

The second day I actually waited in line for the general session since I felt a little guilty about yesterday's episode. I got there early enough that I still got an ok seat even though I wasn't in the line for alumni.  Scott talked at this session it was pretty good but not as funny as he used to be. I didn't see the T-shirt delivery system this day as I had to leave since I was scheduled to open the performance booth. Apparently I didn't miss much. Although it was a pretty high tech system apparently the lights and flashes confused the infrared control system so that they couldn't deliver any shirts.

When I went to the pavillion I found I couldn't get in because I was missing a special blue sticker. This was bad since I was supposed to work the opening 3 hour slot. I ran across the street and got the sticker. Then I got into the pavillion only to find that the machine had been turned off overnight and I didn't know the password. After I few anxious minutes of running around I found someone that had the password. Whew. The booth work was pretty straight forward I had gotten the spiel down (" DTrace is at booth 800") and could demo the various tools. We did have one person who came up pretty irate complaining that garbage collection was destroying inodes on his Linux system. Of course GC  is only a very heavy user of the virtual memory system and so only has a very indirect connection do any inodes. After getting a picture of his setup it was clear that the only thing he ever did that caused any amount of paging was this one particular simulation he was doing with a Java program. I explained that it was more likely that he either had a bad disk driver or hardware problem disk controller or memory. He seemed to calm down and eventually believe that the vm wasn't the real culprit. I sure hope he found his problem but at least I don't get GC bugs. :-)

The first talk I went to this day after getting out off of booth duty was truly awful. I'm not identifying it here but I'm still sorry about that waste of time. I hope they got the worst evalution of any talk at JavaOne because I'd hate to think there was anything worse. Things got much better though. The next one was "Profiling in the Real World" . This was pretty good. It showed how NetBeans can do profiling using the new features in JVMTI.

From the computer science point of view the next talk I attended was my favorite. "Finalization, Threads, and the Java Technology Memory Model" by Hans Boehm. I have never met Hans although I exchanged quite a bit of email with him while working on the port of Hotspot to Itanium. I was sure it was going to be a good talk but I was afraid it might give people ideas about using finalizers when the advice has been to avoid finalizers. This talk was about if you really, really, have to use finalizers (and I mean really) this is how to do it.

I suspect quite a number of people went to this thinking "I'll learn the cookbook and then I'll be able to use finalizers after all". I'm sure they went out disappointed. While there was something of a cookbook the issues Hans talked about were for the most part so subtle I know that in the group I was sitting with most were just lost. Especially the issue that I think of as the "vanishing this pointer". I wish I could find his slides on the net but so far no luck. If I do I'll be sure to post the link. Suffice it to say I think more than anything else Hans scared most of the audience so much that they'll never be tempted to use them.

Tuesday night was a series of great BOFs ("Six Ways to Meet an Out of Memory Error" , "Java Technology on Linux: Tips and Tricks, "Using the Tools in JDK (tm) 5.0 to Diagnose Problems and Monitor Applications") done by the servicability group  and also Hui Huang from the runtime time. Hui had a number of tips for people having issues with various linux distributions and kernels. I learned a lot about all the new tools the servicability group added in 5.0. When you work at such a low level like I do every day it's nice once in a while to get a glimpse of what it's like at the higher levels. Before the BOFs there was a reception for the JDK Community. I got to meet two developers from Kentucky that had the first two bug fixes accepted as part of Peabody process. Alan Bateman and I got dragged over because they were having a vm crash and it thought we might be able to diagnose it via our psychic powers. We tried but we couldn't.

Wednesday

Wednesday I skipped the general session. The first session I went to was "Nine Ways to Hack a Web Application". It was pretty entertaining and very well attended. In fact they repeated it the last day. I think there were a lot of people hoping to hack into Citibank and be able to quit their day job. I was surprised though to see that almost everything on the list was stuff I knew. I guess I did learn some stuff in Sang Shin's class. The "Evolving the Java (tm) Language" was ok (sorry Mark) but it seemed more of a fishing for feedback session than anything else. While I can see the rationale for adding direct support for XML in the Java Language I can't say I was enthused about the proposed syntax. It really is kind of ugly. 

I finally got to go to the DTrace booth (800) and watch Jarod and Adam at work. They had a challenge going on that they could find significant performance imporvements in you app or you'd get an iPod. I think they amazed quite a few people with what they could do with it. Only one iPod was "won" and that one barely met the standard they had established .

Even if I was in a shop running only linux or windows I'd find a way to dual boot Solaris 10 or run it in a VMWare partition. DTrace is so cool and so good at finding performance problems it is worth it. Especially since Solaris 10 is free all its going to cost you is some disk space. Even with the limited support for DTrace in 5.0 you can still do a lot of useful performance analysis and when mustang is done ( or now if your adventuresome ) it will be excellent.

I do have a somewhat funny DTrace story though. While I was standing at the both a guy from the Space Telescope Institute came up and gave them an app for the challenge. This app is used by astronomers that have a proposal for using the Hubble telescope. So they loaded up the app and started using DTrace on it. When they started to look at Java execution stacks (jstack) they were getting traces that were pretty meaningless. Little bits of names and then long sections where they just got hex numbers. So they were sort of at a loss for what to do since if you can't see what code the vm is running its hard to find the performance problems. Jarod was just playing around and disassembled some of the code at one of the addresses and made some silly comment about it to the crowd. So I said "I recognize that code it's the dispatch code in the interpreter from the return from a Java call". (I told you I spent too much time at a really low level). A little further looking showed these were all interpreter frames. The question was why didn't DTrace show the frames as usable method names?
Either Adam or Jarod decided to take a look at it using pstack. It decoded the frames correctly but what we noticed was that the method names were like 200 characters long because the class hierarchy was so deep. This was the clue they needed and the figured out that there was a buffer limitation in jstack but they could workaround it using ustack. Now they could see what the app was doing and they were off to the races. At that point I had to go back for my final shift at the performance booth. I don't know what they found in that app but I know he didn't get an iPod. The funny part was that after this I had several people come up to me and say something like "your the guy that read that Java hex stack dump aren't you?".

That night was the "Java Hotspot Virtual Machine Q&A" BOF which I attended. There were a lot of good questions and I think we helped a number of people. Maybe next year I'll be listed as real speaker rather than the impersonator I went as this year. (I'll explain that remark when the statute of limiations expires).

Thursday

Thurday came and I was back wating in the line for the general session. You could definitely tell that a fair number or people had already left (or were recovering from the party). I got to sit right up front near the stage. They were going to have a  discussion by a number of famous people. It was funny at one point before it started James Gosling and Bill Joy were standing together talking. Lots of people came up in order to take a photo of the two of them talking. This was truly the geekiest point of the conference. It probably wasn't as geeky as when I asked David Patterson to autograph my copy of "Computer Architecture" but it was up there.

In the Thursday session t-shirts were actually delivered. This was a trebuchet so it was definitely low tech. I since one was done before you could do it again but apparently not. Good thing otherwise maybe none of them would have worked. So the audience got to vote there was no surprise, the one that worked won.

The first session I went to Thursday was "The Apache Harmony Project". I was curious to see what they had to say and what the response would be. I personally don't see the point but I know I might be biased. After seeing the slides I think they must get better hallucinogens than the rest of us. Although they have slides acknowledging what a difficult problem it is to reproduce ("jvm hard, libraries big") I think they are still underestimating it. Especially when they talk about the blue sky proposals for how it will all be built with interchangable pieces. They compared it to gcc and linux. I don't see that as a good comparison. gcc and linux both satisfied a market for free compiler that ran everywhere and linux a free os that ran on your pc. They also provided an answer for the market that wanted to hack on their own on a compiler or os. The former market is much larger than the latter. In the case of Java the former market already has Java at the price (free) they want to pay so it is only the smaller market than is missing what they want.  So like I said I don't get, maybe it'll work and maybe it won't. I think it is quite a ways  off in any case unless someone just gives them a lot of code (which they are hoping for).

The surprising part of this was the Q&A session. I really didn't know what to expect and truthfully one of the main reason I came to the talk was to see what other people thought of it. Maybe it would give me a different view point and change my mind. Nope. There were quite a few people that got up (none from Sun btw) and said this was a bad idea. They were really concerned about forking Java and while the Harmony folks say they have no interest in forking Java they did admit that if they did this someone could try and do it. They felt like no one would do such a thing but the questioners didn't seem convinced. After the first few negative questioners got up I left so maybe the tide turned after that.

The next session I attended was "Increasing the Robustness of the Java (tm) Virtual Machine" by folks from SAP. I missed the beginning of this because I went and wolfed down lunch after the Harmony presentation. I'm sorry I missed it now. I really liked this talk. This is really cool technology for getting what people think of as multi-tasking VMs. I like to think of this version as "Green VMs" as they multiplex VM state on top of a process context. It definitely has its limitations but it is great for their application. It is neat way to get both isolation and operating system level protection. It contrasts with the approach taken by MVM from Sun Labs. I think MVM has a place in the universe and in a lot a ways is more general than the SAP work but I think that an isolation technique where isolated apps all crash when the vm crashes is an odd way to be isolated. Or as someone else put it "One VM to lead them, one VM to run them, and in the darkness one VM to crash them all".

Next was the DTrace session. Needless to say I thought it was great. The amazing thing about this as the session ended Adam mentioned that Bryan Cantrill was in the room. Since Bryan didn't interject anything during the whole talk all I can figure is that they had him duct taped to a chair with a gag.

The final session I went to was "Scripting in the Java Platform". I went to this because I wanted to see what this was all about. I have to say I was disappointed in this talk. It was ok but it wasn't anything like I expected.

Well this is the end of my first blog entry. I'm impressed if you've gotten this far. It would have been better if I had done this as it was happening to break things up but I kept procastinating at getting my blog setup and so now I had to do it all at once.


Jul 18 2005, 11:36:22 AM EDT Permalink