GetJava Download Button XML Feed
All | About | Flying | General | Java | Solaris 10
20070108 Monday January 08, 2007

Tiered compilation - where is it?

It's about time I wrote about what I've been working on (although I've been getting requests for airplane building updates too). I've been trying to get the tiered jvm built as the standard vm to replace the server vm. My intention was to get the tiered system visible so we could get feedback on it sooner rather than later. In order to do that I needed to show that people wouldn't have a heart attack over performance. So as a result I'd been running benchmarks.

The good news is that using our internal benchmarking system (Alacrity) the server results were pretty much identical with the tiered vm as with the server vm. The client results which track startup performance weren't quite so good. Now that wasn't really bad for me because I only wanted to replace the server vm with the tiered vm at this point. In the future I want the client vm to go away too but given the state of the tiered vm it wasn't too surprising.

The Alacrity results for startup though were pretty bad so I spent some time investigating them. The suspicion of course is that now the the client compiler is generating code to track execution profiles that would explain it. So I built a client vm that would collect profiles via new switch(es) (switches because I wanted to see what different profile tracking code was the most expensive) and run the tests again. Well it definitely showed that this impacted client performance but to nowhere near the extent I was seeing with the tiered vm. The impact was pretty modest 6-10% or so for most things although there was one benchmark that was impacted pretty drastically.

So one other suspicion was that we didn't have code that allowed a thread that was execution a hot loop in client code to OSR (on stack replace) its way into server code. So I added that capability. That had no real impact. So although we'll need that capability at some point I didn't get anywhere adding that code.

The odd thing was that when I was running test locally on my workstation I wasn't really seeing the same kind of results that the Alacrity results showed. It finally dawned on me that the difference was probably class data sharing (CDS). The client vm supports class data sharing so that classes used during vm initialization are pregenerated when the vm is installed and later client runs can simply load the class data with a good bit less work. This helps startup time. The server vm because it uses a different garbage collector doesn't support CDS. Since when I work on the vm I rarely install it like a normal user would my client vm didn't have a shared archive to load.

So I modified the tiered vm so that when it was run as a client vm (which it was when it was tested in Alacrity) it would use the same garbage collector as the normal client vm and therefore allow dumping/loading of the shared archive. This was the pretty much that missing piece in the performance puzzle. Between the lack of CDS in the tiered vm and the cost of collecting profile data in the client compilers generated code I could account for the performance gap I was seeing compared to the vanilla client vm.

So it is obvious the next important things to attack:

    1: get the garbage collector the server vm uses to support CDS. That is not something I have any expertise at so someone else will be doing that work. (My understanding is that it is doable but that it just wasn't seen as that important for server where startup preformance was not that key.)

    2: see what I can do about the cost of profiling in the code the client compiler generates. That will be a topic for another day...


Jan 08 2007, 10:06:09 AM EST Permalink