GetJava Download Button XML Feed
All | About | Flying | General | Java | Solaris 10
20070108 Monday January 08, 2007

Tiered compilation - where is it?

It's about time I wrote about what I've been working on (although I've been getting requests for airplane building updates too). I've been trying to get the tiered jvm built as the standard vm to replace the server vm. My intention was to get the tiered system visible so we could get feedback on it sooner rather than later. In order to do that I needed to show that people wouldn't have a heart attack over performance. So as a result I'd been running benchmarks.

The good news is that using our internal benchmarking system (Alacrity) the server results were pretty much identical with the tiered vm as with the server vm. The client results which track startup performance weren't quite so good. Now that wasn't really bad for me because I only wanted to replace the server vm with the tiered vm at this point. In the future I want the client vm to go away too but given the state of the tiered vm it wasn't too surprising.

The Alacrity results for startup though were pretty bad so I spent some time investigating them. The suspicion of course is that now the the client compiler is generating code to track execution profiles that would explain it. So I built a client vm that would collect profiles via new switch(es) (switches because I wanted to see what different profile tracking code was the most expensive) and run the tests again. Well it definitely showed that this impacted client performance but to nowhere near the extent I was seeing with the tiered vm. The impact was pretty modest 6-10% or so for most things although there was one benchmark that was impacted pretty drastically.

So one other suspicion was that we didn't have code that allowed a thread that was execution a hot loop in client code to OSR (on stack replace) its way into server code. So I added that capability. That had no real impact. So although we'll need that capability at some point I didn't get anywhere adding that code.

The odd thing was that when I was running test locally on my workstation I wasn't really seeing the same kind of results that the Alacrity results showed. It finally dawned on me that the difference was probably class data sharing (CDS). The client vm supports class data sharing so that classes used during vm initialization are pregenerated when the vm is installed and later client runs can simply load the class data with a good bit less work. This helps startup time. The server vm because it uses a different garbage collector doesn't support CDS. Since when I work on the vm I rarely install it like a normal user would my client vm didn't have a shared archive to load.

So I modified the tiered vm so that when it was run as a client vm (which it was when it was tested in Alacrity) it would use the same garbage collector as the normal client vm and therefore allow dumping/loading of the shared archive. This was the pretty much that missing piece in the performance puzzle. Between the lack of CDS in the tiered vm and the cost of collecting profile data in the client compilers generated code I could account for the performance gap I was seeing compared to the vanilla client vm.

So it is obvious the next important things to attack:

    1: get the garbage collector the server vm uses to support CDS. That is not something I have any expertise at so someone else will be doing that work. (My understanding is that it is doable but that it just wasn't seen as that important for server where startup preformance was not that key.)

    2: see what I can do about the cost of profiling in the code the client compiler generates. That will be a topic for another day...


Jan 08 2007, 10:06:09 AM EST Permalink

Comments:

Really glad to hear that you are plugging away on this! All sounds very interesting...

Start-up time *can* be important on a server. For example, if I have to restart a JSP then server I want it up and running as fast as reasonably possible to minimise user-visible down-time, and then I want that code to be C2-optimised to bits by HotSpot!

Thus my previous comments to you about remembering from one run to the next which routines turned out to be hot at startup so that they can be C1-plus-instrumentation-compiled upon their very first use on subsequent runs to help start-up performance...

Rgds

Damon

Posted by Damon Hart-Davis on January 08, 2007 at 11:41 AM EST #

I don't doubt that startup time is important on a server. It is certainly one of the reasons we are doing tiered compilation. All that I meant was that when it came time to allocate resources getting the garbage collect to support CDS wasn't as important as other things at the time. It is certainly more important now.

Posted by fatcatair on January 08, 2007 at 04:06 PM EST #

An important thing to remember (or realize) is that if you're using Java on x86_64, there's no client compiler. So you're using the "server" compiler for very non-server tasks. This hits most of my applications quite hard (because even with the client compiler, start-up time is still Java's biggest performance weakness). Notice that pretty much every new computer being sold today is x86_64. It might be a while before the 64-bit builds of the various OSes become the default, but it's now a realistic possibility, and it's starting to happen. Many of the Linux users I know have switched or are switching. Even some Windows users have switched. I raised a bug late in JDK6, because I didn't realize that Sun wasn't working on this as a priority. I'd never realized that we 64-bit users would have to live without a client compiler so long. Now it's looking like the client compiler might die before we get one. Which is fine by me, as long as start-up time improves. I should have realized the CDS thing myself; I'd noticed the effect, but never explained it. Thanks for shedding some light on that!

Posted by Elliott Hughes on January 09, 2007 at 01:48 AM EST #

So I guess I have to ask why if your aren't running server type tasks (i.e. large heaps) why you would run the 64bit vm on x86_64 instead of the client 32bit vm? The 32bit vm's client and server both run on x86_64 (solaris and linux anyway I'm not much of a windows user) and unless you really have a need for large heaps you may well be better off. On sparc this is always true, on x86_64 this isn't always the case because the register set is larger on x86_64 than x86_32 and that can help compensate for the increased memory traffic generated by 64bit pointers. That being said we aren't forgetting the client compiler for 64bit. Just because I have talked about it doesn't mean it isn't in the plans. In the past the client compiler has run internally in 64bit mode on sparc I expect it does now too but bit rot sice we don't build and test it could make that untrue at the moment. The intention is to do the same on x86_64 so that a tiered 64bit system will have both compilers. The client compiler is not going away.

Posted by fatcatair on January 09, 2007 at 09:16 AM EST #

Why aren't I running the 32-bit JVM?

1. because the user gets to choose which JVM to install, not me, and because 64-bit sounds shiny and new, there are plenty of users who're installing 64-bit stuff whether they need the extra address space or not, and without measuring whether the extra registers (I'm assuming x86_64, because that's the world we live in) help more than the over-sized pointers hurt performance.

1a. why aren't I shipping a JRE as part of the app? Because that means much larger downloads, in the main. There are trade-offs both ways with including versus not including a JRE, but for now "too much bandwidth" and "enormous downloads put users off" will do.

2. JNI. Strictly, the fact that it's a real PITA to build multi-arch on Linux, and the JVM needs an appropriate-"width" shared library.

2a. why aren't I building a .so for i386 and one for amd64 and shipping them both? ...

3. ".deb"/".rpm". It's awkward, strictly incorrect (since you can't specify exactly what your i386 *and* amd64 package is, and have to lie that it's arch-independent, which annoys Linux/ppc and Linux/sun4 users), and against the packaging rules to stick multiple architectures' binaries in the same package.

4. You end up being forced to outlaw the running of a 32-bit JVM on a 64-bit OS, or break the packaging rules. You can't correctly specify your package's architecture, because when you say "amd64" you mean "...and your JVM damn well better be the 64-bit one too". So your amd64 package will install but it may or may not work, depending on whether the user has a 64-bit or 32-bit JVM, and that's why developers really don't want people going around recommending the use of 32-bit JVMs on 64-bit OSes. Though I've done so myself, for exactly the reasons you mention above. But I only recommend it to people like us, who know what we're doing and would recognize and understand what's gone wrong when our JNI-using Java applications fail.

This idea that 64-bit is "high-end" or "server" is so 1980s. Every computer my parents own is now dual-core and 64-bit. Every computer they could realistically buy this year is at least dual-core and 64-bit. It looks like 64-bit Windows won't be the default this cycle, but Mac OS 10.5 might be (Apple have been making a lot of noise about it being fully 64-bit), and Linux cycles are much shorter still, and there's no cost to trying it out, which is probably the reason why I'm seeing the most 64-bit users there. (On Mac OS the JNI thing isn't a problem because Apple's development tools are all geared towards cross-compilation anyway, and you can generate a fat .so that contains all architectures.)

How does the server compiler with CDS compare to the client compiler in terms of start-up time?

Posted by Elliott Hughes on January 09, 2007 at 12:15 PM EST #

Ok I get why you don't use the 32bit vm. I assumed you could in fact choose and weren't bound up in packaging. As far as the idea that 64bit is "high-end" I think that is still a misunderstanding as to what I meant. 64bit bit apps are pretty much always slower than 32bit apps so using a 64bit app just because it is new and shiny means giving up performance. People truly concerned about high-end would run whatever gets the job done quickest and not what was new and shiny. server with CDS is pretty much the same speed as server. It isn't an answer to your hopes. You really want tiered (and I really want it out there). The problem is that server compiles are so expensive that we'll execute a method 10k times in the interpreter before compiling it whereas client will compile at 1k invocations. Even if we knew up front what to compile if we did that with the server compiler you'd be unhappy because all those compiles would add so much load startup would look pretty awful.

Posted by fatcatair on January 09, 2007 at 12:56 PM EST #

Post a Comment:

Comments are closed for this entry.