Brian Doherty's Weblog

All | General | Java | Personal | Sun
« Don't judge the... | Main | What's in a name? »
20051222 Thursday December 22, 2005

Stacking the deck

I'm not one to typically take a competitor's bait, but I'm really getting tired of BEA making baseless claims on JRockit being the 'world's fastest JVM', when there's plenty of evidence to the contrary.

BEA's latest tactic is the JRockit verses Sun Java™ runtime challenge. According to the anonymous blogger (how droll), there's only one rule:

This rule is ambiguous. What do they consider a "line"? If I take it literally, it seems that I could submit any program, including a 1 million line program made up of many classes and methods, as long as main() is less than 20 lines. I suppose I could submit some industry or de facto standard benchmark subtly modified, if necessary, to have a main() less than 20 lines of code. I could just rename main() to oldmain() and introduce a new main() that calls oldmain(), and now I have a main() that's easily less than 20 lines of code. I suppose I could also submit a program with a main() comprised of thousands of Java statements all concatenated together on a single line. Somehow, though, I don't think that's what they are after. I think they are looking for little micro benchmarks. Otherwise, why would they even mention a line count limit?

Why do you think they want 20 line programs? Because they want to stack the cards in their favor. How's that, you say? Well, there are some significant differences in the JIT designs between HotSpot™ and JRockit. JRockit compiles methods as soon as it sees them, whereas HotSpot will run methods in the interpreter for a while and collect profiling data on them and use that profiling data to compile the methods based on dynamic runtime information. For simple little benchmarks, HotSpot will typically run interpreted for a while, and quite possibly for the entire run. However, JRockit will compile this method out of the gate and run the entire method in compiled code. Who do you think is going to win with most of these 20 line programs?

Why do these two platforms have different JIT policies? In some respects, that's part of the power of the Java™ technology platform. The Java Virtual Machine specification, and the API specification provide for different but compatible implementations of the Java platform. At Sun, we designed the HotSpot JVM to meet the needs of both client and server Java applications, and our JIT policies reflect that design decision. By putting off compilation until we're sure there's a win we can provide good startup time and memory utilization for client applications while still providing industry leading performance for all kinds of applications (not just one outdated benchmark). This design leverages well known trade-offs in computer science. Remember the 80:20 rule? 80% of your performance comes from 20% of your code. How about Donald Knuth's 'premature optimization is the root of all evil'? By not wasting time compiling code that your application rarely uses, we reduce startup time and reduce memory utilization. These are not just client issues. Reduced startup time also means reduced recovery times, important for high availability applications, and reduced memory utilization means larger heaps or more threads or other memory needs.

We talk about micro benchmarks every year at JavaOne. It would be nice if we didn't have to, but we regularly see poorly written micro benchmarks that don't measure what the author intended them to measure. In this challenge, the micro benchmarks are likely to be comparing the results from JRockit JIT compiled code to the results of HotSpot interpreted code. Is that what your real applications are going to see? Does that really sound like an 'Apples* to Apples" comparison?

Micro benchmarks can sometimes be useful, but you have to be very careful when writing the code and when interpreting the results. What you might think is a clear indication of one system being better than another may just be artifacts of default configurations, subtle timing differences or implementation differences that skew the results in ways only observable in the context of the micro benchmark. Changes in command line options (tuning) or subtle changes to the micro benchmark code to properly warm up the JVM or preserve the computation of some loop can produce dramatically different results. Ignoring such matters is likely to cause you to make decisions based on results that don't model the realistic conditions seen by real applications.

If you really want to see the performance difference between HotSpot and JRockit, run your own benchmarks on your own real applications. Make sure that you run them long enough that both environments are sufficiently warmed up, as both perform various dynamic optimizations. I'm confident that you'll find that HotSpot has very competitive performance compared to any JVM in the market. But don't forget to put both systems through your suite of reliability tests as well, as performance without reliability is worthless. If you find a case where JRockit outperforms HotSpot, then please file a bug against HotSpot, as we consider any realistic performance difference to be a bug. If you don't want to file a bug, then please post something about your benchmark in the discussions section of the Java Performance Community web site.

* HotSpot does run on Apple thanks to the excellent port done by the engineers at Apple. It also runs on HP's PA-RISC and Itanium based HP-UX systems thanks to the work by HP's engineers. Not to mention all the Sun supported platforms, Sun's real time Java implementation, also based on HotSpot, and countless other ports to various embedded systems.

Posted by briand ( Dec 22 2005, 10:34:36 AM CST ) Permalink Comments [5]

Trackback URL: http://blogs.sun.com/briand/entry/stacking_the_deck
Comments:

Brian, I've posted my comments in my blog at: http://dev2dev.bea.com/blog/sla/archive/2005/12/misguided_chall.html

Posted by Staffan Larsen on December 23, 2005 at 03:31 AM CST #

Brian says the challenge will deliver nothing of value in terms of a performance comparison. Probably not. In the light-hearted spirit it was offered, maybe it was never intended to. Note that it was not written by an anonymous user - clicking through to the blog's home page provides clear authorship (indicating that it's written by our enthusiastic CodeShare community manager, not by a BEA employee).

Posted by Jon Mountjoy on December 23, 2005 at 04:51 AM CST #

Indeed, I did intend that the rules just be 20 lines because that is small enough to see a small bit of code that would let you see a bit of performance difference. It is in the same spirit as a writing contest of 1000 words or less. The intent is to eek out as much as you can in 20 lines. I also wanted to be able to 'see' these improvements in performance, and thus 20 lines is a good round amount to code. If you will also note, I wanted to have people just post to the blog comments and any more would be silly. In summary, no hype, no marketing, just a challenge to get people thinking and participating in the BEA community. The silly thing here is that Brian could have emailed me and I could have clarified all this :o)

Posted by daniel Brookshier on December 23, 2005 at 02:42 PM CST #

The rules have been changed in the original post to reflect the complaint on code size restrictions. I did however make it clear that I want to see code. This is not a game of blind man's bluff.

Posted by Daniel Brookshier on December 23, 2005 at 03:05 PM CST #

Daniel says in the above comment, "I also wanted to be able to 'see' these improvements in performance, and thus 20 lines is a good round amount to code." Later, it has been revised to "no holds barred". It is not about code size. It is clear that BEA's JRockit and Sun's HOTSPOT works differently, and it will be very difficult to conclude statistically unless you have more rules. It will be good to hear from real customers who have tried both JVMs with more details. The point to note here is that the "benchmarking" is not an easy job to make apples-to-apples comparison. For example, SPEC spends a lot of time making sure that benchmarking is done in fair manner and it defines a lot of rules for each benchmark. Without those rules, this can become subjective.

Posted by Madhu Konda on January 02, 2006 at 06:23 PM CST #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed

Calendar

RSS Feeds

Search

Links

Navigation

Referers