Monday December 11, 2006 Java 6, powered by the recently open-sourced HotSpot JVM is impressive. Here's a summary:
Out of box performance is the right goal for JVM development, and future Java benchmarks should reflect that goal. Delivering optimizations quickly to allow high benchmark results is fun but it doesn't help customers unless they become part on the default runtime behavior of the JVM.
Just to be clear and to reiterate once again, the intention of the data charts below is to highlight the importance of customer experience and out-of-box performance to Sun Java Engineering. These are not meant to be high performance benchmark results. Hand tuning can change the results significantly.
The following is an out-of-box performance comparison on a Dell 2950 and a Sun Fire X4200. The Dell system is configured with 2 dual-core Intel 5160 processors (2 CPUs, 4 cores @ 3.0Ghz) and 16GB of RAM. The Sun system is configured with 2 dual-core Opteron 280 processors (2 CPUs, 4 cores, 2.4 Ghz) and 8GB of RAM. The Operating System installed on both systems is Red Hat EL 4.0 AS Update 4. The kernel version is unmodified from the base install, which is 2.6.9-42.ELsmp. The only variable in this configuration is the JVM.
The JVM distributions and versions tested were the latest versions publicly available at the time of testing. The BEA JRockit JVMs tested are downloaded from their main GA website and their 64-bit performance update website. The IBM JVM is the latest available on the IBM developer website.
The first set of charts reflect performance on Intel's latest Core 2 micro-architecture. The results below, particularly the SPECjbb2005 results, strongly highlight a core difference in philosophy between Sun HotSpot and its competitors. If you look at highly tuned competitive submissions of our competitors, BEA JRockit in particular, have impressive numbers on the new chip. Our competitors have chosen to quickly deliver platform specific performance optimizations for the purpose of benchmark submissions but require the use of several tuning parameters to achieve that level of performance. Unfortunately this is quite misleading for customers. Yes, the benchmark numbers are good, but can a customer jump right in and use these features? If they were thoroughly tested and ready for prime time shouldn't they be enabled by default on the platforms that require them? We think so, and we have chosen differently, and thats the difference with HotSpot.
The first chart is SPECjbb2005. SPECjbb2005 is SPEC's benchmark for evaluating the performance of server side Java. It evaluates server side Java by emulating a three-tier client/server system (with emphasis on the middle tier). It extensively stresses Java collections, BigDecimal, and XML processing. The cool thing about SPECjbb2005 is that optimizations targeted for it also show performance gains in other competitive benchmarks, such as SPECjappserver2004, and a broad range of customer workloads. The benchmark results below are run in single instance mode.
SciMark 2.0 is a Java benchmark for scientific and numerical computing and is a benchmark where Sun's JVMs have continued to shine. Its a decent test of generated code, particularly for tight computational loops. However it is particularly sensitive to alignment issues and can show some level of variance from run to run, mostly in a bimodal fashion. The test has three modes of exectution; small, large, and default. This is the size of the data under test, more details can be found at the scimark website. All in all its a good set of microbenchmarks.
Note that the 32-bit JVMs in all cases are faster than the 64-bit JVMs when running on the Intel Core system. This is quite different than the AMD Opteron system further down the page where 64-bit is significantly faster. Since the Scimark 2.0 test is using the large dataset, its likely that the added pressure of 64-bit pointers on the memory subsystem increases bandwidth enough to impede performance, however this is just a hypothesis.
Volano is a popular Java chat server. The benchmark is quick and involves both a client and server instance. From a JVM perspective the workload is heavily dominated by classic Java socket I/O which is a bit long in the tooth, an NIO version would be quite interesting. That being said, some customers have found this benchmark quite useful so we continue to test it, however it is by no means our favorite benchmark as my friends at BEA have suggested. Running Volano the performance gaps are not as large, most likely because this benchmark has very little garbage collection overhead. BEA JRockit is showing good performance here with a result thats 19% over the baseline. Sun Java SE 6 shines as well with a result thats nearly 22% over baseline.
The second set of charts are run on a Sun Fire X4200 with AMD Opteron 280 CPUs. This is the identical system used in my previous blog articles on this subject, this time with updated JVM releases from Sun and IBM. I'm sure someone will be curious why I didn't compare the Intel and AMD based systems directly. The primary reason is simple, I'm writing about JVM performance, not CPU performance. That being said, I didn't have the latest AMD CPUs readily available. In short, Intel is faster when running some of these benchmarks, while AMD is faster on others. In general the memory subsystem differences between these platforms is prevalent when comparing the performance of Java benchmarks. Sun Java 6 is showing impressive results running SPECjbb2005 with a result 30% over baseline and ~15% faster than J2SE 5.0_10.
Scimark 2.0 is impressive on AMD Opteron as well. The large dataset is an interesting workload as its effect on cache can highlight memory subsystem limitations. If your application crunches on a large dataset, take a look at the large dataset of Scimark when comparing JVMs and system architectures.
Last but not least is Volano on AMD Opteron (and again, no this is not our favorite benchmark!). Java 6 shows a strong improvement of with results more than 20% greater than 5.0_10, pulling ahead of 64-bit BEA JRockit. Nice.
SPECjbb2005 Result Disclosure
Single Instance Run. SPECjbb2005 bops = SPECjbb2005 bops/JVM
System: Dell 2950, 2 X Intel 5160 (2 CPUs, 4 cores @ 3.0Ghz), 16GB of RAM.
| JVM Version | 32-bit SPECjbb2005 bops | 64-bit SPECjbb2005 bops |
|---|---|---|
| IBM 5.0 SR3 | 43,575 | 32,617 |
| BEA JRockit 5.0_06 R26.4 | 26,071 | 26,092 |
| Sun J2SE 5.0_10 | 49,308 | 46,080 |
| Sun Java SE 6 | 62,246 | 56,488 |
| JVM Version | 32-bit SPECjbb2005 bops | 64-bit SPECjbb2005 bops |
|---|---|---|
| IBM 5.0 SR3 | 30,500 | 23,998 |
| BEA JRockit 5.0_06 R26.4 | 19,309 | 19,185 |
| Sun J2SE 5.0_10 | 35,297 | 31,096 |
| Sun Java SE 6 | 39,973 | 34,975 |
SciMark 2.0 Result Disclosure
Large Dataset. Score is in SciMark MFlops
System: Dell 2950, 2 X Intel 5160 (2 CPUs, 4 cores @ 3.0Ghz), 16GB of RAM.
| JVM Version | 32-bit Score | 64-bit Score |
|---|---|---|
| IBM 5.0 SR3 | 171.49 | 207.21 |
| BEA JRockit 5.0_06 R26.4 | 278.15 | 276.37 |
| Sun J2SE 5.0_10 | 321.85 | 292.89 |
| Sun Java SE 6 | 357.72 | 336.58 |
| JVM Version | 32-bit Score | 64-bit Score |
|---|---|---|
| IBM 5.0 SR3 | 175.02 | 180.46 |
| BEA JRockit 5.0_06 R26.4 | 230.85 | 231.53 |
| Sun J2SE 5.0_10 | 300.23 | 332.23 |
| Sun Java SE 6 | 320.42 | 343.74 |
VolanoMark 2.5.0.9 Result Disclosure
Loopback performance test
System: Dell 2950, 2 X Intel 5160 (2 CPUs, 4 cores @ 3.0Ghz), 16GB of RAM.
| JVM Version | 32-bit Score | 64-bit Score |
|---|---|---|
| IBM 5.0 SR3 | 121,747 | 111,826 |
| BEA JRockit 5.0_06 R26.4 | 128,185 | 146,012 |
| Sun J2SE 5.0_10 | 120,048 | 116,959 |
| Sun Java SE 6 | 149,198 | 142,602 |
| JVM Version | 32-bit Score | 64-bit Score |
|---|---|---|
| IBM 5.0 SR3 | 64,218 | 60,802 |
| BEA JRockit 5.0_06 R26.4 | 73,627 | 76,675 |
| Sun J2SE 5.0_10 | 66,955 | 64,316 |
| Sun Java SE 6 | 80,592 | 75,156 |
Nice stuff!
Presumably the deployer will still have to explicitly specify -client/-server for now if their host's parameters don't match the the "ergonomics" automatic choice for that flag, eg a powerful client machine that would force -server by default for an interactive app.
Also, presumably it is still necessary to choose (say) CMS GC explicitly if running a highly-interactive app, ie where absolute performance is trumped by the need to keep pauses short?
Rgds
Damon
Posted by Damon Hart-Davis on December 11, 2006 at 09:57 AM EST #
Posted by dagastine on December 11, 2006 at 11:19 AM EST #
Posted by Michael Slattery on December 13, 2006 at 12:01 PM EST #