I've been responsible for serveral online systems at Sun over the years. Systems that are meant to have a 99.9+ up time. For the last several years these systems have been primarily Java based. As I have trudged over tuning problems with these systems, I have built up a small list of JVM options that have been very useful for me, and here they are.
-Xsqnopause - When you send the JVM a SIGQUIT, it normally suspends execution and presents a menu on the tty of several debugging options. This can be a bad thing while
debugging an online system. For example, a web server like Tomcat, would just pause, and leave all the users hanging waiting for a response. What this option does is tell the JVM to simply output a thread dump and then continue executing. The thread dump can then later be analysed. Or, what's often useful when searching for deadlocks, is taking another thread dump at a later time, and comparing the two.
-XX:+JavaMonitorsInStackTrace - When debugging a particularly nasty problem one week where the JVM would seem to
stop doing real work, I found that the normal thread dump output was insufficient to determine what was going on. After hours of scouring the net, I found this option, which provides monitor information in thread dumps. Monitors are the mechanism which provides synchronization, so on a hunch that it might be a deadlock, I decided to try it. After producing a few thread dumps with this option, it became clear that the it was not a deadlock issue but rather a starvation condition. Many threads were waiting for their turn, and spinning until they got it. More on this later.
-XX:+PrintGCDetails and -verbose:gc - These are both nearly the same thing, although PrintGCDetails is only available on 1.4.0 and newer JVMs. They both provide helpful displays of how garbage collection is occuring, and help you tune garbage collection appropriately for your application with the millions of other garbage collection options. Essentially they show how often and how long GCs are occuring. In older JVMs, during a GC, the JVM would have to pause all running threads, to do its thing, so making GCs happen less often and take a shorter amount of time is essential in an online system. Alot of work has been done on GCs and my understanding is that the garbage collectors in the newer JVMs can do much more work in parallel.
-XX:MaxNewSize - Out of those millions of other garbage collection options, this one I have found particularly useful. It sets the maximum size of the young, or new generation much higher than the default. Doing this is particularly useful with applications that create large amounts of temporary or short lived objects. If the new size isn't large enough, some of these objects have to be prematurely promoted to the the old generation, and which eventually will cause more full GCs to occur more frequently. Nip those objects in the bud if you can.
-XX:+UseLWPSynchronization - The starvation issue I mentioned earlier ended up being a problem with the threading model the JVM was using. The particular application I encountered this was on Solaris 8, in which the JVM defaulted to a
many LWPs-to-many Threads model. The problem though was that the number of LWPs were only a fraction of the number of actual threads, and in order for a thread to run, it needed to be on an LWP. The JVM started spending all its time trying to schedule thread on LWPs. With the UseLWPSynchronization flag, The JVM changed the number of LWPs it created to match the number of threads, which made simplified scheduling. The JVM could essentially just let the kernel schedule LWPs to run on CPUs, without the intermediate step of scheduling threads on LWPs.
|
Posted by Chris Rijk on July 21, 2004 at 02:58 AM PDT #
Posted by hoffie on August 05, 2004 at 02:14 PM PDT #