Nandini's Weblog

Tuesday Feb 10, 2009

Performance Advisor : VM Alerts

GlassFish Enterprise Server 2.1 was released today as part of the GlassFish Portfolio

I wish to add a few details of one particular feature Performance Advisor Run-time Alerts

The main goal of this feature was to introduce a configurable alerting mechanism to monitor key performance indicators. Users can configure their own thresholds based on the behavior of their applications and setup email alerts when these thresholds are crossed.

An example would be setup of thresholds to monitor the CPU Usage. Whenever CPU Usage continues to grow over a prolonged period, performance is inversely affected. A user can setup the threshold to get notified when the CPU Usage trend indicates that the CPU Usage has been high and has crossed user's tolerance boundary.

Alerts are helpful during benchmarking phase for performance tuning as well as on production environments where in addition to notifications, audit trails can be recovered from the instance's server.log files.

This release enables Runtime Alerts for Memory and CPU resources. There are five alerts to do so. Three of them monitor "Trends" of the performance metric. Let's see the two non-trend analyzing alerts. Here the metric in question is sampled and immediately compared with the threshold

Physical Memory Alert:
This alert monitors the available free Physical memory which, if low, can increase swapping activity to levels that adversely affect performance of GF server running on that system.

GC Pause Time Alert
This alert monitors how long did the last GC pause take. A long GC pause indicates that the Garbage Collector on the instance is tuned to do less GCs and wait till memory is filled up. Then in one fell swoop GC is run. Such a tuning, while fine on certain kinds of apps, is incorrect for user request handling apps. The downtime due to GC run could cause request failures leading to an unacceptable user experience. In some cases where user has tuned the GC with a -XXMaxGCPauseMillis threshold, the user would like to know if the latest GC run times are approaching the set threshold

Now let's turn to the three trend analysing alerts.

What do we need a trend analysis for anyway?

Trend analysis helps in clamping noise levels in the measurements. For example, there could have been a spike in CPU Usage or Memory Usage. But that is not really deemed "harmful" from performance perspective as long as the usage returns to normal levels. So to indicate a REAL growth in a resource usage over prolonged periods, it is necessary to take AVERAGES. And that is exactly what the following alerts achieve. They calculate a MOVING AVERAGE of multiple samples of the metric in question and then compare that value with the user defined threshold.

Memory Leak Alert
JVM's HEAP and NON-HEAP memory is divided into smaller units based on the age of objects residing in that unit. Youngest to Oldest generation spaces are : (HEAP) Eden Space, Survivor, Tenured and (NON HEAP) Perm Gen. The younger generations fill up fast but are GCed very often too. The chances of memory leakage in these generations is very low. However, older generations get filled slowly and typically a GC run does not empty them as it does for younger generations. So memory leak in these generations is a serious matter and can eventually lead to OOM exceptions; not to mention the crawling performance of the apps on that instance before that occurs.

A user can setup Memory Leak alert to watch the usage levels of these older generations and be warned of the unwelcome growth in these spaces well in advance.

CPU Usage Trend Alert
As the name suggests, this alert helps user setup a threshold on the usage of CPU. If the CPU time consumed by a user app increases steadily, it may indicate a performance degrading activity. The alert samples for the CPU Usage over a window of time and compares the average to the predefined threshold.

JVM Throughput Alert
JVM throughput is measured as the time spent(rather wasted) on VM's housekeeping activities like Garbage Collection. In other words it is the non application-time. This alert records the time that is spent on such non-application activities over a user configurable period of time and sends notification when the predefined threshold is crossed. User can statically tune a VM for a target throughput using the -XXGCTimeRatio.

Such static tunings are only HINTS for the VM. Whether the goals set statically are really honored can be determined only through monitoring.

The VM alerts thus also provide a way to check reality against desired goals when it comes to VM's behavior!

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed

Archives
Links
Referrers