Thursday June 30, 2005 As previously posted, I like analogies when describing how best to approach managing performance of a computer system running a business application. One analogy I have tended to use is to compare performance management to insurance.
In today's world we accept that we need insurance for all kinds of things. Apart from the fact that some of these are mandatory (car insurance if you drive a car in the UK), most people understand that the consequences of not having insurance when tragedy strikes far outweigh the costs of obtaining that insurance in the first place. Even though you never intend or expect to make a claim against that insurance.
So today we understand that we need separate insurance policies to cover many different aspects of our life:
In fact the list of different types of insurance you can buy today just goes on and on.
I see proactive performance management as a form of insurance. By paying some extra money up front to instrument your systems, and to monitor and record what is happening on them, you will be in the best possible situation to respond when something starts misbehaving.
If something ever goes wrong, then you will already have in place all the information you need to analyse what is happening, identify the root cause, and decide on the most appropriate form of action to remedy it.
But this is not what most computer departments do. They wait until something goes wrong, and then take an iterative approach to trying different fixes until one of them succeeds. Often these fixes either involve down time of the application for each change, or significant monetary outlay to obtain extra resources (typically CPUs, memory or disks).
But the key point is that without proper information about how the application and systems are behaving, you cannot identify the true cause of the problem. Often you are just using "rules of thumb" you have, and are tackling anything that seems unusual. Whether or not that is related to the cause of the performance problem.
Some of the performance management and analysis tools out there can record where time is being spent by the application, and how much of what resources it is using. With this information you can easily identify what has changed when a performance problem is reported. Having identified the cause, you can determine what effect any changes you propose might have on the overall performance of the application and the systems it is running on. Knowing where the application is spending its time during each transaction, will help you focus on the areas that would give the greatest payback.
Furthermore, these performance management tools will let you easily identify any change in the performance behaviour of the systems and the application. So you can identify changes in normal behaviour before they grow to the level of impacting the observed performance of the application. Even if the degree of change is very small, you can still use trend analysis to estimate when in the future there could be a noticeable impact on performance.
And all this for some extra money up front. Instead of having to keep teams of troubleshooters around, just in case. And then experiencing lengthy periods of degraded performance and service levels when any performance problem occurs, while you try different fixes until one of them works. And then hoping that you have finally fixed it all, and that it doesn't happen again.
( Jun 30 2005, 03:29:47 PM BST ) Permalink Comments [2]
Posted by 192.133.193.80 on June 30, 2005 at 06:10 PM BST #
Posted by Adrian Cockcroft on January 31, 2006 at 03:35 AM GMT #