John Brady's Weblog

Oracle and System Performance

All | General | Oracle | Performance
« Previous day (Jun 29, 2005) | Main | Next day (Jun 30, 2005) »
20050630 Thursday June 30, 2005

Performance is like ... (2)

As previously posted, I like analogies when describing how best to approach managing performance of a computer system running a business application. One analogy I have tended to use is to compare performance management to insurance.

In today's world we accept that we need insurance for all kinds of things. Apart from the fact that some of these are mandatory (car insurance if you drive a car in the UK), most people understand that the consequences of not having insurance when tragedy strikes far outweigh the costs of obtaining that insurance in the first place. Even though you never intend or expect to make a claim against that insurance.

So today we understand that we need separate insurance policies to cover many different aspects of our life:

In fact the list of different types of insurance you can buy today just goes on and on.

I see proactive performance management as a form of insurance. By paying some extra money up front to instrument your systems, and to monitor and record what is happening on them, you will be in the best possible situation to respond when something starts misbehaving.

If something ever goes wrong, then you will already have in place all the information you need to analyse what is happening, identify the root cause, and decide on the most appropriate form of action to remedy it.

But this is not what most computer departments do. They wait until something goes wrong, and then take an iterative approach to trying different fixes until one of them succeeds. Often these fixes either involve down time of the application for each change, or significant monetary outlay to obtain extra resources (typically CPUs, memory or disks).

But the key point is that without proper information about how the application and systems are behaving, you cannot identify the true cause of the problem. Often you are just using "rules of thumb" you have, and are tackling anything that seems unusual. Whether or not that is related to the cause of the performance problem.

Some of the performance management and analysis tools out there can record where time is being spent by the application, and how much of what resources it is using. With this information you can easily identify what has changed when a performance problem is reported. Having identified the cause, you can determine what effect any changes you propose might have on the overall performance of the application and the systems it is running on. Knowing where the application is spending its time during each transaction, will help you focus on the areas that would give the greatest payback.

Furthermore, these performance management tools will let you easily identify any change in the performance behaviour of the systems and the application. So you can identify changes in normal behaviour before they grow to the level of impacting the observed performance of the application. Even if the degree of change is very small, you can still use trend analysis to estimate when in the future there could be a noticeable impact on performance.

And all this for some extra money up front. Instead of having to keep teams of troubleshooters around, just in case. And then experiencing lengthy periods of degraded performance and service levels when any performance problem occurs, while you try different fixes until one of them works. And then hoping that you have finally fixed it all, and that it doesn't happen again.

( Jun 30 2005, 03:29:47 PM BST ) Permalink Comments [2]

Performance is like ...

I am a big believer in what I call "Proactive Performance Management". In other words, doing something about performance of an application on a computer system before it becomes a problem. By which point, of course, it is too late.

One of the problems I have is persuading people that this is something worth spending time, effort and money on today. Most people take the approach of "If it ain't broke, don't fix it", and so do not see the benefit of spending money on addressing a problem that doesn't yet exist. So I am always on the lookout for any good descriptions of the dangers of not addressing performance properly, and of the benefits when you do.

I also like analogies, as they stop us getting stuck in a set of specialised terminology related to computers. And a good analogy will get the point over, and show that the principle applies to other scenarios too. Which should increase the strength of the argument being put forward.

So, while reading Adrian Cockcroft's blog I came across a posting comparing fighting house fires to managing performance ( Playing with Fire ). And this made a lot of sense to me. No one would prefer to live in a building that was not well designed, and had taken the consequences of fire into account. Otherwise, you would end up spending a lot of of your time dealing with spontaneous fires. Given the choice most people would choose a well designed, safe building.

So why do we not design performance into the environments in which we deploy software applications? Why do we continue to presume that nothing needs to be done about performance, and end up spending significant amounts of time and effort "fighting fires" when some system or other starts behaving badly?

The analogy to avoiding fires brings out another point. You do not add performance or performance management in at the end, when the system has been built and deployed. Performance is not something you can just bolt on to an existing system. Just like you cannot bolt on fire safety to an inadequately designed building after it has been built. Good performance management needs to be designed in from the very start of the system.

( Jun 30 2005, 10:37:14 AM BST ) Permalink Comments [1]

Calendar

RSS Feeds

Navigation

Links

Referers

Search

Recent Posts