John Brady's Weblog

Oracle and System Performance

All | General | Oracle | Performance
20050810 Wednesday August 10, 2005

DTrace is great!

DTrace is really good. I'm sure most of you know that by now, so I'm not telling you anything new. But I was in an internal presentation on DTrace the other day, and I was just blown away by the ease with which the presenter created D scripts (the DTrace language) to find out what was happening on a system, and then drill down further.

Up to now I have always viewed Solaris Containers as the most useful feature of Solaris 10 i.e. the feature that you would benefit from the most, and would be the one that made you migrate to Solaris 10. But after this presentation I have changed my mind. DTrace gives you so much observability on what is happening on your system. And it does it immediately, non-intrusively (no need to recompile your application), in real time (meaning as it happens with no special set up), gives you as much detail as you want (including access to all arguments to functions and system calls), and is totally safe and always ready to use in Solaris 10.

Presuming that most application environments have some inefficiency or other in them, DTrace just opens up everything for you to find out where your bottlenecks are, and help identify the root causes of them. And you don't need the source code of any application to do this.

Want to see what is consuming up all the I/O on the system? Then write a small DTrace script to count the calls to the read and write system calls. In fact, also look at the size of each I/O too, and show the distribution and count the I/Os by process.

I know that this is all standard DTrace stuff, and that it has all been described many times in the DTrace documentation and other articles and blogs. So I won't waste time trying to describe what DTrace is capable of. I just wanted to say, that if I was an administrator or developer I would be begging for Solaris 10, so that I could use DTrace to investigate any anomaly with the system.

Containers would take some time to do all the necessary work: design the final configuration, set it all up, migrate the applications over from other systems and environments into each container, and require monitoring to ensure that performance of each application was acceptable. But DTrace I could use immediately on Solaris 10, on all applications, regardless of their configuration and set up.

( Aug 10 2005, 10:57:55 AM BST ) Permalink Comments [0]

20050802 Tuesday August 02, 2005

System Activity Snapshot or Performance Profile Baseline

Taking a snapshot of the activity on a system, or a baseline profile of its performance, is a useful tool for dealing with future performance problems. By establishing a baseline, and recording all of its associated details, we have a reference point for comparison at any point in the future. Should any performance problems be reported about the system in the future, then we can compare the current profile of the system to its previous baseline. Any differences will help us identify what has changed and from that the cause of the change in behaviour on the system. How else are we find out what has changed, and is causing the change in performance behaviour of the system? Without a baseline for reference, you literally do not know what has changed since the system was last working properly.

A performance profile consists of saving information about the utilisation of all of the resources on the system (CPU, Memory, Disk, Network, etc.), and all of the processes on the system (the consumers of the resources). This information is normally taken as a series of snapshots over a period of time. Examples range from every hour throughout a 24 hour day, to every minute during the peak hour of the day.

If you have some kind of performance management or monitoring tool, then it should be capable of capturing this data for you over your representative period, and then saving it away somewhere permanently for later use. If you don't have such a tool, then you can achieve something similar using standard UNIX tools like sar, ps, System Accounting and maybe even top (or prstat on Solaris), saving the outputs to a set of files. Of course it will require some manipulation to turn this raw data into a profile of the system and the applications running on it. But it must be better to have some data, no matter how raw it is, than no data at all.

Once this snapshot of the system behaviour has been established, we now have a point of reference for what we consider to be normal activity on the system. Should anything appear to be wrong at any point in the future, then we can compare it to this baseline snapshot, and find out what is different.

Performing this baseline does not require a lot of effort, yet has enormous potential benefits:

The cost of this is very little in real terms – some disk space to store the performance profile data, and some software tools to capture that data. Note that these tools would be needed to undertake any performance problem analysis anyway. So if you are serious about performance management on your systems, and have such tools, then there is no real extra expense to using them to baseline your systems.

( Aug 02 2005, 02:03:08 PM BST ) Permalink Comments [0]

20050725 Monday July 25, 2005

Niagara: Measuring Performance and Sharing

Sun's future Niagara processor has many implications when it comes to measuring the performance of a system, and the behaviour of the applications running on it. A single threaded processor and its execution core are either utilised or not at any given point in time – all or nothing. With a multi-threaded processor like Niagara, utilisation of the execution core can now have values between 0% and 100% - there are shades of grey between black and white.

Our goal is to maximise the performance of our applications on the computer hardware configuration we have available to us. One aspect of this is to understand how utilised the system capacity is. Another aspect is to understand how the resources are being used by each of the applications running on it. With single threaded processors this simplistic on/off view is close enough to the truth, and we have been used to it for many years. With multi-threaded processors we can either retain this simplistic view, or we can enhance it and be honest about the existence of the shades of grey.

Until now a processor has only had a single execution core capable of executing instructions from one thread at a time. To switch between threads involved intervention by the operating system, which in turn involved executing instructions from the kernel to save the current state of this thread, and then choose another thread to schedule on this processor. So only ever one thread of a process was executing on that processor.

The operating system, such as Solaris, records the elapsed time that the thread spends running on that processor, and that the processor is busy (utilised) during this period. This is used to update the running total of the total CPU consumption of the thread and the busy time of the processor, which is used in calculating its overall utilisation.

With a Niagara processor the operating system can still record the same pieces of data about each thread it schedules on each execution core, and they are still valid (i.e. they reflect the scheduling performed by the operating system). However, they no longer tell the full story. Some things are missing which could be very relevant for a multi-threaded processor such as Niagara. We will now also be interested in the efficiency of the processor – how much of its potential capacity has been used and how much is spare - and how effectively a thread has been able to execute on it.

A Niagara processor will have 8 execution cores. Each of these execution cores will be able to concurrently deal with 4 threads. So Solaris will see 32 separate processors for scheduling threads on. However, there are in reality only 8 execution cores, not 32. Each execution core is being shared by 4 separate threads, with the execution core switching between their instruction streams during relatively long memory accesses by one thread to keep itself efficiently utilised. But at any instant in time the execution core is only executing one instruction at a time, not four instructions at a time.

Four threads concurrently executing on a Niagara processor can have different performance behaviours dependent on which execution cores the operating system schedules them on. If the four threads are each scheduled onto different execution cores, and presuming that there are no other active threads, then there is no interference between the execution of the threads. This is similar to having four separate single threaded processors.

However, instead, if the four threads are all scheduled onto the same execution core, then they will be sharing that execution core, and there will be some form of interference between them, as they each in turn have their instructions executed while other threads access data from memory.

Due to this sharing of the execution core, I would now like to know about the utilisation of the physical execution cores in the Niagara processor, as well as the normal thread level scheduling. I would like to know about the Niagara's utilisation both in terms of number of concurrently executing threads, and overall efficiency (how many potential execution cycles were successfully used and how many were not due to lack of a ready instruction to execute).

Obviously, if an execution core is not at maximum efficiency and has unused execution cycles due to long memory accesses, then it is worth adding an extra thread of execution to it. If it is at maximum efficiency already then we would be better off adding in extra, separate execution cores to handle extra threads.

Knowing this information would help us make the right decisions on how to size our systems, and how much extra capacity it could deal with. So the big question is, how is Solaris going to actually measure and report both CPU utilisation and thread CPU usage (consumption) on multi-threaded Niagara processors? Another interesting set of challenges to have with the new design principles behind Niagra.

( Jul 25 2005, 02:33:44 PM BST ) Permalink Comments [3]

20050712 Tuesday July 12, 2005

Collecting System Activity Statistics - sar

Having preached the benefits of monitoring your systems by measuring what is happening on them, the next question is “How do I actually measure system activity?”. Well, there are a number of third party products out there that will do this for you. They are worth having because they are designed to continually collect data about the system and then let you report on it and analyse it in a number of ways later. Also, they can help you manage performance across a large number of systems. But we'll get to those later. Presuming you don't have access to these kinds of tools and want to do be doing something rather than nothing, what can you do with Solaris out of the box?

I'm a big fan of sar, the System Activity Report package, although it is by no means a perfect tool. Sar and its associated package of commands come standard with Solaris (and most UNIXes in general), and you can even get it for Linux. Sar gathers most system activity statistics at once, has a low overhead to using it, and can store its data in a binary file for later analysis. You can use sar to collect system activity data by either using the associated commands that come with it (sa1 and sa2) or by directly running the data collector itself (sadc).

I like sar because it can be set up very quickly, and will have minimal impact on the system if you do not collect data too often. The Solaris kernel is always gathering and calculating these statistics - all we are doing is telling it to save them to disk every now and then. So the only real cost is the disk space required to save the daily data files away.

Solaris comes with some cron entries for the sys user, ready for you to enable sar to collect data. The default entries in the crontab for sys that do this are:

# 0 * * * 0-6 /usr/lib/sa/sa1
# 20,40 8-17 * * 1-5 /usr/lib/sa/sa1
# 5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A

The net effect of this would be to collect data every 20 minutes during the working day (8am – 5pm, Monday to Friday), to a data file named /var/adm/sa/sadd, where dd is the day of the month. At 6:05pm every day it would process the day's data, and produce a text file with all the sar reports in it, called /var/adm/sa/sardd. It also deletes sar data files older than 7 days.

To enable sar data collection at this default frequency, all you need to do is edit this file (as the root user do 'crontab -e sys') and remove the leading '# ' on these 3 lines. Do not edit the file directly, you must use the crontab command to make changes.

However, many systems are now 24 x 7, so these defaults may be inappropriate. To change the collection frequency to every 15 minutes every day (which would now give us 96 sample points per day), edit the crontab as before, but replace the two sa1 lines with the following:

0,15,30,45 * * * * /usr/lib/sa/sa1

Sa1 and sa2 are actually just shell scripts. It would be very easy to customise these. However, you should not change these scripts in place, as they might get overwritten by future upgrades to the operating system. It would be better to copy these scripts to another location, such as /usr/local/lib/sa, creating it if it does not exist, and change the sys crontab entries to refer to the files in these locations instead. Then you can edit and customise these local scripts as desired.

One possibility would be to extend the retention period for the sar data files, and to not bother producing the sar report files every day or only produce the CPU utilisation report. Using a naming convention of sayymmdd instead of sadd we can safely retain the data files for over a year (400 days allows year on year comparisons).

Sa1 and sa2 use sadc, the SAR Data Collector. You can use sadc directly instead of using sa1 and sa2, to perform regular collection of activity data to a file. Sadc can be run to collect data and save it to a file, at a regular collection interval for a number of sample points. You can use sadc manually to gather more detailed data for any specific period of time you want, and analyse the data later using sar.

sar takes a number of options. One option is to specify the data sets you want to see. The default is -u for CPU utilisation. Another option (-f) allows you to specify the data file to read from and report on. And you can also specify the time periods you want reported (-s and -e).

Sar gives you one half of the performance monitoring picture – how busy are each of the resources in the system? The other half – what is using the resources? - is much more difficult, for a number of reasons. There is no direct utility that captures process level data at regular time intervals to a data file for later reporting. There are some utilities that give per process information, but they all work separately and tend to produce simple text files. Given the number of processes on most systems, these can run to large files, and could make navigation awkward.

For a simple list of all process on a system you can use ps. For a list of the most active processes, you can use prstat, which is really meant to be a real time analysis tool, rather than a data capture and reporting tool. System Accounting can also be used to report on process resource consumption (acct), but suffers from the flaw that it is only updated when a process finishes. Therefore you never get any information about a process until it terminates.

Third Party Tools

There are a number of third party tools, which I shall not begin to attempt to list or describe or compare. You can either use specialised performance management products, or use more generic system management products that have a module for collecting and reporting on system performance.

I use TeamQuest quite a lot, and am happy with it for what I use it for. I can install TeamQuest Manager easily and have it immediately start collecting data on what the system is up to. Using View I can then report on both the resource utilisation and the processes that are consuming them. If necessary I can use Model to perform 'what-if' analysis on a problem system to see the effect of any configuration changes.

( Jul 12 2005, 11:38:27 AM BST ) Permalink Comments [2]

20050711 Monday July 11, 2005

Scalability, hiding memory latency, and Niagara

Most people probably want a 'scalable' system when they buy a computer. 'Scalable' can be taken to mean a whole bunch of different things, but for what I am talking about here we can presume it means both the ability for the system to support increases in workload while maintaining the same transaction response time, and that by doubling the system configuration (the resources) it can support twice the workload.

Scalability of a system is not really about what is there in the system, but rather what is 'not' there. Scalability becomes limited or constrained when something gets in the way. Less is more. Good scalable design is about either avoiding the things that can get in the way, or explicitly designing around them when they are unavoidable. Examples of things that have limited system scalability in the past include:

The classic description of the results from this kind of scalability is Amdahl's Law. However, computer hardware and software vendors have spent many years developing various techniques that let them build and deploy large systems that do not suffer from Amdahl's Law. So, generally, today's systems are more balanced designs, that scale well within their stated capacity.

With today's generation of computers, it is actually the memory sub-system that limits the scalability. This is because the processors in computers have got at least 1,000 times faster in the past 10 years, whereas memory has probably only got 100 times faster. It is the relative ratio between these two that is important, not the absolute amount of the increases. Also, the other end of the computer system where it connects to the outside world – the network – and where work comes from and results go back to, has also got a lot faster over the past 10 years. We have moved from 10 Mb/sec through 100 Mb/sec to 1 Gb/sec (Gigabit Ethernet) as standard for many networks.

Modern processors work internally at multiples of the external system bus frequency – anything from 4 to 10 times would be possible. So a 2 GHz processor may be interfaced to a 400 MHz bus, for a multiple of 5. This already shows that any memory access is going to waste multiple internal cycles of the processor. On top of this, modern memory sub-systems do not respond with the data within a single system cycle. It is several system cycles after being given an address that they respond with the data.

How does this affect a CPU? If a CPU had an internal clock speed of 1 GHz, then one CPU cycle is 1 ns (nanosecond). If the total time to obtain data from memory was 100 ns, then the CPU has been idle for most of the 100 CPU cycles. (It will not have been totally idle, as modern CPUs use an internal pipeline of sub-tasks to execute each instruction in parts. Stages of the pipeline already executing other instructions will be able to finish them during the memory access).

So in modern computers we end up with the situation that often the CPU is idle, wasting cycles waiting for data from memory, and that it is the memory sub-system that is the limitation to how the system scales as more work engines (CPUs) are added to the system. Clearly there is an imbalance between a processor's ability to do work and how quickly the work can be supplied, so that the rest of the system spends time waiting on memory. This is where the key focus of good, balanced system design should be.

A good system design principle is therefore about hiding this difference between real memory speed and CPU speed, so that the impact on the CPU of the much slower memory is minimised. Most CPUs today have areas of silicon on them dedicated to this memory interface, doing things to try and offset the relative cost of memory access. This is where you will find things like data and instruction caches running at the same speed as the CPU, branch prediction, pre-fetch buffers and write behind buffers. Many of these are aimed at trying to get the data before the CPU needs it, which is not always possible due to the variations in how programs behave.

Sun's future Niagara processor has a new approach to this 'memory speed hiding' principle, by having four threads co-exist within the CPU's execution core at the same time. The CPU will only ever be executing one of these threads at any moment in time, as other current CPUs do. However, when the currently executing thread needs an external memory access, the CPU simply switches to another thread while this is happening. Thus the delay incurred for the memory access for one thread, is actively used to execute instructions of another thread. This has a number of benefits:

As someone who spends a lot of time concerned with the performance of computer systems, and the actual performance achieved by customers with their applications running on real hardware, the Niagara processor looks like a great win-win deal to me. It uses a simpler CPU execution core design, has a zero-cost switch between threads, hides memory access times, and increases overall system throughput and utilisation. And with less hardware (just one processor) than current systems.

I believe that as Sun ships an actual system using the resultant CPU from Niagara we will see radically different behaviour profiles from systems and their applications. We will have to learn to interpret CPU utilisation and application throughput in different ways. An existing application could behave differently on a Niagara based system, and achieve a greater throughput, yet with only a single CPU. In this case, less is truly more.

As the saying goes – “May you live in interesting times”.

( Jul 11 2005, 02:31:19 PM BST ) Permalink Comments [0]

20050704 Monday July 04, 2005

Performance Monitoring – Measuring Activity

I'm a big believer in proactive performance management, which means lots of different things. One aspect of it is doing proper performance monitoring of your systems.

Performance monitoring a system really means measuring and recording what is happening on that system. In simple terms – if you don't measure it, how can you know what is going on? If you don't know what is happening on each of your systems, how are you ever going to be able to diagnose the real cause of a performance problem?

And if you don't record it somewhere, how can you do any kind of analysis on the data being gathered? You need to have reliable data on what the system was doing, both recently and in the past.

Reliability is both about recording the data and measuring it accurately. You need to record the data so that you are looking at a consistent set of related values, rather than a set of constantly changing values. Accuracy is important, as inaccurate data will not help you identify the real cause of any performance problem, and so help with finding a solution. You need to be able to trust the performance data being gathered about your system.

However, it is not enough to know that CPU utilisation has risen to 80%. You also need to know what is consuming those CPU cycles. Only then can you focus in on the culprit(s), and try to find out what they are doing.

So performance monitoring needs to collect data from a system perspective (a collection of resources) and a workload perspective (a collection of processes), to get both sides of the system activity equation:

( Jul 04 2005, 12:07:29 PM BST ) Permalink Comments [0]

20050630 Thursday June 30, 2005

Performance is like ... (2)

As previously posted, I like analogies when describing how best to approach managing performance of a computer system running a business application. One analogy I have tended to use is to compare performance management to insurance.

In today's world we accept that we need insurance for all kinds of things. Apart from the fact that some of these are mandatory (car insurance if you drive a car in the UK), most people understand that the consequences of not having insurance when tragedy strikes far outweigh the costs of obtaining that insurance in the first place. Even though you never intend or expect to make a claim against that insurance.

So today we understand that we need separate insurance policies to cover many different aspects of our life:

In fact the list of different types of insurance you can buy today just goes on and on.

I see proactive performance management as a form of insurance. By paying some extra money up front to instrument your systems, and to monitor and record what is happening on them, you will be in the best possible situation to respond when something starts misbehaving.

If something ever goes wrong, then you will already have in place all the information you need to analyse what is happening, identify the root cause, and decide on the most appropriate form of action to remedy it.

But this is not what most computer departments do. They wait until something goes wrong, and then take an iterative approach to trying different fixes until one of them succeeds. Often these fixes either involve down time of the application for each change, or significant monetary outlay to obtain extra resources (typically CPUs, memory or disks).

But the key point is that without proper information about how the application and systems are behaving, you cannot identify the true cause of the problem. Often you are just using "rules of thumb" you have, and are tackling anything that seems unusual. Whether or not that is related to the cause of the performance problem.

Some of the performance management and analysis tools out there can record where time is being spent by the application, and how much of what resources it is using. With this information you can easily identify what has changed when a performance problem is reported. Having identified the cause, you can determine what effect any changes you propose might have on the overall performance of the application and the systems it is running on. Knowing where the application is spending its time during each transaction, will help you focus on the areas that would give the greatest payback.

Furthermore, these performance management tools will let you easily identify any change in the performance behaviour of the systems and the application. So you can identify changes in normal behaviour before they grow to the level of impacting the observed performance of the application. Even if the degree of change is very small, you can still use trend analysis to estimate when in the future there could be a noticeable impact on performance.

And all this for some extra money up front. Instead of having to keep teams of troubleshooters around, just in case. And then experiencing lengthy periods of degraded performance and service levels when any performance problem occurs, while you try different fixes until one of them works. And then hoping that you have finally fixed it all, and that it doesn't happen again.

( Jun 30 2005, 03:29:47 PM BST ) Permalink Comments [2]

Performance is like ...

I am a big believer in what I call "Proactive Performance Management". In other words, doing something about performance of an application on a computer system before it becomes a problem. By which point, of course, it is too late.

One of the problems I have is persuading people that this is something worth spending time, effort and money on today. Most people take the approach of "If it ain't broke, don't fix it", and so do not see the benefit of spending money on addressing a problem that doesn't yet exist. So I am always on the lookout for any good descriptions of the dangers of not addressing performance properly, and of the benefits when you do.

I also like analogies, as they stop us getting stuck in a set of specialised terminology related to computers. And a good analogy will get the point over, and show that the principle applies to other scenarios too. Which should increase the strength of the argument being put forward.

So, while reading Adrian Cockcroft's blog I came across a posting comparing fighting house fires to managing performance ( Playing with Fire ). And this made a lot of sense to me. No one would prefer to live in a building that was not well designed, and had taken the consequences of fire into account. Otherwise, you would end up spending a lot of of your time dealing with spontaneous fires. Given the choice most people would choose a well designed, safe building.

So why do we not design performance into the environments in which we deploy software applications? Why do we continue to presume that nothing needs to be done about performance, and end up spending significant amounts of time and effort "fighting fires" when some system or other starts behaving badly?

The analogy to avoiding fires brings out another point. You do not add performance or performance management in at the end, when the system has been built and deployed. Performance is not something you can just bolt on to an existing system. Just like you cannot bolt on fire safety to an inadequately designed building after it has been built. Good performance management needs to be designed in from the very start of the system.

( Jun 30 2005, 10:37:14 AM BST ) Permalink Comments [1]

Calendar

RSS Feeds

Navigation

Links

Referers

Search

Recent Posts