Tuesday July 12, 2005 Having preached the benefits of monitoring your systems by measuring what is happening on them, the next question is “How do I actually measure system activity?”. Well, there are a number of third party products out there that will do this for you. They are worth having because they are designed to continually collect data about the system and then let you report on it and analyse it in a number of ways later. Also, they can help you manage performance across a large number of systems. But we'll get to those later. Presuming you don't have access to these kinds of tools and want to do be doing something rather than nothing, what can you do with Solaris out of the box?
I'm a big fan of sar, the System Activity Report package, although it is by no means a perfect tool. Sar and its associated package of commands come standard with Solaris (and most UNIXes in general), and you can even get it for Linux. Sar gathers most system activity statistics at once, has a low overhead to using it, and can store its data in a binary file for later analysis. You can use sar to collect system activity data by either using the associated commands that come with it (sa1 and sa2) or by directly running the data collector itself (sadc).
I like sar because it can be set up very quickly, and will have minimal impact on the system if you do not collect data too often. The Solaris kernel is always gathering and calculating these statistics - all we are doing is telling it to save them to disk every now and then. So the only real cost is the disk space required to save the daily data files away.
Solaris comes with some cron entries for the sys user, ready for you to enable sar to collect data. The default entries in the crontab for sys that do this are:
# 0 * * * 0-6 /usr/lib/sa/sa1 # 20,40 8-17 * * 1-5 /usr/lib/sa/sa1 # 5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
The net effect of this would be to collect data every 20 minutes during the working day (8am – 5pm, Monday to Friday), to a data file named /var/adm/sa/sadd, where dd is the day of the month. At 6:05pm every day it would process the day's data, and produce a text file with all the sar reports in it, called /var/adm/sa/sardd. It also deletes sar data files older than 7 days.
To enable sar data collection at this default frequency, all you need to do is edit this file (as the root user do 'crontab -e sys') and remove the leading '# ' on these 3 lines. Do not edit the file directly, you must use the crontab command to make changes.
However, many systems are now 24 x 7, so these defaults may be inappropriate. To change the collection frequency to every 15 minutes every day (which would now give us 96 sample points per day), edit the crontab as before, but replace the two sa1 lines with the following:
0,15,30,45 * * * * /usr/lib/sa/sa1
Sa1 and sa2 are actually just shell scripts. It would be very easy to customise these. However, you should not change these scripts in place, as they might get overwritten by future upgrades to the operating system. It would be better to copy these scripts to another location, such as /usr/local/lib/sa, creating it if it does not exist, and change the sys crontab entries to refer to the files in these locations instead. Then you can edit and customise these local scripts as desired.
One possibility would be to extend the retention period for the sar data files, and to not bother producing the sar report files every day or only produce the CPU utilisation report. Using a naming convention of sayymmdd instead of sadd we can safely retain the data files for over a year (400 days allows year on year comparisons).
Sa1 and sa2 use sadc, the SAR Data Collector. You can use sadc directly instead of using sa1 and sa2, to perform regular collection of activity data to a file. Sadc can be run to collect data and save it to a file, at a regular collection interval for a number of sample points. You can use sadc manually to gather more detailed data for any specific period of time you want, and analyse the data later using sar.
sar takes a number of options. One option is to specify the data sets you want to see. The default is -u for CPU utilisation. Another option (-f) allows you to specify the data file to read from and report on. And you can also specify the time periods you want reported (-s and -e).
Sar gives you one half of the performance monitoring picture – how busy are each of the resources in the system? The other half – what is using the resources? - is much more difficult, for a number of reasons. There is no direct utility that captures process level data at regular time intervals to a data file for later reporting. There are some utilities that give per process information, but they all work separately and tend to produce simple text files. Given the number of processes on most systems, these can run to large files, and could make navigation awkward.
For a simple list of all process on a system you can use ps. For a list of the most active processes, you can use prstat, which is really meant to be a real time analysis tool, rather than a data capture and reporting tool. System Accounting can also be used to report on process resource consumption (acct), but suffers from the flaw that it is only updated when a process finishes. Therefore you never get any information about a process until it terminates.
There are a number of third party tools, which I shall not begin to attempt to list or describe or compare. You can either use specialised performance management products, or use more generic system management products that have a module for collecting and reporting on system performance.
I use TeamQuest quite a lot, and am happy with it for what I use it for. I can install TeamQuest Manager easily and have it immediately start collecting data on what the system is up to. Using View I can then report on both the resource utilisation and the processes that are consuming them. If necessary I can use Model to perform 'what-if' analysis on a problem system to see the effect of any configuration changes.
( Jul 12 2005, 11:38:27 AM BST ) Permalink Comments [0]