Managing Fault Management Log Files
I chose to post the section on FMA log files. Hopefully you'll find this interesting, both from an FMA perspective, and as a taste of some good material to come in the new book series.
:wq
The Fault Manager daemon (fmd) maintains two persistent log files of events: the error log and the fault log. The error log persistently records inbound telemetry information (ereports), and the fault log persistently records diagnosis and repair events. Both log files are in the Extended Accounting format associated with libexacct(3LIB). The log files reside in /var/fm/fmd. These log files are viewed by using the fmdump command.
# fmdump -? fmdump: illegal option -- ? Usage: fmdump [-efvV] [-c class] [-R root] [-t time] [-T time] [-u uuid] [-n name[.name]*[=value]] [file] -c select events that match the specified class -e display error log content instead of fault log content -f follow growth of log file by waiting for additional data -R set root directory for pathname expansions -t select events that occurred after the specified time -T select events that occurred before the specified time -u select events that match the specified uuid -n select events containing named nvpair (with matching value) -v set verbose mode: display additional event detail -V set very verbose mode: display complete event contents
With no options, fmdump displays the contents of the fault log. The -e option instructs fmdump to examine the error log. Various options provide more detailed and granular scrutiny of the log files. However, the commonly used options are -v for more verbose output and -u to list only those events associated with a UUID.
Automatic Log Rotation
Both the error and fault log files have historical recording, similar to /var/adm/messages. By default, up to 10 historical error and fault log files are kept. With historical logging, the need for log rotation follows. The rotation of fmd log files is managed by the logadm command. By default, logadm is run from root user's crontab each day at 03:10 a.m. The logadm.conf entries for fmd log files are as follows:
# grep /var/fm/fmd /etc/logadm.conf /var/fm/fmd/errlog -M '/usr/sbin/fmadm -q rotate errlog && mv /var/fm/fmd/errlog.0- $nfile' -N -s 2m /var/fm/fmd/fltlog -A 6m -M '/usr/sbin/fmadm -q rotate fltlog && mv /var/fm/fmd/fltlog.0- $nfile' -N -s 10m
The errlog file is rotated when the active file grows larger than 2 megabytes. The fltlog log threshold for rotation is 10 megabytes. Also note the use of -A on the fltlog file, which means that fault log files older than 6 months are deleted, irrespective of size.
Also note that after the fmadm rotate command, an mv command renames the file to a final archived name. So, automatic rotation is a two-step process:
- fmadm rotate creates a *log.0- file.
- logadm renames the *log.0- file to *log.[0-9].
# cd /var/fm/fmd ; ls -l errlog* -rw-r--r-- 1 root root 2014185 Jun 25 16:32 errlog -rw-r--r-- 1 root root 2049327 Jun 10 16:30 errlog.0 -rw-r--r-- 1 root root 3123843 May 28 16:30 errlog.1 -rw-r--r-- 1 root root 2174873 May 19 16:30 errlog.2 -rw-r--r-- 1 root root 2049173 May 7 16:30 errlog.3 -rw-r--r-- 1 root root 2293094 Apr 22 16:30 errlog.4 -rw-r--r-- 1 root root 2583748 Apr 9 16:30 errlog.5 -rw-r--r-- 1 root root 2867374 Mar 10 16:30 errlog.6 -rw-r--r-- 1 root root 2187465 Feb 8 16:30 errlog.7 -rw-r--r-- 1 root root 2211937 Jan 25 16:30 errlog.8 -rw-r--r-- 1 root root 2328587 Jan 2 16:30 errlog.9
Manual Log RotationThe Fault Manager daemon error and fault log files can also be rotated manually. The logadm.conf entries show that the fmadm rotate <logname> command is used for an on-demand log rotation, followed by some post processing. The following output shows what happens if just the fmadm rotate <logname> command is used:
# ls -l /var/fm/fmd total 54 drwx------ 3 root sys 512 May 13 14:55 ckpt -rw-r--r-- 1 root root 13049 May 13 15:00 errlog -rw-r--r-- 1 root root 11013 May 13 15:01 fltlog drwx------ 2 root sys 512 May 13 15:01 rsrc drwx------ 2 root sys 512 May 13 02:04 xprt # fmadm rotate errlog fmadm: errlog has been rotated out and can now be archived # fmadm rotate fltlog fmadm: fltlog has been rotated out and can now be archived # ls -l /var/fm/fmd total 58 drwx------ 3 root sys 512 May 13 14:55 ckpt -rw-r--r-- 1 root root 330 May 13 15:01 errlog -rw-r--r-- 1 root root 13049 May 13 15:00 errlog.0- -rw-r--r-- 1 root root 330 May 13 15:01 fltlog -rw-r--r-- 1 root root 11013 May 13 15:01 fltlog.0- drwx------ 2 root sys 512 May 13 15:01 rsrc drwx------ 2 root sys 512 May 13 02:04 xprt
Note that manual rotation leaves a *log.0- file. When rotated automatically, logadm summarily renames this file to the next historical log file. Manual rotation executes the rotation steps only within fmd, which creates the *log.0- file. The result is that the next manual rotation will overwrite the previous *log.0- file. For example:
# ls -l /var/fm/fmd/errlog* -rw-r--r-- 1 root root 330 May 18 11:01 errlog -rw-r--r-- 1 root root 13049 May 13 15:00 errlog.0- # fmadm rotate errlog fmadm: errlog has been rotated out and can now be archived # ls -l /var/fm/fmd/errlog* -rw-r--r-- 1 root root 329 Jul 25 18:35 errlog -rw-r--r-- 1 root root 330 May 18 11:01 errlog.0-
Note that errlog.0- has been overwritten. Any information in the log file from May 13 15:00 is gone. Recall that automatic log rotation is a two-step process. Using the fmadm rotate command directly only performs the first step.
A cleaner on-demand log rotation method is to use logadm to process the logadm.conf file, but to override the default rotation periods and sizes. This method has the advantage of ensuring that the historical log files are preserved. For example:
# ls -l errlog* -rw-r--r-- 1 root root 330 May 13 15:01 errlog -rw-r--r-- 1 root root 13049 May 13 15:00 errlog.0- # logadm -p now -s 1b /var/fm/fmd/errlog # ls -l errlog* -rw-r--r-- 1 root root 330 Sep 11 10:17 errlog -rw-r--r-- 1 root root 330 May 13 15:01 errlog.0
And similarly for the fault log:
# ls -l fltlog* -rw-r--r-- 1 root root 330 May 13 15:01 fltlog -rw-r--r-- 1 root root 11013 May 13 15:01 fltlog.0- # logadm -p now -s 1b /var/fm/fmd/fltlog # ls -l fltlog* -rw-r--r-- 1 root root 330 Sep 11 10:22 fltlog -rw-r--r-- 1 root root 330 May 13 15:01 fltlog.0
Log Rotation FailuresThe rotation of a log file can fail. If a rotation request is made while an ereport is being written to the log file, fmd will wait 200 milliseconds and then retry the rotation. If after 10 attempts the rotation is still not successful, fmd will abandon the operation and report the following error:
# fmadm rotate errlog fmadm: failed to rotate errlog: log file is too busy to rotate (try again later)
Such a condition can persist if a steady stream of errors is occurring on a system, such as a “storm” of correctable errors. Even with rotation failures, ereports are still persistently logged to the errlog file.
Examining Historical Log Files
Once log files have been rotated, you can use the fmdump command with the -f <file option to examine historical information. For example:
# fmdump -v -u 04837324-f221-e7dc-f6fa-dc7d9420ea76 TIME UUID SUNW-MSG-ID fmdump: /var/fm/fmd/fltlog is empty # fmdump -f "fltlog.0" -v -u 04837324-f221-e7dc-f6fa-dc7d9420ea76 TIME UUID SUNW-MSG-ID May 13 15:00:02.2409 04837324-f221-e7dc-f6fa-dc7d9420ea76 AMD-8000-AV 100% fault.cpu.amd.dcachedata Problem in: hc://:product-id=Sun-Ultra-20-Workstation:chassis- id=0604FK401F:server-id=hexterra/motherboard=0/chip=0/cpu=0 Affects: cpu:///cpuid=0 FRU: hc://:product-id=Sun-Ultra-20- Workstation:chassis-id=0604FK401F:server-id=hexterra/motherboard=0/chip=0 Location: CPU 0
The fmdump command displays any events in the fltlog.0 file associated with UUID 04837324-f221-e7dc-f6fa-dc7d9420ea76.