Peerapong Kunasirirat's weblog

Solaris FMA Quick Note - Part 2 (CLI Commands)

Sunday Sep 16, 2007

From my FMA Quick Note - Part 1, I have shown you about the FMA architecture, event types, event names and some examples. This part I will show the CLI commands and how to use it. 

FMA Quick Note (Part 2) CLI Commands

fmd (1M) – deamon which receives telemetry info relating to problems, diag & initiates proactive self-healing activities
          the fmd deamon is also working with the diag engine plug-ins, the plug-ins are used to perform diagnosis for particular hardware that previously fired error report to find out whether the hardware is really fault. If the hardware is fault, the fault event will be raised, and a case (correlated event) will be opened. the diag engine plug-inst are located in these directories:
          /usr/platform/`uname -i`/lib/fm/fmd/plugins
     /usr/platform/`uname -m`/lib/fm/fmd/plugins
     /usr/lib/fm/fmd/plugins 
  (generic)
          the plug-ins are searched in the above directories ORDER, specific modules allow to be loaded prior to generic modules.

fmadm (1M) – utility to view/modify system config parameters that maintained by the fmd deamon
fmstat (1M) – utility to report statistics associated with the fmd deamon and its associated modules
fmdump (1M) – utility to display the contents of the fault and error logs

   Diag engine plug-inst example for Sun Fire T2000 
The Sun Fire T2000 does not have this plugins directory /usr/platform/`uname -i`/lib/fm/fmd/plugins
Instead, it has this directory /usr/platform/`uname -m`/lib/fm/fmd/plugins and /usr/lib/fm/fmd/plugins (generic plugins)

Note:

uname -i   shows   SUNW,Sun-Fire-T2000
uname -m   shows   sun4v

   fmd plug-ins on T2000 (specific) 

# ls -l /usr/platform/sun4v/lib/fm/fmd/plugins
total 456
-rw-r--r--   1 root     bin          195 Aug 23  2006 cpumem-diagnosis.conf
-r-xr-xr-x   1 root     bin       109480 Oct 14  2006 cpumem-diagnosis.so
-rw-r--r--   1 root     bin          473 Oct  4  2006 cpumem-retire.conf
-r-xr-xr-x   1 root     bin        31584 Oct 14  2006 cpumem-retire.so

-rw-r--r--   1 root     bin          240 Oct  4  2006 etm.conf
-r-xr-xr-x   1 root     bin        75368 Oct 14  2006 etm.so

   fmd plug-ins on T2000 (generic)

# ls -l /usr/lib/fm/fmd/plugins
total 1008
-rw-r--r--   1 root     bin          411 Jan 22  2005 cpumem-retire.conf
-r-xr-xr-x   1 root     bin        30772 Oct 14  2006 cpumem-retire.so

-rw-r--r--   1 root     bin         1679 Aug 23  2006 eft.conf
-r-xr-xr-x   1 root     bin       311588 Sep  6  2006 eft.so
-rw-r--r--   1 root     bin          315 Jan 22  2005 io-retire.conf
-r-xr-xr-x   1 root     bin        15924 Sep  6  2006 io-retire.so
-rw-r--r--   1 root     bin          149 Aug 23  2006 ip-transport.conf
-r-xr-xr-x   1 root     bin        36768 Sep  6  2006 ip-transport.so
-rw-r--r--   1 root     bin          190 Jul 11  2006 snmp-trapgen.conf
-r-xr-xr-x   1 root     bin        32356 Jul 22  2006 snmp-trapgen.so
-rw-r--r--   1 root     bin          864 Jan 22  2005 syslog-msgs.conf
-r-xr-xr-x   1 root     bin        21356 Sep  6  2006 syslog-msgs.so
-rw-r--r--   1 root     bin          309 Aug 23  2006 zfs-diagnosis.conf
-r-xr-xr-x   1 root     bin        19984 Sep  6  2006 zfs-diagnosis.so
-rw-r--r--   1 root     bin          234 Oct 17  2006 zfs-retire.conf
-r-xr-xr-x   1 root     bin        18844 Oct 19  2006 zfs-retire.so

Note that the cpumem-retire engine plug-ins appears in both specific and generic plug-ins directory, since the specific plug-ins will be loaded before the generic one, in this case ONLY the cpumem-retire module in directory /usr/platform/sun4v/lib/fm/fmd/plugins will be loaded.

      Viewing the FMA data using fmadm

  fmadm   (FM stands for Fault Manager)
     config – show FM config
     faulty – list of faulty resources
     flush <fmri> - Flush the cached state of <fmri>
     load <path> - Load plug-in <module>
     repair <fmri> - Record a repair resource
     reset [-s serd] <module> - Reset the parameters of <module> or its SERD engine
     rotate <logname> - Rotate log file <logname>
     unload <module> - Unloaded <module>

      Show all modules that are currently loaded
# fmadm config
MODULE                   VERSION STATUS  DESCRIPTION
cpumem-diagnosis         1.5     active  CPU/Memory Diagnosis
cpumem-retire            1.1     active  CPU/Memory Retire Agent
eft                      1.16    active  eft diagnosis engine
etm                      1.0     active  FMA Event Transport Module
fmd-self-diagnosis       1.0     active  Fault Manager Self-Diagnosis
io-retire                1.0     active  I/O Retire Agent
snmp-trapgen             1.0     active  SNMP Trap Generation Agent
sysevent-transport       1.0     active  SysEvent Transport Agent
syslog-msgs              1.0     active  Syslog Messaging Agent
zfs-diagnosis            1.0     active  ZFS Diagnosis Engine
zfs-retire               1.0     active  ZFS Retire Agent

     Show faulty components 
# fmadm faulty
   STATE RESOURCE / UUID
------------------------------------------------------------------
 faulted mem:///component=Slot%20B%3A%20J3000
         dbdc7f15-848c-cbdc-b47f-deb9d9fff5c9
------------------------------------------------------------------

     (after) Repair the faulty component 
# fmadm repair mem:///component=Slot%20B%3A%20J3000
fmadm: recorded repair to mem:///component=Slot%20B%3A%20J3000

# fmadm faulty
   STATE RESOURCE / UUID
------------------------------------------------------------------

     Show the FMA statistics
# fmstat
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-diagnosis         4       4  0.0  262.8   0   0     1     1   308b   356b
cpumem-retire            1       0  0.0   66.6   0   0     0     0      0      0
fmd-self-diagnosis       0       0  0.0    0.0   0   0     0     0      0      0
syslog-msgs              1       0  0.0    0.1   0   0     0     0    32b      0

     Show module statistics 
# fmstat -m cpumem-retire
                NAME VALUE            DESCRIPTION
           auto_flts 0                auto-close faults received
            bad_flts 0                invalid fault events received
         cpu_blfails 0                failed cpu blacklists
          cpu_blsupp 0                cpu blacklists suppressed
           cpu_fails 0                cpu faults unresolveable
            cpu_flts 0                cpu faults resolved
            cpu_supp 0                cpu offlines suppressed
            nop_flts 0                inapplicable fault events received
          page_fails 0                page faults unresolveable
           page_flts 0                page faults resolved
         page_nonent 0                retires for non-existent fmris
           page_supp 0                page retires suppressed

     FMA Logging Explained

When an error event is received by the fmd deamon, the event is logged into the error log prior to acknowledging receipt of the event to the FMA event transport.    Initially the event is logged with HEADER that means it has NOT yet been processed by a module.  When a module accepts an error event, that HEADER is changed to indicated the event does not need to be replayed in the case of a failure.

    The CLI commands used for verifying error and fault log are listed below :

     fmdump   (display fault log)

     fmdump -e (display error log)

     fmdump -eV  (display error log in verbose mode)
================================================================================

part 3 will show how to identify system error and fault

[1] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

T2000 FM ???

Posted by KADALI RAMU on August 11, 2009 at 07:48 PM ICT #

Post a Comment:
  • HTML Syntax: NOT allowed