Slava Leanovich
Archives
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML
Search

Links
 

Today's Page Hits: 37

All | Personal | Sun
« Turn the bell off on... | Main | Start playing with... »
20060710 Monday July 10, 2006
Unreasonably high system load problem

A symptom of this problem is high and unreasonable system load. Sometimes such behaviour caused by a storm of interrupts from hardware termal monitor. If so, the following note describes how to track this and make an interim fix.

Firstly, check system load ...
# vmstat 1 3
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s5 -- --   in   sy   cs us sy id
 0 0 0 1601976 74896 20 263 16  7  8  0  9  0  4  0  0 8933 2443 9441  4 29 68
 0 0 0 1119220 17572 25 323 295 0  0  0  0  0 68  0  0 16122 2446 16800 1 53 46
 0 0 0 1119220 17376  0   1  4  0  0  0  0  0  1  0  0 15758 2364 16321 2 51 47

# mpstat 1 3
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  136   1   15  4573  311 8538   98  127   13    0   709    3  47   0  51
  1  128   1   16  4359 4147  903  121  127   26    0  1734    5  11   0  84
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0  8005  304 15329  175  164    5    0   260    1  85   0  14
  1    8   0    0  7931 7776 1099  130  174   17    0  2474    2  18   0  80
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0  8006  375 15272  269  208   13    0   427    0  85   0  15
  1    1   1    1  7789 7552 1442  154  205   11    0  3159    7  20   0  74
Ok, up to 85% CPU0 load by the system and thousands of interrupts ...
need to find out where such load is came from.

Let's use dtrace(1M) [1] facility, where we can specify "probes" we're interested in and actions we'd like to perform when particular probe fires.
The "probe" specification form is "provider:module:function:name", where: So, list probes related to ACPI [2] modules ...
# dtrace -n :acpi*:: -l | head
   ID  PROVIDER  MODULE     		     FUNCTION NAME
16160	    fbt  acpica    AcpiEvFixedEventInitialize entry
16161	    fbt  acpica    AcpiEvFixedEventInitialize return
16162	    fbt  acpica      AcpiEvFixedEventDispatch entry
16163	    fbt  acpica      AcpiEvFixedEventDispatch return
16164	    fbt  acpica  AcpiEvAsynchExecuteGpeMethod entry
16165	    fbt  acpica  AcpiEvAsynchExecuteGpeMethod return
16166	    fbt  acpica     	 AcpiEvSaveMethodInfo entry
16167	    fbt  acpica     	 AcpiEvSaveMethodInfo return
16168	    fbt  acpica     	 AcpiEvMatchPrwAndGpe entry
Basically fbt is "function boundary tracing" provider, which traces "entry" and "return" events for almost all kernel functions.

Now track entries of ACPI related functions, and for each "entry" probe aggregate number of fires by a function name (probefunc variable).
Also, after 5 seconds (tick-5sec probe) do exit dtrace.
# dtrace -n "
    :acpi*::entry { @[probefunc]=count() }
    tick-5sec { exit(0) }
    "

  FUNCTION                        COUNT
  ----------------------         ------  
  AcpiOsGetThreadId               83507
  AcpiUtGetMutexName             164032
  AcpiNsGetNextValidNode         321200
  AcpiUtDebugPrint               529397
  AcpiNsGetNextNode              673705
  AcpiUtTrackStackPtr            978657
That stats shows 978657 calls of AcpiUtTrackStackPtr function per 5 seconds -- too much, so that looks like an ACPI events storm [3].

Actually OS must handle ACPI events, otherwise hardware thinks that OS haven't received it and notify more and more.
This problem is fixed in recent releases, however as an interim fix it is possible to switch ACPI off at all.
Safety of such fix is a question, however most likely that thermal monitor storms not because of overheating, but because of monitor bug itself, so switching it off most probably shouldn't burn your machine up.

ACPI can be switched off either by eeprom(1M) command:
# eeprom acpi-user-options=0x2
or by corresponding kernel parameter during boot, e.g.:
grub> kernel ... -B acpi-user-options=0x2
By the way acpi-user-options=0 enables ACPI.

References

[1] Solaris 10 Software Developer Collection: DTrace User Guide
[2] Advanced Configuration & Power Interface (ACPI)
[3] acpica: Metropolis SMB Alerts result in high background system load

posted by leanovich Jul 10 2006, 11:30:37 AM CEST Permalink Comments [0]

Trackback URL: http://blogs.sun.com/vl/entry/unreasonable_high_system_load
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed