A symptom of this problem is high and unreasonable system load. Sometimes such behaviour caused by a storm of interrupts from hardware termal monitor. If so, the following note describes how to track this and make an interim fix.
Firstly, check system load ...
# vmstat 1 3
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s5 -- -- in sy cs us sy id
0 0 0 1601976 74896 20 263 16 7 8 0 9 0 4 0 0 8933 2443 9441 4 29 68
0 0 0 1119220 17572 25 323 295 0 0 0 0 0 68 0 0 16122 2446 16800 1 53 46
0 0 0 1119220 17376 0 1 4 0 0 0 0 0 1 0 0 15758 2364 16321 2 51 47
# mpstat 1 3
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 136 1 15 4573 311 8538 98 127 13 0 709 3 47 0 51
1 128 1 16 4359 4147 903 121 127 26 0 1734 5 11 0 84
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 8005 304 15329 175 164 5 0 260 1 85 0 14
1 8 0 0 7931 7776 1099 130 174 17 0 2474 2 18 0 80
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 0 8006 375 15272 269 208 13 0 427 0 85 0 15
1 1 1 1 7789 7552 1442 154 205 11 0 3159 7 20 0 74
Ok, up to 85% CPU0 load by the system and thousands of interrupts ...
need to find out where such load is came from.
Let's use
dtrace(1M) [
1] facility, where we can specify "probes" we're interested in and actions we'd like to perform when particular probe fires.
The "probe" specification form is "
provider:module:function:name", where:
-
provider - that's who provides probes handling to DTrace framework;
-
module - particular module instrumented by a provider;
-
function - particular function instrumented by a provider;
-
name - probe name, e.g. "entry" or "return".
So, list probes related to ACPI [
2] modules ...
# dtrace -n :acpi*:: -l | head
ID PROVIDER MODULE FUNCTION NAME
16160 fbt acpica AcpiEvFixedEventInitialize entry
16161 fbt acpica AcpiEvFixedEventInitialize return
16162 fbt acpica AcpiEvFixedEventDispatch entry
16163 fbt acpica AcpiEvFixedEventDispatch return
16164 fbt acpica AcpiEvAsynchExecuteGpeMethod entry
16165 fbt acpica AcpiEvAsynchExecuteGpeMethod return
16166 fbt acpica AcpiEvSaveMethodInfo entry
16167 fbt acpica AcpiEvSaveMethodInfo return
16168 fbt acpica AcpiEvMatchPrwAndGpe entry
Basically
fbt is "function boundary tracing" provider, which traces "entry" and "return" events for almost all kernel functions.
Now track entries of ACPI related functions, and for each "entry" probe aggregate number of fires by a function name (
probefunc variable).
Also, after 5 seconds (
tick-5sec probe) do exit dtrace.
# dtrace -n "
:acpi*::entry { @[probefunc]=count() }
tick-5sec { exit(0) }
"
FUNCTION COUNT
---------------------- ------
AcpiOsGetThreadId 83507
AcpiUtGetMutexName 164032
AcpiNsGetNextValidNode 321200
AcpiUtDebugPrint 529397
AcpiNsGetNextNode 673705
AcpiUtTrackStackPtr 978657
That stats shows 978657 calls of
AcpiUtTrackStackPtr function per 5 seconds -- too much, so that looks like an ACPI events storm [
3].
Actually OS must handle ACPI events, otherwise hardware thinks that OS haven't received it and notify more and more.
This problem is fixed in recent releases, however as an interim fix it is possible to switch ACPI off at all.
Safety of such fix is a question, however most likely that thermal monitor storms not because of overheating, but because of monitor bug itself, so switching it off most probably shouldn't burn your machine up.
ACPI can be switched off either by
eeprom(1M) command:
# eeprom acpi-user-options=0x2
or by corresponding kernel parameter during boot, e.g.:
grub> kernel ... -B acpi-user-options=0x2
By the way
acpi-user-options=0 enables ACPI.
References
[1]
Solaris 10 Software Developer Collection: DTrace User Guide
[2]
Advanced Configuration & Power Interface (ACPI)
[3]
acpica: Metropolis SMB Alerts result in high background system load
Trackback URL: http://blogs.sun.com/vl/entry/unreasonable_high_system_load