Lockstat(1M) is the Solaris tool of choice for identifying contended kernel locks that need to be optimized to improve system efficiency and scalability. However, until recently, you could not easily use lockstat to quantify the impact of hot locks on system utilization, or to state with certainty that a hot lock is limiting your performance. This could lead to wasted effort spent fixing a lock, with no performance benefit. For example, here is the output of lockstat showing a hot lock:
# lockstat sleep 10 Adaptive mutex spin: 9227825 events in 10.302 seconds (895755 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------- 569235 6% 6% 0.00 17 0x6009a065000 timeout_common+0x4 ... |
This says that some thread(s) could not immediately acquire the mutex 569235 times, so they had to busy-wait, and took an average of 17 retries (spins) to acquire the mutex. In absolute terms, is this bad? You cannot tell. Higher spin is worse than lower spin, but you cannot easily map this to time, and further, not all spins are created equal because of the exponential backoff algorithm in the mutex implementation.
The good news is that we have enhanced lockstat to show the spin time in Solaris 10 10/08 and OpenSolaris 2008.11. Now the output of lockstat looks like this:
# lockstat sleep 10 Adaptive mutex spin: 9181733 events in 10.135 seconds (905943 events/sec) Count indv cuml rcnt nsec Lock Caller ------------------------------------------------------------------------- 557331 6% 6% 0.00 88454 0x60033047000 timeout_common+0x4 ... |
The new nsec column shows the average time spent busy-waiting and retrying to acquire a mutex. From this data, you can compute the CPU utilization spent busy-waiting using the formula:
-
CPU_util = (Count * nsec) / (elapsed * ncpu * 1e9)
The elapsed time is shown in the header above, and you need to know the number of CPUs on your platform. In this example, ncpu=256, so we have:
-
CPU_util = (557331 * 88454) / (10.135 * 256 * 1e9)
= .019
Thus, 1.9% of all CPU time on the system was spent busy-waiting to acquire this lock, and if we eliminate this lock, we can recover 1.9% of the CPU cycles and perhaps increase our throughput by that much.
I have written a handy script called lockutil to do this calculation for you. lockutil reads a file produced by lockstat, and computes the CPU utilization spent busy-waiting for every lock in the file, including adaptive mutexes, spin locks, and thread locks. It can parse output produced by all of lockstat's myriad command line options, except for the -I profiling mode, which is not relevant. You must supply the CPU count. Example:
# ... Start running a test ...
# lockstat -kWP sleep 30 > lockstat_kWP.out
# mpstat | wc -l
513 # includes one header line
# cat lockstat_kWP.out
Adaptive mutex spin: 1713116 events in 30.214 seconds (56700 events/sec)
Count indv cuml rcnt nsec Hottest Lock Caller
-------------------------------------------------------------------------------
1170899 68% 68% 0.00 11444141 pcf page_create_wait
202679 12% 80% 0.00 332996 pcf+0x380 page_free
86959 5% 85% 0.00 8840242 pcf page_reclaim
64565 4% 89% 0.00 25301 0x604ec13c650 ufs_lockfs_begin_getpage
55914 3% 92% 0.00 24010 0x604ec13c650 ufs_lockfs_end
29505 2% 94% 0.00 1284 0x70467650040 page_trylock
24687 1% 95% 0.00 1845 0x70467650040 page_unlock
...
#
# lockutil -n 512 lockstat_kWP.out
CPU-util Lock Caller
0.866 pcf page_create_wait
0.004 pcf+0x380 page_free
0.050 pcf page_reclaim
0.000 0x604ec13c650 ufs_lockfs_begin_getpage
0.000 0x604ec13c650 ufs_lockfs_end
0.000 0x70467650040 page_trylock
0.000 0x70467650040 page_unlock
0.920 TOTAL TOTAL
|
Ouch, busy-waiting for the pcf lock consumes 92% of the system! There's a lock that needs fixing if ever I saw one. We fixed it.
This begs the question: what can you do if you find a hot lock? If you are a kernel developer and the lock is in your code, then change your locking granularity or algorithm. If you are an application developer, then study the results carefully, because hot kernel locks sometimes indicate a non-scalable application architecture, such as too many processes using an IPC primitive such as a message queue or a fifo. Use the lockstat -s option to see the callstack for each hot lock and trace back to your system calls, or use the dtrace lockstat provider to aggregate lock spin time by system call made by your application. The lockstat provider has also been modified to report spin time rather than spin count in its probe arguments.
If you are an end-user, then search for the lock in the
support databases, to see if a fix or tuning advice is
available. A few resources are
http://sunsolve.sun.com
http://bugs.opensolaris.org
and public search
engines. For example a google search for "lockstat timeout_common"
finds this:
http://bugs.opensolaris.org/view_bug.do?bug_id=6311743
6311743 callout table lock contention in timeout and untimeout
which my group recently fixed, and will appear in coming Solaris
releases. Advice to folks who file CRs: to make these searches
more successful, include lockstat snippets in the publicly
readable sections of a CR, such as "Description" and "Public Comments".