bufhwm on large systems
Friday Jan 04, 2008
I was asked yesterday to look at a busy system with high system time. Its Solaris 9 on a big config 25K. This output was the top of the lockstat -C -s 50 output.
-------------------------------------------------------------------------------
Count indv cuml rcnt spin Lock Hottest Caller
132614 59% 59% 1.00 199 blist_lock[8] bio_recycle+0x224
spin ------ Time Distribution ------ count Stack
1 | 186 bio_recycle+0x224
2 | 2335 bio_getfreeblk+0x4
4 | 4247 getblk_common+0x2bc
8 |@ 7190 bread_common+0x80
16 |@@ 11570 bmap_read+0x20c
32 |@@@@ 18285 ufs_directio_read+0x2e
64 |@@@@@ 25634 rdip+0x198
128 |@@@@@@ 28613 ufs_read+0x17c
256 |@@@@@ 22918 pread+0x28c
512 |@@ 9707
1024 | 1761
2048 | 157
4096 | 11
A bit of Solaris code reading lead me from the stack above to question the value of bufhwm. I checked it out again on docs.sun.com to really understand what this value does. Its the high water mark in K of the size of allocated buffers used for UFS indirect blocks, directories and other bits of metadata.
I went back to check some basic assumptions(always a good plan) and did an Explorer review. The following line is set in /etc/system :
set bufhwm=8000
I have no idea why it was set to 8000 on this system. I have seen it set many times on many systems and have not paid much attention on this and other systems. 8000 is proposed in many places as a reasonable value. I must admit I have never needed to suggest this value is tuned and my unconcious just assumed that it was just a good idea because common wisdom said so and never made a comment when other people tuned it.
By default this value would be 2% of memory. So this system had > 200Gb which would default to around 4GB. I expect 4gb would waste some memory, but then its a high water mark. 8mb is far too small on this size of server give that the buffer cache is used to store indirect blocks, directories, etc from a set of filesystems near 2 TB!
We can observe if buffer recycling is causing an issue using the following
echo "bfreelist$ buf" | mdb -k echo "v::print -t struct var" | mdb -k kstat -p -n biostats
and sar -b might also give some insight.
So the morals to repeat to myself include
- Turn off your unconcious mind when examining /etc/system. Don't assume any /etc/system setting is valid
- Never carry /etc/system tunables forward
- Put a comment in /etc/system if you set a value based on a attribute like memory size with has a potential to change citing the assumption.
Various customers who I have visited over the years comments in the form in /etc/system
# clive.king@sun.com 4/1/2008 # bufhwm value of 8000 assumes a memory size of 4gb and 600GB of UFS filesystem. revisit if size changes # Check with kstat -p -n biostats before changing set bufhwm=8000
At least if something goes wrong, then I can be emailed in capital letters.










