Friday April 01, 2005
Memory page coalescing update and Solaris 10
Well, some of you may remember my december technical brief
talking about the Solaris memory page coalescing on high-end servers.
If you don't know what I'm talking about feel free to to send me an email at
benoit@sun.com and I will give you the link.
Since this technical brief, I have received a lot of request on this topic
that I would like to answer today :
-> Request 1 : What is the list of issues linked to this one ?
Here is the list with bugIds so you have a complete picture :
4802594 - Idle loop degrades IO performance on large psets
5059920 - Idle loop is not scalable on large systems
5054052 - disp_getwork() is greedy and negatively impacts dispatch latency
5050686 - Solaris mutexes should be made more efficient under contention
5095432 - Oracle startup takes too long due to memory fragmentation
5046939 - kcage_freemem grows too large when large ISM segments assigned on SF15k
4904187 - page_freelist_coalesce() holds the page freelist locks for too long
--> Request 2 : You mentioned that the fixes for this issues are available in
IDRs for Solaris 8 and Solaris 9. What is the status of the patches ?
Good news here. Solaris 8 patches have just been released. The fixes for this issues
are available in the following kernel update patches (KUP) :
Solaris 8 Solaris 9
Sparc 117350-23 117171-17
x86 117351 117172
Now, you are ready to ask me : what about Solaris 10 ?
The answer for Solaris 10 (and nevada) is : in progress....
I tested last month some of this issues on Solaris 10 and while the
problems are still there (the page_freelist_coalesce() routine is in the common Solaris
code), the impact is much,much lower. As an example, the 10G Oracle startup testcase we
built took 50s on a normal system. With no 4M pages available , it took up to 2 hours on
Solaris 8, up to 15 minutes on Solaris 9 and up to 3 minutes on Solaris 10.
--> Request 3 : It is very complicated to get a memory picture on our system.
vmstat or sar data are not detailed enough. Can you help ?
No need here for complex packaged tools. The best kept secret of Solaris is
the numerous options of mdb. So if you write a little script like :
#!/bin/ksh
#
# Displaying the memory map...
#
echo Browsing memory...
echo
mdb -k 2>/dev/null <<!
::memstat
!
echo
date
You will get this output :
Browsing memory...
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 36480 285 1%
Anon 12891 100 0%
Exec and libs 5106 39 0%
Page cache 208799 1631 5%
Free (cachelist) 139913 1093 3%
Free (freelist) 3780231 29533 90%
Total 4183420 32682
Physical 4116397 32159
Fri Apr 1 10:18:16 PST 2005
Cool !
--> Request 4 : Solaris 10 provide updated memory structures and the page freelist is now available. Can we use it to get the amount of free 4M pages ?
This request came last week from the VOS escalation team. And the answer is : yes but it requires a close look at how the page_freelists is implemented to get the right number.
We worked on this question with my good friend Mike C. in December and here is
the updated script for Solaris 10 (yes, mdb again) :
#!/bin/ksh
#
# Walking the page_freelist in Solaris 10 to get the amount of 4M pages...
#
mdb -k 1>/tmp/1 2>&1 <<!
page_freelists+30::array uintptr_t 1 | \
::print uintptr_t | ::array uintptr_t 0t18 | \
::print uintptr_t | ::array uintptr_t 0t2 | \
::print uintptr_t | ::grep ".!=0" | ::list page_t p_vpnext
!
cat /tmp/1 |grep -v failed|wc -l
And on my v490, I have :
v490 # ./4m_s10.sh
5168
That's it for now... Apr 01 2005, 10:51:39 AM PST Permalink