Default style (Cherry Eve). Switch styles (Capricorn). XML Feed Calendar
All | General | Java | Music
20050401 Friday April 01, 2005

Memory page coalescing update and Solaris 10


Well, some of you may remember my december technical brief
talking about the Solaris memory page coalescing on high-end servers.
If you don't know what I'm talking about feel free to to send me an email at
benoit@sun.com and I will give you the link.

Since this technical brief, I have received a lot of request on this topic
that I would like to answer today :

-> Request 1 : What is the list of issues linked to this one ?

Here is the list with bugIds so you have a complete picture :

4802594 - Idle loop degrades IO performance on large psets
5059920 - Idle loop is not scalable on large systems
5054052 - disp_getwork() is greedy and negatively impacts dispatch latency
5050686 - Solaris mutexes should be made more efficient under contention
5095432 - Oracle startup takes too long due to memory fragmentation
5046939 - kcage_freemem grows too large when large ISM segments assigned on SF15k
4904187 - page_freelist_coalesce() holds the page freelist locks for too long



--> Request 2 : You mentioned that the fixes for this issues are available in
IDRs for Solaris 8 and Solaris 9. What is the status of the patches ?

Good news here. Solaris 8 patches have just been released. The fixes for this issues
are available in the following kernel update patches (KUP) :

        Solaris 8        Solaris 9
Sparc        117350-23        117171-17
x86        117351            117172
            
Now, you are ready to ask me : what about Solaris 10 ?
The answer for Solaris 10 (and nevada) is : in progress....
I tested last month some of this issues on Solaris 10 and while the
problems are still there (the page_freelist_coalesce() routine is in the common Solaris
code), the impact is much,much lower. As an example, the 10G Oracle startup testcase we
built took 50s on a normal system. With no 4M pages available , it took up to 2 hours on
Solaris 8, up to 15 minutes on Solaris 9 and up to 3 minutes on Solaris 10.

--> Request 3 : It is very complicated to get a memory picture on our system.
    vmstat or sar data are not detailed enough. Can you help ?

No need here for complex packaged tools. The best kept secret of Solaris is
the numerous options of mdb. So if you write a little script like :

#!/bin/ksh
#
# Displaying the memory map...
#
echo Browsing memory...
echo
mdb -k 2>/dev/null <<!
::memstat
!
echo
date

You will get this output :

    Browsing memory...

    Page Summary                Pages                MB  %Tot
    ------------     ----------------  ----------------  ----
    Kernel                      36480               285    1%
    Anon                        12891               100    0%
    Exec and libs                5106                39    0%
    Page cache                 208799              1631    5%
    Free (cachelist)           139913              1093    3%
    Free (freelist)           3780231             29533   90%

    Total                     4183420             32682
    Physical                  4116397             32159

    Fri Apr  1 10:18:16 PST 2005


Cool !

--> Request 4 : Solaris 10 provide updated memory structures and the page freelist is now available. Can we use it to get the amount of free 4M pages ?
    
This request came last week from the VOS escalation team. And the answer is : yes but it requires a close look at how the page_freelists is implemented to get the right number.
We worked on this question with my good friend Mike C. in December and here is
the updated script for Solaris 10 (yes, mdb again) :

#!/bin/ksh
#
# Walking the page_freelist in Solaris 10 to get the amount of 4M pages...
#
mdb -k 1>/tmp/1 2>&1 <<!
page_freelists+30::array uintptr_t 1 | \
::print uintptr_t | ::array uintptr_t 0t18 | \
::print uintptr_t | ::array uintptr_t 0t2 | \
::print uintptr_t | ::grep ".!=0" | ::list page_t p_vpnext
!
cat /tmp/1 |grep -v failed|wc -l

And on my v490, I have :

v490 # ./4m_s10.sh
    5168

That's it for now... Apr 01 2005, 10:51:39 AM PST Permalink