| « November 2009 |
| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|
| | | | | | | 1 |
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | | | | | | |
| Today |
Blog::Navigation
Site notes
Technorati

Tuesday June 05, 2007
getting all your heap into 4mbyte pages using mpss.so.1 or ppgsz
A couple of times recently people have asked how to get mpss to supply
large pages for all the heap of their application.
they do the following having read man
mpss.so.1..
LD_PRELOAD=$LD_PRELOAD:mpss.so.1
export
LD_PRELOAD
MPSSHEAP=4m
export
MPSSHEAP
sleep
20 &
pmap
-xs $! | grep heap
00024000
8
8
8
- 8K
rwx-- [ heap ]
00026000
3944
-
-
- -
rwx-- [ heap ]
So why is the heap
not made from 4Mbyte pages? The notes section from man ppgsz
alludes that the default heap alignment is not suitable for a 4M page
so you have to wait for the heap to grow to a 4Mb boundary and then you
get 4Mb pages, but nearly 3 Mb of your heap is in 8k pages ( 4k on x86)
or you recompile your application with a mapfile that specifies the
required alignment.
But if you are prepared to lose about 4Mbytes of address space you can
do the following, note this primarily for sparc..
1) compile
yourself a small preload library that mallocs up to 4Mbytes in its
_init() routine, this will waste 4M of your processes virtual address
space (3.8GB for a 32 bit process, vast for a 64 bit process) , you can
then use madvise to throw away the mapping in that 4m range so that the
physical pages are quickly reused.
In my example the source and Makefile live in /home/timu/src/preload,
the Makefile consists of ..
heap: heap.c
cc -g -G -Kpic -o libs/heap.so heap.c
cc -xarch=v9 -g -G -Kpic -o libs/sparcv9/heap.so
heap.c
and here is heap.c, this is supplied
as a working example and is not an example of good programming style ;-)
#include
<sys/types.h>
#include <dlfcn.h>
#include <stdlib.h>
#include <sys/mman.h>
#pragma init (init)
#define roundup(x,rnd) ((x+(rnd-1))/rnd) *rnd
static void
init()
{
uint64_t base;
/* first get an idea of our current heap
addresses */
char * a = malloc(1);
if (a == NULL) {
return;
}
/* work out how much to get that to a
4Mb boundary */
uint64_t sz = (4096ull*1024) -
(uint64_t)a;
/* round it up to next 8k */
sz = roundup(sz, 819ull);
base=(uint64_t )malloc(sz);
if (base == NULL) {
return;
}
/* grab the space to the 4mb boundary */
base = roundup(base, 9182ull);
/* then throw away all those pages */
(void)madvise((char *)base, sz
-8192,MADV_FREE);
}
2) setup the
mpss and preload library environment variables thus..
MPSSHEAP=4m
LD_PRELOAD=mpss.so.1:heap.so
LD_LIBRARY_PATH_64=/home/timu/src/preload/libs/sparcv9
LD_LIBRARY_PATH=/home/timu/src/preload/libs
export LD_PRELOAD LD_LIBRARY_PATH_64 LD_LIBRARY_PATH
MPSSHEAP
cmd
arg
or use ppgsz thus ..
LD_PRELOAD=heap.so
LD_LIBRARY_PATH_64=/home/timu/src/preload/libs/sparcv9
LD_LIBRARY_PATH=/home/timu/src/preload/libs
export LD_PRELOAD LD_LIBRARY_PATH_64 LD_LIBRARY_PATH
ppgsz -o heap=4m cmd args
3) now see
what happens..
firefox before libraries....
estale ksh:
pgrep -u timu firefox
16707
estale ksh: pmap -xs 16707 | grep heap
00048000
32
32
32
- 8K rwx-- [ heap ]
00050000
24
-
-
- - rwx-- [
heap ]
00056000
8
8
8
- 8K rwx-- [ heap ]
00058000
24
-
-
- - rwx-- [
heap ]
0005E000
8
8
8
- 8K rwx-- [ heap ]
00060000
24
-
-
- - rwx-- [
heap ]
00066000
3688 3688
3688
- 8K rwx-- [ heap ]
00400000 12288
12288
12288
- 4M rwx-- [ heap ]
and we can see nevada's aggressive use of large pages has put the last
chunk in a 4mbyte page already but there are 3688k of 8k pages, so lets
see what happens when we use our libraries...
estale ksh: pgrep -u timu firefox
17229
estale ksh: pmap -xs
17229| grep heap
00048000
3808
8
-
- - rwx-- [
heap ]
00400000
16384 16384
16384
- 4M rwx-- [ heap ]
C41B0000
8
8
-
- 8K r-x-- heap.so
C41C0000
8
8
8
- 8K rwx-- heap.so
so we have 3808 k of empty heap with no pages and then 16384k in
4 x 4M pages with all the active heap in the big pages.
So hopefully this might reduce the number of tlb misses and increase
the ratio of useful instructions/cycle for the program.
Posted by Roland Mainz on June 10, 2007 at 01:13 AM GMT+00:00 #
Posted by tim uglow on June 10, 2007 at 03:45 PM GMT+00:00 #