Tuesday Jun 14, 2005

Dynamic segkp for 32bit x86 systems
OpenSolaris is here as promised. We are all really excited about that. Now, everyone interested can  look at
the source and understand how Solaris works.

To assist with that, I would like to describe a  change I introduced, with regards to  a
difference in the virtual memory layout between 32bit & 64bit x86 systems. If you happen
to look at the virtual memory layout in statup.c for x86, you would notice that 'segkp' is
not a separate segment on 32bit systems where as it is a separate segment on 64bit systems.

/*
* 32-bit Kernel's Virtual memory layout.
* +-----------------------+
.
.
* 0xFDFFE000 -|-----------------------|- ekernelheap, ptable_va
* | | (segkp is an arena under the heap)
* | |
* | kvseg |
* | |
* | |
* --- -|-----------------------|- kernelheap (floating)
* | Segkmap |
* 0xC3002000 -|-----------------------|- segkmap_start (floating)
* | Red Zone |
* 0xC3000000 -|-----------------------|- kernelbase / userlimit (floating)
.
.
.
* 0x08048000 -|-----------------------|
* | user stack |
* : :
* | invalid |
* 0x00000000 +-----------------------+


On 32bit x86 systems, the virtual address space is shared between User and Kernel space
for performance reasons.  Therefore, the kernel address space is small.  As a result, the
32bit x86 systems run into memory exhaustion problems, mainly in the kernel heap space.
Read Kit Chow's blog entry on memory exhaustion issues.

In an attempt to provide little more breathing space for the kernel heap,  I was searching for
ways to free up some kernel space and looked at the segkp segment. This was not being
effectively used for a 32bit system where we are tight on kernel address space.

The segkp segment is used for allocating pageable kernel memory. The kernel stack for
threads are allocated from the segkp segment. Some necessary characteristic of this segment
driver are that it provides a redzone which is used at the end of the stack to protect against stack
overflows.  It also provides non pageable memory. 

The default segkp segment size was, a whooping 200MB. Yes, that is quite large when
kernel heap size can be less then 500 MB.

In most cases, on a 32 bit x86 systems,  the segkp segment space is not fully used, while
the system runs short of kernel heap space. On the other hand having a small segkp size
will limit the number of threads that can be created on a system. Note that each thread
that is created requires a kernel stack which  is allocated from the segkp segment.

Therefore the solution was to make segkp dynamic and combine its space with the kernel heap.
Then the system  can use the, limited kernel address space we have, more effectively, by having
a larger heap space and only using the required amount of virtual address space necessary
for segkp.

So, how was this done. Well, the vmem allocator in Solaris, makes it simple to implement this.
The vmem allocator is used to manage virtual address space in the kernel.  I will not go into
how the vmem allocator works as that is beyond the scope of this blog entry. Since, we wanted
to eliminate the segkp segment and add this free space to the kernel heap, all that was needed
was to make segkp a subset of the heap_arena instead of it being a separate segment. So, now
segkp will import memory from its source, the heap_arena, dynamically as and when required.

Here is the part of the code change, in seg_kp.c, which makes segkp a subset of the heap_arena.
(bug 4983788)

        /*
* Allocate the virtual memory for segkp and initialize it
*/
if (segkp_fromheap) {
np = btop(kvseg.s_size);
segkp_bitmap = kmem_zalloc(BT_SIZEOFMAP(np), KM_SLEEP);
kpsd->kpsd_arena = vmem_create("segkp", NULL, 0, PAGESIZE,
vmem_alloc, vmem_free, heap_arena, 5 * PAGESIZE, VM_SLEEP);
} else {
segkp_bitmap = NULL;
np = btop(seg->s_size);
kpsd->kpsd_arena = vmem_create("segkp", seg->s_base,
seg->s_size, PAGESIZE, NULL, NULL, NULL, 5 * PAGESIZE,
VM_SLEEP);
}

Now, with this change, segkp is dynamic on a 32bit x86 systems, while retaining all of its characteristic
described above. The segment space originally occupied by the segkp segment is now clubbed
with the kernel heap space making its size larger by 200MB. Therefore there is no separate segkp segment.
The vmem allocator provides necessary caching.

Technorati Tag:
Technorati Tag: