Jeff Bonwick's Blog |
Tuesday Jun 14, 2005
Now It Can Be Told
For thirty years, people have speculated on the identity of Deep Throat. Last week the mystery was finally revealed: the man called Deep Throat was not John Dean, not G Gordon Liddy, not Al Haig -- no, it was none other than Mark Felt. That's right, Mark Felt. I'm sorry -- who? Who the f*** is Mark Felt? Felt, F-E-L-T? Let me think for a minute... Felt... Felt... nope, never heard of him. Ever. You've got to be kidding me. I've waited thirty years for this? Well, if you think that revelation was a letdown, just keep scrolling: because while the chattering classes in Washington were playing the Deep Throat parlor game, an even less compelling question was nagging at far fewer people: where did the slab allocator get its name? About 20 years ago Kellogg's ran a commercial for Special K cereal with the tag line, "Can you pinch an inch?" The implication was that you were overweight if you could pinch more than an inch of fat on your waist -- and that hoovering a bowl of corn flakes would help. Back in high school I was watching TV with some friends, and this commercial came on. The voice intoned: "Can you pinch an inch?" Without missing a beat, Tommy, who weighed about 250 pounds, reached for his midsection and offered his response: "Hell, I can grab a slab!" Fast forward 10 years. I was debugging a heap corruption bug in the Solaris 2.2 kernel. The kernel had actually crashed in the allocator itself, as often happens when something corrupts the heap. Trying to disentangle the state, I had the crash dump in one window and the source code for the Solaris 2.2 kernel memory allocator in another. The code had all the hallmarks of what we in the industry call "swill": mysterious hard-coded constants, global locks, fixed-size arrays, unions, macros that implicitly reference global state, and variable names that are at once long and uninformative. A small taste:
mutex_enter(&kma_lock);
if ((listp->fl_slack >= 2) || (tmpsiz == max)) {
listp->fl_slack -= 2;
HEADFREE(listp, bufp);
if (tmpsiz < max)
bufp->fb_union.fb_state = DELAYED;
else {
LOOKUP(addr, bufp->fb_union.fb_poolp);
if (--(bufp->fb_union.fb_poolp->bp_inuse) == 0 &&
bufp->fb_union.fb_poolp != Km_NewSmall &&
bufp->fb_union.fb_poolp != Km_NewBig) {
if (Km_IdleI < NIDLEP) {
if ((bufp->fb_union.fb_poolp->bp_status &
BP_IDLE) == 0) {
Km_Idlep[Km_IdleI++] = ...
and on and on it went. All of this might have been tolerable if the allocator performed well, or was space-efficient, or had some other redeeming virtue. But no. It was big, fat, slow, and had to die. I went home that night determined to burn it to the ground. (The allocator, not my home.) The basic idea for the new allocator was to grab a page of memory, carve it up into equal-size chunks, and use a reference count to keep track of how many chunks were allocated. When the reference count went to zero, the page could be freed. That worked fine for small buffers (say 1/8 of a page or smaller), but wouldn't quite work for larger buffers: those would require more than one page. So I couldn't just grab a page; I'd have to grab a ... something. "Grab" must have been a mental trigger: all of a sudden I remembered Tommy, and how much I loved that line, and that sealed it: we would grab a slab, and this would be called the slab allocator. Now you know. And Mark Felt is looking pretty dang interesting. Technorati Tag: OpenSolaris Technorati Tag: Solaris Posted at 12:00PM Jun 14, 2005 by bonwick in Slab Allocator | Comments[3] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Posted by RJ Davis on September 10, 2005 at 08:27 AM PDT #
Posted by kimo on October 03, 2005 at 10:47 PM PDT #
http://www.dragonflybsd.org/
http://www.shiningsilence.com/dbsdlog/archives/000004.html
Following this point is Matt Dillon's explanation of the Slab Allocator, taken right from the cvs entry:
SLAB ALLOCATOR FEATURES
The slab allocator breaks allocations up into approximately 80 zones based on their size. Each zone has a chunk size (alignment). For example, all allocations in the 1-8 byte range will allocate in chunks of 8 bytes. Each size zone is backed by one or more blocks of memory. The size of these blocks is fixed at ZoneSize, which is calculated at boot time to be between 32K and 128K. The use of a fixed block size allows us to locate the zone header given a memory pointer with a simple masking operation.
The slab allocator operates on a per-cpu basis. The cpu that allocates a zone block owns it. free() checks the cpu that owns the zone holding the memory pointer being freed and forwards the request to the appropriate cpu through an asynchronous IPI. This request is not currently optimized but it can theoretically be heavily optimized ('queued') to the point where the overhead becomes inconsequential. As of this commit the malloc_type information is not MP safe, but the core slab allocation and deallocation algorithms, non-inclusive the having to allocate the backing block, ARE MP safe. The core code requires no mutexes or locks, only a critical section.
Each zone contains N allocations of a fixed chunk size. For example, a 128K zone can hold approximately 16000 or so 8 byte allocations. The zone is initially zero'd and new allocations are simply allocated linearly out of the zone. When a chunk is freed it is entered into a linked list and the next allocation request will reuse it. The slab allocator heavily optimizes M_ZERO operations at both the page level and the chunk level.
The slab allocator maintains various undocumented malloc quirks such as ensuring that small power-of-2 allocations are aligned to their size, and malloc(0) requests are also allowed and return a non-NULL result. kern_tty.c depends heavily on the power-of-2 alignment feature and ahc depends on the malloc(0) feature. Eventually we may remove the malloc(0) feature.
Posted by zuzu on December 14, 2005 at 05:56 AM PST #