Weblog

All | General | Solaris
20050614 Tuesday June 14, 2005

kernel memory corruption detected !! and root caused kernel memory corruption detected !! and root caused

I worked the following really interesting bug (rather, root caused) a while back. This bug was reported in one of the late builds of Solaris 10 and it came with a reproducible test case (I love bugs that are reproducible :-)).
This results in system panic and panics/hangs are not anyway palatable. So this had to be fixed before FCS (First Customer Shipment).

Solaris has the facility to collect the crash file  (even though  I take this for granted,  it  came as a surprise to me that some contemporary unix clones don't have this facility) - so whenever a solaris system panics/hangs,  it takes a snapshot of the kernel pages (dumpadm(1)) onto the swap device, which is later extracted into /var/crash/vmcore.? while coming up.

This is what is spewed on the console when the problem happened:


kernel memory allocator: buffer modified after being freed
modification occurred at offset 0x748 (0xdeadbeefdeadbeef replaced by
0xdeadbed7deadbeef)
buffer=c9d701f8 bufctl=cb9ebce8 cache: process_cache
previous transaction on buffer c9d701f8:
thread=d8994000 time=T-0.006178876 slab=cbdef500 cache: process_cache
kmem_cache_free_debug+f5
kmem_cache_free+2c
pid_exit+cb
freeproc+471
waitid+22a
waitsys+19

panic[cpu0]/thread=d86ce800: kernel heap corruption detected

cd2a9ea4 genunix:kmem_error+3e6 (0, c8a0b6f0, c9d701)
cd2a9ed4 genunix:kmem_cache_alloc_debug+2cf (c8a0b6f0, c9d701f8,)
cd2a9f00 genunix:kmem_cache_alloc+129 (c8a0b6f0, 0)
cd2a9f34 genunix:getproc+5b (cd2a9f6c, 0)
cd2a9f70 genunix:cfork+8f (0, 1, cd2a9fac, fe8)
cd2a9f80 genunix:fork1+c (1cce90, fec1f3d4, 3)

Solaris kernel memory allocator provides excellent debug features and one such is, it fills up a freed data structure with 'deadbeef' pattern such that when reallocating the same data structure later on, it can verify that contents are still 'deadbeef' and if not, it is an indication that some thread accessed the freed memory - sign for trouble.

The same thing happened above, first stack points out the path leading to the freeing of the buffer, along with when it was freed, what cache the buffer belongs to.

The 2nd stack points out, what problem it detected with the buffer when it tried allocating that. Ok, so its clear that process structure (proc_t of usr/src/uts/common/sys/proc.h) got modified after being freed, at offset 0x74c and this corresponds to:

  > ::print -a proc_t
[.]
748 p_server_threads {
748 dp_threads
74c dp_cv {
74c _opaque <--- this one.
}
}

As the problem is reproducible, I collected multiple crash files and all showed the same symptom - i.e corruption at the same offset in the proc_t - So this confirms that it is not some random memory corruption but some thread which knows what it's modifying.

Referring to the test case, what it does is, it creates a door server and forks off a few children which will bombard the server with door calls. Doors is an interesting/faster IPC mechanim available in solaris. All the server threads which serve a door hang off from p_server_threads (refer to the door_create(3DOOR) man page for details on private door for which a separate pool of server threads is maintained).

Given that the problem is
            - happening at offset 0x74c
            - test case creates a door server and slams it hard
            - and we now know what 'p_server_threads' is all about,
it is clear that this process structure once was a door server process, and door clients could potentially be doing a cv_signal()/cv_broadcast() after the server is gone.

But how is this possible? Given that as part of exit, server revokes the doors in proc_exit()

        /*
* revoke any doors created by the process.
*/
if (p->p_door_list)
door_exit();

As part of door_exit():

        
for (dp = p->p_door_list; dp != NULL; dp = dp->door_list) {
dp->door_target = NULL;
dp->door_flags |= DOOR_REVOKED;
if (dp->door_flags & DOOR_PRIVATE)
cv_broadcast(&dp->door_servers.dp_cv);
}
cv_broadcast(&p->p_server_threads.dp_cv);
/* Clear the list */
p->p_door_list = NULL;



it is setting the DOOR_REVOKED flag and doing a broadcast on the dp_cv to wake up any client threads waiting for server threads in door_get_server():
        while ((server_t = pool->dp_threads) == NULL) {
if (!cv_wait_sig_swap(&pool->dp_cv, &door_knob))
return (NULL); /* Got a signal */
if (DOOR_INVALID(dp))
return (NULL); /* Target is invalid now */

So once the clients are woken up from cv_wait_sig_swap() because of broadcast from door_exit(), they check for the validity of door and return NULL as the door is just revoked.

So on the surface, this doesn't look possible - but it is happening !!

To confirm that it is the cv_signal() from the clients that is modifying the server's proc structure after it is freed, I built a kernel with process_cache, aligned on page boundary. It is quite simple in solaris - kmem_cache_create() accepts alignment as an option, so just pass on the page size to the following process_cache create in main()

        /*
* Create kmem cache for proc structures
*/
process_cache = kmem_cache_create("process_cache", sizeof (proc_t),
PAGESIZE , NULL, NULL, NULL, NULL, NULL, 0);

and added a call to hat_unload() in pid_exit() for the process structure.

The new kernel panicked with:

0xc7a4fca8:     misc40: 
0xca529ae8: Page fault (#pf)
0xca5295a8: Bad kernel fault at addr=0xd112674c
0xca529e68: pid=112566, pc=0xfe88fceb, sp=0xd0e67000, eflags=0x10202
0xca529768:
0xca536b2b: d14eedc4 unix:die+b1 (e, d14eee60, d112674c, 3)
0xca5367ab: d14eee4c unix:trap+a33 (d14eee60, d112674c, 3)
0xca53626b: d14eee60 unix:_cmntrap+c9 ()
0xc8c79cab: d14eeeb0 genunix:cv_signal+9 (d112674c)
0xc8c7976b: d14eeed8 genunix:cv_wait_sig_swap+228 (d112674c, c8d92fe0)
0xc84225eb: d14eeef8 doorfs:door_get_server+4d (ceba9080)
0xc84227ab: d14eef70 doorfs:door_call+c8 (3, c27f17a8, d14eefac, fe801925, 3, c27f17a8)
0xc8422ceb: d14eef80 doorfs:doorfs+9f (3, c27f17a8, 0, 0, 0, 3)


Aha, All along I was looking for cv_signal() calls from within doorfs code, but here is a  cv_signal() coming from cv_wait_sig_swap() - interesting. Referring to the code, cv_wait_sig_swap() does not call cv_signal() unless TS_SIGNALLED flag is set - and this is getting set only in sleepq_wakeone_chan() and cv_signal()  is the only caller of sleepq_wakeone_chan() - panic thread has this flag set :

> d1604400::print kthread_t t_schedflag
t_schedflag = 0x13 (TS_LOAD|TS_DONT_SWAP|TS_SIGNALLED)
>

and the only cv_signal() on dp_cv, is coming from door_release_server() !!

So after more code reading, it turns out that this is what is happening;

1- door server threads in door_release_server() signal the waiting clients.
2- client threads get onto the run Q. At this stage door server doesn't not have any clue that there are some client threads in door_call on run Q (this is a stress test, where hundreds of processes are running on a 2way system, so things change very fast).
3- Server calls door_revoke_all() to revoke all the doors and one of the server threads through door_slam() sets the door_target to NULL. cv_broadcast() in door_revoke_all() sets, any other waiting client threads to run; and exits normally.
4- If any of the client threads, made runnable at step-3, get to run, it results in door_call() to fail, and as per the test case, if a door_call() of a thread fails, it issues an exit() rather than a thr_exit().
5- exit() results in setting SEXITLWPS flag being set on client proc.
6- Now if threads made runnable at step-2, get onto the cpu, they will execute the following piece of code in cv_wait_sig_swap() :

	signalled = (t->t_schedflag & TS_SIGNALLED);
t->t_flag &= ~T_WAKEABLE;
/* TS_DONT_SWAP set by disp() */
ASSERT(curthread->t_schedflag & TS_DONT_SWAP);
mutex_enter(mp);
if (ISSIG_PENDING(t, lwp, p)) {
mutex_exit(mp);
if (issig(FORREAL))
rval = 0;
mutex_enter(mp);
}
if (lwp->lwp_sysabort || MUSTRETURN(p, t))
rval = 0;
lwp->lwp_asleep = 0;
lwp->lwp_sysabort = 0;
if (rval == 0 && signalled) /* avoid consuming the cv_signal() */
cv_signal(cvp);



signalled is TRUE as these threads are cv_signal()ed; 'rval' is '0' as issig(FORREAL) returns TRUE becasue of SEXITLWPS. The last cv_signal()  results in modifying the proc structure which is long gone. So depending on how many threads were cv_signal()ed in door_release_server() 'deadbeef' will come down by that number. Changing the test case to call thr_exit() instead of exit() upon failure of door_call() is preventing this problem...

Here is the test case that reproduces the above problem.
host> cat misc40.c
#include <stdio.h>
#include <errno.h>
#include <sys/door.h
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>

#define NPROC 10
#define NTHREAD 100
#define NITERS 100
#define PASS 0
#define FAIL -1

int nthreads = NTHREAD;
int nproc = NPROC;
int niter = NITERS;
int did_lock;

void * convert(void *);
void lock_server_stuff(void *, char *, int, door_desc_t *, int);
void do_threads();
pid_t childs[NPROC];

/*
* Slam a server as hard as possible
*/
int
main(int argc, char *argv[])
{
char buf[1024];
int ret;
int i;
int was_fail;
int childstat;

if(argc > 1)
niter = atoi(argv[1]);
if(argc > 2)
nproc = atoi(argv[2]);
if(argc > 3)
nthreads = atoi(argv[3]);

sprintf(buf," NPROC %d NTHREAD %d NITERS %d", NPROC, NTHREAD, NITERS);

if ((did_lock = door_create(lock_server_stuff, 0xdeadbeef, DOOR_UNREF)) < 0) {
sprintf(buf," did_lock %d errno %d", did_lock, errno);

return FAIL;
}

for (i = 0; i < nproc; i++) {
if ((childs[i] = fork1()) == 0) {
do_threads();
}
}

/* Wait for an unref to be delivered */
was_fail = 0;
for (i = 0; i < nproc; i++) {
ret = waitpid(childs[i], &childstat, 0);
if (ret != childs[i] ||
!WIFEXITED(childstat) ||
WEXITSTATUS(childstat) != PASS) {

was_fail++;
}
}

/*
* Wait 10 seconds. The server should get argp == DOOR_UNREF_DATA
* and exit(PASS)
*/
sleep(10);

/* Server wasn't exit */
if (was_fail) {
return FAIL;
}
return PASS;
}

char *s1 =
"After Seven Days, He was Quite Tired, So God Said, \
Let There be Day, Just for Picnics, with Wine and Bread,\
He Gathered Up Some People he Had Made,\
Created Blankets and laid back in the Shade.\
The People Sipped their wine , And what with God there,\
Asked him Questions, Like:\
Do you have to Eat? Or get your Hair Cut,\
In Heaven?\
And if your Eye got poked Out in This Life,\
Would it be waiting there in Heaven with your Wife?\
God Shuffled his Feet,\
Glanced around, at them\
The People Cleared their throats,\
And stared right back,\
At him";

/* Client code */
void *
convert(void * frog)
{
char buf[1024];
int ret;
int i;
char buffer[1024];

for (i = 0; i < niter; i++) {
door_arg_t da;

da.rbuf = buffer;
da.rsize = sizeof(buffer);
da.data_ptr = s1;
da.data_size = strlen(s1) + 1;
da.desc_ptr = 0;
da.desc_num = 0;

/* After 'lock_server_stuff: exit(PASS)' door_call return -1 */
/* (errno is not change) */
errno = 0;
if ((ret = door_call(did_lock, &da)) < 0 && errno != 0) {
sprintf(buf," ret %d errno %d (i %d)", ret, errno, i);

exit (FAIL);
}
}
}

int invocations;

/* Server code */
void
lock_server_stuff(void *c, char *argp, int a_size, door_desc_t *dp, int n_did)
{
char *p = argp;

if (argp == DOOR_UNREF_DATA) {

exit(PASS);
}

invocations++; /* Not protected */
/* Convert the string to upper case */
while (*p) {
if (*p >= 'a' && *p <= 'z')
*p -= ('a' - 'A');
p++;
}
door_return(argp, a_size, 0, 0);
}


void
do_threads()
{
char buf[1024];
int max_threads;
int i;

thr_setconcurrency(20);
/*
* Create a bunch of threads to pound on the server
*/
for (i = 0; i < nthreads; i++) {
if (thr_create(NULL, 8192, convert, NULL, 0, 0) != 0) {
sprintf(buf," max_threads %d ", i);
break;
}
}
max_threads = i;
for (i = 0; i < max_threads; i++) {
thr_join(0, 0, 0);
}

exit(PASS);
}
host>

This is fixed in build_71 of s10 as part of bug# 5105724.

Technorati Tag:
Technorati Tag:
( Jun 14 2005, 08:44:22 AM PDT ) Permalink Comments [3]

20050510 Tuesday May 10, 2005

Joining the party Ok, with this post, I am joining the bandwagon of bloggers. My name is Surya Prakki and I work in Solaris sustaining - this job involves, mostly, fielding escalations (bugs which customers hit get escalated) and providing relief to the customers. More later. ( May 10 2005, 03:17:25 AM PDT ) Permalink Comments [0]

Calendar

RSS Feeds

Search

Links

Navigation

Referers