
Monday June 13, 2005
Dispatcher locks and Bug 5017148
Dispatcher locks and Bug 5017148
As part of the opensolaris release, I'm going to describe about the dispatcher
locks, thread locks and a bug which I root-caused last year. The investigation
didn't take much time, but it was an interesting one because door does magic
in the kernel at the time of handoff to other thread (client to server
or server to client). So let me begin with what's a dispatcher lock:
1. What's a dispatcher lock
Dispatcher lock is a one byte lock (disp_lock_t) which is acquired
at high pil (DISP_LEVEL) and DISP_LEVEL
is the interrupt level at which dispatcher operations should be performed.
There are other symbolic interrupt levels viz. CLOCK_LEVEL and LOCK_LEVEL
in machlock.h
Following are the interfaces for dispatcher lock which are described
in disp_lock.c
disp_lock_init()
initializes dispatcher lock.
disp_lock_destroy()
destroys dispatcher lock.
disp_lock_enter()
acquires dispatcher lock.
disp_lock_exit()
releases dispatcher lock and checks for kernel preemption.
disp_lock_exit_nopreempt()
releases dispatcher lock without checking for kernel preemption.
disp_lock_enter_high()
acquires another dispatcher lock when the thread is already holding
a dispatcher lock.
disp_lock_exit_high()
releases the top level dispatcher lock.
Here are the facts about dispatcher locks :-
(a) Being a spin lock which are acquired at high level, dispatcher
locks should be acquired for a short duration and shouldn't make blocking
calls.
(b) While releasing dispatcher lock, you can be preempted if
cpu_kprunrun
(kernel preemption) is set. You can use disp_lock_exit_nopreempt()
if you don't want to be preempted.
(c) While holding dispatcher lock, you are not preemptible.
(d) Since dispatcher lock raises pil to DISP_LEVEL, the old
pil is saved in t_oldspl
of the thread structure (kthread_t)
2. What's a thread lock
Thread lock is a per-thread entity which protects t_state
and state-related flags of a kernel thread. Thread lock hangs off kthread_t
as t_lockp. t_lockp
is a pointer to thread dispatcher lock and the pointer is changed whenever
the state of the kernel thread is changed. One would acquire thread
lock using thread_lock()
routine giving the kernel thread pointer. thread_lock() is responsible
for getting the correct dispatcher lock for the thread. The dance
done by thread_lock() is interesting because t_lockp is pointer and can
get changed during the course of spinning for a dispatcher lock. Hence
thread_lock() saves t_lockp pointer and ensures that we acquire the right
thread lock.
Now lets take a look at the interfaces in Solaris kernel which
are described in disp_lock.c
and thread.h
thread_lock() is called to require thread lock.
thread_unlock() is called to release thread lock and it checks
for kernel preemption.
thread_lock_high() is called to acquire another thread lock
while holding one.
thread_unlock_high() is called to release thread lock while
holding one.
thread_unlock_nopreempt() is called to release thread lock without
checking for kernel preemption.
3. Various types of thread locks in Solaris Kernel
Now that I've described about thread lock, it's very important
for us to understand what dispatcher locks are acquired depending upon
the state of the thread. In order to find out this, you need to first
understand the one-to-one mapping between the state of the thread and
it's corresponding dispatcher lock:
TS_RUN
(runnable) --->
disp_lock
of the dispatch queue in a CPU (cpu_t)
or global preemption queue of a CPU partition
TS_ONPROC
(running ) ---> cpu_thread_lock
in a CPU (cpu_t)
TS_SLEEP
(sleep)
---> sleepq
bucket lock or turnstile
chain lock
TS_STOPPED
(stopped) ---> stop_lock
(a global dispatcher lock) for stopped threads.
There're two global dispatcher locks: shuttle_lock
and transition_lock
in Solaris Kernel. When thread lock of a thread is pointing to shuttle_lock,
it means that the thread is sleeping on a door and when thread lock
points to transition_lock, it means that thread is in transition to another
state (for instance when the state of the thread sleeping on a semaphore
is changed from TS_SLEEP to TS_RUN or during yield()).
transition_lock is always held and is never released.
4. Examples of thread lock
Now lets understand what all thread locks will be involved from
wakeup (or unsleep) to onproc (running) of a thread. Lets assume
that T1 (thread 1) is blocked on a condition variable CV1 and T2 (thread
2) signals T1 as part of wakeup. First cv_signal()
grabs sleepq bucket lock and decrements the waiters count on CV1. It
then calls sleepq_wakeone_chan()
to wakeup T1. sleepq_wakeone_chan()'s
responsibility is to unlink T1 from the sleepq list (using t_link of
kthread_t) and calls CL_WAKEUP
(scheduling class specific wakeup routine). Assuming T1 is in time sharing
class (TS), ts_wakeup()
gets called. Now ts_wakeup()
which in turn calls dispatcher enqueue routine (setfrontdq() or
setbackdq()) changes the state of T1 thread to TS_RUN and changes t_lockp
to point to disp_lock of the chosen CPU. At last sleepq_wakeone_chan()
drops disp_lock of the dispatch queue and finally sleepq dispatcher
lock is also released in cv_signal().
Once T1 is chosen to run, disp()
removes T1 from the dispatch queue of the CPU and changes the state
to TS_ONPROC and t_lockp to cpu_thread_lock of the CPU.
void cv_signal(kcondvar_t *cvp) { condvar_impl_t *cp = (condvar_impl_t *)cvp;
/* make sure the cv_waiters field looks sane */ ASSERT(cp->cv_waiters <= CV_MAX_WAITERS); if (cp->cv_waiters > 0) { sleepq_head_t *sqh = SQHASH(cp); disp_lock_enter(&sqh->sq_lock); ASSERT(CPU_ON_INTR(CPU) == 0); if (cp->cv_waiters & CV_WAITERS_MASK) { kthread_t *t; cp->cv_waiters--; t = sleepq_wakeone_chan(&sqh->sq_queue, cp); /* * If cv_waiters is non-zero (and less than * CV_MAX_WAITERS) there should be a thread * in the queue. */ ASSERT(t != NULL); } else if (sleepq_wakeone_chan(&sqh->sq_queue, cp) == NULL) { cp->cv_waiters = 0; } disp_lock_exit(&sqh->sq_lock); } }
The second example is from the phase of preemption. We know that
there are two types of preemption in Solaris kernel viz. user preemption
(cpu_runrun) and kernel preemption (cpu_kprunrun). Assume that T1 is
being preempted in favour of a high priority thread. As a result T1
will call preempt()
once T1 realizes that it has to give up the CPU (there're hooks in Solaris
kernel to determine this). preempt()
first grabs thread lock effectively cpu_thread_lock on itself and calls
THREAD_TRANSITION()
to change the t_lockp to transition_lock. Note that the state of T1
is still TS_ONPROC while t_lockp is pointing to transition_lock, because
T1 is in transition phase (from TS_ONPROC -> TS_RUN). THREAD_TRANSITION()
also releases previous dispatcher lock because transition_lock is always
held. preempt()
then calls CL_PREEMPT(), scheduling class specific preemption routine,
to enqueue T1 on a particular CPU. From here on it's same as described
in the first example.
void preempt() { kthread_t *t = curthread; klwp_t *lwp = ttolwp(curthread);
if (panicstr) return;
TRACE_0(TR_FAC_DISP, TR_PREEMPT_START, "preempt_start");
thread_lock(t);
if (t->t_state != TS_ONPROC || t->t_disp_queue != CPU->cpu_disp) { /* * this thread has already been chosen to be run on * another CPU. Clear kprunrun on this CPU since we're * already headed for swtch(). */ CPU->cpu_kprunrun = 0; thread_unlock_nopreempt(t); TRACE_0(TR_FAC_DISP, TR_PREEMPT_END, "preempt_end"); } else { if (lwp != NULL) lwp->lwp_ru.nivcsw++; CPU_STATS_ADDQ(CPU, sys, inv_swtch, 1); THREAD_TRANSITION(t); CL_PREEMPT(t); DTRACE_SCHED(preempt); thread_unlock_nopreempt(t);
TRACE_0(TR_FAC_DISP, TR_PREEMPT_END, "preempt_end");
swtch(); /* clears CPU->cpu_runrun via disp() */ } }
5. An example of a dispatcher lock and Bug 5017148.
Apart from illustrating dispatcher lock, I'll also describe
a problem which I had found a while back. This's involves kernel door
implementation too.
I usually begin with looking at what CPUs are doing whenever
I take a look at a crash dump from a system hang:
> ::cpuinfo
ID ADDR
FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD
PROC
0 0001041d2b0 1b
1
0 60
no no
t-0
3001ba04900 cluster
1 30019fe4030 1d
2
0 101 no
no t-0
3003d873a40 rgmd
2 3001a38aab8 1d
1 0
165 yes
yes t-0
2a1003ebd20 sched
3 0001041b778 1d
2
0 60 yes
yes t-0
3004fac3c80 cluster
CPU 0 is spinning for a mutex 0x30001d7cae0 which is held by
thread 0x3004fac3c80 running on CPU 3. Please note that thread will spin
for a mutex only when the owner is running and in this case owner of
the mutex happens to be onproc on CPU 3.
> 0x30001d7cae0$<mutex
0x30001d7cae0: owner/waiters
3004fac3c80
>
CPU 3 is our clock interrupt CPU (run ::cycinfo -v and figure
out where the clock handler is registered) and thread 0x3004fac3c80
on CPU 3 seems to be spinning in cv_block()
for sleepq bucket lock (sleepq_head[]).
In order to find out which sleepq bucket this thread is looking for,
we can look at wait chanel t_wchan
(t_lwpchan.lc_wchan) and using the hash function SQHASH(),
I found out the right bucket. Since we're already holding thread lock
(effectively cpu_thread_lock of CPU 3) and looking for sleepq bucket lock,
this would have blocked clock interrupts too. This can be verifyed from
the pending clock interrupts in ::cycinfo -v.
Lets disassemble cv_block() thread 3004fac3c80 is stuck
cv_block+0x9c:
add %i2, 8, %i0
cv_block+0xa0:
call -0x460e0 <disp_lock_enter_high>
cv_block+0xa4:
mov %i0, %o0
> 0x3004fac3c80::print kthread_t t_lockp
t_lockp = cpu0+0xb8
> cpu0=J
1041b778
// CPU 3
> 0x3004fac3c80::print kthread_t ! grep wchan
lc_wchan = 0x3006fc52d20
And the sleepq bucket happens to be :-
> 0x10471d88::print sleepq_head_t
{
sq_queue = {
sq_first = 0x3001b476ee0
}
sq_lock = 0xff
<----- dispatcher lock is held
}
Thread 3003d873a40 running on CPU 1 is spinning in thread_lock_high().
> 3003d873a40::findstack
stack pointer for thread 3003d873a40: 2a1025964a1
[ 000002a1025964a1 panic_idle+0x1c() ]
000002a102596551 prom_rtt()
000002a1025966a1 thread_lock_high+0xc()
000002a102596751 sema_p+0x60()
000002a102596801 kobj_open+0x84()
000002a1025968d1 kobj_open_file+0x44()
[.]
000002a102597011 xdoor_proxy+0x20c()
000002a1025971f1 door_call+0x204()
000002a1025972f1 syscall_trap32+0xa8()
>
Now this's an interesting stack. Looking at the sema_p() code,
we see that we first grab the sleepq bucket lock and then try
to grab thread lock.
Since the hashing function SQHASH() would return the same index
for 0x3006fc52d20 and 0x300819f3118, we see that sema_p() getting stuck
on the thread lock which is held by thread running on CPU 3 and thread
running on CPU 3 is stuck because sleep queue bucket lock is held by thread
running on CPU 1.
> 0x3003d873a40::print kthread_t t_lockp
t_lockp = cpu0+0xb8
> cpu0+0xb8/x
cpu0+0xb8: ff00
Now lets find out the real problem of this deadlock. Lets
look t_cpu of thread 0x3003d873a40 and we see that thread 0x3003d873a40
running on CPU 1 has t_lockp pointing to CPU 3's cpu_thread_lock. This's
really nasty as we would expect it to point to CPU 1's cpu_thread_lock.
> 0x3003d873a40::print kthread_t ! grep cpu
t_bound_cpu = 0
t_cpu = 0x30019fe4030
t_lockp = cpu0+0xb8
// CPU 3's cpu_thread_lock
t_disp_queue = cpu0+0x78
The cause of this problem is that the door_get_server(),
while doing the handoff to server thread, is getting preempted because
disp_lock_exit() checks for kernel preemption.
static kthread_t * door_get_server(door_node_t *dp) { [.] /* * Mark the thread as ONPROC and take it off the list * of available server threads. We are committed to * resuming this thread now. */ disp_lock_t *tlp = server_t->t_lockp; cpu_t *cp = CPU;
pool->dp_threads = server_t->t_door->d_servers; server_t->t_door->d_servers = NULL; /* * Setting t_disp_queue prevents erroneous preemptions * if this thread is still in execution on another processor */ server_t->t_disp_queue = cp->cpu_disp; CL_ACTIVE(server_t); /* * We are calling thread_onproc() instead of * THREAD_ONPROC() because compiler can reorder * the two stores of t_state and t_lockp in * THREAD_ONPROC(). */ thread_onproc(server_t, cp); disp_lock_exit(tlp); return (server_t); [.]
As a result server thread's t_lockp points to incorrect cpu_thread_lock
because client thread started running on different CPU when client thread
did shuttle_resume()
to server thread. We can see that door_return()
(which return the results to the caller) releases dispatcher lock without
getting preempted, so we didn't notice this problem in door_return().
On the move for cracking another problem now...In fact we don't get sleep
if we don't take a look at the crash dump :-)
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
(2005-06-13 12:00:00.0)
Permalink
Trackback URL: http://blogs.sun.com/saurabh_mishra/entry/dispatcher_locks_and_bug_5017148
|