« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20090511 Monday May 11, 2009
Lock bug in DS teardown code

I've hit a nasty little bug which requires an orderly shutdown of the DS as the client is pounding it with traffic:

panic[cpu1]/thread=ffffff01d16d38e0: rw_destroy: lock still active, lp=ffffff01e60c6e08 wwwh=10 thread=ffffff01d16d38e0

ffffff00090a0b50 unix:rw_panic+6f ()
ffffff00090a0b70 unix:rw_destroy+33 ()
ffffff00090a0ba0 nfssrv:dserv_mds_instance_destroy+6d ()
ffffff00090a0bf0 genunix:kmem_cache_free_debug+29c ()
ffffff00090a0c50 genunix:kmem_cache_free+90 ()
ffffff00090a0c90 nfssrv:dserv_mds_instance_teardown+2b8 ()
ffffff00090a0cd0 unix:stubs_common_code+51 ()
ffffff00090a1eb0 nfs:nfssys+73 ()
ffffff00090a1f00 unix:brand_sys_syscall32+292 ()

Lock still active, I take that to mean that we are trying to destroy it as another thread has it held.

If we look at dserv_mds_instance_destroy, it only helps as far as getting us in the right source file. Strike that, as there is only one read/write lock, it also tells us which one. The issue is this snippet in dserv_instance_enter:

 220         if (inst == NULL)
 221                 return (ESRCH);
 222 
 223         rw_enter(&inst->dmi_inst_lock, lock_type);
 224         /*
 225          * dmi_teardown_in_progress is only set in one place,
 226          * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
 227          * is held as a WRITER, therefore, it is safe to check it without
 228          * holding the dmi_content_lock.
 229          */
 230         if (inst->dmi_teardown_in_progress == B_TRUE) {
 231                 rw_exit(&inst->dmi_inst_lock);
 232                 if (lock_type == RW_READER)
 233                         return (EIO);
 234                 else if (lock_type == RW_WRITER)
 235                         /*
 236                          * This will protect from receiving multiple teardown
 237                          * commands happening at once.
 238                          */
 239                         return (EBUSY);
 240         }

I thought the issue was that if we got past here, we had multiple references held to the lock by threads processing compounds. I.e., I've been dealing with refcounts and all problems look like refcounts will solve them. I had the refcount code halfway implemented and was looking to send a wakeup to try and get the writer going when I realized what the code was really trying to do.

If the instance had been removed from memory totally, then we would bail out on the first check at line 220. If it was in the process of tearing down, then the lock would be held by the WRITER, which would mean that all READERs had exited. I.e., the refcount of READERs had to be 0. No use tracking something that does not matter.

The problem had to be in that window between starting to tear down and removing the instance from the avl tree (look at the code in usr/src/uts/common/fs/nfs/dserv_mds.c for more context). And the most obvious thing is that a READER which comes along to check has to grab the lock to check. So, this code solves that:

 216         bool_t grab_lock = FALSE;
...
 223         if (inst == NULL)
 224                 return (ESRCH);
 225 
 226         /*
 227          * If dmi_teardown_in_progress is set, then we can't grab the
 228          * lock. I.e., we are in the midst of either tearing it
 229          * down or we have torn it down.
 230          */
 231 retry_with_lock:
 232         if (grab_lock) {
 233                 /*
 234                  * Now we have to grab the lock and make sure that it is not
 235                  * true!
 236                  */
 237                 rw_enter(&inst->dmi_inst_lock, lock_type);
 238         }
 239 
 240         /*
 241          * dmi_teardown_in_progress is only set in one place,
 242          * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
 243          * is held as a WRITER, therefore, it is safe to check it without
 244          * holding the dmi_content_lock.
 245          */
 246         if (inst->dmi_teardown_in_progress == B_TRUE) {
 247                 if (grab_lock)
 248                         rw_exit(&inst->dmi_inst_lock);
 249                 if (lock_type == RW_READER)
 250                         return (EIO);
 251                 else if (lock_type == RW_WRITER)
 252                         /*
 253                          * This will protect from receiving multiple teardown
 254                          * commands happening at once.
 255                          */
 256                         return (EBUSY);
 257         }
 258 
 259         if (!grab_lock) {
 260                 grab_lock = TRUE;
 261                 goto retry_with_lock;
 262         }

We will try to check inst->dmi_teardown_in_progress twice. The first time we will do it without the lock. The only way it can be 1 is if a tear down is in progress. We aren't going to keep the lock in that case, we are going to exit right away. But if it is 0 here, we have no way of knowing whether another thread just modified it. So in that case we have to grab the lock and check again.

I thought this would solve the problem, but I ended up with the same panic. The new issue can be found in rwlock(9F):

The rw_enter() function acquires the lock, and blocks if necessary. If enter_type is RW_READER, the caller blocks if there is a writer or a thread attempting to enter for writ- ing. If enter_type is RW_WRITER, the caller blocks if any thread holds the lock.

So say a WRITER comes along, it will wait until all of the READERS drain. At that point, we know the refcnt is 0. The problem is that if another READER comes along, it will now block until the WRITER is done. It will keep the lock active:

...
    dmi_inst_lock = {
        _opaque = [ 0x10 ]
    }
...
[1]> ffffff01e60c6dc8::print dserv_mds_instance_t dmi_inst_lock|::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffff01e60c6e08        READERS=2  B000

What we have to do as a READER is try to grab the lock. If we have to block, then in this case only, we know tear-down has started!

[th199096@aus-build-x86 nfs]> grep dserv_instance_enter *c
dserv_mds.c:dserv_instance_enter(krw_t lock_type, boolean_t create_instance,
dserv_mds.c: * This function frees any of the locks taken by dserv_instance_enter
dserv_mds.c:    error = dserv_instance_enter(RW_WRITER, B_FALSE, &inst);
dserv_mds.c:    error = dserv_instance_enter(RW_READER, B_TRUE, &inst);
dserv_mds.c:    error = dserv_instance_enter(RW_READER, B_TRUE, &inst);
dserv_mds.c:    error = dserv_instance_enter(RW_READER, B_TRUE, &inst);
dserv_mds.c:    error = dserv_instance_enter(RW_READER, B_FALSE, &inst);
dserv_mds.c:    error = dserv_instance_enter(RW_READER, B_FALSE, &inst);
dserv_server.c:         error = dserv_instance_enter(RW_READER, B_FALSE, &inst);

So we could do this:

 238                 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
 239                         if (lock_type == RW_READER)
 240                                 return (EIO); 
 241                         else if (lock_type == RW_WRITER)
 242                                 rw_enter(&inst->dmi_inst_lock, lock_type);
 243                 }               

But I really hate doing this as it makes an assumption about there only ever being one reason to grab as a WRITER and no way to programmatically enforce it. Would a comment suffice?

What we really want to do is sleep and when awoken, retry to grab the lock. We would also have to get the instance pointer fresh.

I think a quick comment and this change will accomplish all I want:

 238                 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
 239                         if (lock_type == RW_READER)
 240                                 return (EAGAIN); 
 241                         else if (lock_type == RW_WRITER)
 242                                 rw_enter(&inst->dmi_inst_lock, lock_type);
 243                 }               

I.e., let the caller try again if it wants to!

So code, build, test!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/lock_bug_in_ds_teardown
Comments:

Thanks.

Posted by porno izle on June 28, 2009 at 05:38 PM CDT #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed