I've hit a nasty little bug which requires an orderly shutdown of the DS as the client is pounding it with traffic:
panic[cpu1]/thread=ffffff01d16d38e0: rw_destroy: lock still active, lp=ffffff01e60c6e08 wwwh=10 thread=ffffff01d16d38e0 ffffff00090a0b50 unix:rw_panic+6f () ffffff00090a0b70 unix:rw_destroy+33 () ffffff00090a0ba0 nfssrv:dserv_mds_instance_destroy+6d () ffffff00090a0bf0 genunix:kmem_cache_free_debug+29c () ffffff00090a0c50 genunix:kmem_cache_free+90 () ffffff00090a0c90 nfssrv:dserv_mds_instance_teardown+2b8 () ffffff00090a0cd0 unix:stubs_common_code+51 () ffffff00090a1eb0 nfs:nfssys+73 () ffffff00090a1f00 unix:brand_sys_syscall32+292 ()
Lock still active, I take that to mean that we are trying to destroy it as another thread has it held.
If we look at dserv_mds_instance_destroy, it only helps as far as getting us in the right source file. Strike that, as there is only one read/write lock, it also tells us which one. The issue is this snippet in dserv_instance_enter:
220 if (inst == NULL)
221 return (ESRCH);
222
223 rw_enter(&inst->dmi_inst_lock, lock_type);
224 /*
225 * dmi_teardown_in_progress is only set in one place,
226 * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
227 * is held as a WRITER, therefore, it is safe to check it without
228 * holding the dmi_content_lock.
229 */
230 if (inst->dmi_teardown_in_progress == B_TRUE) {
231 rw_exit(&inst->dmi_inst_lock);
232 if (lock_type == RW_READER)
233 return (EIO);
234 else if (lock_type == RW_WRITER)
235 /*
236 * This will protect from receiving multiple teardown
237 * commands happening at once.
238 */
239 return (EBUSY);
240 }
I thought the issue was that if we got past here, we had multiple references held to the lock by threads processing compounds. I.e., I've been dealing with refcounts and all problems look like refcounts will solve them. I had the refcount code halfway implemented and was looking to send a wakeup to try and get the writer going when I realized what the code was really trying to do.
If the instance had been removed from memory totally, then we would bail out on the first check at line 220. If it was in the process of tearing down, then the lock would be held by the WRITER, which would mean that all READERs had exited. I.e., the refcount of READERs had to be 0. No use tracking something that does not matter.
The problem had to be in that window between starting to tear down and removing the instance from the avl tree (look at the code in usr/src/uts/common/fs/nfs/dserv_mds.c for more context). And the most obvious thing is that a READER which comes along to check has to grab the lock to check. So, this code solves that:
216 bool_t grab_lock = FALSE;
...
223 if (inst == NULL)
224 return (ESRCH);
225
226 /*
227 * If dmi_teardown_in_progress is set, then we can't grab the
228 * lock. I.e., we are in the midst of either tearing it
229 * down or we have torn it down.
230 */
231 retry_with_lock:
232 if (grab_lock) {
233 /*
234 * Now we have to grab the lock and make sure that it is not
235 * true!
236 */
237 rw_enter(&inst->dmi_inst_lock, lock_type);
238 }
239
240 /*
241 * dmi_teardown_in_progress is only set in one place,
242 * dserv_mds_teardown_instance() and when doing so the dmi_inst_lock
243 * is held as a WRITER, therefore, it is safe to check it without
244 * holding the dmi_content_lock.
245 */
246 if (inst->dmi_teardown_in_progress == B_TRUE) {
247 if (grab_lock)
248 rw_exit(&inst->dmi_inst_lock);
249 if (lock_type == RW_READER)
250 return (EIO);
251 else if (lock_type == RW_WRITER)
252 /*
253 * This will protect from receiving multiple teardown
254 * commands happening at once.
255 */
256 return (EBUSY);
257 }
258
259 if (!grab_lock) {
260 grab_lock = TRUE;
261 goto retry_with_lock;
262 }
We will try to check inst->dmi_teardown_in_progress twice. The first time we will do it without the lock. The only way it can be 1 is if a tear down is in progress. We aren't going to keep the lock in that case, we are going to exit right away. But if it is 0 here, we have no way of knowing whether another thread just modified it. So in that case we have to grab the lock and check again.
I thought this would solve the problem, but I ended up with the same panic. The new issue can be found in rwlock(9F):
The rw_enter() function acquires the lock, and blocks if necessary. If enter_type is RW_READER, the caller blocks if there is a writer or a thread attempting to enter for writ- ing. If enter_type is RW_WRITER, the caller blocks if any thread holds the lock.
So say a WRITER comes along, it will wait until all of the READERS drain. At that point, we know the refcnt is 0. The problem is that if another READER comes along, it will now block until the WRITER is done. It will keep the lock active:
...
dmi_inst_lock = {
_opaque = [ 0x10 ]
}
...
[1]> ffffff01e60c6dc8::print dserv_mds_instance_t dmi_inst_lock|::rwlock
ADDR OWNER/COUNT FLAGS WAITERS
ffffff01e60c6e08 READERS=2 B000
What we have to do as a READER is try to grab the lock. If we have to block, then in this case only, we know tear-down has started!
[th199096@aus-build-x86 nfs]> grep dserv_instance_enter *c dserv_mds.c:dserv_instance_enter(krw_t lock_type, boolean_t create_instance, dserv_mds.c: * This function frees any of the locks taken by dserv_instance_enter dserv_mds.c: error = dserv_instance_enter(RW_WRITER, B_FALSE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_TRUE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst); dserv_mds.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst); dserv_server.c: error = dserv_instance_enter(RW_READER, B_FALSE, &inst);
So we could do this:
238 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
239 if (lock_type == RW_READER)
240 return (EIO);
241 else if (lock_type == RW_WRITER)
242 rw_enter(&inst->dmi_inst_lock, lock_type);
243 }
But I really hate doing this as it makes an assumption about there only ever being one reason to grab as a WRITER and no way to programmatically enforce it. Would a comment suffice?
What we really want to do is sleep and when awoken, retry to grab the lock. We would also have to get the instance pointer fresh.
I think a quick comment and this change will accomplish all I want:
238 if (rw_tryenter(&inst->dmi_inst_lock, lock_type) == 0) {
239 if (lock_type == RW_READER)
240 return (EAGAIN);
241 else if (lock_type == RW_WRITER)
242 rw_enter(&inst->dmi_inst_lock, lock_type);
243 }
I.e., let the caller try again if it wants to!
So code, build, test!
Thanks.
Posted by porno izle on June 28, 2009 at 05:38 PM CDT #