I've got a bug for which I do not have a reproducible test case. But I'm pretty confident I found what is going wrong. I can run regression tests to show I haven't broken anything - but those same regression tests never tripped the bug in the first place.
The fix is:
------- usr/src/uts/common/fs/nfs/nfs4_stub_vnops.c -------
Index: usr/src/uts/common/fs/nfs/nfs4_stub_vnops.c
23c23
< * Copyright 2007 Sun Microsystems, Inc. All rights reserved.
---
> * Copyright 2008 Sun Microsystems, Inc. All rights reserved.
27c27
< #pragma ident "@(#)nfs4_stub_vnops.c 1.3 07/10/25 SMI"
---
> #pragma ident "%Z%%M% %I% %E% SMI"
1751a1757,1769
> * Someone is already working on it. We
> * need to back off and let them proceed.
> *
> * We return EBUSY so that the caller knows
> * something is going on. Note that by that
> * time, the umount in the other thread
> * may have already occured.
> */
> if (was_locked) {
> return (EBUSY);
> }
>
> /*
1762,1763c1780
< if (was_locked == FALSE &&
< !mutex_tryenter(&net->net_tree_lock)) {
---
> if (!mutex_tryenter(&net->net_tree_lock)) {
1814c1831
< } else if (was_locked == FALSE) {
---
> } else {
In English, the lock detection used to only handle when the lock was not being held.
What I want to do is force was_locked to be true at this point in both the original code (to verify I can trigger the panic at will) and also in my fix (to verify I have fixed the correct bug).
I can do that by adding the following code:
# wx diffs ------- usr/src/uts/common/fs/nfs/nfs4_stub_vnops.c ------- 122a123,124 > int nfsv4_mm_was_locked = FALSE; > 1750a1753,1755 > if (nfsv4_mm_was_locked) > was_locked = TRUE;
This is a global in the nfs module which by default does not force was_locked to be set. I can use mdb to change it on the fly:
# mdb -kw Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs mpt ip hook neti sctp arp usba fctl nca lofs cpc random zfs nfs fcip logindmux ptm sppp ] > nfsv4_mm_was_locked::print 0 > nfsv4_mm_was_locked/W 1 nfsv4_mm_was_locked: 0 = 0x1 > nfsv4_mm_was_locked::print 0x1 > $q
Note that I do not want to add special code to check the environment, add something in /etc/default/nfs, or anything else which requires changing anything on a system. I leave it entirely in the kernel and I use mdb to control it.