« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20090505 Tuesday May 05, 2009
Debugging a NFSv4.1 server database reference count problem

I'm in the process of making sure the MDS cleans up memory when a DS reboots. As such, I've been trying to figure out where a refcnt is held or released for a rfs4_dbe. I added two circular buffers to keep track of these events.

I just did 10 back-to-back cthon test runs and turned my head away. I was trying to figure out if mds_gather_devs leaked ds_addrlist_ts. The answer is no, but I did get a panic.

[root@pnfs-17-24 ~]> 
panic[cpu1]/thread=ffffff01d16dee80: assertion failed: e->refcnt > 1, file: ../../common/fs/nfs/nfs4_db.c, line: 193

ffffff00094836d0 genunix:assfail+7e ()
ffffff0009483700 nfssrv:rfs4_dbe_rele+98 ()
ffffff0009483720 nfssrv:rfs41_session_rele+1a ()
ffffff0009483740 nfssrv:rfs41_compound_state_free+8a ()
ffffff00094837d0 nfssrv:rfs41_dispatch+1bd ()
ffffff0009483820 nfssrv:rfs4_minor_version_dispatch+5c ()
ffffff0009483b20 nfssrv:common_dispatch+7a6 ()
ffffff0009483b40 nfssrv:rfs_dispatch+2d ()
ffffff0009483c30 rpcmod:svc_getreq+20d ()
ffffff0009483ca0 rpcmod:svc_run+197 ()
ffffff0009483cd0 rpcmod:svc_do_run+81 ()
ffffff0009484eb0 nfs:nfssys+9f1 ()
ffffff0009484f00 unix:brand_sys_syscall32+292 ()

panic: entering debugger (continue to save dump)

And this is probably something I added - not something already existing.

So I have these arrays, how do I figure out what is going on?

[1]> $C
fffffffffbc65740 kmdb_enter+0xb()
fffffffffbc65760 debug_enter+0x38(fffffffffb961ea0)
fffffffffbc65830 panicsys+0x40e(fffffffffbf50518, ffffff0009483660, 
fffffffffbc65840, 1)
ffffff00094835a0 vpanic+0x15d()
ffffff0009483690 panic+0x94()
ffffff00094836d0 assfail+0x7e(fffffffff81e4910, fffffffff81e48f0, c1)
ffffff0009483700 nfssrv`rfs4_dbe_rele+0x98(ffffff01e678fad8)
ffffff0009483720 nfssrv`rfs41_session_rele+0x1a(ffffff01e678fb68)
ffffff0009483740 nfssrv`rfs41_compound_state_free+0x8a(ffffff027ffb7c20)
ffffff00094837d0 nfssrv`rfs41_dispatch+0x1bd(ffffff0009483bb0, ffffff01f18ab940
, ffffff0009483880)
ffffff0009483820 nfssrv`rfs4_minor_version_dispatch+0x5c(ffffff0009483bb0, 
ffffff01f18ab940, ffffff0009483880)
ffffff0009483b20 nfssrv`common_dispatch+0x7a6(ffffff0009483bb0, ffffff01f18ab940
, 2, 4, fffffffff81e1e40, ffffffffc011e3e0)
ffffff0009483b40 nfssrv`rfs_dispatch+0x2d(ffffff0009483bb0, ffffff01f18ab940)
ffffff0009483c30 rpcmod`svc_getreq+0x20d(ffffff01f18ab940, ffffff01d63f2f60)
ffffff0009483ca0 rpcmod`svc_run+0x197(ffffff01e4a0bbb8)
ffffff0009483cd0 rpcmod`svc_do_run+0x81(1)
ffffff0009484eb0 nfs`nfssys+0x9f1(e, fe580fc0)
ffffff0009484f00 sys_syscall32+0x1fc()

The rfs4_dbe_t is at ffffff01e678fad8:

[1]> ffffff01e678fad8::print struct rfs4_dbe
{
    lock = [
        {
            _opaque = [ 0xffffff01d16dee80 ]
        }
    ]
    refcnt = 0x1
    skipsearch = 0x1
    invalid = 0x1
    reserved = 0x3addcafe
    time_rele = 2009 May  5 16:17:41
    inval_hint = nfssrv`mds_session_inval+0x9e
    id = 0x64
    cv = [
        {
            _opaque = 0
        }
    ]
    data = 0xffffff01e678fb68
    rtr = {
        rtr_count = 0x3
        rtr_rele_idx = 0x1
        rtr_hold_idx = 0x2
        rtr_rele = 0xffffff01e7a9c1b0 
        rtr_hold = 0xffffff01e7a9c180 
    }
    table = 0xffffff01d9494700
    indices = [
        {
            next = 0
            prev = 0
            entry = 0xffffff01e678fad8
        }
    ]
}

The rtr structure holds the refcnt tracking data. We are tracking 3 items per buffer and we can see the last ones recorded would have been rele=0 and hold=1. (I.e., we are going to write at rele=1 and hold=2, so we subtract 1 to get the previous record.)

I can use a dcmd to see each array:

[1]> 0xffffff01e7a9c1b0::array caddr_t 3 |::print caddr_t
nfssrv`mds_destroysession+0x3e
nfssrv`rfs41_compound_state_free+0x8a
nfssrv`mds_session_inval+0xa7
[1]> 0xffffff01e7a9c180::array caddr_t 3 |::print caddr_t
nfssrv`rfs4_dbsearch+0x154
nfssrv`rfs4_dbsearch+0x154
nfssrv`rfs4_dbsearch+0x154

I need to script this up to run from the command line, but for now this will suffice. Hmm, the rfs4_dbsearch information isn't too useful. It would be better to have recorded the caller of it. I have that capability, I'll just need to make sure to call it in rfs4_dbsearch.

Okay, the buffer size is too small here to help. But I could bump it up and rerun the tests. I'm pretty confident that it will trip again. But, I also happen to know I have made changes here and I can go revisit those changes to see where I made a mistake.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/debugging_a_nfsv4_1_server
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed