« December 2009
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20090728 Tuesday July 28, 2009
Tracking down a nasty refcnt bug with rfs4_file_t, or am I?

I'm interested in that layout and mpd entries get harvested as they become stale. Because of the prior assumption that there was only ever 1 of each, it never was an issue.

If we recall our mappings, each mpd can belong to several layouts, each layout can belong to several files, etc. So when I discovered that mpds were sticking in memory with a refcnt of 2, I checked the layouts, saw that they had a refcnt of 2, and then checked the files:

[root@pnfs-17-24 ~]> more rlays.sh 
#!/bin/sh -x
echo '::walk mds_Layout_entry_cache|::print  struct rfs4_dbe' | mdb -k
[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2

Now, some of these would go away, but in the end, several would stick in memory. I actually found 2 nasty rfs4_file_t rele bugs and fixed them, but the issue stayed.

I thought my last fix did the trick, I turned my attention away for a bit as I worked on my MacBook Pro, and the file references went away. I rebooted and tried again. No such luck, that was when I got the above.

I unmounted the mds from my two clients and went away to study the code, trying to find out how an mds_op_close() did a file rele which would lead to the layout rele.

I was going to reboot, without any luck, and try the case where I created the files and did nothing else - the remove code is suspect to me still. Well, before I did that, I checked the file refcnts:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k+  
echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1

My guess is that somehow the clients are causing state to be clung to on the server. It is only on the unmount that a final rele takes place. It has been some time since I did that last check:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k + echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 

[root@pnfs-17-24 ~]> ./rlays.sh 
+ mdb -k 
+ echo ::walk mds_Layout_entry_cache|::print  struct rfs4_dbe 
[root@pnfs-17-24 ~]> ./rmpd.sh 
+ mdb -k 
+ echo ::walk mds_mpd_entry_cache|::print  struct rfs4_dbe  
{
    lock = [
        {
            _opaque = [ 0 ]
        },
    ]
    refcnt = 0x1

Whoops, I expected the mpds to be out of memory as well and after a bit of patience, they are. :->

[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_mpd_entry_cache|::print  struct rfs4_dbe  

A quick test of creating 4 files sees:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x5
    refcnt = 0x5
    refcnt = 0x5
    refcnt = 0x5

If we then unmount, the snoop is mainly the following:

[root@pnfs-4-02 ~]> grep RETURN\) xxx
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)

And indeed, we see the refcnts get decremented by two for each file:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x3
    refcnt = 0x3
    refcnt = 0x3
    refcnt = 0x3

And after a while:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 

I'm actually happy that this strange behavior occurred. I wouldn't have taken the time to painstakingly poured over every file rele and I wouldn't have thus found the two bugs. The first was in error handling in the remove case and the second was in file delegation recalls. The first was probably hit or miss to being found. The second was a timebomb.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/tracking_down_a_nasty_refcnt
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed