« August 2009 »
SunMonTueWedThuFriSat
      
3
4
5
6
7
8
9
11
13
14
15
17
18
19
20
21
22
23
24
25
26
28
29
31
     
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20090801 Saturday August 01, 2009
memleaks plugged!

The memleaks on the DSes are plugged:

> ::findleaks
CACHE             LEAKED           BUFCTL CALLER
ffffff01d459d5a0       1 ffffff01eae60088 cralloc_flags+0x21
ffffff01c682a860       1 ffffff01d75d65e8 dserv_mds_do_reportavail+0x210
ffffff01c68262e0       4 ffffff01f3c3f1c0 mds_compound+0x54
ffffff01c682b2e0       3 ffffff01ecc0e6e8 mds_compound+0x193
ffffff01c6828020       2 ffffff02032b4d30 mds_get_server_impl_id+0x30
ffffff01c68262e0       2 ffffff01f3c3f0e8 mds_get_server_impl_id+0x58
ffffff01c6826b20       2 ffffff01dfb46cc0 mds_get_server_impl_id+0x8a
ffffff01c6828860       1 ffffff01d7ba0740 modinstall+0x129
ffffff01c682b2e0       1 ffffff01ddfd1b60 modinstall+0x129
ffffff01c6828860       1 ffffff01d7ba0c50 modinstall+0x129
ffffff01c68265a0       2 ffffff01e159f890 tohex+0x32
ffffff01c6828020       2 ffffff01e3e6e398 xdr_array+0xae
ffffff01c68282e0       1 ffffff0201571818 xdr_bytes+0x70
ffffff01c68285a0       1 ffffff02590b2538 xdr_bytes+0x70
------------------------------------------------------------------------
           Total      24 buffers, 2356 bytes

And on the MDS, we see the same:

> ::findleaks
CACHE             LEAKED           BUFCTL CALLER
ffffff01c68262e0       4 ffffff01d6e497b8 mds_compound+0x54
ffffff01c682b2e0       3 ffffff01d69cdcb0 mds_compound+0x193
ffffff01c6828020       1 ffffff0202d71db8 mds_get_server_impl_id+0x30
ffffff01c6828020       1 ffffff03ea11d4c0 mds_get_server_impl_id+0x30
ffffff01c68262e0       2 ffffff01ebc1a008 mds_get_server_impl_id+0x58
ffffff01c6826b20       2 ffffff0202808588 mds_get_server_impl_id+0x8a
ffffff01c68265a0       2 ffffff01f92d2458 tohex+0x32
ffffff01c6828020       2 ffffff0202d71c08 xdr_array+0xae
ffffff01c68285a0       1 ffffff07c864e128 xdr_bytes+0x70
ffffff01c68282e0       1 ffffff02027ff390 xdr_bytes+0x70
------------------------------------------------------------------------
           Total      19 buffers, 1496 bytes

Besides that nasty interaction on the DS XDR code, we can see that I've fixed the rpc_init_taglist() leak on both systems. BTW - I thought that one may have been in Nevada, but I checked and it is only in the nfs41-gate. Sweet, that means I don't have to backport it into a gate which would cause another week of testing.

The code is ready to go, I have a code walk through next week, where I need to iron out the following issues:

  1. Do we leave in the print routines for spe debugging in the kernel?
    • Small footprint
    • But not needed
      • Could use dtrace to debug now that bulk of unit testing is done.
      • No need to print from kernel, may eventually want to write a file.
      • All printing can be done with sped.
  2. Is sped a loader or a daemon right now?
    • Only loads policies and nppols
    • Would have to re-run to change them
    • Not SMF-ized
  3. Biggest issue - is kspe part of nfssrv or nfs?
    • Put in nfssrv module because of need to get at mds_sids.
    • Not sure if client-side spe will ever work, because of issue with getting path from vnode.
    • Want to do this right, the first integration!

  4. Originally posted on Kool Aid Served Daily
    Copyright (C) 2009, Kool Aid Served Daily
One of those memory leaks is still there on the DS

Looks like my new code is not complete:


> ::findleaks
CACHE             LEAKED           BUFCTL CALLER
ffffff01c682a860       1 ffffff01d79523d0 dserv_mds_do_reportavail+0x210
ffffff01c68262e0       4 ffffff01ee2a9118 mds_compound+0x54
ffffff01c682b2e0       3 ffffff01e8721738 mds_compound+0x193
ffffff01c6828020       1 ffffff01f274dc00 mds_get_server_impl_id+0x30
ffffff01c68262e0       1 ffffff01e96acb40 mds_get_server_impl_id+0x58
ffffff01c6826b20       1 ffffff01e128de70 mds_get_server_impl_id+0x8a
ffffff01c6828860       1 ffffff01d87c66d8 modinstall+0x129
ffffff01c682b2e0       1 ffffff01ddb51748 modinstall+0x129
ffffff01c6828860       1 ffffff01d7f8f9b0 modinstall+0x129
ffffff01c6828020       1 ffffff01dd7db798 rpc_init_taglist+0x25
ffffff01c6828020   12741 ffffff01f180fdf8 rpc_init_taglist+0x25
ffffff01c6828020       1 ffffff01e3ede5e8 rpc_init_taglist+0x25
ffffff01c6828020       1 ffffff09ba2e4da0 rpc_init_taglist+0x25
ffffff01c6828020   23152 ffffff01e2e37cc8 rpc_init_taglist+0x25
ffffff01c68265a0       1 ffffff01ee74bd30 tohex+0x32
ffffff01c6828020       2 ffffff01d632b880 xdr_array+0xae
ffffff01c6828020       1 ffffff01fe11ec00 xdr_array+0xae
ffffff01c68282e0       1 ffffff01e5e4c2b0 xdr_bytes+0x70
ffffff01c68262e0    1659 ffffff01eaae3700 xdr_bytes+0x70
ffffff01c68285a0       1 ffffff01fe136de0 xdr_bytes+0x70
ffffff01c68262e0 1571800 ffffff01e7dde3b8 xdr_bytes+0x70
------------------------------------------------------------------------
           Total 1609375 buffers, 26900440 bytes
> ffffff01e7dde3b8$<bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
ffffff01e7dde3b8 ffffff01ea091bc8     4d6b30e7ab91 ffffff01d8245b40
                 ffffff01c68262e0 ffffff01c6b37000 ffffff01cc88be60
                 kmem_cache_alloc_debug+0x283
                 kmem_cache_alloc+0xa9
                 kmem_alloc+0xa3
                 xdr_bytes+0x70
                 xdr_mds_sid+0x21
                 xdr_ds_fh_v1+0x68
                 xdr_ds_fh+0x3f
                 xdr_decode_nfs41_fh+0xdd
                 xdr_snfs_argop4+0x5e
                 xdr_COMPOUND4args_srv+0xf4
                 svc_authany_wrap+0x22
                 svc_cots_kgetargs+0x41
                 dispatch_dserv_nfsv41+0x5d
                 svc_getreq+0x20d
                 svc_run+0x197

By the way, those leaks of 1 or 2, those are probably active memory when I forced the core.

So this is the second bug I claimed to have fixed earlier today. Of note is that we never saw a panic, so something at least is correct. And, I decided to fix the rpc_init_taglist bug while I am at it.

I'm going to need to add some DTrace to track down what is happening here...

Aargh! I say, aargh! nfs4_xdr.c belongs to the nfs module and not the nfssrv module. For quick turn around, I've only been rebuilding nfssrv and not the whole kernel. It was only when just changing nfs_xdr.c and trying a dmake in src/uts/intel/nfssrv that I noticed nothing happened. My code may be golden after all! If it compiles that is.

Okay, I did some other changes, but here is my compiling code:

4059                 case OP_PUTFH: {
4060                         nfs_fh4 *obj = &array[i].nfs_argop4_u.opputfh.object;
4061 
4062                         if (obj->nfs_fh4_val == NULL)
4063                                 continue;
4064 
4065                         DTRACE_NFSV4_1(xdr__i__op_putfh_version, uint32_t,
4066                             minorversion);
4067                         if (minorversion != 0) {
4068                                 struct mds_ds_fh        *dsfh =
4069                                     (struct mds_ds_fh *)obj->nfs_fh4_val;
4070 
4071                                 DTRACE_NFSV4_1(xdr__i__op_putfh_type,
4072                                     nfs41_fh_type_t, dsfh->type);
4073 
4074                                 /*
4075                                  * Is it really a DS filehandle?
4076                                  */
4077                                 if (dsfh->type == FH41_TYPE_DMU_DS) {
4078                                         mds_sid *sid = &dsfh->fh.v1.mds_sid;
4079 
4080                                         DTRACE_NFSV4_1(xdr__i__op_putfh_sid,
4081                                             mds_sid *, sid);
4082 
4083                                         if (sid->val) {
4084                                                 kmem_free(sid->val, sid->len);
4085                                         }
4086                                 }
4087                         }
4088 
4089                         kmem_free(obj->nfs_fh4_val, obj->nfs_fh4_len);
4090                         continue;
4091                 }

And I added this simple DTrace script:

[root@pnfs-17-22 ~]> more ds.d 
#!/usr/sbin/dtrace -s

nfsv4:::xdr-i-op_putfh_version
{
        printf("xdr decode a FH -- version == %u",
            (uint32_t)arg0);
}

nfsv4:::xdr-i-op_putfh_type
{
        printf("xdr decode a FH -- type == %s",
            (int)arg0 == 2 ? "DS" : "regular");
}

nfsv4:::xdr-i-op_putfh_sid
{
        sid = (mds_sid *)arg0;

        printf("xdr decode a FH -- sid == %s",
            sid == NULL ? "(null)" : "valid");
}

Which shows:

[root@pnfs-17-22 ~]> ./ds.d
dtrace: script './ds.d' matched 3 probes
CPU     ID                    FUNCTION:NAME
  0   2834 xdr_snfs_argop4_free:xdr-i-op_putfh_version xdr decode a FH -- version == 1
  0   2833 xdr_snfs_argop4_free:xdr-i-op_putfh_type xdr decode a FH -- type == DS
  0   2832 xdr_snfs_argop4_free:xdr-i-op_putfh_sid xdr decode a FH -- sid == valid
  0   2834 xdr_snfs_argop4_free:xdr-i-op_putfh_version xdr decode a FH -- version == 1
  0   2833 xdr_snfs_argop4_free:xdr-i-op_putfh_type xdr decode a FH -- type == DS
  0   2832 xdr_snfs_argop4_free:xdr-i-op_putfh_sid xdr decode a FH -- sid == valid

But I still have to check back later to see if there are memory leaks!

I've been trying to show how you would use kmdb and ::findleaks to track down memory leaks. You need to do this with XDR code, even the machine generated stuff. You also need to do it before you integrate and not after. I've fixed two leaks that were pre-existing. They would probably go until either someone had a regression test session flunk because of accumulated memory leaks (the mds_sid leaks would do it) or we sat down to find them before shipping code.

The other thing about memory leaks is that you have to test after you fix them, you might find more, find out your fix didn't work, or find out your fix uncovered others.

And perhaps it is time to remind you of my other disclaimer, I don't hide my braindead mistakes. I show them in hopes that someone can learn from them - even if it is just me. :->


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily