One of the things I've been doing is spamming the MDS with file creates and deletes. I've been doing it for a day and now I want to see if the new code is leaking memory:
> ::findleaks
CACHE LEAKED BUFCTL CALLER
ffffff01c68262e0 1 ffffff0203ff9e70 mds_compound+0x54
ffffff01c68262e0 3 ffffff01eb07ada0 mds_compound+0x54
ffffff01c682b2e0 3 ffffff01de7f1818 mds_compound+0x193
ffffff01c6828020 1 ffffff020136ba50 mds_get_server_impl_id+0x30
ffffff01c68262e0 1 ffffff0203ff9a38 mds_get_server_impl_id+0x58
ffffff01c6826b20 1 ffffff01eb624ca8 mds_get_server_impl_id+0x8a
ffffff01c6828860 1 ffffff01e3fa6898 modinstall+0x129
ffffff01c6828860 1 ffffff01d82cd960 modinstall+0x129
ffffff01c6828860 1 ffffff01d61493a0 modinstall+0x129
ffffff01c682b2e0 1 ffffff02f1f72da8 modinstall+0x129
ffffff01c6828860 1 ffffff01e3fa6970 modinstall+0x129
ffffff01c682b2e0 1 ffffff01de7f13e0 modinstall+0x129
ffffff01c6828020 1 ffffff01fed32cd8 rpc_init_taglist+0x25
ffffff01c6828020 2 ffffff01e6bb68c0 rpc_init_taglist+0x25
ffffff01c6828020 1 ffffff020136bdb0 rpc_init_taglist+0x25
ffffff01c6828020 1311 ffffff01d742e4a0 rpc_init_taglist+0x25
ffffff01c68265a0 1 ffffff01f6a1ecb0 tohex+0x32
ffffff01c6828020 2 ffffff020136b390 xdr_array+0xae
ffffff01c68285a0 1 ffffff01d67cc068 xdr_bytes+0x70
ffffff01c68262e0 1154050 ffffff01eb07c7c8 xdr_bytes+0x70
ffffff01c68282e0 1 ffffff0200f380e0 xdr_bytes+0x70
ffffff01c68262e0 3979053 ffffff01f9f569b0 xdr_bytes+0x70
------------------------------------------------------------------------
Total 5134439 buffers, 82195096 bytes
I knew about the rpc_init_taglist() from before my changes, so that leaves the xdr_bytes, and yes, those are leaking:
> ffffff01f9f569b0$<bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffff01f9f569b0 ffffff01fa04fc68 2a7a9936fecc ffffff01d84574e0
ffffff01c68262e0 ffffff01c70614c0 0
kmem_cache_alloc_debug+0x283
kmem_cache_alloc+0x164
kmem_alloc+0xa3
xdr_bytes+0x70
xdr_mds_sid+0x21
xdr_ds_fh_v1+0x68
xdr_ds_fh+0x3f
xdr_ds_fh_fmt+0x3b
get_mds_ds_fh+0x46
ds_checkstate+0x3f
nfs_ds_cp_dispatch+0x9e
svc_getreq+0x20d
svc_run+0x197
svc_do_run+0x81
nfssys+0xa0e
> ffffff01f9f569b0$<bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffff01f9f569b0 ffffff01fa04fc68 2a7a9936fecc ffffff01d84574e0
ffffff01c68262e0 ffffff01c70614c0 0
kmem_cache_alloc_debug+0x283
kmem_cache_alloc+0x164
kmem_alloc+0xa3
xdr_bytes+0x70
xdr_mds_sid+0x21
xdr_ds_fh_v1+0x68
xdr_ds_fh+0x3f
xdr_ds_fh_fmt+0x3b
get_mds_ds_fh+0x46
ds_checkstate+0x3f
nfs_ds_cp_dispatch+0x9e
svc_getreq+0x20d
svc_run+0x197
svc_do_run+0x81
nfssys+0xa0e
BTW: I had enabled detailed accounting earlier with setting in /etc/system:
set kmem_flags=0xf
I think what is going on is that I assumed the space in the ds_filehandle was fixed and that there was no allocation going on for the mds_sid when it was xdr decoded. There are several complications to this:
Okay, from code inspection, the DS would have the same issue. I created a free routine and went through and applied it everywhere. Now when we add a new field with a memory allocation, we can make a change in only 1 place to clean it up.
Also, the leak was happening before - we just never bothered to check for it or track it down.