Looks like my new code is not complete:
> ::findleaks
CACHE LEAKED BUFCTL CALLER
ffffff01c682a860 1 ffffff01d79523d0 dserv_mds_do_reportavail+0x210
ffffff01c68262e0 4 ffffff01ee2a9118 mds_compound+0x54
ffffff01c682b2e0 3 ffffff01e8721738 mds_compound+0x193
ffffff01c6828020 1 ffffff01f274dc00 mds_get_server_impl_id+0x30
ffffff01c68262e0 1 ffffff01e96acb40 mds_get_server_impl_id+0x58
ffffff01c6826b20 1 ffffff01e128de70 mds_get_server_impl_id+0x8a
ffffff01c6828860 1 ffffff01d87c66d8 modinstall+0x129
ffffff01c682b2e0 1 ffffff01ddb51748 modinstall+0x129
ffffff01c6828860 1 ffffff01d7f8f9b0 modinstall+0x129
ffffff01c6828020 1 ffffff01dd7db798 rpc_init_taglist+0x25
ffffff01c6828020 12741 ffffff01f180fdf8 rpc_init_taglist+0x25
ffffff01c6828020 1 ffffff01e3ede5e8 rpc_init_taglist+0x25
ffffff01c6828020 1 ffffff09ba2e4da0 rpc_init_taglist+0x25
ffffff01c6828020 23152 ffffff01e2e37cc8 rpc_init_taglist+0x25
ffffff01c68265a0 1 ffffff01ee74bd30 tohex+0x32
ffffff01c6828020 2 ffffff01d632b880 xdr_array+0xae
ffffff01c6828020 1 ffffff01fe11ec00 xdr_array+0xae
ffffff01c68282e0 1 ffffff01e5e4c2b0 xdr_bytes+0x70
ffffff01c68262e0 1659 ffffff01eaae3700 xdr_bytes+0x70
ffffff01c68285a0 1 ffffff01fe136de0 xdr_bytes+0x70
ffffff01c68262e0 1571800 ffffff01e7dde3b8 xdr_bytes+0x70
------------------------------------------------------------------------
Total 1609375 buffers, 26900440 bytes
> ffffff01e7dde3b8$<bufctl_audit
ADDR BUFADDR TIMESTAMP THREAD
CACHE LASTLOG CONTENTS
ffffff01e7dde3b8 ffffff01ea091bc8 4d6b30e7ab91 ffffff01d8245b40
ffffff01c68262e0 ffffff01c6b37000 ffffff01cc88be60
kmem_cache_alloc_debug+0x283
kmem_cache_alloc+0xa9
kmem_alloc+0xa3
xdr_bytes+0x70
xdr_mds_sid+0x21
xdr_ds_fh_v1+0x68
xdr_ds_fh+0x3f
xdr_decode_nfs41_fh+0xdd
xdr_snfs_argop4+0x5e
xdr_COMPOUND4args_srv+0xf4
svc_authany_wrap+0x22
svc_cots_kgetargs+0x41
dispatch_dserv_nfsv41+0x5d
svc_getreq+0x20d
svc_run+0x197
By the way, those leaks of 1 or 2, those are probably active memory when I forced the core.
So this is the second bug I claimed to have fixed earlier today. Of note is that we never saw a panic, so something at least is correct. And, I decided to fix the rpc_init_taglist bug while I am at it.
I'm going to need to add some DTrace to track down what is happening here...
Aargh! I say, aargh! nfs4_xdr.c belongs to the nfs module and not the nfssrv module. For quick turn around, I've only been rebuilding nfssrv and not the whole kernel. It was only when just changing nfs_xdr.c and trying a dmake in src/uts/intel/nfssrv that I noticed nothing happened. My code may be golden after all! If it compiles that is.
Okay, I did some other changes, but here is my compiling code:
4059 case OP_PUTFH: {
4060 nfs_fh4 *obj = &array[i].nfs_argop4_u.opputfh.object;
4061
4062 if (obj->nfs_fh4_val == NULL)
4063 continue;
4064
4065 DTRACE_NFSV4_1(xdr__i__op_putfh_version, uint32_t,
4066 minorversion);
4067 if (minorversion != 0) {
4068 struct mds_ds_fh *dsfh =
4069 (struct mds_ds_fh *)obj->nfs_fh4_val;
4070
4071 DTRACE_NFSV4_1(xdr__i__op_putfh_type,
4072 nfs41_fh_type_t, dsfh->type);
4073
4074 /*
4075 * Is it really a DS filehandle?
4076 */
4077 if (dsfh->type == FH41_TYPE_DMU_DS) {
4078 mds_sid *sid = &dsfh->fh.v1.mds_sid;
4079
4080 DTRACE_NFSV4_1(xdr__i__op_putfh_sid,
4081 mds_sid *, sid);
4082
4083 if (sid->val) {
4084 kmem_free(sid->val, sid->len);
4085 }
4086 }
4087 }
4088
4089 kmem_free(obj->nfs_fh4_val, obj->nfs_fh4_len);
4090 continue;
4091 }
And I added this simple DTrace script:
[root@pnfs-17-22 ~]> more ds.d
#!/usr/sbin/dtrace -s
nfsv4:::xdr-i-op_putfh_version
{
printf("xdr decode a FH -- version == %u",
(uint32_t)arg0);
}
nfsv4:::xdr-i-op_putfh_type
{
printf("xdr decode a FH -- type == %s",
(int)arg0 == 2 ? "DS" : "regular");
}
nfsv4:::xdr-i-op_putfh_sid
{
sid = (mds_sid *)arg0;
printf("xdr decode a FH -- sid == %s",
sid == NULL ? "(null)" : "valid");
}
Which shows:
[root@pnfs-17-22 ~]> ./ds.d dtrace: script './ds.d' matched 3 probes CPU ID FUNCTION:NAME 0 2834 xdr_snfs_argop4_free:xdr-i-op_putfh_version xdr decode a FH -- version == 1 0 2833 xdr_snfs_argop4_free:xdr-i-op_putfh_type xdr decode a FH -- type == DS 0 2832 xdr_snfs_argop4_free:xdr-i-op_putfh_sid xdr decode a FH -- sid == valid 0 2834 xdr_snfs_argop4_free:xdr-i-op_putfh_version xdr decode a FH -- version == 1 0 2833 xdr_snfs_argop4_free:xdr-i-op_putfh_type xdr decode a FH -- type == DS 0 2832 xdr_snfs_argop4_free:xdr-i-op_putfh_sid xdr decode a FH -- sid == valid
But I still have to check back later to see if there are memory leaks!
I've been trying to show how you would use kmdb and ::findleaks to track down memory leaks. You need to do this with XDR code, even the machine generated stuff. You also need to do it before you integrate and not after. I've fixed two leaks that were pre-existing. They would probably go until either someone had a regression test session flunk because of accumulated memory leaks (the mds_sid leaks would do it) or we sat down to find them before shipping code.
The other thing about memory leaks is that you have to test after you fix them, you might find more, find out your fix didn't work, or find out your fix uncovered others.
And perhaps it is time to remind you of my other disclaimer, I don't hide my braindead mistakes. I show them in hopes that someone can learn from them - even if it is just me. :->