I finally have the XDR flowing and my debug code barfed:
kl -- 10, 16, 64000, 35865356092523570 spe_print_leaf: Unknown type -17958194 == spe_print_leaf: Unknown type -456718248 kl -- 20, 32, 2000, 35865356092523570 spe_print_leaf: Unknown type -17958194 == spe_print_leaf: Unknown type -696433200 kl -- 30, 64, 1000, 35865356092523570 spe_print_leaf: Unknown type -17958194 == spe_print_leaf: Unknown type -696433160 kl -- 35, 2, 2000, 35865356092523570 (spe_print_leaf: Unknown type -17958194 panic[cpu0]/thread=ffffff01d717ee20: BAD TRAP: type=e (#pf Page fault) rp=ffffff00084fa5d0 addr=fffffffd97030238
The output is pretty interesting, everything is good until the attribute expression and it even kinda looks in the right shape.
After a reboot and turning off the dump of the policies, mdb shows us some interesting things:
> Spe_HQ::print
{
sc_rwlock = {
_opaque = [ 0 ]
}
sc_policies = 0xffffff01d92f9780
}
> Spe_HQ::print struct spe_control sc_policies | ::print struct spe_policy
{
sp_id = 0xa
sp_stripe_count = 0x10
sp_interlace = 0xfa00
sp_attr_expr = 0xffffff01e5a859d0
sp_name = 0
sp_guuids = [ 0x7f6b59f1a34432, 0, 0, 0, 0, 0, 0, 0 ]
next = 0xffffff01d4a81008
}
> 0xffffff01e5a859d0::print struct spe_interior
{
si_op = 3 (SPE_OP_EQUAL)
si_parens = 0
si_children = 0x2
si_branches = 0xffffff01e80bd2b0
}
> 0xffffff01e80bd2b0::print spe_thunk_t
{
st_is_interior = 0
st_node = 0xffffff01e4b059c0
}
The spe_interior looks good. What started to confuse me was that I knew si_branches was an array and I wanted to look at the 2nd element.
I checked the header:
typedef struct spe_interior {
spe_operators_t si_op;
bool_t si_parens;
uint_t si_children;
spe_thunk_t *si_branches;
} spe_interior_t;
And I checked the XDR:
bool_t
xdr_spe_interior_t(XDR *xdrs, spe_interior_t *objp)
{
if (!xdr_spe_operators_t(xdrs, &objp->si_op))
return (FALSE);
if (!xdr_bool(xdrs, &objp->si_parens))
return (FALSE);
if (!xdr_uint_t(xdrs, &objp->si_children))
return (FALSE);
if (!xdr_pointer(xdrs, (char **)&objp->si_branches,
sizeof (spe_thunk_t), (xdrproc_t)xdr_spe_thunk_t))
return (FALSE);
return (TRUE);
}
And the fact that I went from C to XDR and back to C is biting me here. si_branches should probably be an xdr_vector(). The userland XDR code is pulling apart the data the same way I tried to in the debugger.
At first blush, I want to do this:
if (!xdr_vector(xdrs, (char *)objp->si_branches,
SPED_MAX_BRANCHES, sizeof (spe_thunk_t),
(xdrproc_t)xdr_spe_thunk_t))
return (FALSE);
But that won't work. si_branches is allocated and not a real array. I can make it an array, but that will waste space. And actually, I'm going to want to know how to handle the pool ids...
I've tried this:
if (xdrs->x_op == XDR_DECODE) {
objp->si_branches = (spe_thunk_t *)
kmem_zalloc(sizeof (spe_thunk_t),
KM_SLEEP);
if (!xdr_vector(xdrs,
(char *)objp->si_branches,
objp->si_children,
sizeof (spe_thunk_t),
(xdrproc_t)xdr_spe_thunk_t))
return (FALSE);
} else if (xdrs->x_op == XDR_ENCODE) {
if (!xdr_vector(xdrs,
(char *)objp->si_branches,
objp->si_children,
sizeof (spe_thunk_t),
(xdrproc_t)xdr_spe_thunk_t))
return (FALSE);
} else {
spe_thunk_t *st = objp->si_branches;
if (!xdr_vector(xdrs,
(char *)objp->si_branches,
objp->si_children,
sizeof (spe_thunk_t),
(xdrproc_t)xdr_spe_thunk_t))
return (FALSE);
kmem_free(st, sizeof (spe_thunk_t));
}
But that cores in the kernel during decode:
...
trap+0x160f(ffffff00087e0490, 0, 1)
0xfffffffffb8002c0()
xdr_bool+0x5a(ffffff00087e0810, 0)
nfs`xdr_spe_thunk_t+0x25(ffffff00087e0810, 0)
nfs`xdr_vector+0x4f(ffffff00087e0810, 0, 3, 10, fffffffff804da50)
nfs`xdr_spe_interior_t+0x79(ffffff00087e0810, ffffff01e635a938)
...
[1]> ffffff01e635a938::print struct spe_interior
{
si_op = 3 (SPE_OP_EQUAL)
si_parens = 0
si_children = 0x2
si_branches = 0
}
I was leery about the xdr_vector calls, but at least in the kernel, I think it is correct so far.
Looking at the userland code without calling the kernel, we see a 40 byte jump in the size of the XDR buffer:
ul - 16 bytes big -- XDR buffer is 620 big!
I'm going to go with the fixed array sizes to at least get something working that I can debug...
And, that looks like I was getting at the start. Hmm, a spe_thunk_t can either be a spe_leaf_t or a spe_interior_t. And I bet we are decoding it incorrectly!
I've fixed that and I guess I'm running into newer issues. More later, maybe!
I don't like kernel debugging in Solaris. Look at the following example:
> Spe_HQ::print
{
sc_rwlock = {
_opaque = [ 0 ]
}
sc_policies = 0xffffff01ee346880
}
> Spe_HQ.sc_policies::print
mdb: failed to dereference symbol: unknown symbol name
First of all, ::print is cumbersome. And to get at what I want to see:
> Spe_HQ::print struct spe_control sc_policies | ::print struct spe_policy
{
sp_id = 0
sp_stripe_count = 0
sp_interlace = 0
sp_attr_expr = 0
sp_name = 0
sp_guuids = [ 0, 0, 0, 0, 0, 0, 0, 0 ]
next = 0
}