« July 2009
SunMonTueWedThuFriSat
   
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20081020 Monday October 20, 2008
snoop doesn't want to decode the DS_EXIBIargs

I'm getting a signal when I use my snoop on a ds to mds packet trace:

[th199096@jhereg snoop]> ./snoop -V -i ~/ds2tmds.snoop > xxx
WARNING: received signal 11 from packet 4

And packet 4 is:

  4   0.00596    pnfs-9-25 -> pnfs-9-26.Central.Sun.COM CTL-DS C DS_EXIBI

I've gone from handcoding the XDR to generating it automatically, and both had this error. Time to see what is going on. I generated the packet trace with '-x0,2000' so I get to see the output:

           0: 001b 242d e629 001b 242d e641 0800 4500    ..$-.)..$-.A..E.
          16: 00a8 d3dd 4000 4006 0000 0a01 e943 0a01    ....@.@.....�C..
          32: e944 03fc 0801 1f9e 8f80 17cc 1d01 5018    �D............P.
          48: c1e8 0000 0000 8000 007c d058 4d4c 0000    .�.......|.XML..
          64: 0000 0000 0002 0001 9641 0000 0001 0000    .........A......
          80: 0002 0000 0001 0000 0020 48f4 fa0b 0000    ......... H.....
          96: 0009 706e 6673 2d39 2d32 3500 0000 0000    ..pnfs-9-25.....
         112: 0000 0000 0000 0000 0000 0000 0000 0000    ................
         128: 0000 ffff ff02 eef0 3b00 0000 0027 706e    ........;....'pn
         144: 6673 2d39 2d32 353a 2073 7663 3a2f 6e65    fs-9-25: svc:/ne
         160: 7477 6f72 6b2f 6473 6572 763a 6465 6661    twork/dserv:defa
         176: 756c 743a 0000                             ult:..

I'm going to go back and forth in the code to look at this. First, I've tracked down where the FMRI is appearing:

        case NFS4_SETPORT:
                uaddr = get_uaddr(nconf, addr);
                if (uaddr == NULL) {
                        dserv_log(do_all_handle, LOG_INFO,
                            gettext("NFS4_SETPORT: get_uaddr failed"));
                        return (1);
                }
                (void) strlcpy(setportargs.dsa_uaddr, uaddr,
                    sizeof (setportargs.dsa_uaddr));
                (void) strlcpy(setportargs.dsa_proto, nconf->nc_proto,
                    sizeof (setportargs.dsa_proto));
                (void) strlcpy(setportargs.dsa_name, getenv("SMF_FMRI"),
                    sizeof (setportargs.dsa_name));

This is in usr/src/cmd/dserv/dservd/tbind_sup.c and is a dservd call into the kernel. It ends up in usr/src/uts/common/dserv/dserv_mds.c: (minus some unpacking)

int
dserv_mds_addport(const char *uaddr, const char *proto, const char *aname)
{
...
        (void) sprintf(in, "%s: %s:", uts_nodename(), aname);

        inst->dmi_name = dserv_strdup(in);
        bzero(&res, sizeof (res));

        args.ds_ident.boot_verifier = inst->dmi_verifier;
        args.ds_ident.instance.instance_len = strlen(inst->dmi_name) + 1;
        args.ds_ident.instance.instance_val = inst->dmi_name;

The defaults are also set:

dserv_mds_instance_init(dserv_mds_instance_t *inst)
{
        inst->dmi_ds_id = 0;
        inst->dmi_mds_addr = NULL;
        inst->dmi_mds_netid = NULL;
        inst->dmi_verifier = (uintptr_t)curthread;
        inst->dmi_teardown_in_progress = B_FALSE;
}

So, if we knew the curthread, we could spot check to see that this went across okay. We also need to know if this has to be unique or not. If so, could we get a dup here?

So how does this data go across the wire? We need to look in the XDR (usr/src/head/rpcsvc/ds_prot.x):

struct identity {
        ds_verifier     boot_verifier;
        opaque          instance;
};

/*
 * DS_EXIBI - Exchange Identity and Boot Instance
 *
 *  ds_ident  : An identiifier that the MDS can use to distinguish
 *             between data-server instances.
 */
struct DS_EXIBIargs {
        identity        ds_ident;
};

So we see the boot_verifier followed by the instance. BTW: MAXPATHLEN might be too small here as we add the nodename.

And an opaque is a length and an array. Hmm, the hand-coded usr/src/cmd/cmd-inet/usr.sbin/snoop/nfs4_xdr.c calls xdr_opaque, while the machine generated code does:

?
bool_t
xdr_identity(XDR *xdrs, identity *objp)
{

        rpc_inline_t *buf;

        if (!xdr_ds_verifier(xdrs, &objp->boot_verifier))
                return (FALSE);

        if (!xdr_bytes(xdrs, (char **)&objp->instance.instance_val,
            (u_int *) &objp->instance.instance_len, MAXPATHLEN)
                return (FALSE);
        return (TRUE);
}

And that makes a difference:

[th199096@jhereg snoop]> ./snoop -v -i ~/ds2tmds.snoop > xxx
[th199096@jhereg snoop]> 

But wait, we don't see the signal, but we do see:

CTL-DS:  ----- Sun CTL-DS -----
CTL-DS:
CTL-DS:  Proc = 2 (Exchange Identity and Boot Instance)
CTL-DS:  ----  short frame ---

And a debug statement shows that the length looks off:

CTL-DS:  ----- Sun CTL-DS -----
CTL-DS:
CTL-DS:  Proc = 2 (Exchange Identity and Boot Instance)
CTL-DS:  xdr_identity bombed, len = 0
CTL-DS:  ----  short frame ---

Hmm, I manually set the length before the call to xdr_opaque. So back to the raw data. We know right before the nodename, we should find the length.

Hmm, my allergies are killing my thought process. A signal 11 is SIGSEGV.

I'm back after a night's rest. I recompiled snoop with gcc and I think I've found the problem after staring at it in gdb:


127             switch (xdrs->x_op) {
128             case XDR_DECODE:
129                     if (nodesize == 0)
130                             return (TRUE);
131                     if (sp == NULL)
(gdb) 
132                             *cpp = sp = (char *)mem_alloc(nodesize);
133                     /* FALLTHROUGH */
134     
135             case XDR_ENCODE:
136                     sprintf(get_line(0, 0), "tdh_xdr_bytes calling xdr_opaque with %d!", nodesize);
137                     return (xdr_opaque(xdrs, sp, nodesize));
138     
139             case XDR_FREE:
140                     if (sp != NULL) {
141                             mem_free(sp, nodesize);
(gdb) p sp
$9 = 0x80c74a6 "\203�\020\203}\020"
(gdb) 

We need to be allocating memory here. But whatever sp is pointing to is junk:

bool_t
xdr_identity(XDR *xdrs, identity *objp)
{

        rpc_inline_t *buf;

        if (!xdr_ds_verifier(xdrs, &objp->boot_verifier)) {
                sprintf(get_line(0, 0), "xdr_identity bombed for verifier = %d", objp->boot_verifier);
                return (FALSE);
        }
        sprintf(get_line(0, 0), "xdr_identity okay for verifier = %lx", objp->boot_verifier);
        if (!tdh_xdr_bytes(xdrs, (char **)&objp->instance.instance_val,
           (u_int *) &objp->instance.instance_len, MAXPATHLEN)) {
                sprintf(get_line(0, 0), "xdr_identity bombed, len = %d", objp->instance.instance_len);
                return (FALSE);
        }
        return (TRUE);
}

And we can see I am just grabbing it off the stack:


static void
ds_exibi_sa(char *line)
{
        DS_EXIBIargs    eargs;

        if (!xdr_DS_EXIBIargs(&xdrm, &eargs))
                longjmp(xdr_err, 1);
        sprintf(line, "V = %d I = (%.20s)", eargs.ds_ident.boot_verifier,
            utf8localize((utf8string *)&eargs.ds_ident.instance));

        xdr_free(xdr_DS_EXIBIargs, (char *)&eargs);
}

A quick memset and retest:

[th199096@jhereg snoop]> ./snoop -v -i ~/ds2mds2.snoop > zzz
WARNING: received signal 11 from packet 4
[th199096@jhereg snoop]> ./snoop -v -i ~/ds2mds2.snoop > zzz
[th199096@jhereg snoop]> 

And we can see the difference!


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/snoop_doesn_t_want_to
Comments:

A Classic XDR issue for first time programmers.. err but wait.. heh :-P

Posted by biteme@xdr.com on October 25, 2008 at 08:38 AM CDT #

> A Classic XDR issue for first time programmers.. err but wait.. heh :-P

You miss the point as to why I blog. This was what, a minute of my day? Why bother blogging about it at all?

The reason I blog is to help "first time programmers" with issues like this. I get *a lot* of thank you email / comments from people hitting the same problems I blog about.

Posted by Thomas Haynes on October 29, 2008 at 10:33 AM CDT #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed