« June 2008
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Tom Haynes

loghyr.com
excfb.com

Blogs to Gander At

Navigation

Editing

AllMarks

Referers

Today's Page Hits: 3364

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

www.alesti.org

Add to Alesti RSS Reader

South Park as I was 10 years ago

South Park Fantasy

South Park today

South Park Reality

I have more hair and it isn't so grey. :->

10 years ago, really

Toon Tom

Today, literally

Tom Today

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.

20080403 Thursday April 03, 2008
Putting that throw away code to good use

I'm going to narrate a massive debug session I just had, in which I used my XDR dump routine to flush out bugs. I'm not going to give all of the code, and my approach is very free form. But that is very much like a massive debug session. Enjoy or not!

Some my upcall from the kernel to the sped daemon is working fine, but the return path is not. My 64 bit gpool-id is off:

# Apr  3 16:13:29 pnfs-3-15 ./sped[100781]: spedit: |35865356092523570| gives |35865356092523570| and |35865356092523570|
10, 16, 64000, 35865356092523570 uid == 200096
20, 32, 2000, 35865356092523571 uid == 1066
30, 64, 1000, 35865356092523572 uid == 0
40, 2, 2000, 35865356092523573 subnet ==  10.1.233.0/24
Apr  3 16:13:29 pnfs-3-15 ./sped[100781]: spedit: |35865356092523571| gives |35865356092523571| and |35865356092523571|
Apr  3 16:13:29 pnfs-3-15 ./sped[100781]: spedit: |35865356092523572| gives |35865356092523572| and |35865356092523572|
Apr  3 16:13:29 pnfs-3-15 ./sped[100781]: spedit: |35865356092523573| gives |35865356092523573| and |35865356092523573|
We got a policy 64, -17890057, 18444853557553841283 for /nippy/bogey4
Apr  3 16:13:34 pnfs-3-15 nfs: WARNING: sped: 2 0 2 200096 10 33554432 /nippy/bogey4 0 0
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:    7:   0  0  0  2  0  0  0  0
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:   15:   0  0  0  2  0  3  D A0
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:   23:   0  0  0  A  2  0  0  0
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:   31:   0  0  0  D 2F 6E 69 70
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:   39:  70 79 2F 62 6F 67 65 79
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:   47:  34  0  0  0  0  0  0  0
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:   52:   0  0  0  0
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]: We got a policy 64, -17890057, 18444853557553841283 for /nippy/bogey4
Apr  3 16:13:34 pnfs-3-15 ./sped[100781]:    4:   0  0  0  5

We can see it change from 35865356092523570 to 18444853557553841283. Hmm, it used to have one more change. Anyway, the value of -17890057 looks off. It should be one of 1000, 2000, or 64000. And perhaps the 64 is off as well.

Anyway, we can see that the input XDR looks okay. I've even added a byte counter for it.

And that lets us see that the last line must be the reply. The counter has started again and 4 bytes as a reply looks too small. Oh crud, what does a 5 mean as a status code?

/*
 * Return status codes
 */
#define SPED_OK         0

/*
 * Caller provided invalid arguments
 */
#define SPED_INVALID    1

/*
 * Internal error in daemon e.g. out of memory, can't return result
 */
#define SPED_INTERNAL   2

/*
 * Were layouts revoked?
 */
#define SPED_REVOKED    3

/*
 * Was there a matching rule, but not enough available gpools?
 */
#define SPED_MATCH_NOT_ENOUGH   4

/*
 * Was there no matching rules?
 */
#define SPED_NO_MATCH   5

Grin, so I thought we were going to find that the buffer was too small and something in the encoding was way off. Looks like the buffer is just right and now I need to debug why my input is not matching a policy. But, my debug code is correct here! What is being printed is the garbage in the local variable.

I added some code to zap the local variable and to also dump the input parameters:

Apr  3 16:37:20 pnfs-3-15 ./sped[100790]:   52:   0  0  0  0
Apr  3 16:37:20 pnfs-3-15 ./sped[100790]: Matching against: 200096 10 0 0 0 0 (N/A) (N/A) (N/A) (N/A) (N/A) (N/A) /nippy/bogey5 (N/A)

Hmm, no XDR dump of the output. That means we are erroring out earlier, probably due to zapping the local variable with a memset!

It is coring now somewhere in here:


                if (sar.gpoolids)
                        u = sar.gpoolids[0];
                else
                        u = (uint64_t)0xe4e5c7c8;

                printf("We got a policy %d, %d, %llu for %s\n",
                    sar.lh_unit_size, sar.lh_stripe_count,
                    sar.gpoolids[0], saa.path);

                syslog(LOG_ERR, "We got a policy %d, %d, %llu for %s",
                    sar.lh_unit_size, sar.lh_stripe_count,
                    sar.gpoolids[0], saa.path);

We should be concerned that we generate u but never use it. My guess is that sar.gpoolids is now NULL instead of random junk. If we fix that, we get back to:

Apr  3 21:38:41 pnfs-3-15 ./sped[100841]: Matching against: 200096 10 0 0 33554432 0 (N/A) (N/A) (N/A) (N/A) (N/A) (N/A) /nippy/bogey8 (N/A)
Apr  3 21:38:41 pnfs-3-15 ./sped[100841]: We got a policy 0, 0, 3840264136 for /nippy/bogey8

Ignoring the need to fix the receiving code to detect a NULL array, what can we do to get a match?

Another unitialized variable in going from the stand alone policy debugger to the sped daemon. How did I find it?

(gdb) n
spe_eval_thunk (st={st_is_interior = B_FALSE, st_node = 0x8067af0}, 
    pat=0x8047144, sa=0x8047078, prc=0x80470d8, bServer=B_TRUE)
    at spedaemon.c:827
827		return (b);
(gdb) p b
$19 = B_TRUE
(gdb) n
spe_eval_attribute (si=0x80674e8, pat=0x8047144, prc=0x80470d8, bServer=B_TRUE)
    at spedaemon.c:864
864			if (*prc != 0) {
(gdb) p *prc
$21 = -16785880

Why was *prc such an odd number, no error cases were hit.

int
spe_global_eval(policy_attributes_t *pat, uint64_t **gooies,
    uint32_t *n_gooies, uint32_t *unit_size, uint32_t *stripes)
{
        spe_policy_t    *sp;
        boolean_t       b = FALSE;
        int             rc;

If we set rc = 0, it works.

Apr  3 22:54:24 pnfs-3-15 ./sped[100938]: Matching against: 200096 10 0 0 33554432 0 (N/A) (N/A) (N/A) (N/A) (N/A) (N/A) /nippy/bogeya (N/A)
Apr  3 22:54:24 pnfs-3-15 ./sped[100938]: Considering rule 0
Apr  3 22:54:24 pnfs-3-15 ./sped[100938]: Considering 200096 vs 200096
Apr  3 22:54:24 pnfs-3-15 ./sped[100938]:    7:   0  0  0  0  8  6 F0 38

An 8 byte reply looks wrong. And we have another braindead problem. (Please realize this is the first time I am testing code I wrote over a month ago. I had to task switch to a P1 bug.):

                result = spe_global_eval(&pat, &sar.gpoolids,
                    &sar.n_gooies, &sar.lh_unit_size, &sar.lh_stripe_count);
                if (!result) {
                        sped_door_return(result, &sar, proc);
                }

0 is a valid match, everything else is an error. Either I planned to make it a boolean (in which case spe_global_eval is wrong) or I messed up the check here. I'll assume the check:


Hey, the policy match info is correct, but why no payload?

        sped_door_return(result, NULL, NULL);
        /* NOTREACHED */

The two sped_door_return instances are mixed up. Flip them. Okay, back to a result and a 4 byte payoff.

And proc has a bad initialization:<

        switch (command) {
        case (SPED_DS_ZPOOL):
                proc = xdr_spe_alloc_caller_t;

Should be: xdr_spe_alloc_res_t

I think we finally have good results:

Apr  3 23:14:02 pnfs-3-15 ./sped[100957]: Matching against: 200096 10 0 0 33554432 0 (N/A) (N/A) (N/A) (N/A) (N/A) (N/A) /nippy/bogeyd (N/A)
Apr  3 23:14:02 pnfs-3-15 ./sped[100957]: We got a policy 64000, 16, 35865356092523570 for /nippy/bogeyd
Apr  3 23:14:02 pnfs-3-15 ./sped[100957]:    7:   0  0  0  0  0  0  0  1
Apr  3 23:14:02 pnfs-3-15 ./sped[100957]:   15:   0 7F 6B 59 F1 A3 44 32
Apr  3 23:14:02 pnfs-3-15 ./sped[100957]:   23:   0  0  0  0  0  0  0  0
Apr  3 23:14:02 pnfs-3-15 ./sped[100957]:   31:   0  0  0  0  0  0 FA  0
Apr  3 23:14:02 pnfs-3-15 ./sped[100957]:   39:   0  0  0 10  0  0  0  0
Apr  3 23:14:02 pnfs-3-15 nfssrv: WARNING: sped: 64000 16 0 35865356092523570

There are some take home points here:

  1. Try to unit test/debug your code when you write it.
    1. You'll remember shortcuts you took.
    2. You'll remember the code.
  2. Write debugging tools if necessary.
    1. If I had to step through a debugger every time, I would have gone crazy.
    2. The XDR dump routine allowed me to focus on problems.
  3. Remember, sometimes the debugging tools introduce other bugs (see the stuff with u above.
  4. You didn't see it, but I created small focused test cases when necessary.
    1. Small one-off files allow you to eliminate large chunks of code.
    2. Allowed me to use gdb, in which case I fought the code and not the debugger.

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
Debugging fat code

I've been working on a kernel to user space door call which encodes the data in an XDR stream. Normally, I view XDR output as it goes across the wire with snoop. As this stuff is not exposed, I ended up writing a little piece of code to dump the bits on the console. I know I had some problems when I wrote it, but I'm just going to give you the current code:

void
sped_xdr_dump(XDR *x)
{
        int     i;
        int     j;
        char    buf[100];
        char    str[10];
        char    cbuf[4];

        j = 0;
        memset(&cbuf, '\0', 4);

        buf[0] = '\0';
        for (i = 0; i < x->x_handy; i++) {
                cbuf[0] = x->x_private[i];
                sprintf(str, " %X", cbuf[0]);
                strcat(buf, str);
                j++;
                if (j == 8) {
                        syslog(LOG_ERR, "%s", buf);
                        j = 0;
                        buf[0] = '\0';
                }
        }

        if (j != 0) {
                syslog(LOG_ERR, "%s", buf);
        }
}

The crap with cbuf is where I consider this code to be fat. It is a lazy hack because I felt the bytes were running together. It is lazy because I don't need it. And it is where I should be looking for bugs. Anyway, with the following input:

> fec3fce8::print XDR 
{
    x_op = 1 (XDR_DECODE)
    x_ops = 0xfee45bd8
    x_public = 0
    x_private = 0xfec3fdd0
    x_base = 0xfec3fdd0
    x_handy = 0x30
}
> 0xfec3fdd0/48B
0xfec3fdd0:     0       0       0       2       0       0       0       0
                0       0       0       2       0       3       d       a0
                0       0       0       a       2       0       0       0
                0       0       0       a       2f      6e      69      70
                70      79      2f      68      32      62      0       0
                0       0       0       0       0       0       0       0

As shown by mdb, which I don't like, I get the following output:

Apr  3 12:04:06 pnfs-3-15 last message repeated 1 time
Apr  3 12:04:22 pnfs-3-15 sped[100920]:  0 0 0 2 0 0 0 0
Apr  3 12:04:22 pnfs-3-15 sped[100920]:  0 0 0 2 0 3 D FFFFFFA0
Apr  3 12:04:22 pnfs-3-15 sped[100920]:  0 0 0 A 2 0 0 0
Apr  3 12:04:22 pnfs-3-15 sped[100920]:  0 0 0 A 2F 6E 69 70
Apr  3 12:04:22 pnfs-3-15 sped[100920]:  70 79 2F 68 32 62 0 0
Apr  3 12:04:22 pnfs-3-15 sped[100920]:  0 0 0 0 0 0 0 0

Where is FFFFFFA0 coming from? Okay, I took the code and stuck it in a driver:

> cat y.c
#include <sys/systm.h>
#include <sys/sdt.h>
#include <rpc/types.h>
#include <rpc/svc.h>
#include <rpc/xdr.h>

#include <rpc/auth_sys.h>

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

void
sped_xdr_dump(char *jack, int len)
{
        int     i;
        int     j;
        char    buf[100];
        char    str[10];
        char    cbuf[4];

        j = 0;
        memset(&cbuf, '\0', 4);

        buf[0] = '\0';
        for (i = 0; i < len; i++) {
                cbuf[0] = jack[i];
                sprintf(str, " %X", cbuf[0]);
                strcat(buf, str);
                j++;
                if (j == 8) {
                        printf("%s\n", buf);
                        j = 0;
                        buf[0] = '\0';
                }
        }

        if (j != 0) {
                printf("%s\n", buf);
        }
}

int
main(int argc, char *argv[])
{
	char	jack[] = {
            0x0,  0x0,  0x0,  0x2,  0x0,  0x0,  0x0,  0x0,
            0x0,  0x0,  0x0,  0x2,  0x0,  0x3,  0xd,  0xa0,
            0x0,  0x0,  0x0,  0xa,  0x2,  0x0,  0x0,  0x0,
            0x0,  0x0,  0x0,  0xa,  0x2f, 0x6e, 0x69, 0x70,
            0x70, 0x79, 0x2f, 0x68, 0x32, 0x62, 0x0,  0x0,
            0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0,  0x0

	};

	sped_xdr_dump(jack, sizeof(jack));
	return (0);
}
>  ./a.out 
 0 0 0 2 0 0 0 0
 0 0 0 2 0 3 D FFFFFFA0
 0 0 0 A 2 0 0 0
 0 0 0 A 2F 6E 69 70
 70 79 2F 68 32 62 0 0
 0 0 0 0 0 0 0 0

So, I've captured the problem. Using gdb, I can get to the heart of the issue:

Breakpoint 3, sped_xdr_dump (jack=0x8047134 "", len=48) at x.c:32
32	                cbuf[0] = jack[i];
5: str = " D\000\000\000\000\000\000X\003"
4: j = 7
3: i = 15
2: jack[i] = -96 '?'
1: cbuf = "\r\000\000"
(gdb) p jack[i-1]
$1 = 13 '\r'
(gdb) q

As tempting as it is to say that I'm grabbing a bit from the previous word, it is clear that the fat code is really working. And while I could do shenanigans with bit shifting, the problem isn't in the way I am explicitly pulling out the data. Nor is the " %X" an issue.

The real issue is that I've got the wrong type of data, I'm trying to print out signed data, and 'A0' has a high order bit set.

> diff y.c y1.c	
21c21
<         char    cbuf[4];
---
>         unsigned char    cbuf[4];
> ./a.out 
 0 0 0 2 0 0 0 0
 0 0 0 2 0 3 D A0
 0 0 0 A 2 0 0 0
 0 0 0 A 2F 6E 69 70
 70 79 2F 68 32 62 0 0
 0 0 0 0 0 0 0 0

Okay, we can do better than that. We need to get rid of cbuf:

> diff y.c y2.c
21d20
<         char    cbuf[4];
24d22
<         memset(&cbuf, '\0', 4);
28,29c26
<                 cbuf[0] = jack[i];
<                 sprintf(str, " %X", cbuf[0]);
---
>                 sprintf(str, " %X", (unsigned char)jack[i]);
> ./a.out
 0 0 0 2 0 0 0 0
 0 0 0 2 0 3 D A0
 0 0 0 A 2 0 0 0
 0 0 0 A 2F 6E 69 70
 70 79 2F 68 32 62 0 0
 0 0 0 0 0 0 0 0

And we need to clean-up the output:

> diff y2.c y3.c
26c26
<                 sprintf(str, " %X", (unsigned char)jack[i]);
---
>                 sprintf(str, " %2X", (unsigned char)jack[i]);
> ./a.out
  0  0  0  2  0  0  0  0
  0  0  0  2  0  3  D A0
  0  0  0  A  2  0  0  0
  0  0  0  A 2F 6E 69 70
 70 79 2F 68 32 62  0  0
  0  0  0  0  0  0  0  0

I might be able to make it even more either smarter or compact, but it is now throw away code. But even thow away code has to be functionally correct. When it isn't, look for areas where you are doing more work than you need to. If you remember you tried something and weren't convinced it would work, and it only worked partially, go back and question that code.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Copyright (C) 2007, Kool Aid Served Daily