« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20080912 Friday September 12, 2008
Decoding data transfer across nfssys() syscall

I've got this new code to XDR encode in userland and XDR decode in the kernel. I can run the kernel and debug issues live, but that is bulky. So, instead, I'm going to write some tools to help me. First off, I already have routines for printing a policy, so I've already got the hard part out of the way.

Or do I?

You can't just call printf() in the kernel. And while you can use syslog() or cmn_err(), they have the nasty habit of inserting a line feed after every call. So this

        printf("Hello ");
        printf("Nurse!\n");

Would yield

Hello
Nurse!

On a strict translation. So, the easiest way to proceed is to collect it all in a buffer:

void
spe_print_policy_list(spe_policy_t *policies)
{
        spe_policy_t    *sp;
        char            *buf;
        int             offset;

        buf = kmem_zalloc(MAX_PRINT_BUF + 1, KM_SLEEP);
        offset = 0;
        for (sp = policies; sp; sp = sp->next) {
                spe_print_policy(sp, buf, &offset);
                cmn_err(CE_WARN, "%s", buf);
                buf[0] ='\0';
                offset = 0;
        }
        free(buf, MAX_PRINT_BUF + 1);
}

Note that this is debug code and probably will be thrown away. So, I still make sure to do garbage cleanup correctly, but I don't care about whether it should be CE_WARN or CE_NOTE.

Why am I passing an offset?

Well, we have the following functions, most of which emit information:

spe_bitcount(unsigned int n)
spe_print_address(uint_t addr, char *buf)
spe_print_netmask(uint_t mask, char *buf)
spe_print_leaf(spe_leaf_t *sl, char *buf)
spe_print_thunk(spe_thunk_t st, char *buf)
spe_print_attribute(spe_interior_t *si, char *buf)
spe_print_policy(spe_policy_t *sp, char *buf)
spe_print_policy_list(spe_policy_t *policies)

If I pass in just the buffer, I could advance it every time I wrote:

                len = sprintf(buf, "%c%u", sep, rev);
                spe_print_fool(fool, &buf[len + 1]);

Note, I wrote that second line off the top of my head, don't trust it!

But that doesn't help you during a tree traversal. You print out the left side, come back out and you don't know much (or if at all) spe_print_fool changed the offset. So, instead I also pass in the offset to be used:

void
spe_print_netmask(uint_t mask, char *buf, int *poffset)
{
        int     t;
        int     len;

        t = spe_bitcount(mask);
        len = sprintf(&buf[*poffset], "/%u", t);
        *poffset = len + 1;
}

This way, the caller of spe_print_netmask will know where to write.

Now, this may be embarrassing, but we really need to debug this debug code before we add it into the kernel. I.e., we want to make sure we are focused on debugging our real code and not the support code. Luckily, we can write some stand-alone code and test it out there.

Since I know the print code used to dump out a policy correctly, I know know that this hunch was correct:

[th199096@aus1500-home ~/tmp]> ./a.out 
10, 200
20, 1
30
40,/

And here was the input:

10, 16, 64k, 35865356092523570, uid == 200096
20, 32, 2k, 35865356092523570, uid == 1066
30, 64, 1k, 35865356092523570, uid == 0
40, 2, 2k, 35865356092523570, subnet == 10.1.233.0/24

It is easy to see in the debugger that I reset poffset instead of incrementing it:

96                                      len = sprintf(&buf[*poffset], "%s",
8: *poffset = 28
7: len = 0
(gdb) 
98                                      *poffset = len + 1;
8: *poffset = 28
7: len = 3
(gdb) 
94                      for (i = 0; i < Attribute_count; i++) {
8: *poffset = 4
7: len = 3

That should be *poffset += len + 1.

Better, but not a go:

[th199096@aus1500-home ~/tmp]> ./a.out
10, 16, 64000, 4054008882l 
20, 32, 2000, 4054008882l 
30, 64, 1000, 4054008882l 
40, 2, 2000, 4054008882l 

The offending line starts with:

        len = sprintf(&buf[*poffset], "%u, %u, %u, %ul ", sp->sp_id, sp->sp_stripe_count,
            sp->sp_interlace, sp->sp_guuids[0]);

I suspect that the %ul is coming out as %ul. I.e., print an unsigned followed by an 'l'. We see that the data is correct in memory:

(gdb) p *sp
$2 = {sp_id = 10, sp_stripe_count = 16, sp_interlace = 64000,
    sp_attr_expr = 0x80687e0, sp_name = 0x0, sp_guuids = {35865356092523570, 0, 
    0, 0, 0, 0, 0, 0}, next = 0x80699f0}

Crud, that is a bad formatting and I've already fixed it once! Looks like I lost some code. :-< (see here for prior head banging Putting that throw away code to good use). And I'll bet the remaining problem has something to do with code changes I made due to the move to xdr!

And I would be wrong. Things look okay, but no change:

96                                      len = sprintf(&buf[*poffset], "%s",
5: spe_attribute_table[i].at_attr = SPE_ATTR_UID
4: buf = 0x8066748 "10, 16, 64000, 35865356092523570 "
3: sl->sl_attr = SPE_ATTR_UID
1: i = 13

I now suspect I am off by 1!


(gdb) p &buf[33]
$5 = 0x8066769 ""
(gdb) p &buf[34]
$6 = 0x806676a "uid"
(gdb) p spe_attribute_table[i].at_name
$7 = 0x8054ded "uid"

Yes, and now Thunderbirds are go!

[th199096@aus1500-home ~/tmp]> ./a.out
10, 16, 64000, 35865356092523570 uid == 200096
20, 32, 2000, 35865356092523570 uid == 1066
30, 64, 1000, 35865356092523570 uid == 0
35, 2, 2000, 35865356092523570 subnet ==  10.1.233.0/24 && gid == 55
40, 2, 2000, 35865356092523570 subnet ==  10.1.233.0/24

Note I added a new test case to handle a more complex rule!

Lessons Learned

  1. Test when you write code.
  2. Test your test code.
  3. Add new test cases, even when you think things are okay.
  4. Build up from simple tested code to more complex tested code.
    1. Assume your prior code works, but don't be afraid to question that when you stress the code.
    2. Building block approach to debugging
  5. If your environment is too complex for easy testing, write a stand-alone test suite.
  6. Don't be afraid to throw away your test code!
  7. Don't be afraid to reuse it.
    1. Which also means write it as well as you can.
    2. You might end up putting it in production and you don't want an assumption to become a customer bug.

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/decoding_data_transfer_across_nfssys
Comments:

Hi Tom,

good job! About 'Lessons Learned':

>If your environment is too complex for easy testing, write a stand-alone test suite.

In our project we spend two years with writing our own test suite and finally adopted existing one for a month.

It's hard to get the code to be testable, but if it's the case try to use existing frameworks. There are lot of advanced staff available. 'Delegate' part of your work to some one else :)

Tigran.

Posted by Tigran Mkrtchyan on September 13, 2008 at 02:58 AM CDT #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed