Sometimes you just get busy and can't recall where your time went. I just finished up the spedebugger. I wish I had blogged more about it, but it isn't going to happen - I need to turn it into a real spe daemon in two weeks time. The Austin Bake-A-Thon is coming up and I need a demo.
Anyway, here is the finished product, which still does not support IPv6: spedebug.tar.bz2
Besides reading the README.txt, here is a quick howto get it up and running:
> bzcat spedebug.tar.bz2 | tar xf - > ./make.sh spedebug.c:27: warning: ignoring #pragma ident In file included from spedebug.c:49: speadm.h:27: warning: ignoring #pragma ident > ./spedebug tests/domain.txt Looking for a matching server policy: No matching policy, default would apply. > ./spedebug -v tests/domain.txt Here are the server policies: 1, 25, 34000, uid != 1066 && domain == centrl.sun.com 10, 25, 34000, uid == 1066 && weekday == sun 20, 25, 34000, domain == excfb.com Looking for a matching server policy: No matching policy, default would apply. > ls -la total 260 drwxr-xr-x 4 tdh staff 10 Jan 30 12:49 . drwxr-xr-x 12 tdh staff 16 Jan 30 12:41 .. -r--r--r-- 1 tdh staff 3583 Jan 30 12:38 README.txt drwxr-xr-x 2 tdh staff 5 Jan 30 12:46 SCCS -rwxr-xr-x 1 tdh staff 65 Jan 30 12:02 make.sh -r--r--r-- 1 tdh staff 2641 Jan 30 12:38 speadm.h -rwxr-xr-x 1 tdh staff 44156 Jan 30 12:46 spedebug -r--r--r-- 1 tdh staff 49826 Jan 30 12:46 spedebug.c -rw-r--r-- 1 tdh staff 21778 Jan 30 12:48 spedebug.tar.bz2 drwxr-xr-x 2 tdh staff 17 Jan 29 23:04 tests
I sponsor OpenSolaris contributions. I recently asked some contributers to provide a webrev on cr.opensolaris.org such that I could get a review done much in the same manner as we do internally. And I found there wasn't much usage documentation for it. And then the same subject just came up here: Shooting my mouth off.
I was wondering what it would take to generate a webrev. So I decided on a little experiment. Note, I chose to do this without Mercurial because at least one poster couldn't get to a repository. This should be easy enough to do with 'hg' as well.
First, lets get the ON source and install it in two locations:
[tdh@silver ~]> mkdir os [tdh@silver ~]> cd os [tdh@silver ~/os]> wget http://dlc.sun.com/osol/on/downloads/current/on-src.tar.bz2 ... [tdh@silver ~/os]> bzcat on-src.tar.bz2 | tar xf - ... [tdh@silver ~/os]> ls -la total 132535 drwxr-xr-x 3 tdh staff 5 Jan 30 17:25 . drwxr-xr-x 54 tdh staff 101 Jan 31 11:12 .. -rw-r--r-- 1 tdh staff 67779916 Jan 30 17:43 on-src.tar.bz2 -rw-r--r-- 1 tdh staff 10420 Jan 30 17:25 README.opensolaris drwxr-xr-x 3 tdh staff 3 Jan 30 17:24 usr [tdh@silver ~/os]> mv usr README.opensolaris head/ [tdh@silver ~/os]> bzcat on-src.tar.bz2 | tar xf - ... [tdh@silver ~/os]> mkdir tail [tdh@silver ~/os]> mv usr README.opensolaris tail
And now set up our environment:
[tdh@silver ~/os]> cd tail/ [tdh@silver tail]> cp usr/src/tools/env/opensolaris.sh . [tdh@silver tail]> chmod +w opensolaris.sh [tdh@silver tail]> vi opensolaris.sh [tdh@silver tail]> diff opensolaris.sh usr/src/tools/env/opensolaris.sh 43c43 < NIGHTLY_OPTIONS="-FNnaCDlmr"; export NIGHTLY_OPTIONS --- > NIGHTLY_OPTIONS="-FNnaCDlmrt"; export NIGHTLY_OPTIONS 47c47 < GATE=tail; export GATE --- > GATE=testws; export GATE 50c50 < CODEMGR_WS="/home/tdh/os/$GATE"; export CODEMGR_WS --- > CODEMGR_WS="/export/$GATE"; export CODEMGR_WS 93c93 < STAFFER=tdh; export STAFFER --- > STAFFER=nobody; export STAFFER
Note, I took out -t because I had earlier installed SUNWonbld:
[tdh@silver ~/os]> which webrev /opt/onbld/bin/webrev
Okay, really get the environment:
[tdh@silver tail]> bldenv -d opensolaris.sh Build type is DEBUG RELEASE is VERSION is tail RELEASE_DATE is The top-level 'setup' target is available to build headers and tools. Using /bin/tcsh as shell.
And a simple test (which shows why I should use 'hg' by the way - i.e., chmod is not an acceptable change tracking mechanism.):
[tdh@silver tail]> cd usr/src/prototypes/ [tdh@silver prototypes]> chmod +w prototype.c [tdh@silver prototypes]> cp prototype.c xxx [tdh@silver prototypes]> vi prototype.c [tdh@silver prototypes]> diff prototype.c xxx 23c23 < * Copyright 2008 Sun Microsystems, Inc. All rights reserved. --- > * Copyright 2007 Sun Microsystems, Inc. All rights reserved. 51,55d50 < < /* < * Get up, stand up, stand up for your rights! < * Get up, stand up, don't give up your rights! < */
Yes, Bob Marley is playing in the background.
[tdh@silver prototypes]> cd ../../.. [tdh@silver ~/os]> pwd /home/tdh/os/tail [tdh@silver tail]> vi changes.txt [tdh@silver tail]> more !$ more changes.txt usr/src/prototypes/prototype.c
And this does not work:
[tdh@silver tail]> webrev -p ../head -i changes.txt
Unable to determine SCM type currently in use.
For teamware: webrev looks for $CODEMGR_WS either in
the environment or in the file list.
And I'll try this:
[tdh@silver tail]> vi changes.txt
[tdh@silver tail]> cat changes.txt
CODEMGR_WS=/home/tdh/os/tail
CODEMGR_PARENT=/home/tdh/os/head
usr/src/prototypes/prototype.c
[tdh@silver tail]> webrev changes.txt
SCM detected: teamware
File list from: changes.txt
Workspace: /home/tdh/os/tail
Compare against: /home/tdh/os/head
Output to: /home/tdh/os/tail/webrev
Output Files:
usr/src/prototypes/prototype.c
patch cdiffs udiffs wdiffs sdiffs frames ps old new
Generating PDF: Done.
index.html: Done.
By the way, thanks to Frank Hoffman for a link to the man page, which is here:
[tdh@silver tail]> nroff -man usr/src/tools/scripts/webrev.1 | more
And we see we have output:
[tdh@silver tail]> ls -la webrev/ total 42 drwxr-xr-x 4 tdh staff 11 Jan 31 11:35 . drwxr-xr-x 4 tdh staff 8 Jan 31 11:35 .. -rw-r--r-- 1 tdh staff 2 Jan 31 11:35 TotalChangedLines -rw-r--r-- 1 tdh staff 4983 Jan 31 11:35 ancnav.html -rw-r--r-- 1 tdh staff 3053 Jan 31 11:35 ancnav.js -rw-r--r-- 1 tdh staff 93 Jan 31 11:35 file.list -rw-r--r-- 1 tdh staff 3760 Jan 31 11:35 index.html drwxr-xr-x 4 tdh staff 4 Jan 31 11:35 raw_files -rw-r--r-- 1 tdh staff 489 Jan 31 11:35 tail.patch -rw-r--r-- 1 tdh staff 2518 Jan 31 11:35 tail.pdf drwxr-xr-x 3 tdh staff 3 Jan 31 11:35 usr
At this point, you need to follow the instructions at cr.opensolaris.org to post your changes for review. Note that you could also tar and bzip the webrev to send out.
But I can finish this off with a simple example:
[tdh@silver tail]> scp -r webrev/ tdh@cr.opensolaris.org:example The authenticity of host 'cr.opensolaris.org (72.5.123.19)' can't be established. RSA key fingerprint is fa:ab:0c:a4:78:75:0b:bd:3d:eb:74:f1:b5:e1:98:a8. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'cr.opensolaris.org,72.5.123.19' (RSA) to the list of known hosts. Enter passphrase for key '/home/tdh/.ssh/id_dsa': ...
Which means that the following link should work: http://cr.opensolaris.org/~tdh/example
Given a directory path entry to a file, I want to find:
I can probably find a standard function to do this, but where is the fun in that?
You might notice I am being vague with the extension and basename, I'm giving the definition we know from common usage, not a precise one. We'll see why in a bit. And I am assuming we are passing in a file and not a directroy.
I've got the code sitting in the policy prototype, but I pulled it out to focus on and to do some fun unit testing. So, here is the code:
#include <stdio.h>
#include <stdarg.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
typedef struct {
char *ext;
char *path;
char *base;
char *file;
} policyAttributes_t;
int
main (int argc, char *argv[])
{
policyAttributes_t pat;
int rc = 0;
char *junk;
char *s;
/*
* Do some initialization.
*/
memset(&pat, '\0', sizeof(pat));
pat.path = strdup(argv[1]);
if (pat.path == NULL) {
...
}
/*
* Now we need the file.
*/
s = strrchr(argv[1], '/');
if (s) {
junk = strdup(s+1);
if (junk == NULL) {
...
}
pat.file = strdup(junk);
if (pat.file == NULL) {
...
}
s = strrchr(junk, '.');
if (s) {
pat.ext = strdup(s+1);
if (junk == NULL) {
...
}
*s = '\0';
}
pat.base = junk;
}
cleanup:
if (pat.path) {
printf("path = <%s>\n", pat.path);
free(pat.path);
}
if (pat.ext) {
printf("ext = <%s>\n", pat.ext);
free(pat.ext);
}
if (pat.base) {
printf("base = <%s>\n", pat.base);
free(pat.base);
}
if (pat.file) {
printf("file = <%s>\n", pat.file);
free(pat.file);
}
return (rc);
}
You can see I pulled it out of the other code - I preserved the pat variable. Anyway, I was mostly worried about the pointer manipulation, but I shouldn't have:
% ./a.out /foo/bar.c path = </foo/bar.c> ext = <c> base = <bar> file = <bar.c> % ./a.out /foo/grabba/yabba/bar.c path = </foo/grabba/yabba/bar.c> ext = <c> base = <bar> file = <bar.c> % ./a.out /foo/grabba/yabba/bar path = </foo/grabba/yabba/bar> base = <bar> file = <bar>
Now, in a traditional CIFS only world, everything would be hunky-dory. I could probably even do the 8+3 filename stuff. But we are wearing big boy pants now, so another test yields a result I do not agree with:
% ./a.out /foo/grabba/yabba/.cshrc path = </foo/grabba/yabba/.cshrc> ext = <cshrc> base = <> file = <.cshrc>
I would argue that the base name is '.cshrc' and there is no extension.
Anyone have any different views?
I knew I was close to having a lot more work last night. I spent a little bit of time this morning getting the error handling done the way I wanted for the evaluation (does a return of FALSE mean an error or not to match?). I also spliced the network parsing into the command line. Still needs to be reworked a bit, a network address is not a subnet after all. But, it now handles evaluating addresses correctly:
% /a.out -r tests/hulk.txt -d 12 -u 1167 -a 192.168.3.211 4, 25, 34000, uid == 1066 5, 30, 32000, uid == 1067 || uid == 1065 6, 15, 2000, uid == 1068 && gid == 500 7, 40, 8000, gid == 500 9, 40, 8000, day == 10 15, 3, 15000, subnet == 192.168.3.0/24 16, 3, 15000, subnet == 10.10.20.0/24 25, 2, 48000, ip == 192.168.2.211 1114, 30, 32000, !(uid == 1167 || uid == 1165) 1115, 40, 8000, !(day == 11) The matching policy is: 15, 3, 15000, subnet == 192.168.3.0/24 % ./a.out -r tests/hulk.txt -d 12 -u 1167 -a 192.168.2.211 ... The matching policy is: 25, 2, 48000, ip == 192.168.2.211 % ./a.out -r tests/hulk.txt -d 12 -u 1167 -a 10.10.20.5 ... The matching policy is: 16, 3, 15000, subnet == 10.10.20.0/24
Looks like I should have an option to dump the policies and by default not do it.
What I am struggling with now is how to differentiate between essentially testing this on the server and on any other machine. If it is the server, then I can state that the only command line parameters for attributes are those that I can pull out of a NFS request. I don't need anything else.
But if this is being run on a different computer, then I might need to provide some mock-up ability. E.g., I might want to set a netmask, a domain name, etc. In short, any of the secondary attributes.
I'm thinking of taking these off of the command line and putting them in a .speadm file. I want to focus on the command line looking like a NFS packet, but still allow for flexibility when debugging.
The latest code is at: speadm.c and speadm.h.
I'm not going to walk through the code right now. It is incomplete, mostly the command line parsing, and has XXXs all over the place. I pushed ahead just enough to get some simple integers working:
% cat tests/hulk.txt 4, 25, 34k, uid == 1066 6, 40, 8k, gid == 500 6, 40, 8k, day == 10 % ./a.out -r tests/hulk.txt -d 10 -u 1066 4, 25, 34000 uid == 1066 6, 40, 8000 gid == 500 6, 40, 8000 day == 10 The matching policy is: 4, 25, 34000 uid == 1066 % ./a.out -r tests/hulk.txt -d 10 4, 25, 34000 uid == 1066 6, 40, 8000 gid == 500 6, 40, 8000 day == 10 The matching policy is: 6, 40, 8000 day == 10
BTW: Notice what I meant about duplicate ids. I still need to fix that.
If you are trying the current code out, avoid the strings for right now. That would also include the networking tests.
What is in there now is the evaluation code, parsing of network and single machine addresses in the policies file, etc. I think I added 1k of code today and I can't figure out when. Well, 650 lines came in since the last blog entry.
The evaluation code stole the framework from the printing code. I'd love to abstract out the looping from the action to be performed. But, I think that may end up being more trouble than it is worth.
Hmm, I beefed up the tests just a bit and found that both '||' and '&&' are working:
% cat tests/hulk.txt 4, 25, 34k, uid == 1066 5, 30, 32k, uid == 1067 || uid == 1065 6, 15, 2k, uid == 1068 && gid == 500 7, 40, 8k, gid == 500 9, 40, 8k, day == 10 % ./a.out -r tests/hulk.txt -u 1067 4, 25, 34000 uid == 1066 5, 30, 32000 uid == 1067 || uid == 1065 6, 15, 2000 uid == 1068 && gid == 500 7, 40, 8000 gid == 500 9, 40, 8000 day == 10 The matching policy is: 5, 30, 32000 uid == 1067 || uid == 1065 % ./a.out -r tests/hulk.txt -u 1065 4, 25, 34000 uid == 1066 5, 30, 32000 uid == 1067 || uid == 1065 6, 15, 2000 uid == 1068 && gid == 500 7, 40, 8000 gid == 500 9, 40, 8000 day == 10 The matching policy is: 5, 30, 32000 uid == 1067 || uid == 1065 % ./a.out -r tests/hulk.txt -u 1068 -g 500 4, 25, 34000 uid == 1066 5, 30, 32000 uid == 1067 || uid == 1065 6, 15, 2000 uid == 1068 && gid == 500 7, 40, 8000 gid == 500 9, 40, 8000 day == 10 The matching policy is: 6, 15, 2000 uid == 1068 && gid == 500 % ./a.out -r tests/hulk.txt -u 1068 -g 501 4, 25, 34000 uid == 1066 5, 30, 32000 uid == 1067 || uid == 1065 6, 15, 2000 uid == 1068 && gid == 500 7, 40, 8000 gid == 500 9, 40, 8000 day == 10 No matching policy, default would apply.
What about '!'?
% ./a.out -r tests/hulk.txt -d 12 -u 1167 4, 25, 34000, uid == 1066 5, 30, 32000, uid == 1067 || uid == 1065 6, 15, 2000, uid == 1068 && gid == 500 7, 40, 8000, gid == 500 9, 40, 8000, day == 10 10, 30, 32000, !(uid == 1167 || uid == 1165) 15, 40, 8000, !(day == 11) The matching policy is: 10, 30, 32000, !(uid == 1167 || uid == 1165)
Looks wrong (notice I fixed a missing ',' before the policy attribute-expression). Hmm, here is the bug:
bLHS = spe_eval_thunk(si->si_branches[0], pat, &sa);
/*
* Lazy, but only 1 op - which is '!'.
*/
if (b == TRUE) {
b = FALSE;
} else {
b = TRUE;
}
That should have been a test on bLHS. Again, quick coding leads to bugs. With the fix:
% ./a.out -r tests/hulk.txt -d 12 -u 1167 4, 25, 34000, uid == 1066 5, 30, 32000, uid == 1067 || uid == 1065 6, 15, 2000, uid == 1068 && gid == 500 7, 40, 8000, gid == 500 9, 40, 8000, day == 10 10, 30, 32000, !(uid == 1167 || uid == 1165) 15, 40, 8000, !(day == 11) The matching policy is: 15, 40, 8000, !(day == 11)
The latest code is at: speadm.c and speadm.h.
theShepler asked me offline why I wasn't looking at inet_ntoa(3SOCKET) for Reversing a network address from an uint_t. His email was dry, but I think there was a "D'oh!" in there somewhere.
I had considered those routines, but I went with the stuff in mountd because for the standalone, we don't have sockets being attached. In the finished product, I'll be looking at this. Also, since I will need to support IPv6, I'll have more need to not be doing my own parsing. But for right now, I'm going to proceed without IPv6 and without inet_aton(3SOCKET).
Okay, I did two things with the little network parsing code:
And here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
/*
* http://infolab.stanford.edu/~manku/bitcount/bitcount.html
*/
int
spe_bitcount (unsigned int n)
{
int count = 8 * sizeof(int) ;
n ^= (unsigned int) -1 ;
while (n) {
count-- ;
n &= (n - 1) ;
}
return (count);
}
int
main (int argc, char *argv[])
{
char *p = argv[1];
char *junk;
uint_t addr = 0, mask = 0;
uint_t rev;
int i;
int shift;
int t;
char sep = ' ';
unsigned char b = 0;
for (i = 0; i < 4; i++) {
t = strtol(p, &junk, 10);
if (junk && junk[0] == '/' && i == 3) {
b = 1;
} else if (junk && junk[0] != '.' && junk[0] != '\0') {
fprintf(stderr, "%s is wrong in the %d octet,"
" %s has non-digits as |%s|\n", argv[1],
i+1, p, junk);
exit (-1);
} else if (t > 255 || t < 0) {
fprintf(stderr, "%s is wrong in the %d octet,"
" %d is out of range\n", argv[1],
i+1, t);
exit (-1);
}
addr |= t << ((3-i) * 8);
p = strchr(p, '.');
if (p == NULL)
break;
p++;
}
if (b) {
p = &junk[1]; /* advance over '/' */
t = strtol(p, &junk, 10);
if (junk && junk[0] != '\0') {
fprintf(stderr, "%s is wrong in the netmask,"
" %s has non-digits as |%s|\n", argv[1],
p, junk);
exit (-1);
} else if (t < 0 || t > 32) {
fprintf(stderr, "%s is wrong in the netmask,"
" %d is out of range\n", argv[1], t);
exit (-1);
}
mask = t ? ~0 << ((sizeof (struct in_addr) * NBBY) - t) : 0;
} else {
if ((addr & 0x00ffffff) == 0)
mask = 0xff000000;
else if ((addr & 0x0000ffff) == 0)
mask = 0xffff0000;
else if ((addr & 0x000000ff) == 0)
mask = 0xffffff00;
else
mask = 0xffffffff;
}
printf("%s yields %u (%u) and that reverse maps to ", argv[1],
addr, mask);
for (i = 0; i < 4; i++) {
shift = ((3-i) * 8);
rev = (addr & (0xff << shift)) >> shift;
printf("%c%u", sep, rev);
sep = '.';
}
sep = '(';
t = 0;
for (i = 0; i < 4; i++) {
shift = ((3-i) * 8);
rev = (mask & (0xff << shift)) >> shift;
printf("%c%u", sep, rev);
sep = '.';
}
t = spe_bitcount(mask);
printf(" - %u)\n", t);
return (0);
}
And some test runs:
% ./a.out 192.168.2.0/23 192.168.2.0/23 yields 3232236032 (4294966784) and that reverse maps to 192.168.2.0(255.255.254.0 - 23) % ./a.out 192.168.2.0/24 192.168.2.0/24 yields 3232236032 (4294967040) and that reverse maps to 192.168.2.0(255.255.255.0 - 24) % ./a.out 192.168.2.0/31 192.168.2.0/31 yields 3232236032 (4294967294) and that reverse maps to 192.168.2.0(255.255.255.254 - 31) % ./a.out 192.168.2.0/32 192.168.2.0/32 yields 3232236032 (4294967295) and that reverse maps to 192.168.2.0(255.255.255.255 - 32) % ./a.out 192.168.2.0/0 192.168.2.0/0 yields 3232236032 (0) and that reverse maps to 192.168.2.0(0.0.0.0 - 0) % ./a.out 192.168.2.0/1 192.168.2.0/1 yields 3232236032 (2147483648) and that reverse maps to 192.168.2.0(128.0.0.0 - 1) % ./a.out 192.168.2.0/8 192.168.2.0/8 yields 3232236032 (4278190080) and that reverse maps to 192.168.2.0(255.0.0.0 - 8) % ./a.out 192.168.2.0/33 192.168.2.0/33 is wrong in the netmask, 33 is out of range
I just need to stuff this code into my prototype.
Okay, the amount of delta turns out to be small to getting working code. I spent more time thinking about things. I tried putting the '(' code handling into the RHS and struggled with what node do I attach it to? I understood why for the LHS case it went to si, but after I created some new storage, I could not figure out how I knew si already had 2 children allocated.
Consider (path == /db) && (ext == jpg). From just doing the '(' code from the LHS, I knew there was no way that I had a correct si to attach the LHS ((ext == jpg)). Which is when I looked for where I had the correct code, which was in the get_compound state. And once I realized that, I realized I could keep the get_lhs focused on just handling data. The changes I had to make were in the OP code:
case ('|') :
/*
* Rewind so that the compound operator can handle this correctly.
*/
if (*expression-- != '|') {
fprintf(stderr,
...
}
goto get_compound;
case ('&') :
/*
* Rewind so that the compound operator can handle this correctly.
*/
if (*expression-- != '&') {
..
}
goto get_compound;
This also means that I only have to worry about '!' and '(' in the get_lhs state. This code may not be what a generator would have produced and it may make me groan when I come back to it, but it works.
Anyway, the current set of tests pass:
1, 16, 64000 (path == /db) 2, 16, 64000 !(path == /db) 3, 16, 64000 path == /db && ext == jpg 4, 16, 64000 path == /db && ext == jpg || ext == jpeg 5, 16, 64000 (path == /db && ext == jpg) || ext == jpeg 6, 16, 64000 path == /db && (ext == jpg || ext == jpeg) 7, 16, 64000 path == /db 8, 16, 64000 (path == /db && ext == jpg) || (path != /db && ext == jpeg) 9, 16, 32000 (path == /db && ext == jpg) 10, 16, 32000 !(path == /db && ext == jpg) % cat tests/parens.txt 1, 16, 64k, (path == /db) 2, 16, 64k, !(path == /db) 3, 16, 64k, path == /db && ext == jpg 4, 16, 64k, path == /db && ext == jpg || ext == jpeg 5, 16, 64k, (path == /db && ext == jpg) || ext == jpeg 6, 16, 64k, path == /db && (ext == jpg || ext == jpeg) 7, 16, 64k, path == /db 8, 16, 64k, (path == /db && ext == jpg) || (path != /db && ext == jpeg) 9, 16, 32k, (path == /db && ext == jpg) 10, 16, 32k, !(path == /db && ext == jpg)
I need to go back and strip out all of the debug printfs, fix the storage of data in the get_rhs state, finish off the command line processing, etc. All of which needs to be done before I write the evaluation code. And that will be the real test case. I also need to add a new command line option to just verify the input. I've basically done that with the attribute-expression parsing, but I'd like a mode which does not load into the "global" set of policies. Note, I am thinking about the final product here.
Also, the code assumes that all id's are unique. If I were to have multiple ids in my rules-files, they would just load. BTW: I need to start calling it my policies-file. I think what I want to do in the final product is treat the loading of a policies-file to be atomic. To that end, they would load into a local list (which now has to become a parameter) and a warning would be spit out if there were duplicate ids. Only when the policies are all loaded correctly would the local copy be merged with the global one. And at that point, there would be no warnings about duplicates.
The latest code is at: speadm.c and speadm.h.
Turns out test case 4 is also horked, so let us work on it first:
4, 16, 64000 path == /db && ext == jpg && ext == jpeg
versus the expected
4, 16, 64000 path == /db && ext == jpg && ext == jpeg
5 and 6 also have this problem... Pretty easy to find once we noticed it:
case ('|') :
if (*expression++ != '|') {
...
}
sop = SPE_OP_AND;
break;
case ('&') :
if (*expression++ != '&') {
...
}
sop = SPE_OP_AND;
break;
...
/*
* Both children are interior nodes...!
*/
si_new->si_op = sop;
Cutting and pasting when in a hurry or tired is going to lead to bugs. Fixed and working.
Back to 5 and 6:
5, 16, 64000 (path == /db) && ext == jpg || ext == jpeg 6, 16, 64000 path == /db && (ext == jpg) || ext == jpeg
I did the expressions out on paper and I think the basic mechanism may be correct here. It might be that the recording of the parentheses is the problem. I.e., they look to be in the correct area, but on the wrong interior node. Hard to explain what I mean, let me see if I can find the bug and illustrate what I suspect.
First I'll add two new test cases:
8, 16, 64k, (path == /db && ext == jpg) || (path != /db && ext == jpeg) 9, 16, 32k, (path == /db && ext == jpg)
And look at the results:
8, 16, 64000 (path == /db) && ext == jpg || (path != /db) && ext == jpeg 9, 16, 32000 (path == /db) && ext == jpg
So my parentheses handling only works for the simple case of !(path == /db). It does not work for compound operators at all. As a matter of fact, a new test case suggests itself:
10, 16, 32k, !(path == /db && ext == jpg)
And we get:
10, 16, 32000 !(path == /db) && ext == jpg
I think what we should be doing is that when we encounter a '(' we recursively parse the subexpression and when that comes back, we throw the parentheses over the root of the subtree. And that reminds me, we are only handling match detecting for when we hit '!'.
We can test this claim pretty quickly as the '!' almost does this exact algorithm. A quick test shows with this:
if (!(si->si_branches[0].st_node =
spe_parse_attribute_expression(si->si_branches[0].st_node, sub_expr,
pf, piLine, prc))) {
goto cleanup;
}
{
spe_interior_t *sip;
if (si->si_branches[0].st_is_interior == FALSE) {
...
}
sip = (spe_interior_t *)si->si_branches[0].st_node;
sip->si_parens = TRUE;
}
bears fruit:
2, 16, 64000 !(path == /db) 10, 16, 32000 !(path == /db && ext == jpg)
What I'm trying to get at is that I believe the parse trees are correct, but it is just the annotation on them to produce parentheses which is incorrect. I.e., they would evaluate file creations correctly. And this is a very dangerous claim to make - it may simply be that my test cases support it and new test cases would tear it apart. I'm not going to investigate this angle, I have other fish to fry.
And while this latest change appears to work, what I really need to do is push the parentheses handling out of the '!' case and make it a first class citizen.
The big difference here is that we need to consume the parentheses first when getting the sub-expression:
t = expression;
iLen = strlen(t) + 1;
b = FALSE;
iParens = 1;
for (i = 0; i < iLen - 1; i++) {
If we don't consume it, then we will never have a base case to halt recursion. The new code, works, kinda. I actually was predicting the failure to myself.
spe_parse_attribute_expression: EOS when parsing operator for Line 10 in 1, 16, 64000 (path == /db) 2, 16, 64000 !(path == /db) 3, 16, 64000 path == /db && ext == jpg 4, 16, 64000 path == /db && ext == jpg || ext == jpeg 5, 16, 64000 (path == /db && ext == jpg) 6, 16, 64000 path == /db && (ext == jpg || ext == jpeg) 7, 16, 64000 path == /db 8, 16, 64000 (path == /db && ext == jpg) 9, 16, 32000 (path == /db && ext == jpg) 10, 16, 32000 !(path == /db && ext == jpg)
Note the warning I've included. The problem is the the code was structured to handle LHS OP RHS, where the allowed ops were '==' and '!='. This worked when we had the latest parentheses bug. You can see it fail in policies 5 and 8 where the op is '||'.
We need to allow the ops '||' and '&&' to now appear in the OP slot. We also need to allow a '(' to appear in the RHS slot.
I'm trying to decide if the code is about to become too kludgey or not. I think I put too much emphasis in trying to parse it without recursion. You might have noticed my hesitation about some things earlier (like the labels).
I think the best thing is to continue down the path of finishing the prototype. Then I can decide if I need to rewrite it for the final product.
I'm going to cut this entry off here - this is a big enough of an issue. The final code is at: speadm.c and speadm.h.
I quickly wrote a recursive print routine for an attribute-expression and also stuffed all RHS data into the string data type. Here is the resulting code:
char *spe_ops_list[] = {
"&&",
"||",
"!",
"==",
"!=",
NULL
};
...
void
spe_print_leaf (spe_leaf_t *sl)
{
if (sl->sl_is_attr == TRUE) {
} else {
switch (sl->sl_type) {
case (SPE_DATA_STRING) :
fprintf(stdout, "%s", sl->sl_data.sz);
break;
...
default :
fprintf(stderr,
"spe_print_leaf: Unknown type %d\n", sl->sl_type);
break;
}
}
}
void
spe_print_thunk (spe_thunk_t st)
{
if (st.st_is_interior == TRUE) {
spe_print_attribute((spe_interior_t *)st.st_node);
} else {
spe_print_leaf((spe_leaf_t *)st.st_node);
}
}
void
spe_print_attribute (spe_interior_t *si)
{
int i;
if (!si) {
return;
}
if (si->si_children == 1) {
fprintf(stdout, " %s", spe_ops_list[si->si_op]);
spe_print_thunk(si->si_branches[0]);
} else {
if (si->si_parens) {
fprintf(stdout, "(");
}
spe_print_thunk(si->si_branches[0]);
fprintf(stdout, " %s ", spe_ops_list[si->si_op]);
spe_print_thunk(si->si_branches[1]);
if (si->si_parens) {
fprintf(stdout, ")");
}
}
}
void
spe_print_policy (spe_policy_t *spe)
{
if (!spe) {
return;
}
fprintf(stdout, "%u, %u, %u ", spe->sp_id, spe->sp_stripe_count,
spe->sp_interlace);
spe_print_attribute(spe->sp_attr_expr);
fprintf(stdout, "\n");
}
void
spe_print_policy_list (void)
{
spe_policy_t *spe;
for (spe = spe_policy_list; spe; spe = spe->next) {
spe_print_policy(spe);
}
}
/*
* When we parse an attribute-expression...
*/
spe_interior_t *
spe_parse_attribute_expression (spe_interior_t *si, char *expression, FILE *pf,
int *piLine, int *prc)
{
...
/*
* The sub_expr is basically the tested.
*/
printf("RHS is %s\n", sub_expr);
/*
* XXX: Hack to get to debugging!
*/
sl = (spe_leaf_t *)si->si_branches[0].st_node;
sl->sl_is_attr = FALSE;
sl->sl_type = SPE_DATA_STRING;
sl->sl_data.sz = sub_expr;
if (sub_expr) {
#if 0
free(sub_expr);
#endif
sub_expr = NULL;
}
}
And here is some output:
% cat !$ cat tests/parens.txt 1, 16, 64k, (path == /db) 2, 16, 64k, !(path == /db) 3, 16, 64k, path == /db && ext == jpg 4, 16, 64k, path == /db && ext == jpg || ext == jpeg 5, 16, 64k, (path == /db && ext == jpg) || ext == jpeg 6, 16, 64k, path == /db && (ext == jpg || ext == jpeg) % ./a.out -r tests/parens.txt 1, 16, 64000 (/db == (null)) 2, 16, 64000 !(/db == (null)) 3, 16, 64000 jpg == (null) 4, 16, 64000 jpeg == (null) 5, 16, 64000 jpeg == (null) 6, 16, 64000 jpeg == (null)
I've already spotted a bug in spe_print_leaf(), a quick fix of:
void
spe_print_leaf (spe_leaf_t *sl)
{
int i;
if (!sl) {
return;
}
if (sl->sl_is_attr == TRUE) {
for (i = 0; i < attribute_count; i++) {
if (spe_attribute_table[i].at_attr == sl->sl_attr) {
fprintf(stdout, "%s", spe_attribute_table[i].at_name);
}
}
} else {
Hmm, sl_attr could come out of the table, the index should suffice. But that is for later. What results do we get?
1, 16, 64000 (/db == (null)) 2, 16, 64000 !(/db == (null)) 3, 16, 64000 jpg == (null) 4, 16, 64000 jpeg == (null) 5, 16, 64000 jpeg == (null) 6, 16, 64000 jpeg == (null)
I'm actually quite happy with that, I didn't think that should have been the issue. Okay, what do we know? Well, first off, the attribute expression looks backwards. Hmm, the parentheses in 1 and the negation in 2 appear correct.
A little exploration in gdb shows that the first node is storing the path:
(gdb) print si
$1 = (spe_interior_t *) 0x100160
(gdb) print *si
$2 = {
si_op = SPE_OP_EQUAL,
si_parens = 1 '\001',
si_children = 2,
si_branches = 0x100170
}
(gdb) print si->si_branches[0]
$3 = {
st_is_interior = 0 '\0',
st_node = 0x100180
}
(gdb) print (spe_leaf_t *)si->si_branches[0].st_node
$4 = (spe_leaf_t *) 0x100180
(gdb) print *(spe_leaf_t *)si->si_branches[0].st_node
$5 = {
sl_is_attr = 0 '\0',
sl_attr = SPE_ATTR_PATH,
sl_type = SPE_DATA_STRING,
sl_data = {
uid = 1049088,
gid = 1049088,
i = 1049088,
sz = 0x100200 "/db",
net = {
sn_netname = 0x100200 "/db",
sn_network = 0,
sn_netmask = 0
}
}
}
Okay, the bug is in the quick hack to get printing to work, wrong index here:
/* * XXX: Hack to get to debugging! */ sl = (spe_leaf_t *)si->si_branches[0].st_node;
Fix it and:
1, 16, 64000 (path == /db) 2, 16, 64000 !(path == /db) 3, 16, 64000 ext == jpg 4, 16, 64000 ext == jpeg 5, 16, 64000 ext == jpeg 6, 16, 64000 ext == jpeg 7, 16, 64000 path == /db
Great, I added a new simple test 7 and it with the first 2 tests are working correctly. As thought though, the more complex parentheses tests are failing. Just not the way I expected. Sheesh! By the way, I'm leaking memory like crazy as well.
Walking through the code, I think the bug is here, where I only allocate 1 branch and not 2:
/*
* Both children are interior nodes...!
*/
si_new->si_op = sop;
si_new->si_children = 2;
si_new->si_branches = (spe_thunk_t *)calloc(1, sizeof(spe_thunk_t));
Crap, my fix was this:
/*
* Both children are interior nodes...!
*/
si_new->si_op = sop;
si_new->si_children = 2;
si_new->si_branches = (spe_thunk_t *)calloc(si_new->si_children, sizeof(spe_thunk_t));
And in applying it everywhere I did the calloc, I found where I must have cut and paste the bug from:
/*
* We do not yet know what the operator is...
*/
si->si_children = 2;
si->si_branches = (spe_thunk_t *)calloc(1, sizeof(spe_thunk_t));
While nasty, those are not the bug which is killing me. I think it is this code:
/*
* The first child is the previous root of the parse tree.
*/
si_new->si_branches[0].st_is_interior = TRUE;
si_new->si_branches[0].st_node = si;
/*
* This is temporary until we get the second child allocated.
* This is done to allow proper cleanup in the case of a
* memory allocation when getting the second child.
*/
si = si_new;
si_new->si_branches[1].st_is_interior = TRUE;
si_new->si_branches[1].st_node = (spe_interior_t *)calloc(1,
sizeof(spe_interior_t));
if (!si_new->si_branches[1].st_node) {
fprintf(stderr, "spe_parse_attribute_expression:"
" Not enough memory\n");
*prc = ENOMEM;
goto cleanup;
}
/*
* Now we get the correct subtree to fill out.
*/
si = (spe_interior_t *)si_new->si_branches[1].st_node;
/*
* Finally, we realize that we have more input to process.
*/
szError = expression;
goto get_lhs;
The problem is that what I want to be returning (si_new) and what I want to be working on (si) are not the same. This may be where not using recursion is really going to bite me! Hmm, is it as simple as making a recursive call right here?
Yes, that is a good thing:
/*
* The first child is the previous root of the parse tree.
*/
si_new->si_branches[0].st_is_interior = TRUE;
si_new->si_branches[0].st_node = si;
/*
* We hoist the new node into place.
*/
si = si_new;
si_new->si_branches[1].st_is_interior = TRUE;
si_new->si_branches[1].st_node = (spe_interior_t *)calloc(1,
sizeof(spe_interior_t));
if (!si_new->si_branches[1].st_node) {
fprintf(stderr, "spe_parse_attribute_expression:"
" Not enough memory\n");
*prc = ENOMEM;
goto cleanup;
}
/*
* Handle the rest by recursion.
*/
if (!(si_new->si_branches[1].st_node =
spe_parse_attribute_expression(si_new->si_branches[1].st_node, expression,
pf, piLine, prc))) {
goto cleanup;
}
/*
* XXX: Do we want to check to make sure we've exhausted input? And how?
*/
cleanup:
Which yields:
1, 16, 64000 (path == /db) 2, 16, 64000 !(path == /db) 3, 16, 64000 path == /db && ext == jpg 4, 16, 64000 path == /db && ext == jpg && ext == jpeg 5, 16, 64000 (path == /db) && ext == jpg && ext == jpeg 6, 16, 64000 path == /db && (ext == jpg) && ext == jpeg 7, 16, 64000 path == /db
Only 5 and 6 are horked. But those are the ones I suspected from the start would be that way.
And I'm okay with that for right now. I'm getting tired and I almost zapped my source while making a backup! Here is the current speadm.c and speadm.h.
Okay, I had to grind out some code and do a bunch of unit testing. I couldn't chart my journey, I had to really focus on what was going on. If I wrote about what I was doing as I did it, it would have tripled the development time, and I'm already on a tight self-imposed deadline.
Anyway, I have the attribute-expression parsing "done". By that I mean everything is mostly loaded in memory (except the data from the RHS), but I don't have a clue as to whether it is correctly loaded in memory. I have a sneaky suspicion that if I had the policy:
6, 16, 64k, path == /db && (ext == jpg || ext == jpeg)
Then if printed, it would be incorrect. I.e., I suspect my parentheses handling is very bogus.
It is getting to the point where my deltas between posts is getting huge. The raw source is at speadm.c and speadm.h. I'm going to version these so that you get to see changes. Note that the CDDL has been applied. :->
What I do want to do is go over the ~400 lines of parsing code that I have and annotate parts of it. I'm not happy with some things and others have tripped me up.
The first thing which I tried to do was have a simple while loop processing the input string. That wasn't going to work - I need to be parsing out LHS OP RHS and repeating when I find a compound. I know you are not supposed to use goto with a target before the goto, but this seemed the natural way to express that I was processing a token. These labels form a state table and you will see me jumping backwards at least once from the end of the RHS to the start of an OP.
/*
* When processing the LHS, we know that the only recursion we have
* to face is when we encounter a '!'. Otherwise, it has to be the case
* that we are eventually consuming an attribute followed by an
* operation.
*/
get_lhs:
switch (*expression++) {
case (' ') :
case ('\t') :
goto get_lhs;
/*
* Whoa, we did not expect this, did we?
*/
case ('\0') :
...
goto cleanup;
And I'll look at the ! handling again, you can contrast it against the earlier post on Parsing the ! operator:
case ('(') :
si->si_parens = TRUE;
goto get_lhs;
case ('!') :
/*
* Scan ahead and make sure the next token is '('.
*/
t = expression;
iLen = strlen(t) + 1;
b = FALSE;
We need to both make sure that we find the start token for the parentheses:
for (i = 0; i < iLen - 1; i++) {
if (t[i] == ' ' || t[i] == '\t') {
continue;
} else if (t[i] == '(') {
b = TRUE;
break;
}
break;
}
But we have to scan ahead to make sure that we find a balanced ending token. I tripped up here by starting on the '(' and counting it twice.
/*
* Okay, now we need to scan ahead and find the end ')'.
* Note that we must be sitting on a '(' from above, so we
* will need to advance past it.
*/
b = FALSE;
iParens = 0;
for (; i < iLen - 1; i++) {
if (t[i] == '(') {
iParens++;
} else if (t[i] == ')') {
iParens--;
if (iParens == 0) {
b = TRUE;
sub_expr = (char *)calloc(i+i, 1);
if (!sub_expr) {
...
}
strncpy(sub_expr, expression, i+1);
expression = &t[i+1];
break;
}
}
}
I know this part is how I want it. I send in the new expression to parse and attach it below the operator for the !.
if (!(si->si_branches[0].st_node =
spe_parse_attribute_expression(si->si_branches[0].st_node, sub_expr,
pf, piLine, prc))) {
goto cleanup;
}
And we have a case to end the recursion. I think I caught this one before I hit it.
/*
* Note that we can be done with this attribute-expression
* if it were "!(path == /db)".
*/
if (*expression == '\0') {
/*
* We could not have changed the parent interior node,
* so leave it alone.
*/
goto cleanup;
}
The next thing I want to cover is the handling for what happens if you process an attribute-expression and discover you have more:
3, 16, 64k, path == /db && ext == jpg
I.e., after we handle /db, we are done with the RHS of an attribute-expression. But we know that if we have anymore text, then we must already be in the middle of another attribute-expression:
get_compound:
/*
* We are done processing the RHS. That means if there is anything
* left, then we are part of a compound statement. The first thing
* in the remaining expression better be either && or ||.
*/
switch (*expression++) {
case (' ') :
case ('\t') :
goto get_compound;
Note: I invariably caused a cut and paste bug with the above, it would go to get_lhs and the input would be really screwed up. I don't like this use of labels, but I find it very elegant and I like avoiding as many function calls here as possible.
case ('\0') :
/*
* Not an error, done with input.
*/
goto cleanup;
case ('|') :
if (*expression++ != '|') {
...
}
sop = SPE_OP_AND;
break;
case ('&') :
if (*expression++ != '&') {
...
}
sop = SPE_OP_AND;
break;
default :
...
goto cleanup;
}
Now that we have verified that we have a valid compound operator, we need to glue it into the tree.
Another huge difference from earlier postings is that you will start to see comments appearing in the code. This isn't just for reviewers, I need it to understand what my intent was. I also use it as times as a prod for come back here later and see if a change is needed.
/*
* Now we know we have a valid operation. So we need to
* allocate a new internal node and place it above the
* existing one.
*/
si_new = (spe_interior_t *)calloc(1, sizeof(*si));
if (!si_new) {
...
}
/*
* Both children are interior nodes...!
*/
si_new->si_op = sop;
si_new->si_children = 2;
si_new->si_branches = (spe_thunk_t *)calloc(1, sizeof(spe_thunk_t));
if (!si->si_branches) {
...
}
/*
* The first child is the previous root of the parse tree.
*/
si_new->si_branches[0].st_is_interior = TRUE;
si_new->si_branches[0].st_node = si;
/*
* This is temporary until we get the second child allocated.
* This is done to allow proper cleanup in the case of a
* memory allocation when getting the second child.
*/
si = si_new;
si_new->si_branches[1].st_is_interior = TRUE;
si_new->si_branches[1].st_node = (spe_interior_t *)calloc(1,
sizeof(spe_interior_t));
if (!si_new->si_branches[1].st_node) {
...
}
/*
* Now we get the correct subtree to fill out.
*/
si = (spe_interior_t *)si_new->si_branches[1].st_node;
And here is where I use the labels as a state transition:
/*
* Finally, we realize that we have more input to process.
*/
szError = expression;
goto get_lhs;
To recap, I'm at the point where I need to add the RHS data into the parse trees, which means all of the validity checking. Hmm, I could put it in as all strings for right now. :->
I also have to write a real routine to print out my parse trees. It is only by comparing the input versus the output that I will see if I am constructing the trees correctly.
So I've got the untested code in place for parsing a '!' out of an attribute-expression. It is the simplest attribuite-expression to handle. The key to it is to realize that it recursively fills in the child.
typedef enum {
SPE_LHS,
SPE_OP,
SPE_RHS
} spe_state_t;
spe_interior_t *
spe_parse_attribute_expression(spe_interior_t *si, char *expression, FILE *pf,
int *piLine, int *prc)
{
char *p, *t;
char *w;
char *lasts;
char *junk;
spe_state_t ss = SPE_LHS;
if (!si) {
fprintf(stderr,
"spe_parse_attribute_expression: No interior node passed in\n");
*prc = ENOENT;
goto cleanup;
}
while (expression) {
w = " ";
t = strtok_r(p, w, &lasts);
if (!t) {
fprintf(stderr, "spe_parse_attribute_expression: Line %d"
" - No token for %s\n", *piLine, expression);
*prc = EINVAL;
goto cleanup;
}
/*
* What do we have here?
*/
switch (*t) {
/*
* Push it?
*/
case ('(') :
si->si_parens = TRUE;
break;
/*
* Next thing better be a '('!
*/
case ('!') :
At the point we realize that we have the negation operator, we can look at our test cases (for this one, look at the second ascii diagram in Finally, parsing policy attribute-expressions) to see we need to fill in the interior node:
It was only in annotating the code that I realized that my comment is indeed true and we still need to scan ahead to make sure the next token is a '('. Note that when we recursively call ourselves below, we would allow !path == /db to become valid. I read this as !path, which does not make sense. I want to force the '('...
/* * Fill in the interior node and create the child. */ si->si_op = SPE_OP_NOT; si->si_children = 1; si->si_branches = (spe_thunk_t *)calloc(1, sizeof(spe_thunk_t));
And then we need to create a thunk with 1 child pointer:
si->si_branches[0].st_is_interior = TRUE;
si->si_branches[0].st_node = (spe_interior_t *)calloc(1,
sizeof(spe_interior_t));
if (!si->si_branches[0].st_node) {
fprintf(stderr, "spe_parse_attribute_expression:"
" Not enough memory\n");
*prc = ENOMEM;
goto cleanup;
}
And finally we need to parse out what we are negating. Note the while we pass in an interior node to spe_parse_attribute_expression as si-->si_branches[0].st_node, there is no guarantee that interior node will be the one at the root of this branch of the parse tree when we return. This will clear up as we add more of the code.
if (!(si->si_branches[0].st_node =
spe_parse_attribute_expression(si->si_branches[0].st_node, t,
pf, piLine, prc))) {
goto cleanup;
}
break;
/*
* Start of a function name.
*/
default :
break;
}
}
cleanup:
return (si);
}
int
spe_load_policies_from_file (char *szFile)
{
...
t = strtok_r(NULL, w, &lasts);
if (!t) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - No attribute-expression for %s\n",
iLine, szLine);
rc = EINVAL;
goto cleanup;
}
spe->sp_attr_expr = (spe_interior_t *)calloc(1, sizeof(*si));
if (!spe->sp_attr_expr) {
fprintf(stderr, "spe_load_policies_from_file:"
" Not enough memory\n");
rc = ENOMEM;
goto cleanup;
}
if (!(spe->sp_attr_expr = spe_parse_attribute_expression(spe->sp_attr_expr,
t, pf, &iLine, &rc))) {
goto cleanup;
}
Again, note what we pass in as the root may not be what returns as the root. Note: we better not lose anything down there. Also, we could make this clear by having a local pointer for the calloc() and have the root start out as NULL. I may end up doing that if it makes sense for cleaning things up.
Another note is that you may see the format of the code changing slightly. I am adding more complex code and solving style issues by using the Sun standard. I may not agree with that standard in all places, but it is a prerequisite before I can check the code in. I might as well start getting it is shape cstyle-wise.
Back to the need to read ahead to make sure the next token is a '(', this change will do:
case ('!') : {
/*
* Scan ahead and make sure the next token is '('.
*/
int i;
int iLen = strlen(lasts);
boolean b = FALSE;
for (i = 0; i < iLen - 1; i++) {
if (lasts[i] == ' ' || lasts[i] == '\t') {
continue;
} else if (lasts[i] == '(') {
b = TRUE;
break;
}
break;
}
if (b == FALSE) {
fprintf(stderr, "spe_parse_attribute_expression: Line %d"
" - '!' not followed by a '(' %s\n", *piLine, expression);
*prc = EINVAL;
goto cleanup;
}
But I think it points out a much more subtle bug, one which was going to cause me much grief. In this chunk:
while (expression) {
w = " ";
t = strtok_r(p, w, &lasts);
p is uninitialized - I just copied this code over from the calling function. A quick fix is:
t = strtok_r(expression, w, &lasts);
And this leads to a bug down here:
if (!(si->si_branches[0].st_node =
spe_parse_attribute_expression(si->si_branches[0].st_node, t,
pf, piLine, prc))) {
goto cleanup;
}
We need to advance past the '!' character in t. But even more, we need to ask are what should we be sending into the recursion here? t or lasts? I'm not worried about the first call, the one in spe_load_policies_from_file(), I know it is right since we exhausted input. A quick test with a printf will show the answer.
case ('!') :
fprintf(stdout, "t = |%s|, lasts = |%s|\n", t, lasts);
return (si);
And the test case of:
% cat tests/sanity.txt 1, 16, 64k, !path == /db % ./a.out -r tests/sanity.txt t = |!path|, lasts = |== /db| 1, 16, 64000
Note that I'll reuse this test case in a moment to test my look ahead. But for now, we see what we are grabbing. Hmm, let's change the code and do the look ahead test:
% ./a.out -r tests/sanity.txt
t = |!path|, lasts = |== /db|
spe_parse_attribute_expression: Line 1 - '!' not followed by a '(' !path
1, 16, 64000
% ./a.out -r tests/jjj.txt
t = |!(path|, lasts = |== /db)|
spe_parse_attribute_expression: Line 1 - '!' not followed by a '(' !(path
1, 16, 64000
Where jjj.txt has the correct format. Here is another bug, the check should be based on t and not lasts.
int iLen = strlen(t);
boolean b = FALSE;
for (i = 0; i < iLen - 1; i++) {
if (t[i] == ' ' || t[i] == '\t') {
continue;
} else if (t[i] == '(') {
b = TRUE;
break;
}
break;
}
And that still gives the error:
./a.out -r tests/jjj.txt t = |!(path|, lasts = |== /db)| 1, 16, 64000
Yes, I did recompile. It turns out that the for loop needs to start at 1 and not 0. Because t[0] is '!', right? Recompile and run a new test:
% cat tests/jjj.txt
1, 16, 64k, !(path == /db)
2, 16, 64k, ! (path == /db)
% ./a.out -r tests/jjj.txt
t = |!(path|, lasts = |== /db)|
t = |!|, lasts = |(path == /db)|
spe_parse_attribute_expression: Line 2 - '!' not followed by a '(' !
1, 16, 64000
2, 16, 64000
Instead of flailing about, we need to step back and realize that while using strtok_r(3C) is no longer appropriate. If we were to continue down this path, we would have to be fixing EOSes, etc. We should step back and do manual pointer manipulation here.
What I want to impress again is to unit test as early as possible and come up with as many unit tests as possible. Make new test cases in your tests directory and you'll have them later when you want to automate regression testing. Do not count on either yourself of someone else coming back later - do the tests while they are fresh in your mind.
And do not be afraid to rip code out if it is getting in your way. I could continue down this path, but I've already shown it to be a pain in the rear and I can see more issues.
Okay, we know that attribute-expressions are actually quite simple to enter in a file and that they are comprised of space separated tokens. For right now, we do not support '\' in the file to split them across lines, so that makes it pretty simple as well.
Before we start, we need to look at some variants of a simple test case:
1) path == /db 2) (path == /db) 3) ( path == /db ) 4) !(path == /db) 5) ! (path == /db) 6) !path == /db 7) path != /db 8) path == /db && uid == 500
Note that test case 6 is an error condition. And test case 8 shows us the first complex case. In looking at these, it is evident that we need to examine 3 tokens to get an attribute-extension into memory. A '(' or a '!' will signal we will need to look at more tokens.
But wait, we also need to consider expressions of the form:
9) is_subnet_of(192.168.2.0)
It is a valid attribute-expression in its own right. And right now we should consider changing the syntax to be:
subnet == network subnet != network
I.e., get rid of the extra length in the function name and also overload the meaning of '==' and '!='. The typing is still strong in this case, we know exactly what is on the RHS. If we do this, we make the parsing easier, but we make flexibility later be an issue. Or do we? Do these both make sense?
$network = 192.178.5.17/27 $ip = 192.178.5.18 is_subnet_of(ip, $network) versus subnet(ip) == $network is_subnet_of($ip, $network) versus subnet($ip) == $network
Not even sure if it makes sense to ask this type of question in a policy. But I think that the new form will follow what the other functions will have to do. Time to make a change to make parsing consistent.
When working with new data structures, it is a good idea to draw some examples out by hand. In some cases you might make a box with each field getting a slot. But you can abstract it quite a bit if needed. Following are some ascii art examples that I had just doodled out on the back of some scratch paper. I might be missing one big example, but the following provide enough information to verify (and also debug) the code.
The first example:
policy -+
|
| leaf
interior +--------+
+----------+ thunk +--->| 1 |
| == | +--------+ | +--------+
+----------+ | +---+ | | + path +
| 2 | ---> | | 0 |-----+ +--------+
+----------+ | +---+ | +--------+
+--------+ +--------+
| +---+ |
| | 0 |-----+
| +---+ | | leaf
+--------+ | +--------+
+--->| 0 |
+--------+
+--------+
+ string +
+--------+
+ /db +
+--------+
Note that this can represent either attribute-expression:
path == /db (path == /db)
I.e., since we lose the parenthesis information, we can not map this back to the original statement.
The next example:
policy -+
|
|
interior
+----------+ thunk
| ! | +--------+
+----------+ | +---+ |
| 1 | --->| | 1 |-----+
+----------+ | +---+ | |
+--------+ |
/
/
+
|
| leaf
interior +--------+
+----------+ thunk +--->| 1 |
| == | +--------+ | +--------+
+----------+ | +---+ | | + path +
| 2 | ---> | | 0 |-----+ +--------+
+----------+ | +---+ | +--------+
+--------+ +--------+
| +---+ |
| | 0 |-----+
| +---+ | | leaf
+--------+ | +--------+
+--->| 0 |
+--------+
+--------+
+ string +
+--------+
+ /db +
+--------+
Note that this can only represent:
!(path == /db)
Which in turn means that '!' implicitly causes parenthesis to be output.
Okay, there could also be any number of parenthesis arround it.
The next example ramps it up a bit:
leaf
interior +--------+
+----------+ thunk +--->| 1 |
| == | +--------+ | +--------+
+----------+ | +---+ | | + path +
| 2 | ---> | | 0 |-----+ +--------+
+----------+ | +---+ | +--------+
^ +--------+ +--------+
| | +---+ |
| | | 0 |-----+
| | +---+ | | leaf
| +--------+ | +--------+
| +--->| 0 |
| +--------+
| +--------+
| + string +
| +--------+
| + /db +
| +--------+
policy -+ thunk |
| +--------+ |
| | +---+ | |
interior | | 1 |-----+
+----------+ | +---+ |
| && | +--------+
+----------+ | +---+ |
| 2 | --->| | 1 |-----+
+----------+ | +---+ | |
+--------+ |
/
/
+
|
| leaf
interior +--------+
+----------+ thunk +--->| 1 |
| == | +--------+ | +--------+
+----------+ | +---+ | | + uid +
| 2 | ---> | | 0 |-----+ +--------+
+----------+ | +---+ | +--------+
+--------+ +--------+
| +---+ |
| | 0 |-----+
| +---+ | | leaf
+--------+ | +--------+
+--->| 0 |
+--------+
+--------+
+ uid +
+--------+
+ 500 |
+--------+
And it can represent any of the following:
path == /db && uid == 500 (path == /db) && uid == 500 path == /db && (uid == 500) (path == /db && uid == 500) (path == /db) && (uid == 500)
The point is that we typically do not go back from a parse tree to the original input. It matters mainly because some human spent some time carefully crafting the policies.
We can fix this by embedding in each spe_interior_t whether or not to output a parenthesis when printing.
I meant to get to code, but all of this groundwork has chewed up this entry. I'm going to fork a new one and use this for reference.
This one is going to be more of a dump than an explanation. I wanted to start reading in the policies from the rules-file. I've now got everything in place except for the parsing of the attribiute-expressions. I did it in two passes - I first got my line parsing down pat and then I did the insert into the global policy list: spe_policy_list.
To get my line parsing down, I used extensive fprintf()s until I was satisfied with the result. I avoided the parsing of the attribute-expressions because I really wanted to focus on seeing if the data structures did flow at a high level.
One thing that came out in the unit testing was that I would like a switch to say just verify the rules. I.e., do not stop on the first error. Which is another design question, if we have a file with N policies, and the Kth one was an error, do we treat the entire loading as a transaction or do we just kick the one policy out?
If we treat it as an atomic transaction, then it will be pretty obvious that nothing was loaded. If we only do the ones in error, it can be hard for an admin to discover some policies were not loaded. Also, you can think of policy K+1 as being (if !K && K+1). So without K being loaded, the admin might see behavior not expected.
The problem with it being atomic is that you basically have to load it all into a temporary policy list and then merge it with the global one at the very end. Not too hard to do and perhaps the best approach in any case. Consider the other extreme, you may be replacing policies in the global list and while processing the next one, a request comes in and gets inconsistent results.
Anyway, here is the code fragments that are of interest.
void
spe_insert_policy_list (spe_policy_t *spe)
{
spe_policy_t *s, *p = NULL;
if (!spe) {
return;
}
for (s = spe_policy_list; s; p = s, s = s->next) {
if (s->sp_id > spe->sp_id) {
spe->next = s;
if (p) {
p->next = spe;
} else {
spe_policy_list = spe;
}
return;
}
}
if (p) {
p->next = spe;
} else {
spe_policy_list = spe;
}
}
void
spe_print_policy (spe_policy_t *spe)
{
fprintf(stdout, "%u, %u, %u\n", spe->sp_id,
spe->sp_stripe_count, spe->sp_interlace);
}
void
spe_print_policy_list (void)
{
spe_policy_t *spe;
for (spe = spe_policy_list; spe; spe = spe->next) {
spe_print_policy(spe);
}
}
int
spe_load_policies_from_file (char *szFile)
{
FILE *pf = NULL;
int rc = 0;
char *p, *t;
char *w;
int iTemp;
char *lasts;
char *junk;
char *szLine = NULL;
int iLine = 1;
spe_policy_t *spe = NULL;
if (!szFile) {
fprintf(stderr, "spe_load_policies_from_file: No filename passed in\n");
rc = ENOENT;
goto cleanup;
}
szLine = (char *)malloc(MAXBUFSIZE+1);
if (szLine == NULL) {
fprintf(stderr, "spe_load_policies_from_file: Not enough memory\n");
rc = ENOMEM;
goto cleanup;
}
pf = fopen(szFile, "r");
if (!pf) {
fprintf(stderr, "spe_load_policies_from_file: File %s not present\n",
szFile);
rc = ENOENT;
goto cleanup;
}
/*
* Read a line and try to enter it.
* Do we quit if just one line is in error?
*
* Perhaps we should have a verify mode?
*/
while ((p = fgets(szLine, MAXBUFSIZE, pf))) {
szLine[strlen(szLine) - 1] = '\0';
spe = (spe_policy_t *)calloc(1, sizeof(*spe));
if (!spe) {
fprintf(stderr, "spe_load_policies_from_file: Not enough memory\n");
rc = ENOMEM;
goto cleanup;
}
w = ", ";
t = strtok_r(p, w, &lasts);
if (!t) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - No id for %s\n", iLine, szLine);
rc = EINVAL;
goto cleanup;
}
iTemp = strtol(t, &junk, 10);
if (junk && *junk != '\0') {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - id is non-numeric %s\n", iLine, t);
rc = EINVAL;
goto cleanup;
} else if (iTemp < 0) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - id is out of range %s\n", iLine, t);
rc = EINVAL;
goto cleanup;
}
spe->sp_id = iTemp;
t = strtok_r(NULL, w, &lasts);
if (!t) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - No stripe-count for %s\n", iLine, szLine);
rc = EINVAL;
goto cleanup;
}
iTemp = strtol(t, &junk, 10);
if (junk && *junk != '\0') {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - stripe-count is non-numeric %s\n", iLine, t);
rc = EINVAL;
goto cleanup;
} else if (iTemp < 0) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - stripe-count is out of range %s\n", iLine, t);
rc = EINVAL;
goto cleanup;
}
spe->sp_stripe_count = iTemp;
t = strtok_r(NULL, w, &lasts);
if (!t) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - No interlace for %s\n", iLine, szLine);
rc = EINVAL;
goto cleanup;
}
iTemp = strtol(t, &junk, 10);
if (junk && *junk != '\0') {
switch (*junk) {
/*
* Limit ourselves to believable sizes for now.
*/
case ('k') :
iTemp *= 1000;
break;
case ('m') :
iTemp *= 1000000;
break;
default :
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - interlace is non-numeric %s\n", iLine, t);
rc = EINVAL;
goto cleanup;
}
/*
* Handle stuff like kB...
*/
if (junk[1] != '\0') {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - interlace is non-numeric %s\n", iLine, t);
rc = EINVAL;
goto cleanup;
}
}
if (iTemp < 0) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - stripe-count is out of range %s\n", iLine, t);
rc = EINVAL;
goto cleanup;
}
spe->sp_interlace = iTemp;
w = "";
t = strtok_r(NULL, w, &lasts);
if (!t) {
fprintf(stderr, "spe_load_policies_from_file: Line %d"
" - No attribute-expression for %s\n", iLine, szLine);
rc = EINVAL;
goto cleanup;
}
/*
* Insert and forget.
*/
spe_insert_policy_list(spe);
spe = NULL;
iLine++;
}
cleanup:
if (szLine) {
free(szLine);
}
if (pf) {
fclose(pf);
}
/*
* We only have a valid pointer if it is not
* in the global list.
*/
if (spe) {
spe_free_attr_expr(spe->sp_attr_expr);
free(spe);
}
return (rc);
}
Note that I don't even try to store or print the attribute-expression. And here are some simple test cases and results:
% cat tests/simple.txt 1, 16, 64k, path == /db 2, 24, 32k, base == foo 3, 8, 64, ext == jpg || ext == jpeg 4, 25, 34k, uid == 1066 && ip == 192.168.2.108 5, 32, 16k, ip == 192.168.2.108 6, 40, 8k, network == 192.168.2.0/24 % ./a.out -r tests/simple.txt 1, 16, 64000 2, 24, 32000 3, 8, 64 4, 25, 34000 5, 32, 16000 6, 40, 8000 % cat tests/unordered.txt 2, 24, 32k, base == foo 1, 16, 64k, path == /db 6, 40, 8k, network == 192.168.2.0/24 4, 25, 34k, uid == 1066 && ip == 192.168.2.108 5, 32, 16k, ip == 192.168.2.108 3, 8, 64, ext == jpg || ext == jpeg % ./a.out -r tests/unordered.txt 1, 16, 64000 2, 24, 32000 3, 8, 64 4, 25, 34000 5, 32, 16000 6, 40, 8000
Note that the code is incomplete in that I don't convert the interlace back to the original units. Meh, I can do that later. The important thing is that I am doing list manipulation.
And my head gets too big. I decided to show off how I could delete a single policy by hard-coding the following:
spe_load_policies_from_file(rules);
spe_print_policy_list();
fprintf(stdout, "Delete rule 5\n");
spe_clear_policy_from_list(5);
spe_print_policy_list();
The first 4 lines were in the original. The result is not what I expected:
% ./a.out -r tests/unordered.txt 1, 16, 64000 2, 24, 32000 3, 8, 64 4, 25, 34000 5, 32, 16000 6, 40, 8000 Delete rule 5 6, 40, 8000
What's up? Whoops, I can figure it out by inpsection:
void
spe_clear_policy_from_list (uint_t id)
{
spe_policy_t *spe, *prev = NULL;
for (spe = spe_policy_list; spe != NULL; spe = spe->next) {
if (spe->sp_id == id) {
if (prev) {
prev->next = spe->next;
} else {
spe_policy_list = spe->next;
}
spe_free_attr_expr(spe->sp_attr_expr);
free(spe);
return;
}
}
}
I never advance prev if there is no match! A quick fix
for (spe = spe_policy_list; spe != NULL; prev = spe, spe = spe->next) {
and:
./a.out -r tests/unordered.txt 1, 16, 64000 2, 24, 32000 3, 8, 64 4, 25, 34000 5, 32, 16000 6, 40, 8000 Delete rule 5 1, 16, 64000 2, 24, 32000 3, 8, 64 4, 25, 34000 6, 40, 8000
The lesson is to unit test as you develop. I didn't have the framework earlier to do this type of testing. Now that I do, I need to test as much as possible. I don't show it here, but I have mucked with the rules file to test my parsing. I really need a verify flag to walk down a unit test file with a bunch of errors.
I also need to work on the attribute-expression parser next. I need to be able to unit test the cleanup code. I also need to be able to print the thing back out.
So mountd maps network addresses into unsigned integers. I want to map them back. My first pass at it fails:
% cat a.c
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
int
main (int argc, char *argv[])
{
char *p = argv[1];
uint_t addr = 0;
uint_t rev;
int i;
char sep = ' ';
for (i = 0; i < 4; i++) {
addr |= atoi(p) << ((3-i) * 8);
p = strchr(p, '.');
if (p == NULL)
break;
p++;
}
printf("%s yields %u and that reverse maps to ", argv[1], addr);
for (i = 0; i < 4; i++) {
rev = addr >> ((3-i) * 8);
printf("%c%u", sep, rev);
sep = '.';
}
printf("\n");
return (0);
}
% ./a.out 192.168.1.2
192.168.1.2 yields 3232235778 and that reverse maps to 192.49320.12625921.3232235776
We see the idea works in that it gets the first octet correct. If we were to mask each octet again, it would probably also work.
The next simple approach also fails:
printf("%u.%u.%u.%u\n", addr & 0xff000000, addr & 0x00ff0000,
addr & 0x0000ff00, addr & 0x000000ff);
as this shows
./a.out 192.168.1.256 192.168.1.256 yields 3232235776 and that reverse maps to 3221225472.11010048.256.0
This gets us the right number, but it keeps it too far out. We need to get the right number and then shift it back down. (Note my earlier comment about remasking in the first approach):
for (i = 0; i < 4; i++) {
shift = ((3-i) * 8);
rev = (addr & (0xff << shift)) >> shift;
printf("%c%u", sep, rev);
sep = '.';
}
We are close, oh so close:
./a.out 192.168.1.256 192.168.1.256 yields 3232235776 and that reverse maps to 192.168.1.0
Perhaps we are really there, look at the ip I gave, it is invalid. :-> What does valid input yield?
% ./a.out 192.168.1.255 192.168.1.255 yields 3232236031 and that reverse maps to 192.168.1.255 % ./a.out 192.168.1.2 192.168.1.2 yields 3232235778 and that reverse maps to 192.168.1.2 % ./a.out 192.168.1.3 192.168.1.3 yields 3232235779 and that reverse maps to 192.168.1.3 % ./a.out 192.168.1.4 192.168.1.4 yields 3232235780 and that reverse maps to 192.168.1.4 % ./a.out 192.168.1.0 192.168.1.0 yields 3232235776 and that reverse maps to 192.168.1.0
So, even when prototyping, you should be doing some validity checking. I might have spent a chunk of time trying to fix code that was really working. I grabbed that test case from my command line history. It was the first that looked correct. We can add some simple checking in the first loop:
for (i = 0; i < 4; i++) {
t = strtol(p, &junk, 10) << ((3-i) * 8);
if (junk && junk[0] != '.') {
fprintf(stderr, "%s is wrong in the %d octet,"
" %s has non-digits as %s\n", argv[1],
4-i, p, junk);
exit (-1);
} else if (rev > 255 || t < 0) {
fprintf(stderr, "%s is wrong in the %d octet,"
" %d is out of range\n", argv[1],
4-i, t);
exit (-1);
}
addr |= t;
Which yields this incorrect result:
% ./a.out 192.168.1.1 192.168.1.1 is wrong in the 1 octet, -1073741824 is out of range
Took me a while to realize I was shifting too early. This fix:
t = strtol(p, &junk, 10);
if (junk && junk[0] != '.' && junk[0] != '\0') {
...
addr |= t << ((3-i) * 8);
works:
% ./a.out 192.168.1.1 192.168.1.1 yields 3232235777 and that reverse maps to 192.168.1.1 % ./a.out 192.168.1aa.1 192.168.1aa.1 is wrong in the 3 octet, 1aa.1 has non-digits as |aa.1| % ./a.out 192.168.1.256 192.168.1.256 is wrong in the 4 octet, 256 is out of range
Note that I had to change the junk check because strtol(3C) will return a pointer to the EOS. Which I had never hit before in coding - it seems counterintuitive.
In editing this, I noticed that the first approach was actually very close to being correct and a simple change would have sufficed:
for (i = 0; i < 4; i++) {
rev = (addr >> ((3-i) * 8)) & 0xff;
printf("%c%u", sep, rev);
sep = '.';
}
This change makes use of the fact we know exactly which bits we want to mask off. It removes another shifting operation and would be preferred over the other final solution.
So I now have the ability to store the attribute-expressions as parse trees and convert those back to human readable rules. This will allow me to not store the string representation. As a bonus, I've also got most of the network error checking ready to add to the prototype.
I need to finalize the rules-file format and to do that, I want to stabilize the operators in the policy attribute expressions. I.e., I can simply say that the rules-file format is:
id-num, stripe-count, interlace, attribute-expression
And be done with it, but I really want to nail down the attribute-expression first. I've decided to ignore both the component() and regexp handling for now. Hence, this is not the final pass at the expressions. With that in mind, the operators are:
And the built in functions (and only ones) are:
Note that the last 3 are not really functions (since they do not work on input) but really are variants of attributes. I'm going to leave them here in case I ever expand to include variables.
In short, an attribute-expression is a logical expression built on top of the file attributes.
attribute = file | path | uid | user | guid | group | hour | day | weekday | ip
function = extension | base | fqdn | domain | host | is_subnet_of
binary-operator = && | ||
unary-operator = !
test = == | !=
expression = function | attribute test function | attribute
attribute-expression = attribute-expression binary-operator attribute-expression |
'(' attribute-expression ')' |
unary-operator attribute-expression |
expression
I think I captured the recursion correctly.
What am I glossing over here? A clue would be how do we identify machines? E.g., take is_subnet_of(ip, network), what is the format of network? I'm glossing over it here because I think my expression language will end up being more strongly typed than the access lists for shares. As an example, we can only enter a network name or address for is_subnet_of(). We can not enter a domain name prefix. If we want to test against ip, it would be something like ip == 192.168.2.129, whereas to test a name, it would be like host(ip) == fred. Note I want to avoid '"'s wherever possible.
I'm not sure that I am capturing that a big issue for me in this design has been how machine identification is verified, parsed, and handled uniformly. The interplay between the command line, the rules-file, and the mountd implementation has slowed everything down. The strong typing shown above (i.e., the type is known before compilation) really makes this problem look simple. I believe that is in retrospect. One of the functional requirements was to borrow heavily from mountd. It takes a lot of effort sometimes to discard a functional requirement. The playing with code and analysis of mountd code lets us do that in what is still the design phase of the overall project.
My next task is to either write the code to parse and verify the attribute-expressions or to design the data structures needed to store them in memory. Either one can be done first - it is just a personal choice on approach. I'll probably do the data structures first - I like working with parse trees.