I've got an old perl script that I have gotten a lot of mileage from:
package read_txtfile_format;
sub main'read_txtfile_format {
local(*file,*format) = @_;
local($first_line, $first_char) = '';
do {
$first_line = <file>;
$first_line =~ /(.)(.*)/;
$first_char = $1;
$first_line = $2;
} until ($first_char eq "!" || eof(file));
if (eof(file)) {
die "There is no ! header line in $file";
}
$format = '$' . join(', $', split(/,/, $first_line));
}
I didn't write it, I think either Mark Lawrence or Walt Gaber did while I was at DRD Corporation. Or they got it somewhere. I know they called it a data dictionary - which I'm not sure is a way I would use that term these days. By the way, I still have and use my very beaten up copy of Programming Perl from back then - a 1991 printing.
What it does is allow you to read in another file, generate variable names based on a line starting with a '!' and then use those names per line. It is a cheap database laid out in a flat file.
I think we called this basic script and its associated text files data dictionaries because we would use it to quickly prototype and change data structures in C. I know I used it in my Genetic Programming research to describe the operators used in a new problem set.
Perhaps an example will show the power.
I want to quickly take my resume and reformat it as needed. Perhaps I need it in html format, a plain text file, etc.
I can keep my data in a file, I can have a skeleton script to process it, and I can quickly change it to adapt to new styles...
I'm picking an example which looks like I'm pimping myself out because I thought it was quirky and fun to code. It is also a way I never would have thought to do an example with this piece of code.
!started,ended,title,company,jobdesc 1/05,present,Staff Engineer Software,Sun Microsystems,NFS development 6/01,12/05,File System Engineer,Network Appliance,WAFL and NFS development 4/01,6/01,Manager,Network Appliance,Manager of Engineering Internal Test 10/99,4/01,System Administrator,Network Appliance,Perl hacker and filer administrator
#! /usr/bin/perl
do 'getthead.pl';
open(LNG_FILE, $ARGV[0]) || die "Can't open LNG_FILE: $!\n";
# Determine the Column Names
do main'read_txtfile_format(*LNG_FILE, *languages);
lang: while (<LNG_FILE>) {
next lang if (/^#/ || /^!/);
eval "($languages) = split(/[,\n]/)";
print "$started - $ended: $title for $company\n\t$description\n\n";
}
And here I have it dumping out a format much like the resume.txt file I have been updating as I change job functions:
> ./resume.pl resume.txt
1/05 - present: Staff Engineer Software for Sun Microsystems
NFS development
6/01 - 12/05: File System Engineer for Network Appliance
WAFL and NFS development
4/01 - 6/01: Manager for Network Appliance
Manager of Engineering Internal Test
10/99 - 4/01: System Administrator for Network Appliance
Perl hacker and filer administrator
#! /usr/bin/perl
open(LNG_FILE, $ARGV[0]) || die "Can't open LNG_FILE: $!\n";
lang: while (<LNG_FILE>) {
next lang if (/^#/ || /^!/);
($started, $ended, $title, $company, $description) = split(/[,\n]/);
print "$started - $ended: $title for $company\n\t$description\n\n";
}
It does the same thing, less code as well.
But it isn't as dynamic. I have to edit both the data file and the script to make a change. If I were to add a new field location after company, I would have to change the script on the split. Also, what if I have many scripts manipulating the same data? During my research, I had two different data files and six different scripts per problem set.
For clique detection, 5-6 lines of data dictionary entries resulted in about 400 lines of C code. For predator/prey, about 50 lines of data dictionary entries resulted in about 890 lines of C code.
An example set of data dictionary entries for the predator/prey would be:
!fId:fSymbol:fType:fArity:fMacro:fDifGen:fChild1:fChild2:fChild3:fChild4:fChild5:fDescription Agent:Ag:Agent:1:True:False:Agent:NO_CHILD:NO_CHILD:NO_CHILD:NO_CHILD:Returns A's predatorId. And:&&:Boolean:2:True:False:Boolean:Boolean:NO_CHILD:NO_CHILD:NO_CHILD:A AND B. CellOf:CellOf:Cell:2:False:False:Agent:Tack:NO_CHILD:NO_CHILD:NO_CHILD:The (X,Y) coordinate of agent A if it moves from its current cell to the one in the Tack B.
(Note I've changed the separator to a ':' for clarity.)
Part of the processing script would be:
# Put in all Caps
( $capId = 'GP_L' . $LBranch . '_F_' . $fId ) =~ tr/a-z/A-Z/;
...
print INI_FP ' /*';
print INI_FP ' * ' . $fDescription;
print INI_FP ' */';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].iId = ' . $capId . ';';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].psSymbol = "' . $fSymbol . '";';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].ftType = ' . $capType . ';';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].arity = ' . $fArity . ';';
...
And some resulting code would be:
/*
* Branch 0 - Main language for the system
*/
/*
* Branch 0 - Functions
*/
pgps->als[0].afs = (FunctionsStruct *)calloc( GP_L0_MAX_FUNCTIONS,
GP_FUNCTIONS_SIZE );
if ( !pgps->als[0].afs ) {
GU_logError( stderr, "%s(%d): Out of Memory!\n",
__FILE__, __LINE__ );
GU_exit ( -1 );
}
/*
* Returns A's predatorId.
*/
pgps->als[0].afs[i].iId = GP_L0_F_AGENT;
pgps->als[0].afs[i].psSymbol = "Ag";
pgps->als[0].afs[i].ftType = FT_L0_E_AGENT;
pgps->als[0].afs[i].arity = 1;
pgps->als[0].afs[i].bMacro = TRUE;
pgps->als[0].afs[i].bActive = TRUE;
pgps->als[0].afs[i].bDifGeneric = FALSE;
pgps->als[0].afs[i].pftChildren = (FunctionTypes *)calloc( pgps->als[0].afs[i].arity, FT_TYPES_SIZE );
if ( !pgps->als[0].afs[i].pftChildren ) {
GU_logError( stderr, "%s(%d): Out of Memory!\n",
__FILE__, __LINE__ );
GU_exit ( -1 );
}
pgps->als[0].afs[i].pftChildren[0] = FT_L0_E_AGENT;
pgps->als[0].afs[i].pFct = gpf_L0_Agent;
i++;
/*
* A AND B.
*/
pgps->als[0].afs[i].iId = GP_L0_F_AND;
pgps->als[0].afs[i].psSymbol = "&&";
By the way, the same script f_types.pl would process all of the language data dictionaries without being modified. If I happened to change the underlying data structures in the C code, i.e., FunctionsStruct, I could change that one script and rebuild all of the different languages.
Why the walk down memory lane?
Well, I still use this script. I've used it to do volunteer scheduling at AAAI, generate Java opcodes for a simple JVM implementation, plan a new company, check for sibling conflicts during a recreational soccer season, implement testbeds for QA efforts for multiple companies, etc. I don't have to have a database on my system. I can suck data out of a database on a Windows box, store it in a CSV datafile on OpenSolaris, and play with the data. I don't have to know SQL and/or care too much about the data. I can generate "reports" and such from the CLI.
And it is the power of Perl (well, the eval() it offers) which lets me get away with this. One of the selling points of Perl for me was rapid prototyping, especially with respect to strings. I could have written C programs to do all of this, but why?
If I'm going to learn Python, I need to be able to replace this piece of functionality. Or else I'll be back with Perl before you know it.
And honestly, even if I learn to make Python bark for me, I'll pick up the tool I need when I have to. :->
Well off to sleep and I'll pick this up tomorrow when I start playing with Python.
Hi,
So is there a way to post code as comments to your blog without leading whitespace being clobbered?
Regards
Neil
Posted by Neil McCallum on October 07, 2008 at 09:05 PM CDT #
Neil,
Probably not. I'm not always too fond of the comment system we use.
Tom
Posted by Thomas Haynes on October 07, 2008 at 09:18 PM CDT #