My first pass at a Python version of An old perl script reveals my inner C programmer. I've restricted the program to the simple version which does not generate the column name as local variables - first I want to get my proof of concept correct:
#!/usr/bin/python
for line in open("resume.txt"):
line.lstrip()
if line.startswith("!") or line.startswith("#"): continue
(started, ended, title, company, description) = line.split(",")
print "%s - %s: %s for %s\n\t%s\n\n", started, ended, title, company, description
It looks like it will reformat, but I've messed up the print statement:
> ./simple_1.py
%s - %s: %s for %s
%s
1/05 present Staff Engineer Software Sun Microsystems NFS development
%s - %s: %s for %s
%s
6/01 12/05 File System Engineer Network Appliance WAFL and NFS development
%s - %s: %s for %s
%s
4/01 6/01 Manager Network Appliance Manager of Engineering Internal Test
%s - %s: %s for %s
%s
10/99 4/01 System Administrator Network Appliance Perl hacker and filer administrator
I.e., I treated print like a C printf. Okay, I can try again with this one:
#!/usr/bin/python
for line in open("resume.txt"):
line.lstrip()
if line.startswith("!") or line.startswith("#"): continue
(started, ended, title, company, description) = line.split(",")
print "%s - %s: %s for %s\n\t%s\n\n" % (started, ended, title, company, description)
And get more of what I want to see:
> ./simple.py
1/05 - present: Staff Engineer Software for Sun Microsystems
NFS development
6/01 - 12/05: File System Engineer for Network Appliance
WAFL and NFS development
4/01 - 6/01: Manager for Network Appliance
Manager of Engineering Internal Test
10/99 - 4/01: System Administrator for Network Appliance
Perl hacker and filer administrator
I'm getting an extra line I don't want and I have to hard code the file to process. I can easily fix these both up:
#!/usr/bin/python
import sys
for line in open(sys.argv[1]):
line.lstrip()
if line.startswith("!") or line.startswith("#"): continue
(started, ended, title, company, description) = line.split(",")
print "%s - %s: %s for %s\n\t%s\n" % (started, ended, title, company, description)
Okay, with this simple example, I could get rid of the names in Perl and make it really simple. Can I do so in Python?
#!/usr/bin/python
import sys
for line in open(sys.argv[1]):
line.lstrip()
if line.startswith("!") or line.startswith("#"): continue
print "%s - %s: %s for %s\n\t%s\n\n" % line.split(",")
No, not as I have tried:
> ./simple2.py resume.txt Traceback (most recent call last): File "./simple2.py", line 8, inprint "%s - %s: %s for %s\n\t%s\n\n" % line.split(",") TypeError: not enough arguments for format string
I've got a type error, hmm, I'm going to try this by hand:
> python >>> st1 = "This is the radio clash!" >>> st1.split() ['This', 'is', 'the', 'radio', 'clash!'] >>>
So I have a '[]' instead of a '()'. What does that mean? It means I have a list versus a tuple. And I find a converter called strangely enough, tuple:
print "%s - %s: %s for %s\n\t%s\n\n" % tuple(line.split(","))
And that works.
The Oklahoma City OpenSolaris Users group (OKCOSUG) next meeting will be on Thursday, October 30th at Oklahoma City University. Said Syed, Sun Staff Engineer, will be the featured speaker. The meeting will run from 5:30 to 7:30PM, with light refreshments beginning at 5:00PM.
Pizza and refreshments will be provided, so please RSVP by pointing your favorite browser to OKCOSUG Event to help us get an estimate of attendance. It will also help speed up the sign-in process.
!! FREE OpenSolaris Back to School Kits CDs and Giveaways. !!
Agenda
And as always for the latest updates don't forget to check our web page at: http://opensolaris.org/os/project/okcosug/. Our Users Group Email is: ug-okcosug@opensolaris.org.
Thank you for your time and looking forward to seeing you at this and any future OpenSolaris meetings.
Sincerely,
I've got an old perl script that I have gotten a lot of mileage from:
package read_txtfile_format;
sub main'read_txtfile_format {
local(*file,*format) = @_;
local($first_line, $first_char) = '';
do {
$first_line = <file>;
$first_line =~ /(.)(.*)/;
$first_char = $1;
$first_line = $2;
} until ($first_char eq "!" || eof(file));
if (eof(file)) {
die "There is no ! header line in $file";
}
$format = '$' . join(', $', split(/,/, $first_line));
}
I didn't write it, I think either Mark Lawrence or Walt Gaber did while I was at DRD Corporation. Or they got it somewhere. I know they called it a data dictionary - which I'm not sure is a way I would use that term these days. By the way, I still have and use my very beaten up copy of Programming Perl from back then - a 1991 printing.
What it does is allow you to read in another file, generate variable names based on a line starting with a '!' and then use those names per line. It is a cheap database laid out in a flat file.
I think we called this basic script and its associated text files data dictionaries because we would use it to quickly prototype and change data structures in C. I know I used it in my Genetic Programming research to describe the operators used in a new problem set.
Perhaps an example will show the power.
I want to quickly take my resume and reformat it as needed. Perhaps I need it in html format, a plain text file, etc.
I can keep my data in a file, I can have a skeleton script to process it, and I can quickly change it to adapt to new styles...
I'm picking an example which looks like I'm pimping myself out because I thought it was quirky and fun to code. It is also a way I never would have thought to do an example with this piece of code.
!started,ended,title,company,jobdesc 1/05,present,Staff Engineer Software,Sun Microsystems,NFS development 6/01,12/05,File System Engineer,Network Appliance,WAFL and NFS development 4/01,6/01,Manager,Network Appliance,Manager of Engineering Internal Test 10/99,4/01,System Administrator,Network Appliance,Perl hacker and filer administrator
#! /usr/bin/perl
do 'getthead.pl';
open(LNG_FILE, $ARGV[0]) || die "Can't open LNG_FILE: $!\n";
# Determine the Column Names
do main'read_txtfile_format(*LNG_FILE, *languages);
lang: while (<LNG_FILE>) {
next lang if (/^#/ || /^!/);
eval "($languages) = split(/[,\n]/)";
print "$started - $ended: $title for $company\n\t$description\n\n";
}
And here I have it dumping out a format much like the resume.txt file I have been updating as I change job functions:
> ./resume.pl resume.txt
1/05 - present: Staff Engineer Software for Sun Microsystems
NFS development
6/01 - 12/05: File System Engineer for Network Appliance
WAFL and NFS development
4/01 - 6/01: Manager for Network Appliance
Manager of Engineering Internal Test
10/99 - 4/01: System Administrator for Network Appliance
Perl hacker and filer administrator
#! /usr/bin/perl
open(LNG_FILE, $ARGV[0]) || die "Can't open LNG_FILE: $!\n";
lang: while (<LNG_FILE>) {
next lang if (/^#/ || /^!/);
($started, $ended, $title, $company, $description) = split(/[,\n]/);
print "$started - $ended: $title for $company\n\t$description\n\n";
}
It does the same thing, less code as well.
But it isn't as dynamic. I have to edit both the data file and the script to make a change. If I were to add a new field location after company, I would have to change the script on the split. Also, what if I have many scripts manipulating the same data? During my research, I had two different data files and six different scripts per problem set.
For clique detection, 5-6 lines of data dictionary entries resulted in about 400 lines of C code. For predator/prey, about 50 lines of data dictionary entries resulted in about 890 lines of C code.
An example set of data dictionary entries for the predator/prey would be:
!fId:fSymbol:fType:fArity:fMacro:fDifGen:fChild1:fChild2:fChild3:fChild4:fChild5:fDescription Agent:Ag:Agent:1:True:False:Agent:NO_CHILD:NO_CHILD:NO_CHILD:NO_CHILD:Returns A's predatorId. And:&&:Boolean:2:True:False:Boolean:Boolean:NO_CHILD:NO_CHILD:NO_CHILD:A AND B. CellOf:CellOf:Cell:2:False:False:Agent:Tack:NO_CHILD:NO_CHILD:NO_CHILD:The (X,Y) coordinate of agent A if it moves from its current cell to the one in the Tack B.
(Note I've changed the separator to a ':' for clarity.)
Part of the processing script would be:
# Put in all Caps
( $capId = 'GP_L' . $LBranch . '_F_' . $fId ) =~ tr/a-z/A-Z/;
...
print INI_FP ' /*';
print INI_FP ' * ' . $fDescription;
print INI_FP ' */';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].iId = ' . $capId . ';';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].psSymbol = "' . $fSymbol . '";';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].ftType = ' . $capType . ';';
print INI_FP ' pgps->als[' . $LBranch . '].afs[i].arity = ' . $fArity . ';';
...
And some resulting code would be:
/*
* Branch 0 - Main language for the system
*/
/*
* Branch 0 - Functions
*/
pgps->als[0].afs = (FunctionsStruct *)calloc( GP_L0_MAX_FUNCTIONS,
GP_FUNCTIONS_SIZE );
if ( !pgps->als[0].afs ) {
GU_logError( stderr, "%s(%d): Out of Memory!\n",
__FILE__, __LINE__ );
GU_exit ( -1 );
}
/*
* Returns A's predatorId.
*/
pgps->als[0].afs[i].iId = GP_L0_F_AGENT;
pgps->als[0].afs[i].psSymbol = "Ag";
pgps->als[0].afs[i].ftType = FT_L0_E_AGENT;
pgps->als[0].afs[i].arity = 1;
pgps->als[0].afs[i].bMacro = TRUE;
pgps->als[0].afs[i].bActive = TRUE;
pgps->als[0].afs[i].bDifGeneric = FALSE;
pgps->als[0].afs[i].pftChildren = (FunctionTypes *)calloc( pgps->als[0].afs[i].arity, FT_TYPES_SIZE );
if ( !pgps->als[0].afs[i].pftChildren ) {
GU_logError( stderr, "%s(%d): Out of Memory!\n",
__FILE__, __LINE__ );
GU_exit ( -1 );
}
pgps->als[0].afs[i].pftChildren[0] = FT_L0_E_AGENT;
pgps->als[0].afs[i].pFct = gpf_L0_Agent;
i++;
/*
* A AND B.
*/
pgps->als[0].afs[i].iId = GP_L0_F_AND;
pgps->als[0].afs[i].psSymbol = "&&";
By the way, the same script f_types.pl would process all of the language data dictionaries without being modified. If I happened to change the underlying data structures in the C code, i.e., FunctionsStruct, I could change that one script and rebuild all of the different languages.
Why the walk down memory lane?
Well, I still use this script. I've used it to do volunteer scheduling at AAAI, generate Java opcodes for a simple JVM implementation, plan a new company, check for sibling conflicts during a recreational soccer season, implement testbeds for QA efforts for multiple companies, etc. I don't have to have a database on my system. I can suck data out of a database on a Windows box, store it in a CSV datafile on OpenSolaris, and play with the data. I don't have to know SQL and/or care too much about the data. I can generate "reports" and such from the CLI.
And it is the power of Perl (well, the eval() it offers) which lets me get away with this. One of the selling points of Perl for me was rapid prototyping, especially with respect to strings. I could have written C programs to do all of this, but why?
If I'm going to learn Python, I need to be able to replace this piece of functionality. Or else I'll be back with Perl before you know it.
And honestly, even if I learn to make Python bark for me, I'll pick up the tool I need when I have to. :->
Well off to sleep and I'll pick this up tomorrow when I start playing with Python.