« December 2009
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20081008 Wednesday October 08, 2008
Finally, the Python version of the old Perl script

I played about in the interactive Python shell trying to understand the data and how to tie it together. I learned about the difference between exec and eval for Python. I learned about capturing stdio and stdout for exec, but I couldn't figure out a way to automatically create variables in the proper scope in Python.

I even finally found a good quote on this at http://mail.python.org/pipermail/tutor/2005-January/035253.html:

> This is something I've been trying to figure out for some time.  Is
> there a way in Python to take a string [say something from a
> raw_input] and make that string a variable name?  I want to to this so
> that I can create class instances on-the-fly, using a user-entered
> string as the instance name.

This comes up regularly from beginners and is nearly always a bad
idea!

The easy solution is to use a dictionary to store the instances.

Nice to know I'm not the first to want to do this. But it did get me thinking, I have been calling this set of Perl scripts 'data dictionaries' for longer than I care to remember. And the code is not very legible at times. So, I decided to redo the script as:

#!/usr/bin/python

import sys

first_line = True

lang = []
iCounter = 0
for line in open(sys.argv[1]):
        line2 = line.lstrip()
        iCounter += 1

        if line2.startswith("!") or line2.startswith("#"):
                if first_line:
                        lang = line2[1:].split(",")
                        first_line = False
                continue
        splity = line2.split(",")
        dtemp = {}

        if len(splity) != len(lang):
                print "Error - args do not match header on line %d" % (iCounter)
                continue

        for i in range(len(splity)):
                dtemp[lang[i]] = splity[i]

        print "%s - %s: %s for %s\n\t%s\n" % (
                dtemp['started'],
                dtemp['ended'],
                dtemp['title'],
                dtemp['company'],
                dtemp['description'])

dtemp['started'] is more verbose than $started, but it is clearer how I am generating the data. And I have more error checking (which I have yet to sanity check :->).

Anyway, this fails and I knew why almost right off the bat:

> ./r3.py r2.txt
Traceback (most recent call last):
  File "./r3.py", line 33, in 
    dtemp['description'])
KeyError: 'description'

I was suspicious about that extra newline I mentioned way back in The simple version of the old perl script. I suspected that the entry line still had an extra one that I needed to remove. I.e., the data dictionary has a key for 'dictionary\n' and not 'dictionary'.

The following change proved that:

for line in open(sys.argv[1]):
        line1 = line.lstrip()
        line2 = line1.rstrip()
        iCounter += 1

And some quick sanity checking of removing a column in one row and adding one in another row shows that my error checking works:

> ./r3.py r3.txt
Error - args do not match header on line 2
Error - args do not match header on line 3
4/01 - 6/01: Manager for Network Appliance
        Manager of Engineering Internal Test

10/99 - 4/01: System Administrator for Network Appliance
        Perl hacker and filer administrator

So I learned what I set out to do. I may never use this script, but it helped me learn some things the hard way. I didn't show all of the little syntax errors I had to fix (forgetting the ':', not indenting in the interactive shell, etc). But hopefully, I'll remember them.

I'll also claim that the script does meet my needs as did the old one. If I add a new field to the flat file, I won't have to change the script to get the current output! And yes, I just tried that and I didn't have a problem.

I could do some more error checking (i.e., don't access an entry unless it is set), but I've already gone above the error checking in the Perl script.

Final Copy

#!/usr/bin/python

import sys

first_line = True

lang = []
iCounter = 0
for line in open(sys.argv[1]):
        line1 = line.lstrip()
        line2 = line1.rstrip()
        iCounter += 1

        if line2.startswith("!") or line2.startswith("#"):
                if first_line:
                        lang = line2[1:].split(",")
                        first_line = False
                continue
        splity = line2.split(",")
        dtemp = {}

        if len(splity) != len(lang):
                print "Error - args do not match header on line %d" % (iCounter)
                continue

        for i in range(len(splity)):
                dtemp[lang[i]] = splity[i]

        print "%s - %s: %s for %s\n\t%s\n" % (
                dtemp['started'],
                dtemp['ended'],
                dtemp['title'],
                dtemp['company'],
                dtemp['description'])

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/finally_the_python_version_of
Comments:

...
for line in open(sys.argv[1]):
...line = line.strip()
...iCounter += 1
...

Posted by Anonymous on October 08, 2008 at 06:45 AM CDT #

for i in range(len(splity)):
dtemp[lang[i]] = splity[i]

could also be

for index,value in enumerate(splity):
...dtemp[lang[index]] = value

I think this would be considered more pythonic as it avoids iterating on a integer and iterates on the data instead

Posted by Neil McCallum on October 15, 2008 at 04:49 PM CDT #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed