« July 2009
SunMonTueWedThuFriSat
   
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20081008 Wednesday October 08, 2008
A reader suggestion on how to solve the Perl script

Neil doesn't like that our comment section wipes out whitespace. His concern is certainly valid where it comes to the way Python uses indentation.

He suggested the following implementation:

#!/usr/bin/env python

import csv

def main(dfile,format,delimiter=","):
        db=open(dfile,'U')
        start=0
        for line in db:
                if line.startswith(format):
                        db.seek(start+len(format))
                        return csv.DictReader(db,delimiter=delimiter)
                else:
                        start+=len(line)+(len(db.newlines)==2) #windows hackery
        raise "There is no %s header line in %s" % (format,dfile)


if __name__ == "__main__":
        for row in main('data.txt','!'):
                print "%s - %s: %s for %s\n\t%s\n\n" % \
                                tuple([row[column] for column in ['started','ended','title','company','jobdesc']])

And he provided the following note:

So what about something like this?

The csv module should take care of delimiters within columns
Simplification is possible if you don't need to deal with windows or
unix style line terminators
Changing delimiters is easy too.

I like that he caught on to making the separator an argument. It makes the code much more portable. I'm not sure it is as robust with respect to error handling, but in all fairness that could easily be handled and I did add those after posting the Perl script. Oh, and it does easily handle the addition of a new column in the data file.

I like the use of raise, I'm certainly not used to exception handlers any more.

I can see part of what is going on here:

>>> for row in neil.main("r4.txt",'!'):
...     print row
...
{'description': 'NFS development', 'title': 'Staff Engineer Software', 'started': '1/05', 'company': 'Sun Microsystems', 'ended': 'present', 'mad': 'money'}
{'description': 'WAFL and NFS development', 'title': 'File System Engineer', 'started': '6/01', 'company': 'Network Appliance', 'ended': '12/05', 'mad': 'honey'}
{'description': 'Manager of Engineering Internal Test', 'title': 'Manager', 'started': '4/01', 'company': 'Network Appliance', 'ended': '6/01', 'mad': 'scot'}
{'description': 'Perl hacker and filer administrator', 'title': 'System Administrator', 'started': '10/99', 'company': 'Network Appliance', 'ended': '4/01', 'mad': 'dam'}

And I thing the stuff with 'start' is what gets over the '!' in the first line???

>>> import csv
>>> help(csv.DictReader)

>>> for row in csv.DictReader(file("r4.txt")):
...     print row
...
{'!started': '1/05', 'description': 'NFS development', 'title': 'Staff Engineer Software', 'company': 'Sun Microsystems', 'ended': 'present', 'mad': 'money'}
{'!started': '6/01', 'description': 'WAFL and NFS development', 'title': 'File System Engineer', 'company': 'Network Appliance', 'ended': '12/05', 'mad': 'honey'}
{'!started': '4/01', 'description': 'Manager of Engineering Internal Test', 'title': 'Manager', 'company': 'Network Appliance', 'ended': '6/01', 'mad': 'scot'}
{'!started': '10/99', 'description': 'Perl hacker and filer administrator', 'title': 'System Administrator', 'company': 'Network Appliance', 'ended': '4/01', 'mad': 'dam'}

But I haven't figured out yet how the result is being built up. Okay, yes I have. I was fixated on the 'if' and 'else' thinking that was handling the header line versus the data line. But no, all it does it get you to the header line (i.e., there are comments in the file) and then the 'db.seek' gets you to the start of the header line and + 1 (via 'len(format)') for the format character. Then, just as in my interactive example, 'csv.DictReader' does the magic for you!

Sweet, Neil's code does what I had the Perl script doing!

It also shows I'm not used to all of the Python way of doing things. But it was fun to figure out what his script was doing!


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/a_reader_suggestion_on_how
Comments:

try

template = """"%(started)s - %(ended)s: %(title)s for %(company)s
%(jobdesc)s"""

print template % row

Posted by Justin on October 08, 2008 at 08:37 PM CDT #

I think it's also worth noting that if you remove the ! from the first line, you could just use csv.DictReader directly.

Posted by Justin on October 09, 2008 at 09:20 AM CDT #

Yes, but I've abstracted the original data files, which would have comment lines describing the data.

The original Perl code could have been simplified or built on other packages as well if I had changed the specifications of the data files.

Thanks again for the contribution.

Posted by Thomas Haynes on October 10, 2008 at 03:18 PM CDT #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed