Neil doesn't like that our comment section wipes out whitespace. His concern is certainly valid where it comes to the way Python uses indentation.
He suggested the following implementation:
#!/usr/bin/env python
import csv
def main(dfile,format,delimiter=","):
db=open(dfile,'U')
start=0
for line in db:
if line.startswith(format):
db.seek(start+len(format))
return csv.DictReader(db,delimiter=delimiter)
else:
start+=len(line)+(len(db.newlines)==2) #windows hackery
raise "There is no %s header line in %s" % (format,dfile)
if __name__ == "__main__":
for row in main('data.txt','!'):
print "%s - %s: %s for %s\n\t%s\n\n" % \
tuple([row[column] for column in ['started','ended','title','company','jobdesc']])
And he provided the following note:
So what about something like this? The csv module should take care of delimiters within columns Simplification is possible if you don't need to deal with windows or unix style line terminators Changing delimiters is easy too.
I like that he caught on to making the separator an argument. It makes the code much more portable. I'm not sure it is as robust with respect to error handling, but in all fairness that could easily be handled and I did add those after posting the Perl script. Oh, and it does easily handle the addition of a new column in the data file.
I like the use of raise, I'm certainly not used to exception handlers any more.
I can see part of what is going on here:
>>> for row in neil.main("r4.txt",'!'):
... print row
...
{'description': 'NFS development', 'title': 'Staff Engineer Software', 'started': '1/05', 'company': 'Sun Microsystems', 'ended': 'present', 'mad': 'money'}
{'description': 'WAFL and NFS development', 'title': 'File System Engineer', 'started': '6/01', 'company': 'Network Appliance', 'ended': '12/05', 'mad': 'honey'}
{'description': 'Manager of Engineering Internal Test', 'title': 'Manager', 'started': '4/01', 'company': 'Network Appliance', 'ended': '6/01', 'mad': 'scot'}
{'description': 'Perl hacker and filer administrator', 'title': 'System Administrator', 'started': '10/99', 'company': 'Network Appliance', 'ended': '4/01', 'mad': 'dam'}
And I thing the stuff with 'start' is what gets over the '!' in the first line???
>>> import csv
>>> help(csv.DictReader)
>>> for row in csv.DictReader(file("r4.txt")):
... print row
...
{'!started': '1/05', 'description': 'NFS development', 'title': 'Staff Engineer Software', 'company': 'Sun Microsystems', 'ended': 'present', 'mad': 'money'}
{'!started': '6/01', 'description': 'WAFL and NFS development', 'title': 'File System Engineer', 'company': 'Network Appliance', 'ended': '12/05', 'mad': 'honey'}
{'!started': '4/01', 'description': 'Manager of Engineering Internal Test', 'title': 'Manager', 'company': 'Network Appliance', 'ended': '6/01', 'mad': 'scot'}
{'!started': '10/99', 'description': 'Perl hacker and filer administrator', 'title': 'System Administrator', 'company': 'Network Appliance', 'ended': '4/01', 'mad': 'dam'}
But I haven't figured out yet how the result is being built up. Okay, yes I have. I was fixated on the 'if' and 'else' thinking that was handling the header line versus the data line. But no, all it does it get you to the header line (i.e., there are comments in the file) and then the 'db.seek' gets you to the start of the header line and + 1 (via 'len(format)') for the format character. Then, just as in my interactive example, 'csv.DictReader' does the magic for you!
Sweet, Neil's code does what I had the Perl script doing!
It also shows I'm not used to all of the Python way of doing things. But it was fun to figure out what his script was doing!
I need to do a branch merge between nfs41-gate and onnv-clone. And specifically, I want to not get the 'tip', but rather the tag for release 100. I found a good reference - Chapter 8 Managing releases and branchy development.
So I'll follow along with it. I need the tag:
[th199096@jhereg onnv-play]> hg tags tip 7782:716c23b2ce2e onnv_100 7757:bf4a45ecb669 onnv_99 7613:e49de7ec7617 onnv_98 7473:fad192e9bc57
It turns out I don't need much more:
[th199096@jhereg nfs41-100]> hg reparent ssh://onnv.eng//export/onnv-clone [th199096@jhereg nfs41-100]> hg tags | more tip 7744:763bfa203d1a closedv1 7742:9fab48a31a4a onnv_99 7652:e49de7ec7617
So I haven't merged yet:
[th199096@jhereg nfs41-100]> hg pull -u -r onnv_100 pulling from ssh://onnv.eng//export/onnv-clone searching for changes adding changesets adding manifests adding file changes added 64 changesets with 475 changes to 462 files (+1 heads) not updating, since new heads added (run 'hg heads' to see heads, 'hg merge' to merge)
The tag can be used as a revision!
With the introduction of Mercurial, we have a need to keep our tools directory up to date. We could simply NFS mount the one in Menlo Park, but for WAN and build performance, that sucks. So, the Austin Labs have a local copy. And it is not being kept up to date. We've all been bitten by an old copy of the BFU script.
To get around this, we've built our own local repository and made sure that our paths all take this into account. Well, that just failed for me:
[th199096@jhereg mms]> hg outgoing -v running ssh onnv.eng "hg -R /export/onnv-clone serve --stdio" comparing with ssh://onnv.eng//export/onnv-clone searching for changes abort: style not found: /ws/onnv-tools/onbld/etc/hgstyle
I know the 'hgstyle' stuff is new, I saw Flag Day info on it. And sure enough:
[th199096@jhereg mms]> df -k /ws/onnv-tools/onbld/etc Filesystem kbytes used avail capacity Mounted on mool-ha1-nfs.central:/export/ds01/d531/tools/01/elpaso.eng/opt/onbld 140454588 109105801 29944242 79% /ws/onnv-tools/onbldI don't want to hack on the script, which I think shouldn't be using the full path. So I'll have to change where I'm getting my copy of the tools in /ws.
Okay, I don't have permissions on the NIS server, but I can get the map:
[th199096@jhereg ~]> ypcat -k auto.ws | grep onnv-tool onnv-tools /SUNWspro -ro mool-ha1-nfs.central:/export/ds01/d531/tools/01/slug-17.eng/export/$CPU/opt/SUNWspro /teamware -ro mool-ha1-nfs.central:/export/ds01/d531/tools/01/slug-17.eng/export/$CPU/opt/SUNWspro/SOS8 /onbld -ro mool-ha1-nfs.central:/export/ds01/d531/tools/01/elpaso.eng/opt/onbld
And I can add it to my local /etc/auto_ws:
# # Local copies of /ws workspaces # # For /ws/on10-clone use: # /ws/on10-patch-clone-auspen or on10-feature-clone-auspen # on10-clone-aus iquad:/pool/ws/on10-clone on10-patch-clone-aus iquad:/pool/ws/on10-patch-clone onnv-clone-aus iquad:/pool/ws/onnv-clone on10-test-aus iquad:/pool/ws/on10-test onnv-test-aus iquad:/pool/ws/onnv-test onnv-stc2-aus iquad:/pool/ws/onnv-stc2 on10-tools-aus -ro iquad:/pool/ws/on10-tools-$CPU onnv-tools-aus -ro aus1500-home:/pool/ws/onnv-tools-$CPU onnv-tools /SUNWspro -ro /opt/SUNWspro /teamware -ro /opt/SUNWspro/SOS8 /on bld -ro /opt/onbld
And no go:
[th199096@jhereg /etc]> sudo svcadm restart autofs ... [th199096@jhereg th199096]> ls -la /ws/onnv-tools /ws/onnv-tools: Permission denied total 1 [th199096@jhereg th199096]> dmesg ... Oct 8 16:54:23 jhereg automountd[883428]: [ID 406441 daemon.error] parse_entry: mapentry parse error: map=auto_ws key=onnv-tools Oct 8 16:55:55 jhereg automountd[883477]: [ID 406441 daemon.error] parse_entry: mapentry parse error: map=auto_ws key=onnv-tools
I turn spaces into tabs, no luck. I check other machines and they do the hierarchy locally for other things. Well, I then convert the pathnames from /opt/SUNWspro to localhost:/opt/SUNWspro. And that turns the trick:
[th199096@jhereg th199096]> ls -la /ws/onnv-tools total 5 dr-xr-xr-x 4 root root 4 Oct 8 17:04 . dr-xr-xr-x 2 root root 2 Oct 8 17:04 .. dr-xr-xr-x 1 root root 1 Oct 8 17:04 SUNWspro dr-xr-xr-x 1 root root 1 Oct 8 17:04 onbld dr-xr-xr-x 1 root root 1 Oct 8 17:04 teamware
I probably need to put a real fix into our jumpstart servers and make the path dependent on $CPU, but I think I was doing something when this happened.
I'm the volunteer webmaster for my son's soccer club: Blitz United Soccer Club. We occasionally get logos and such from sponsors. We want jpeg images for the website and they want high quality pdf for printing. Until now, I've simply asked them for the images in a format we can handle.
I got tired of doing that and googled 'pdf to jpg'. There were a lot of hits of sites that either wanted to install to my windows box or get an email address. I added 'linux' to my search parameter and found a nice hit: Batch converting PDF to JPG/JPEG using free software.
Having heard of ImageMagick vaguely in the past, and since they had many download sites, I installed it on my WinXP desktop. And it didn't convert for me:
C:\Documents and Settings\thud\Desktop\Downloads\97red>convert cooper.pdf cooper.jpg convert: `%s': %s "gswin32c.exe" -q -dQUIET -dPARANOIDSAFER -dBATCH -dNOPAUSE -d NOPROMPT -dMaxBitmap=500000000 -dEPSCrop -dAlignToPixels=0 -dGridFitTT=0 "-sDEVICE=pnmraw" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=C:/DOCUME~1/thud/LOCALS~1/Temp/magick-UtqkGDcw" "-fC:/DOCUME~1/thud/LOCALS~1/Temp/magick-MpE4YxWI" "-fC:/DOCUME~1/thud/LOCALS~1/Temp/magick-z6ByBicB". convert: Postscript delegate failed `cooper.pdf': No such file or directory. convert: missing an image filename `cooper.jpg'.
Well, I solved that fairly quickly by:
[thud@adept ~/tmp]> sudo yum install ImageMagick Setting up Install Process Parsing package install arguments Package ImageMagick-6.3.5.9-1.fc8.i386 already installed and latest version Nothing to do [thud@adept ~/tmp]> convert -density 600 cooper.pdf cooper.jpg
Which is probably what I should have tried in the first place.
Helen Chao, a colleague who had never really used Linux, asked me to help configure a kernel. I asked why and she said she needed to test RDMA over NFSv4. It turns out that the stock 2.6.25 kernel with Fedora Core 9 already had the support in it. We followed the directions at the nfs-rdma.txt and were not able to get it running.
Helen (a great test engineer) proceeded to investigate from there and couldn't get a simple loopback or NFS mount to succeed.
So I exported the root to all hosts and went to work debugging this issue. A 'rpcinfo -p' on the server showed the expected registered services. The same call from a client failed, but a ping worked:
[th199096@jhereg ~]> rpcinfo -p pnfs-9-30 ^C [th199096@jhereg ~]> rpcinfo -p pnfs-9-30 ^C [th199096@jhereg ~]> sudo mount -o vers=3 pnfs-9-30:/ /mnt ^C [th199096@jhereg ~]> sudo mount -o vers=3 pnfs-9-30:/ /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: retrying: /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out ^C [th199096@jhereg ~]> ping pnfs-9-30 pnfs-9-30 is alive
I thought that perhaps it was a firewall issue and disabled IPTABLES.
No luck and I knew the mount should succeed - I tried it with my home Core 8 box and an OpenSolaris server. It worked, but then again, that Linux box has been configured for ages. Long story short, I asked Chuck Lever for help.
His only suggestion was to turn off selinux or as he puts it:
Also disable selinux, just so your systems behave like normal Unix.
So I followed the directions I found here: How to Disable SELinux and now the mount works:
# mount -o vers=3 pnfs-9-30:/ /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: retrying: /mnt nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: pnfs-9-30: : RPC: Rpcbind failure - RPC: Timed out nfs mount: /mnt: mounted OK #
Most of the help I found with google on the RPC messages wasn't informative. Either the suggestion was to turn off IPTABLES or there was no reply.
I played about in the interactive Python shell trying to understand the data and how to tie it together. I learned about the difference between exec and eval for Python. I learned about capturing stdio and stdout for exec, but I couldn't figure out a way to automatically create variables in the proper scope in Python.
I even finally found a good quote on this at http://mail.python.org/pipermail/tutor/2005-January/035253.html:
> This is something I've been trying to figure out for some time. Is > there a way in Python to take a string [say something from a > raw_input] and make that string a variable name? I want to to this so > that I can create class instances on-the-fly, using a user-entered > string as the instance name. This comes up regularly from beginners and is nearly always a bad idea! The easy solution is to use a dictionary to store the instances.
Nice to know I'm not the first to want to do this. But it did get me thinking, I have been calling this set of Perl scripts 'data dictionaries' for longer than I care to remember. And the code is not very legible at times. So, I decided to redo the script as:
#!/usr/bin/python
import sys
first_line = True
lang = []
iCounter = 0
for line in open(sys.argv[1]):
line2 = line.lstrip()
iCounter += 1
if line2.startswith("!") or line2.startswith("#"):
if first_line:
lang = line2[1:].split(",")
first_line = False
continue
splity = line2.split(",")
dtemp = {}
if len(splity) != len(lang):
print "Error - args do not match header on line %d" % (iCounter)
continue
for i in range(len(splity)):
dtemp[lang[i]] = splity[i]
print "%s - %s: %s for %s\n\t%s\n" % (
dtemp['started'],
dtemp['ended'],
dtemp['title'],
dtemp['company'],
dtemp['description'])
dtemp['started'] is more verbose than $started, but it is clearer how I am generating the data. And I have more error checking (which I have yet to sanity check :->).
Anyway, this fails and I knew why almost right off the bat:
> ./r3.py r2.txt Traceback (most recent call last): File "./r3.py", line 33, indtemp['description']) KeyError: 'description'
I was suspicious about that extra newline I mentioned way back in The simple version of the old perl script. I suspected that the entry line still had an extra one that I needed to remove. I.e., the data dictionary has a key for 'dictionary\n' and not 'dictionary'.
The following change proved that:
for line in open(sys.argv[1]):
line1 = line.lstrip()
line2 = line1.rstrip()
iCounter += 1
And some quick sanity checking of removing a column in one row and adding one in another row shows that my error checking works:
> ./r3.py r3.txt
Error - args do not match header on line 2
Error - args do not match header on line 3
4/01 - 6/01: Manager for Network Appliance
Manager of Engineering Internal Test
10/99 - 4/01: System Administrator for Network Appliance
Perl hacker and filer administrator
So I learned what I set out to do. I may never use this script, but it helped me learn some things the hard way. I didn't show all of the little syntax errors I had to fix (forgetting the ':', not indenting in the interactive shell, etc). But hopefully, I'll remember them.
I'll also claim that the script does meet my needs as did the old one. If I add a new field to the flat file, I won't have to change the script to get the current output! And yes, I just tried that and I didn't have a problem.
I could do some more error checking (i.e., don't access an entry unless it is set), but I've already gone above the error checking in the Perl script.
#!/usr/bin/python
import sys
first_line = True
lang = []
iCounter = 0
for line in open(sys.argv[1]):
line1 = line.lstrip()
line2 = line1.rstrip()
iCounter += 1
if line2.startswith("!") or line2.startswith("#"):
if first_line:
lang = line2[1:].split(",")
first_line = False
continue
splity = line2.split(",")
dtemp = {}
if len(splity) != len(lang):
print "Error - args do not match header on line %d" % (iCounter)
continue
for i in range(len(splity)):
dtemp[lang[i]] = splity[i]
print "%s - %s: %s for %s\n\t%s\n" % (
dtemp['started'],
dtemp['ended'],
dtemp['title'],
dtemp['company'],
dtemp['description'])
Guess I have to understand that script to rewrite it in Python. :->
First, gethead.pl reads through the file until it finds a line which starts with a '!'. In which case it creates a list of names of the form '$'column name:
$format = '$' . join(', $', split(/,/, $first_line));
print $format . "\n";
Yields:
> ./r.pl r.txt $started, $ended, $title, $company, $description
The magic really occurs in the main processing loop:
do main'read_txtfile_format(*LNG_FILE, *languages);
lang: while (<LNG_FILE>) {
next lang if (/^#/ || /^!/);
eval "($languages) = split(/[,\n]/)";
print "$started - $ended: $title for $company\n\t$description\n\n";
}
The first line gets '$languages' setup to the 'variable' names. Each time through the while loop, we call the eval to translate/associate the columns to variable names.
So I added some code to my simple script that wasn't in the Perl:
for line in open(sys.argv[1]):
line.lstrip()
And my intent was to strip out all of the leading spaces. I didn't have to, but I created a simple test case with the first line pushed over by a tab and the second line pushed over by 8 spaces. The first one worked correctly and the second did not:
Update, I wasn't thinking correctly here, I knew I had two bad lines here and I didn't know why. After solving the coding problem, I can see that both lines of input failed. The header line is being treated here as if it were a normal line and being processed.
> ./simple2.py r2.txt
!started - ended: title for company
description
1/05 - present: Staff Engineer Software for Sun Microsystems
NFS development
Well d'oh, even in Perl I just told it to strip out one character. I've got to tell it that while there is whitespace, strip it out:
for line in open(sys.argv[1]):
while line.isspace(): line.lstrip()
And this doesn't work either. At which point I realize it must be because strings are immutable, right? I mean it is never changing! Note I get to the right conclusion, but for the wrong reasons. If it were immutable and the string had whitespace at the start, I should be stuck in an endless loop here. See the ending section for that analysis.
It also points out that I never did anything with that line.lstrip(). It never changes line, but does create a reference to a new string. Which we can see here:
>>> st2 = " This is the radio clash!"
>>> print st2.lstrip()
This is the radio clash!
>>> print st2
This is the radio clash!
>>>
See, st2.lstrip() actually works!
I've fixed up the script (in a boring way) and it works:
for line in open(sys.argv[1]):
line2 = line.lstrip()
if line2.startswith("!") or line2.startswith("#"): continue
print "%s - %s: %s for %s\n\t%s\n\n" % tuple(line2.split(","))
Okay, to try to understand this, I did the following in the shell:
>>> st2 = " This is the radio clash!" >>> while st2.isspace(): ... print st2 ... st2.lstrip() ...
Which should be an endless loop according to what I know now. But nothing gets done. Which means that st2.isspace() is FALSE. And a help(st2.isspace) shows that:
Help on built-in function isspace: isspace(...) S.isspace() -> bool Return True if all characters in S are whitespace and there is at least one character in S, False otherwise.
I.e., my misunderstanding of st2.lstrip() being immutable made me think that st2.isspace() worked on the first character of the string. Actually, I made a bad assumption based on what I thought C would do. My bad.
So I don't ever want to do that while loop on a string which is really all whitespace.
All the reading in the world about Python strings will not help me understand the immutability of them as much as this simple example.