1001 Books You Must Read Before You Die - Part 2
|
I just couldn't leave it alone. I had to keep scratching at it, like a festering wound. |
I knew from the Amazon reviewers that Ian McEwan had eight books on the list. I was curious to know who the other multi-book authors were. First of all, I grabbed the list, cleaned it up and saved it as a text file.
I then wrote a small Python script to read in that text file and determine two things:
- The number of authors that had more than one book listed.
- For each of those authors, exactly how many books they had listed.
Here's the code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Change to False to print a book list alphabetically by author.
printSummary = True
authors = {}
bookFile = open("1001Books.txt", "r")
while 1:
line = bookFile.readline()
if not line:
break
tokens = line.split("–")
authorString = tokens[1].lstrip().rstrip()
title = tokens[0].split(".", 1)[1].lstrip().rstrip()
# Adjust author string so last name comes first.
tokens = authorString.split()
author = tokens[len(tokens)-1]
if len(tokens)-1 > 0:
author = author + ", " + " ".join(tokens[0:len(tokens)-1])
if authors.has_key(author):
authors[author].append(title)
else:
authors[author] = [ title ]
if printSummary:
# ------ Print summary of multi-book authors ------
noOfBooks = {}
for author in authors.keys():
total = len(authors[author])
if total > 1:
if noOfBooks.has_key(total):
noOfBooks[total].append(author)
else:
noOfBooks[total] = [ author ]
print "Number of authors: ", len(authors), "\n\n"
keys = noOfBooks.keys()
keys.sort()
for key in keys:
print "Authors with ", key, " books: "
noOfBooks[key].sort()
for i in range (0, len(noOfBooks[key])):
print " ", noOfBooks[key][i]
print
else:
# ------ Print book list alphabetically by author -----
keys = authors.keys()
keys.sort()
for author in keys:
print author
titles = authors[author].sort()
for title in authors[author]:
print " %s" % title
Here's the results.
I actually took it one step further. If you set
printSummary = False
near the top of the script, it'll generate an alphabetical book list by author which you can then print out and easily use when you visit a bookstore. As they say on all the best cooking shows and to save you the trouble, I already have one prepared.
If I was to become even more obsessed with this and wanted to take it further, I'd wonder:
- How many of the authors here had had their books
nominated for the Man Booker prize.
- How much correlation there is between the 1001 Books publisher and the publishers of the authors who have the most number of books on this list.
But that's enough for now.
( Sep 16 2007, 10:09:59 AM PDT ) [Listen] Permalink Comments [6]
Comments are closed for this entry.













Certainly more ambitious than me. I generally go for quick and dirty shell; e.g.
while read BOOK; do echo ${BOOK#*– }; done < 1001Books.txt | sort | uniq -c | sort -n
Formatting isn't especially pretty, of course.
Posted by Matthew Berg on September 16, 2007 at 11:48 AM PDT #
Nice. Thanks Matthew.
Posted by Rich Burridge on September 16, 2007 at 12:11 PM PDT #
I re-wrote your script to use many modern python features.
http://paste.lisp.org/display/47809
see if you can spot all the changes :-)
Posted by Justin on September 16, 2007 at 12:29 PM PDT #
Hi Justin.
I knew about def functions and __main__ (see some
of my other recent Python hacks), but sorted() and
setdefault() were new to me.
I can think of several places in Orca where we
should be using that. :-)
Thanks.
Posted by Rich Burridge on September 16, 2007 at 12:39 PM PDT #
After having been through the alphabetical
book listing by author, I should point out --
before some eagle-eyed reader does -- that there
are a few things that my original script
didn't handle. Lines like:
120. Mr. Vertigo – Paul Auster
614. Out of Africa – Isak Dineson (Karen Blixen)
161. Asphodel – H.D. (Hilda Doolittle)
990. The Princess of Clèves – Marie-Madelaine Pioche de Lavergne, Comtesse de La Fayette
...
This started out as a simple hack. I don't intend
to take it any further. If I did, I'd probably
end up adding a ChangeLog and a README, generating
a .tar.gz and submitting it to FreshMeat.
That's just way too far.
I intend to fixup the list by hand.
Posted by Rich Burridge on September 16, 2007 at 04:28 PM PDT #
I ended up fixing up a bug:
I replaced:
title = tokens[0].split(".")[1].lstrip().rstrip()
with:
title = tokens[0].split(".", 1)[1].lstrip().rstrip()
That fixes up any book titles with a period in them.
Justin, you might want to update your version too.
Posted by Rich Burridge on September 16, 2007 at 06:09 PM PDT #