All | 43 Folders | Accessibility | BoingBoing | Books | Computer Related | Family | Films | General | Hacking | Hobbies | Humor | Java | Links | Omni | OpenSolaris | Puzzles and Games

« Project Hamburg | Main | How A Quantum Comput... »
20070916 Sunday September 16, 2007

1001 Books You Must Read Before You Die - Part 2

I just couldn't leave it alone. I had to keep scratching at it, like a festering wound.

I knew from the Amazon reviewers that Ian McEwan had eight books on the list. I was curious to know who the other multi-book authors were. First of all, I grabbed the list, cleaned it up and saved it as a text file.

I then wrote a small Python script to read in that text file and determine two things:

Here's the code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Change to False to print a book list alphabetically by author.
printSummary = True

authors = {}
bookFile = open("1001Books.txt", "r")
while 1:
    line = bookFile.readline()
    if not line:
        break
    tokens = line.split("–")
    authorString = tokens[1].lstrip().rstrip()
    title =  tokens[0].split(".", 1)[1].lstrip().rstrip()

    # Adjust author string so last name comes first.
    tokens = authorString.split()
    author = tokens[len(tokens)-1]
    if len(tokens)-1 > 0:
        author = author + ", " + " ".join(tokens[0:len(tokens)-1])

    if authors.has_key(author):
        authors[author].append(title)
    else:
        authors[author] = [ title ]

if printSummary:
    # ------ Print summary of multi-book authors ------
    noOfBooks = {}
    for author in authors.keys():
        total = len(authors[author])
        if total > 1:
            if noOfBooks.has_key(total):
                noOfBooks[total].append(author)
            else:
                noOfBooks[total] = [ author ]

    print "Number of authors: ", len(authors), "\n\n"

    keys = noOfBooks.keys()
    keys.sort()
    for key in keys:
        print "Authors with ", key, " books: "
        noOfBooks[key].sort()
        for i in range (0, len(noOfBooks[key])):
            print "    ", noOfBooks[key][i]
        print
else:
    # ------ Print book list alphabetically by author -----
    keys = authors.keys()
    keys.sort()
    for author in keys:
        print author
        titles = authors[author].sort()
        for title in authors[author]:
            print "    %s" % title

Here's the results.

I actually took it one step further. If you set

printSummary = False

near the top of the script, it'll generate an alphabetical book list by author which you can then print out and easily use when you visit a bookstore. As they say on all the best cooking shows and to save you the trouble, I already have one prepared.

If I was to become even more obsessed with this and wanted to take it further, I'd wonder:

But that's enough for now.

[]

( Sep 16 2007, 10:09:59 AM PDT ) [Listen] Permalink Comments [6]

Comments:

Certainly more ambitious than me. I generally go for quick and dirty shell; e.g.

while read BOOK; do echo ${BOOK#*– }; done < 1001Books.txt | sort | uniq -c | sort -n

Formatting isn't especially pretty, of course.

Posted by Matthew Berg on September 16, 2007 at 11:48 AM PDT #

Nice. Thanks Matthew.

Posted by Rich Burridge on September 16, 2007 at 12:11 PM PDT #

I re-wrote your script to use many modern python features.

http://paste.lisp.org/display/47809

see if you can spot all the changes :-)

Posted by Justin on September 16, 2007 at 12:29 PM PDT #

Hi Justin.

I knew about def functions and __main__ (see some
of my other recent Python hacks), but sorted() and
setdefault() were new to me.

I can think of several places in Orca where we
should be using that. :-)

Thanks.

Posted by Rich Burridge on September 16, 2007 at 12:39 PM PDT #

After having been through the alphabetical
book listing by author, I should point out --
before some eagle-eyed reader does -- that there
are a few things that my original script
didn't handle. Lines like:

120. Mr. Vertigo – Paul Auster
614. Out of Africa – Isak Dineson (Karen Blixen)
161. Asphodel – H.D. (Hilda Doolittle)
990. The Princess of Clèves – Marie-Madelaine Pioche de Lavergne, Comtesse de La Fayette
...

This started out as a simple hack. I don't intend
to take it any further. If I did, I'd probably
end up adding a ChangeLog and a README, generating
a .tar.gz and submitting it to FreshMeat.

That's just way too far.

I intend to fixup the list by hand.

Posted by Rich Burridge on September 16, 2007 at 04:28 PM PDT #

I ended up fixing up a bug:

I replaced:

title = tokens[0].split(".")[1].lstrip().rstrip()

with:

title = tokens[0].split(".", 1)[1].lstrip().rstrip()

That fixes up any book titles with a period in them.

Justin, you might want to update your version too.

Posted by Rich Burridge on September 16, 2007 at 06:09 PM PDT #

Post a Comment:

Comments are closed for this entry.