All | 43 Folders | Accessibility | BoingBoing | Books | Computer Related | Family | Films | General | Hacking | Hobbies | Humor | Java | Links | Omni | Puzzles and Games

« Vacation Next Week | Main | TheTech Museum of... »
20070225 Sunday February 25, 2007

Problem Automatically Downloading Files With Wget

A question for you wget experts.

I'm trying to put together a script that will automatically download all the user supplied custom levels for the Professor Fizzwizzle game.

You can see them, if you go to their web page. Each one is of the form:

http://grubbygames.com/pf_levels/download.php?id=NNN
where "NNN" is a unique level.

Now if I do (say):

  % wget http://grubbygames.com/pf_levels/download.php?id=852

the download succeeds, but the file is saved with a name of "download.php?id=852".

If I do:

  % wget --spider --debug http://grubbygames.com/pf_levels/download.php?id=852"

I can see from the debug messages, that the actual filename I want it saved as is in the "Content-Disposition:" line:

DEBUG output created by Wget 1.10.2 on linux-gnu.

--11:55:57--  http://grubbygames.com/pf_levels/download.php?id=852
           => `download.php?id=852.1'
Resolving grubbygames.com... 69.20.54.231
Caching grubbygames.com => 69.20.54.231
Connecting to grubbygames.com|69.20.54.231|:80... connected.
Created socket 3.
Releasing 0x00000000005576a0 (new refcount 1).

---request begin---
HEAD /pf_levels/download.php?id=852 HTTP/1.0
User-Agent: Wget/1.10.2
Accept: */*
Host: grubbygames.com
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 200 OK
Date: Sun, 25 Feb 2007 20:00:08 GMT
Server: Apache/2.0.46 (Red Hat)
Accept-Ranges: bytes
X-Powered-By: PHP/4.3.2
Pragma: public
Cache-Control: must-revalidate, post-check=0, pre-check=0
Content-Disposition: attachment; filename="duncan1.lvl"
Connection: close
Content-Type: lvl

---response end---
200 OK
Length: unspecified [lvl]
Closed fd 3
200 OK

Are their any command options that I can give to wget, to get it to save into the filename given on the "Content-Disposition:" line?

I'm using wget 1.10.2.

Update:

Thanks to everybody who commented. As I wanted to write a Python script to grab all the user supplied custom levels within a given range, the solution by Stephen English was just perfect.

Here's my complete script using a slight variation of his code:

#!/usr/bin/env python
#
# Script to automatically download a range of Professor Fizzwizzle custom
# levels from the Grubbygames website.
#

import urllib2
import sys

startLevel = 1
endLevel = 853
baseUrl = "http://grubbygames.com/pf_levels/download.php?id="

def main():
    for i in range(startLevel, endLevel+1):
        url = baseUrl + str(i)
        print "url: `%s`" % url

        level = urllib2.urlopen(url)
        f = open(level.headers["Content-Disposition"].split("\"")[1], "w")
        f.write(level.read())
        f.close()

if __name__ == "__main__":
    main()

[]

[]

( Feb 25 2007, 12:42:12 PM PST ) [Listen] Permalink Comments [5]

Comments:

Wget's Development Page lists among the features for wget 1.11: "Fixed parsing of HTTP Content-disposition header". Perhaps this will provide the functionality you're looking for? (I haven't tested it myself...)

Posted by George Skuse on February 25, 2007 at 01:37 PM PST #

Not what you where looking for but this is how I would do it.... with an ugly hack. :) To download a bunch just use a for loop and seq.... for id in $(seq 832 853); do ./grubbydownload.sh $id ; done No idea if this is possible to do with any nice switches to wget.... grubbydownload.sh: #!/bin/sh if [ "$1" = "" ]; then echo "usage: $0 <id-number>" >&2 exit 1 fi ID=$1 wget --quiet "http://grubbygames.com/pf_levels/download.php?id=$ID" FILENAME=$(wget --spider --debug "http://grubbygames.com/pf_levels/download.php?id=$ID" 2>&1 1>/dev/null | awk -Ffilename= '/Content-Disposition/{print $2}' | cut -d'"' -f2) mv download.php*$ID "$FILENAME" if [ -f "$FILENAME" ]; then echo "$FILENAME saved." else echo "Download failed." fi # as a bonus, you get to fix up the linebreaks, since your input form didn't respect my formatting (newlines). :)

Posted by no name on February 25, 2007 at 01:52 PM PST #

stephen@sam:/tmp$ cat moo.py
import urllib2
import sys

game = urllib2.urlopen(sys.argv[1])
f = open(game.headers["Content-Disposition"].split("\"")[1], "w")
f.write(game.read())
f.close()
stephen@sam:/tmp$ python moo.py http://grubbygames.com/pf_levels/download.php?id=852

Posted by Stephen English on February 25, 2007 at 01:53 PM PST #

I should do it in this way: wget -O $(wget --debug -O /dev/null http://grubbygames.com/pf_levels/download.php?id=853 2>&1|grep Content-Disposition|cut -d'"' -f2) http://grubbygames.com/pf_levels/download.php?id=853 greets

Posted by MaoP on February 25, 2007 at 02:36 PM PST #

Thanks everybody.

There's always more than one way to skin a cat. I went with a variation of the solution by Stephen English, and updated the blog post accordingly.

Posted by Rich Burridge on February 25, 2007 at 03:34 PM PST #

Post a Comment:

Comments are closed for this entry.