BeautifulSoup - Get A 10 Day Weather Forecast For Your Zip Code
|
After Matt Harrison mentioned BeautifulSoup in a comment to an old Python script post of mine, I've been looking for somewhere where I could use it. |
BeautifulSoup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping.
I initially played around with it, seeing if I could use it to get listings of when new episodes of my favorite TV programs were appearing, now that Zap2It Labs are no longer making their listing available for free. The problem there (I think) is that, because of the dynamically generated content on their TV listings website, I can't find a URL that BeautifulSoup can parse.
So I picked something different to cut my teeth on.
I often go to Weather.com and get a 10 day forecast for the city where I live. Easy to do, but I used this as an example of something to extract from a web page and then also email it to me so I have it handy.
This script does this. You will also need to get a copy of BeautifulSoup.py for it to work properly. I've simply put them both in the same directory and run it with:
$ python ./get_weather.py
If others are interested in running this, then there are two variables that you will need to change in the script to meet your needs:
# Zip code to get 10 day forecast for. # zipCode = "94024" # Email address to sent results to. # emailAddr = "someone@somewhere.com"
Just like my early attempts with using XPath in some of my JavaScript scripts, I suspect that I'm not doing it the best way.I predict that there are much nicer ways of writing the extractForecast() routine.
Still it works and that's the first step in programming.
[Technorati Tag: BeautifulSoup]
( Apr 30 2008, 08:58:05 AM PDT ) [Listen] Permalink Comments [5]
Another Python Library Script
My library is part of the Santa Clara County library system. Several branches work together. If the book is in the county library system, but not available at my local branch, I can put in a request, and they'll ship a copy to me as soon as one is available.
It's also possible that my local branch has a copy and it's out. What I really want to know is which books I'm interested in are available now in my local branch, so I can grab them if I immediately visit the library.
This script helps me do that. Here's how it works. For each of the books on each of the given Amazon Wish List ID's, it'll extract the ISBN and use that to query my libraries online catalog.
It'll first check the HTML reply, looking for the string "Sorry, could not find anything matching". If it finds, that, it'll go onto the next book. If it doesn't find it, then the county library has at least one copy of the book. It then looks for the string "Los Altos Library". If it finds that, it'll then grab the reply from that point upto the sub-string "Add Copy to MyList" and divide it into tokens, using "<" as a separator. It'll then look for tokens that start with "a class" and grab the sub-string from the ">" character to the end of the string. When that's complete for all the tokens, it'll check to see if the last one is "In". If it is, we've found a book that's in at my local library, and the book results are written to standard out. The script also writes a few messages to stderr that give information on the books that are in the county library system and/or are available from my local branch but are not currently in.
Here's a partial listing of what the program output to stderr looks like as it's running:
$ python ./check_los_altos.py >booklist.txt ... Found in County Library: Watercolor: Painting Smart Found in County Library: Chaos and Fractals: New Frontiers of Science Los Altos Library has a copy Found in County Library: Ships-In-Bottles: A Step-By-Step Guide to a Venerable Nautical Craft Los Altos Library has a copy Currently IN Found in County Library: Complete Stories of Robert Bloch: Final Reckonings (Complete Stories of Robert Bloch) Los Altos Library has a copy Currently IN Found in County Library: Looking for Jake: Stories Los Altos Library has a copy Found in County Library: 123 Robotics Experiments for the Evil Genius (TAB Robotics) Los Altos Library has a copy Found in County Library: The Art and Craft of Paper Sculpture: A Step-By-Step Guide to Creating 20 Outstanding and Original Paper Projects Found in County Library: Bad Science: The Short Life and Weird Times of Cold Fusion ... $
The booklist.txt entries for those two books above that are found in my local library look like:
Ships-In-Bottles: A Step-By-Step Guide to a Venerable Nautical Craft Nonfiction Section 745.5928 HUBBARD In Complete Stories of Robert Bloch: Final Reckonings (Complete Stories of Robert Bloch) Science Fiction Section SF BLOCH ROBERT In
As you can see, the output even tells me which section of the library to look in for each book.
For anybody who wants to use the script as the basis for doing something similar for their library, you are going to have to make small changes in four areas. Three are trivial. The fourth will take a little Python programming.
You'll need to replace the "KKKKKKKKKKKKKKKKKKKK" on the line:
amazonAccessKey = "KKKKKKKKKKKKKKKKKKKK"
with your own Amazon Access key. If you don't already have one, then click the "Sign up now" link on the right side of their Amazon Web Services web page.You'll need to replace the "XXXXXXXXXXXXX", "YYYYYYYYYYYYY" on the line:
amazonWishListIDs = [ "XXXXXXXXXXXXX", "YYYYYYYYYYYYY" ]
with the ID's of the Amazon wish lists you are interested in. One way to find out the Amazon wish list ID, is to view the wish list, then click on the "Edit list information" link near the top left of the web page. You'll be taken to another web page with a URL of the form:https://www.amazon.com/gp/registry/wishlist/XXXXXXXXXXXXX/settings.html?ie=UTF8&ref%5F=cm%5Fwl%5Fedit&in-sign-in=1&page=settings.mThe "XXXXXXXXXXXXX" part is the wish list ID.
The URL of your libraries online catalog. This old post of mine and Jon Udell's LibraryLookup Bookmarklet Generator should help with that.
And the last part is where you'll need to cut some Python code. The checkLibrary() routine will need to be rewritten to search for the appropriate strings on the HTML web page that your library catalog generates.
For the Python naming pedants, I've found that I simply like CamelCase variable names better that the current Python naming "standard", so you'll going to have to just deal with it. Other constructive Python criticisms are appreciated.
As I was writing this, I realized I didn't take into consideration the case where my local library might have multiple copies of the same book, and one or more of them might be in, even though the first one wasn't.
But that'll be a fix for the next version.
( Feb 22 2008, 08:10:32 AM PST ) [Listen] Permalink Comments [3]
Take 3 - LifeHacker Category Viewer GreaseMonkey Script Working Again
|
You may remember a post from last November that described a GreaseMonkey script that would display the list of categories of the LifeHacker web site and allow you to dynamically display all the posts associated with each category. |
Tyler Trafford, (who gave me lots of help getting that working), emailed me today with a change I would need to make because of the new security enhancements in the latest version of the GreaseMonkey Firefox add-on.
In testing it, we discovered that the script no longer worked with the LifeHacker site, with or without the suggested change.
Before I could get to it (darn RealWork getting in the way again), Tyler went and worked out what other changes had to be made to the GreaseMonkey script and sent them to me (thankyou!).
For you conspiracy theorists, I should let you know that when I posted the previous version back in November, I sent an email to the LifeHacker folks telling them about it. I was under the naïve impression that they might want to let their users know. Hah! Not a dickie bird. Nothing posted to their web site. No acknowledgment whatsoever.
And now we find that the old script doesn't work anymore because they've changed their website layout!
Coincidence? I think not. Let this be just our little secret this time.
[Technorati Tag: GreaseMonkey]
( Jan 24 2008, 01:21:12 PM PST ) [Listen] Permalink Comments [3]
Your Ultimate Hacking Tools
|
Hack a Day have an interesting post today. It's a contest.
Here's the challenge: Given a budget of $600, put together the best hacking workbench you can. Don't include computers or the actual bench in your budget. Oh, and you have to spend it all. |
This is for hardware hacking. See the comments to their post, for the replies so far. I should probably wait until they pick the five winners to see which tools I should add to my collection.
This post got me thinking about the ultimate tools for software hacking. Hacking in the nice sense of the word. If they are open source and/or freely available, then the cost would just be time not money.
So if you have any recommendations on your essential tools for your hacking arsenal (especially if your code in Python), please feel free to comment. If I get a sufficient response, I'll summarise in a future post.
( Jan 23 2008, 02:32:11 PM PST ) [Listen] Permalink
Roly Poly Pot Redux
|
I saw this post on Hackszine about a flower pot that will tilt over when it needs water. |
That's cute, but why stop there? One article below it shows how you can use an Arduino board for helicopter control to stabalize the roll and pitch.
Why not combine the two? When the pot tips, the Arduino detects this and waters the plant. Hopefully the pot goes back to vertical and the water is turned off.
Now that would be a neat hack!
( Jan 16 2008, 09:42:32 AM PST ) [Listen] Permalink
Wii Remote and Nunchuck Projects
|
If you can pry the Wii Remote and/or Nunchuck from your child's hands, then you could possibly use it for one of these interesting projects: |
- Using the Boarduino with a Wii Nunchuck
Makes use on one of the accelerometers in the Nunchunk, and Lady Ada's Arduino clone, plus a small servo motor. There's the potential for a lot more then this interesting proof-of-concept prototype.
- Wiimote projector whiteboard
No need to buy one of those expensive interactive whiteboards. Build your own!
- Wiimote head tracking desktop VR display
More amazing stuff from Johnny Chung Lee. Also check out his projects page.
Let's hope the Wii parts all go back together again afterwards.
( Jan 08 2008, 12:04:34 PM PST ) [Listen] Permalink Comments [2]
ListsofBests List of Lists GreaseMonkey Script
|
Friday is hacking day, so here's another little GreaseMonkey hack. Previously I'd created a GreaseMonkey script that would take one of the ListsofBests lists and turn it into a plain text list, making it easier to read. |
This new script will take their list of lists, for the awards, definitive or personal lists categories, for Books, Music, Movies, Places, People or More, and turn it into a simple list of links. No having to click through numerous pages to get to something you might be interested in. No web site bling to distract you. Just the list.
If you're like me, (because these lists do take a while to regenerate, especially for the personal categories), you'll then save away a copy and bookmark it.
If I get enthused, the next step is to adjust the script so that clicking on a list entry will expand that list inline, rather than going off to the actual list web page and then using my other GreaseMonkey script.
But that's for another day. Back to RealWorkTM
[Technorati Tag: Greasemonkey]
( Dec 14 2007, 09:02:54 AM PST ) [Listen] Permalink
LifeHacker Category Viewer GreaseMonkey Script
|
Another GreaseMonkey script, this time to list out all the posts under all the categories at the LifeHacker web site. |
If you running Firefox and have installed GreaseMonkey and this script and have it enabled, then if you visit their archives web page, it'll do its thing.
Note that there are a lot of posts there and most of them have been cross-categorized, so this will take a long time. It generates a new web page that's over 3.8Mb when saved. It also loads a lot of extra web pages very quickly which must be disruptive to their web servers.
If anybody can tell me how I can adjust this script to "throttle back", I'd very much appreciate it.
[Technorati Tag: GreaseMonkey]
( Nov 26 2007, 01:02:20 PM PST ) [Listen] Permalink
Coloring Your Own Roller Blog Comments
|
Somebody was asking today on our internal blog users alias, on how they could color their own comments to make them stick out more. Seems that this is a built-in feature for WordPress. Here's an example (see comment #11). |
GreaseMonkey to the rescue. If you are running Firefox and have GreaseMonkey installed, then install this script.
Unfortunately this isn't one of those script's that'll "just work". Each blog owner is going to have to customize it for their blog. There are two lines to change:
Change
richbon this line:@include http://blogs.sun.com/richb/*to be the name of your blog.
Change
Rich Burridgeon this line:var userName = "Rich Burridge";to be the user name that is automatically inserted in the comment form for you by the roller software.
Now (hopefully), when you display one of your blog entries and there are comments by you, then you should see them with a light blue background and black text. If somebody can come up with some better CSS (which shouldn't be too hard), then they need to just adjust these lines:
div.style.backgroundColor = "CEEBEB";
div.style.color = "black";
But this was a thirty minute hack so that's what you get for a first version.
Hopefully a future version of the roller software will have a feature that just does this automatically.
[Technorati Tag: GreaseMonkey]
( Nov 19 2007, 09:02:41 PM PST ) [Listen] Permalink Comments [5]
Working GreaseMonkey Script For Expanding All Hackszine Categories
|
With great help again from Tyler Trafford (thanks!), there is now a GreaseMonkey script that will automatically expand all the categories on the Hackszine website, even those that have several pages worth. |
This is useful to see all the posts they've created in the past, if you aren't exactly sure what you are looking for.
If you are running Firefox and have GreaseMonkey installed, then just install this script
Now if you go to the Hackszine website, it'll automatically reconfigure the page for you. Note that this is doing the equivalent of loading over 120 more web pages, so be patient with it.
If you just want the normal behavior when you visit the Hackszine site, then just disable the script (right click on the monkey icon on the right hand side of the Firefox status bar).
[Technorati Tag: GreaseMonkey]
( Nov 07 2007, 03:58:06 PM PST ) [Listen] Permalink
GreaseMonkey Script To Improve Blogs.sun.com Recent Posts Display
|
Like several other people, I find that the current design of the blogs.sun.com home page frustrating, when it comes to trying to read the summaries of recent posted entries. |
It initially only shows you ten entries, but you can click on a "See All" link, then you get the most recent 25 entries. If you want to see (say) the most recent 500 entries, you have to page through 19 more pages.
GreaseMonkey to the rescue. If you are running Firefox and have GreaseMonkey installed, then install this script
Now when you click on "See All", you will see the most recent 500 entries. Note that it'll take a few moments to construct the new page. It's interactively adding in the other 19 pages.
I'm sure this can be improved, but it was a quick-n-dirty hack.
Hopefully the blogs.sun.com people will come up with a proper fix.
[Technorati Tag: GreaseMonkey]
( Oct 16 2007, 09:44:43 AM PDT ) [Listen] Permalink
Improved Lists Of Bests Book List To HTML Script
|
See yesterday's post for the background on this. |
There is now an improved version of the script. See the Change Log at the end of the script for the changes made.
I've used it to regenerated the Pulitzer list. I then set it loose on the 1001 Books You Must Read Before You Die list. I've no idea who put this together, but it really needs work. Entries were incomplete or incorrect. Most promoted a specific version of the book. It's Persuasion by Jana Austen not Persuasion (Penguin Modern Classics). And so on...
After I did extensive editing of the list, and multiple runnings of the script to adjust the entries to make it quicker to select the correct title, I ended up with this text version, which created this HTML version. Even now, it's not always picking the best Amazon entry. Sometimes it selects one that doesn't have a book cover image.
It can process this list in about 32 minutes. Most of that time is spent doing the Amazon lookups to get the ISBNs. Even now there are still seven books on the list that I can't find:
- Adjunct: An Undigest - Peter Manson
- The Taebek Mountains - Jo Jung-Rae
- Disobedience - Alberto Moravia
- A Day Off - Storm Jameson
- The Last Days of Humanity
- The Stechlin - Theodore Fontane
- On the eve - Ivan S. Turgenev
I'll need to borrow Boxall's book from the library again to see if I can work out what they really should be.
[Technorati Tag: GreaseMonkey]
( Oct 07 2007, 02:40:39 PM PDT ) [Listen] Permalink Comments [7]
Convert Your Lists Of Bests Book Lists To HTML Web Pages With Amazon Links
|
I've created a Python script that will take one of the Lists of Bests book lists generated by my GreaseMonkey script (see a previous post for more details on this), and convert it to an HTML web page with Amazon links for each book. |
To use it you will need to have installed the Lists of Bests GreaseMonkey script. Then go to the Lists Of Bests book list you are interested in. The GreaseMonkey script will automatically convert it to a simple text-like list. You should then cut and paste it into a text file and save it.
You will also need to edit the make_HTML_list.py script and adjust the amazonAccessKey line to your Amazon Access License Key.
To use it simple run:
% python make_HTML_list.py < <your-book-list.txt> > <your-book-list.html>
The script uses the Amazon web services to try to work out the book's ISBN from the title and the author. It tries to do as much as it can to make sure this works, but the simple fact is that some of the entries in these lists are incorrect or incomplete. If it can't find the ISBN, it uses the standard Amazon "no image available" image and doesn't generate a link.
Here's an example. Taking the Pulitzer prize list in text form, it generates this HTML web page.
There is a debug flag near the beginning of the script, that if set to True, will generate copious debug messages that should help you determine what the book title in the list should have been.
I have a problem with this script that I haven't been able to solve yet. Pointers on how to fix it would be most appreciated.
- Given a book's ISBN, there is usually a corresponding image of it at
http://ec1.images-amazon.com/images/P/XXXXXXXXXX.01.MZZZZZZZ.jpgwhere XXXXXXXXXXXX is the ISBN. This is certainly true for recent books. Not so true the older the book is. I'd like to try to detect if one of those image files exists. I've written a
checkImageroutine that I'd like to return an indication of whether that image file exists. I haven't worked out how to do it yet and for now, this routine is always returning True. This results in a small blue "dot" for non-existent images. I really want to replace this with the Amazon "image not available" image.
Tips on how to improve the script and/or the Python code are most welcome too.
[Technorati Tag: GreaseMonkey]
( Oct 06 2007, 12:07:59 PM PDT ) [Listen] Permalink Comments [4]
An Alternate Del.icio.us Export / Backup Webpage
|
del.ic.ious allows you to export / backup your bookmarks to an HTML file. That file is just one long list of links. |
What I wanted was a web page with my tagcloud at the top, and each tag in the tagcloud taking me to a section of the same web page which contained all my bookmarks with that tag. I realize that del.icio.us sort of gives you this with the tag cloud on the righthand side of their web site, but having a local copy would be much faster.
So I wrote a small Python script that does this. It writes the new web page to standard out. You can then just view it in your web browser.
If others want to use it, you'll need to adjust the delBookmarkFile definition of line 33 to point to the location of your exported del.icio.us bookmark file before you run it.
It doesn't save or use all the information that del.icio.us exports, but it does generate a smaller faster web page.
The look & feel could be improved with some nicer CSS.
Suggestions welcome.
( Jun 14 2007, 07:45:08 AM PDT ) [Listen] Permalink
New Version Of the Get TV Listings Script
|
For background on this, see a previous post. I've adjusted the Python script to now automatically email you the results rather than pump the results out to standard output which then had to be piped to the mailx program. |
If you are interested in using this script, you will first need to do the
setup as described in the previous post. You will also need to adjust the
emailAddr variable in the script (as well as the TV_GRAB,
XMLTV_FILE and programs ones), to suit your needs.
( Apr 18 2007, 11:29:10 AM PDT ) [Listen] Permalink



















