First Cut At A GreaseMonkey Script For Expanding All Hackszine Categories
|
One of my favorite sites is Hackszine. I've been keeping up with the new entries by monitoring their RSS feed, but I also wanted to see what other interesting posts they'd created in the past. |
The search functionality on their web site is fine if you know what you are looking for, but I wanted a way to quickly summarize everything they've ever done.
Again, GreaseMonkey to the rescue. I've created a first cut at a script that does this. Currently the script will extract a list of all the categories from their main web page, and then follow each of those URL's and extract a list of all the posts on that first page of each of those categories. It will then generate a page of lists, one for each category.
There is still the problem that some of the Hackszine categories have multiple pages worth of entries. In such a case, there is a paragraph just before the entries that will say "Page 1 of <N>" From that we could work out how many URL's of the form "http://hackszine.com/blog/archive/<category>/<N>.html" we would need to look at and process, but I can't currently work out how to easily read those pages and embed the results into the new page that is being created, because the number of such category sub-pages is initially unknown and the results are also being returned in an asynchronous manner.
Several of their posts are also filed under multiple categories so there will be duplication, but I don't see an easy way out of that, unless we do extensive process of each entry.
Still, it's useful now and this kind of approach should also work for extracting a summary of other web sites (like LifeHacker or 43 Folders), so once I've got it fully working for Hackszine, it should be trivial to generate similar GreaseMonkey scripts for those other interesting sites.
[Technorati Tag: GreaseMonkey]
( Nov 06 2007, 01:39:25 PM PST ) [Listen] Permalink
Comments are closed for this entry.












