All | 43 Folders | Accessibility | BoingBoing | Books | Computer Related | Family | Films | General | Hacking | Hobbies | Humor | Java | Links | Omni | OpenSolaris | Puzzles and Games

« Christmas Break 2006 | Main | P. D. Q. Bach »
20070102 Tuesday January 02, 2007

Turn Your Roller Blog Into A Book

Overall, I'm very pleased with the Roller software that we use here at Sun to hold all our blogs. It's clear that from a company perspective it scales well (currently serving up 2762 weblogs for 3055 users).

From an individual point of view, there are a couple places where it's not as simple and straight forward as I'd like.

  1. The handling of resources: all those files I've uploaded (mostly small images that appear at the beginning of each post). There are over a thousand of them now. They are managed in a flat file system which makes the page updates slow and hard for me to find the exact file you are looking for. You can't tag or annotate them. It doesn't even prompt you if you are about to overwrite an existing file. There is definitely room for improvement here although I'll note that this is not a problem for most Sun bloggers who don't use a lot of resources.

  2. Seeing all of your blog as a single entity: there is a limit on the number of blog entries that can appear on a single page. What I'd like to be able to do is view (and or save) the whole of my blog as a single file. I don't know of a way to do that.

Until now.

Over the Christmas break I wrote a very simple Python script that will take all of my blog entries that I'd backed up locally using Grabber and turn them into a single HTML file. Currently it's 2.7Mb for 916 blog entries over 2 ½ years.

(If others want to use the script, you will need to adjust the blogPostsDir and title definitions near the beginning. Note that there is minimal bullet-proofing in the script. If you find any problems, please let me know.)

What this means now is that I can easily determine where broken links are. I've already fixed up the broken image links in my blog and I'll regenerate the single HTML file in a little while. Over the next few days, I also plan to find out how many broken web page links are there and see how easy it would be to fixup the important ones.

I also wanted to see what the blog looked like as a PDF file. I've no intention of self-publishing it, but I was curious to see if I'd written something that was novel length yet.

I converted it two ways:

It's not War and Peace but it's getting up there. It certainly has more laughs.

[]

[]

( Jan 02 2007, 08:18:44 AM PST ) [Listen] Permalink Comments [5]

Comments:

Neat!

Posted by John Clingan on January 02, 2007 at 09:00 AM PST #

To generate nice PDF from HTML, I tend to use HTMLDOC. It's pretty good at generating a Tables of Content by parsing the HTML headers (1 to 6) to build the hierarchy. It also preserves links.

Posted by Martin-Éric on January 02, 2007 at 10:42 AM PST #

Hi Martin-Éric. Thanks! I'll give it a try.

Posted by Rich Burridge on January 02, 2007 at 02:25 PM PST #

Fyi, I just tried htmldoc on the single HTML file I'd created (see link in blog post). It stopped after generating 113 pages of PDF. It didn't handle the tables very well and failed to include lots of the images I'd used.

Maybe I've just got bogus HTML, but that's not a great excuse. OOo does a much better job of the conversion.

Posted by Rich Burridge on January 02, 2007 at 05:47 PM PST #

Hi, another tool to convert from HTML which preserves links and so on is html2ps (http://user.it.uu.se/~jan/html2ps.html). It does not work all the time, but may be worth a try! Rgds, Daniel

Posted by Daniel on January 03, 2007 at 12:41 AM PST #

Post a Comment:

Comments are closed for this entry.