All | 43 Folders | Accessibility | BoingBoing | Books | Computer Related | Family | Films | General | Hacking | Hobbies | Humor | Java | Links | Omni | OpenSolaris | Puzzles and Games

« Turning off comments... | Main | Zillions of Games »
20041220 Monday December 20, 2004

Analysis of my first 300 posts

This is my 301st post to this blog. I thought I'd do a quick-n-dirty analysis of the first 300 posts. This is using a simple Perl script I found on the web. This isn't perfect as I had to use the "Save As..." feature in Mozilla to save all my entries as a text file, and that insists on giving you the URL's of all the links. So those have been analysed as well.

I'll give you the totals first:

Total characters are: 692329
Total words: 113612
Average words per sentence is: 9.26310640032613
Average length of words is: 6.09380171108686

Here are the top 20 word counts (no real surprises here):

The word the was seen 4737 times
The word to was seen 2732 times
The word a was seen 2383 times
The word http was seen 1989 times
The word of was seen 1984 times
The word and was seen 1822 times
The word i was seen 1528 times
The word com was seen 1462 times
The word in was seen 1325 times
The word that was seen 1282 times
The word www was seen 1244 times
The word roller was seen 1118 times
The word it was seen 1086 times
The word richb was seen 1080 times
The word this was seen 1015 times
The word for was seen 965 times
The word is was seen 878 times
The word on was seen 811 times
The word page was seen 797 times
The word was was seen 787 times

Looking at the report it generated, I noticed that I've used 12803 different words in writing the blog.

What would be better would be to first remove all the URLS, then analyse all words of (say) five letters or more, after first removing common words like "roller", "comment", "richb" etc. Maybe something for another day. Or another 300 posts.

[]

( Dec 20 2004, 01:11:02 AM PST ) [Listen] Permalink

Comments:

Post a Comment:

Comments are closed for this entry.