Analysis of my first 300 posts
|
This is my 301st post to this blog. I thought I'd do a quick-n-dirty analysis of the first 300 posts. This is using a simple Perl script I found on the web. This isn't perfect as I had to use the "Save As..." feature in Mozilla to save all my entries as a text file, and that insists on giving you the URL's of all the links. So those have been analysed as well. |
I'll give you the totals first:
Total characters are: 692329 Total words: 113612 Average words per sentence is: 9.26310640032613 Average length of words is: 6.09380171108686
Here are the top 20 word counts (no real surprises here):
The word the was seen 4737 times The word to was seen 2732 times The word a was seen 2383 times The word http was seen 1989 times The word of was seen 1984 times The word and was seen 1822 times The word i was seen 1528 times The word com was seen 1462 times The word in was seen 1325 times The word that was seen 1282 times The word www was seen 1244 times The word roller was seen 1118 times The word it was seen 1086 times The word richb was seen 1080 times The word this was seen 1015 times The word for was seen 965 times The word is was seen 878 times The word on was seen 811 times The word page was seen 797 times The word was was seen 787 times
Looking at the report it generated, I noticed that I've used 12803 different words in writing the blog.
What would be better would be to first remove all the URLS, then analyse all words of (say) five letters or more, after first removing common words like "roller", "comment", "richb" etc. Maybe something for another day. Or another 300 posts.
( Dec 20 2004, 01:11:02 AM PST ) [Listen] Permalink
Comments are closed for this entry.











