Download NetBeans!

20070817 Friday August 17, 2007

Identifying Words in HTML Documents

If you study Tim's Wicket tag parser, which underlies the navigator mentioned in yesterday's blog entry, then you'll find it's not very hard to adapt it to your own purposes. For example, here's the same parser in action, but slightly modified. Here, instead of looking for Wicket tags, it looks for all words in the document and then prints them to the navigator:

Why might this be useful? Well, it's now a small step to a spell checker. The user would specify a file containing words, the words would be compared to the words in the document, and all the words that are not found in the file would be printed to the navigator. And those words, because they don't match the words in the file, would be the ones that are incorrect in one way or another. And that's all a spell checker should tell you, i.e., which words are incorrect. I've made a spell checker before, using annotations in the editor, but I haven't been able to find the code. Plus, I prefer this navigator approach to adding still more annotations to the editor. So, watch this space for HTML spell checker developments.

Aug 17 2007, 10:02:57 AM PDT Permalink