Friday Jun 13, 2008

I've just posted a new piece of Minion documentation about how search results highlighting works.

It's kind of complicated, but then again getting the highlighting that you want is kind of complicated. The short version is: if you have a set of query terms and a document that you want to highlight that contains (some of) those terms, then:


  1. Tell the passage retrieval API what fields you want to highlight and how to treat the passages in that field.

  2. Use the passage retrieval algorithm to find a set of passages.

  3. Pull out the highlighted passages and display theme.

Using the passage retrieval algorithm to find a set of passages has some handy side effects like it easily handles things like finding morphological variations of the query terms.

A major improvement for this version over previous versions, is that the process of figuring out how to build a passage of a particular size (e.g., you want to display a 500 character passage from the body of an email message) is a lot more robust.

This blog copyright 2009 by searchguy