Out of the Woodwork

« Looking My Best | Main | How Open is Too... »

http://blogs.sun.com/woodjr/date/20070126 Friday January 26, 2007

A quick word about "A quick word about Googlebombs"

Google has just announced that they have tweaked their search algorithm in a way which "has begun minimizing the impact of many Googlebombs." I'm not sure whether I think that's a good thing or not. On one hand, susceptibility to any artificial manipulation of search results is probably bad. On the other hand, a little light-heartedness is one way that Google has always stood out as a company.

I have no such mixed feelings in looking at how Google announced this change, however. I think it's pathetic. Their blog entry essentially just says that the change is algorithmic and "very limited in scope and impact." Good intro, but how about some details?

Google Bombs worked in the first place because Google's search algorithm assumes that what people say when they link to a page can be used to better understand that page. That idea is an important piece in the search puzzle, and I'd like to understand how their new algorithm changes impact it. Presumably, being "very limited in scope and impact" means that they somehow detect and ignore "bad" context in some links (which match some Google Bomb profile) while still paying attention to "good" context in other links? Again, that sounds good (if my presumption is correct), but why not be more forthcoming with exactly what's being done? We all deserve to know if and how wording around hyperlinks impacts the target URL's status in search results.

I realize that Google is in a very competitive space. Keeping a lead over the likes of Microsoft and Yahoo (if you believe they're leading) requires that Google keep some technical secrets to itself. But the key word is some. There is value in allowing everyone to understand the basics of how a key service such as Google search works. Their core PageRank technology fundamentally depends on us all "voting" with our hyperlinks. And as I've mentioned before, I think that there is an obligation to allow its "electorate" to learn how to best use those votes. That can certainly be accomplished without giving away every detail of their technology. But I think it requires more detail than just telling us that something is algorithmic and low-impact.

Comments:

The argument in favor of disclosure is less clear-cut to me than to you.

Referring first to your "other hand"; I see no evidence to suggest that Google's lightheartedness is any less now that they've found a way to muffle the impact of some of the few-hundred extant Google bombs. So far as I can tell, they tweaked their search algorithm so that it better detects what people say when they link to a page, and successfully weeds out or downplays what <em>non</em>-people (such as mechanically generated copies) say about the page. Isn't this simply a mild improvement in the search engine, akin to a better spam filter?

For example: A certain well-known prank played on a former senator has <strong>not</strong> been defused. Whether one finds the prank humorous or appalling, the is that it still stands, and the "miserable failure" prank does not. Because of Google's stellar reputation, I infer that the primary page devoted to the prank enjoys more actual net currency than do any pages devoted to the hapless ex-senator.

But to address your main point: I'm guessing that a team of intelligent humans at Google went through several hundred phrases that are Google bombs and several thousand similar phrases that are non-bombs, and that they found certain differences between the factitious pages constituting the bombs (apparently generated by automated processes), in contrast to the real pages constituting the non-bombs (apparently generated by individual persons). Suppose the differences found were any of the following:

  • For a Google-bombed phrase, referring pages in aggregate have a mutual-refer-to coefficient of 0.003 to 0.012, whereas for a non-bombed phrase, referring pages in aggregate have a mutual-refer-to coefficient of 0.21 to 2.75. If Google publishes this fact, Google bombers simply design their referring pages to have more mutual references, defeating the rule.
  • For a Google-bombed phrase, the universe of referring pages has a set of mean pagesizes whose standard deviation falls in the range 0.13 to 0.16 times the mean, whereas for a non-bombed phrase, the universe of referring pages has a mean pagesize whose standard deviation is 0.35 to 0.55 times the mean. If Google publishes this fact, Google bombers simply add varying amounts of meaningless verbiage to their pages, defeating the rule.
  • For a Google-bombed phrase, the distribution of domains amongst the referring pages, when normalized against the known domain traffic, resembles a Poisson distribution, whereas for a non-bombed phrase, the normalized distribution resembles a Gaussian distribution. If Google publishes this fact, then makers of other search engines might realize that calculating and caching statistics on domain distribution greatly increases performance and reliability, and Google would lose the equivalent of a trade secret.

I'm not suggesting that any of the above three is a serious possibility of how the algorithm was tweaked; I only mean to show plausible reasons for not disclosing the exact nature of the tweakage.

Posted by Paul Carpenter on January 26, 2007 at 05:17 PM MST #

Hi Paul. Thanks for the very thoughtful comments.

In regards to the "lighheartedness" of the company, I just get the feeling that their attitude used to be that Google bombs weren't a big deal because they only impacted obscure terms and were just another form of information from valid web pages. For example, in a 2004 USA TODAY story, Google seemed a lot less "up tight" about the issue:

But Craig Silverstein, Google's technology director, says, "Our philosophy is that there's no need (for Google) to do anything." Even search results fueled by "Google bomb" campaigns "are appropriate," Silverstein says. "There's an association that people make from the word or phrase to the results."

But again, my main concern is not about the life or death of Google bombs. It's about whether or not Google tells us all the basics of how their search service works.

You're right that full disclosure of their new algorithm will be used by some to search for a way to defeat it and produce new Google bombs. They're going to do that anyway. By avoiding the disclosure of any substantive details, Google is at best just delaying them a bit. Some may say that's a good thing, but others would just say it's akin to "security through obscurity" (which is generally looked upon with low regard). You either have good protection or you don't.

In this case, good protection would mean a system where "bombing" a search result would require so much quantity and authority in references that could no longer be considered "manipulation" and would instead be "the will of the web." After all, no one complains that "legitimate" links are manipulating search results (even though they do impact them).

Of course, one can argue that the absence of a reason for avoiding disclosure isn't the same as having a reason to explicitly pursue disclosure. True. But I think there is such a reason here. Google has, in many ways, taken a central position as the Web's election commissioner. Through technologies like PageRank, we all have a voice in determining which pages are "best." But in the end, it's Google (and to a lesser degree, Microsoft and Yahoo) which makes the final tally. When there is some change which can impact how votes are counted, it's only fair that they give the public some reasonable level of guidance about the change (and how to make sure that their legitimate votes will be counted).

Posted by Jamey Wood on January 30, 2007 at 10:31 AM MST #

Post a Comment:
  • HTML Syntax: NOT allowed

This is a personal weblog, I do not speak for my employer.