Out of the Woodwork

« Previous month (Jan 2007) | Main | Next month (Mar 2007) »

http://blogs.sun.com/woodjr/date/20070321 Wednesday March 21, 2007

Sun Related Trends

Dan Farber has a post talking about some Q&A with Jonathan Schwartz at a recent "Mashup Event" at a Sun campus.

His last paragraph is what caught my attention:

Sun's stock price has been trending upwards, and would seem to correlate with what Schwartz said he found in checking Google Trends for keywords associated with Sun–such as NetBeans, GlassFish and Niagara–are up and to the right. "Word of mouth is a way more efficient than buying ad words," he concluded. Based on the Google Trends chart below, it's unclear just high and to the right the keywords are trending.
Google Trends Graph for Sun-related Terms in 2006

(The chart that Farber references.)

I think his chart is a bit misleading. For one thing, it only covers 2006. We're almost a full quarter into 2007. So I think it's worth looking at that data:

Google Trends Graph for Sun-related Terms in 2007

(Same chart, but covering 2007.)

If you look closely at the 2007 picture, you'll see that the Sun-related terms (NetBeans, GlassFish, and Niagara) do all trend up. That's better, but this chart still suffers from a second problem: combining so many terms with such different search volumes makes it hard to see the trends. In other words, you shouldn't have to look so closely.

So let's look at charts for each of the Sun-related terms individually. (Also note that these charts use Google's "All Years" time period, since I don't want to run into the afore-mentioned issues with just seeing 2006 or 2007 data.)

Google Trends Graph for "NetBeans"

(Google Trends Chart for the "NetBeans" term.)

Google Trends Graph for "GlassFish"

(Google Trends Chart for the "GlassFish" term.)

Google Trends Graph for "Niagara"

(Google Trends Chart for the "Niagara" term.)

That's better. Both Niagara and GlassFish clearly do demonstrate "up and to the right" trend growth in these pictures. The Niagara picture might be seen as showing a stagnant overall trend, but I think that too can be addressed if we dig a little deeper.

The "Niagara" term is too ambiguous. While we at Sun (and hopefully any of you reading) think of it as the code name for our UltraSPARC T1 processors, most of the rest of the world thinks of it as a waterfall (or, as the dictionary tells us: a river, a fort, or a variety of grape). Searches from people seeking those kinds of "Niagara" are going to clutter up the trend chart (from our perspective).

So, let's instead look at some less ambiguous terms related to this Sun product. How about the actual server models which use the Niagara processor--the Sun Fire T1000 and T2000?

Google Trends Graph for "T1000"

(Google Trends Chart for "T1000" term.)

Google Trends Graph for "T2000"

(Google Trends Chart for "T2000" term.)

A bit better, perhaps. I realize that any true skeptics out there will argue that these show stagnant or even declining trends since 2006. But I think there are a few reasons to give this product line the benefit of the doubt. For one thing, it's likely to have people's searches be spread across many different terms (such as "Niagara," "CoolThreads," "UltraSPARC T1," "T1000," and "T2000"). Second, a throughput server such as this might appeal most to people who will already be familiar enough with Sun that their searches for it will take place directly on sun.com sites more often than on general search engines such as Google. And finally, Sun has publicly released sales figures which do demonstrate a lot of traction and momentum for these servers. For a hardware product, revenue probably trumps Google Trends as a momentum indicator.

By digging deeper, we have seen that there definitely is an up trend for two of these offerings (NetBeans and GlassFish). You may or may not feel that we've also established momentum for the Niagara offering. But even if you do discount that one, two out of three isn't bad. I'd say it's pretty supportive of Jonathan's original statement that we're seeing good momentum for Sun-related offerings.

http://blogs.sun.com/woodjr/date/20070320 Tuesday March 20, 2007

I'm a Mac. I'm a PC. And I'm Linux.

There's no denying the brilliance of Apple's "I'm a Mac ... and I'm a PC" commercials. Ted Haeger does a nice job of explaining how they put the Mac in the best possible light by playing off of our existing perceptions, "framing" the conversation in favorable either-or terms, and by just being funny and clever. Whether you like the product or not, you've got to appreciate its marketing.

Ted goes on to look at attempts to redirect the popularity and momentum of the ads, such as with spoofs inserting a Linux character. As he notes, these probably haven't done a very good job of making Linux look its best.

(Though in all fairness, I think the above was clearly intended just to be funny--not as an attempt to mold Linux's public image.)

Ted's clearly an optimist, though, and has set out to create his own spoofs which do make Linux look good. He describes in great detail how he and others at Novell tried to break the "either-or" framing of Apple's original commercials with a spoof casting Linux as a sexy female (though not too sexy--see his blog for the full reasoning).

The results are interesting, as is Ted's description of the thought process behind them. But I walked away thinking about one detail he didn't address. This was the work of Novell? As in the company which is well on its way to destroying any credibility it may have once had with the Linux community?

I could be wrong, but... Don't they have more immediate concerns than trying to sell Linux to the masses?

Does Google Track Search Result Clicks?

A lot of bloggers are talking about Google's patent application for a method of ranking blog Search results. As Bill Slawski and Alex Chitu have noted, these break down into a set of factors which provide positive and negative scoring influences. I won't repeat them all here, but I did find one of the positive factors particularly interesting: the implied popularity of a blog, as determined from click stream analysis in search results.

In other words, if users consistently click on a result from Blog A more often than one from Blog B when both show up in the results for a given search (such as on blogsearch.google.com), it can be seen as an indication that Blog A is more popular and/or of higher quality than Blog B. Pretty obvious stuff. Right?

Sure. And it's also pretty obvious that the same idea can be applied to non-blog resources (such as general web results returned by www.google.com or image results from images.google.com).

The question is... How would Google actually obtain this data?

Normally, the page which presents a hyperlink isn't notified when it's clicked. There are ways around this (such as using special javascript or pointing the hyperlink to an intermediate "redirector" service), but I don't see any evidence in Google's pages that they're employing these mechanisms in their regular search results (though paid ads are a different matter).

So when you click on a Google search result, Google should never know it.

But wait... There is a good chance that they do know it. If you use Google's toolbar and enable the "PageRank Display" feature, they'll know about this click (and all of your others, for that matter). Of if the final destination happens to use certain of Google's server-side services (such as AdSense or Google Analytics), they'll likewise know about it (and all other access to that site).

So does this imperfect but growing view of users' behavior on non-Google sites provide enough data to plug into their search ranking algorithms? Probably. And it's one more example of how a web giant such as Google is gaining a "moat" of data which guards against smaller competitors.

http://blogs.sun.com/woodjr/date/20070316 Friday March 16, 2007

Product Quality Heatmaps

Read/WriteWeb has an interesting look at heatmap visualizations. In particular, they focus on Summize, a site specializing in product reviews.

Summize allows users to vote on the quality of products. What's new and interesting is how they present those voting results--with heatmaps. Here is an example:

The colored stripe is a heatmap showing what percentage of users think the iPod Nano is great (the green: 44%), what percentage think it's wretched (the red: 12%), and those in between (the orange, yellow, and yellowish green). One nice thing about this visualization is that it works well even when the heatmap image is small. So, for example, they use a scaled-down version of the stripe next to each item in search results.

The end result is a nice way to see and understand a lot of information packed into a small space--the very definition of a good visualization.

http://blogs.sun.com/woodjr/date/20070315 Thursday March 15, 2007

Privacy and the Private Sector

Big Brother Is Watching

How would you feel if you saw this headline on a search form? I bet the "I'm Feeling Lucky" button would take on a whole new light, for one thing.

In many ways, it's already happening. Major search engines keep records of every one of your searches. Tracing these records back to you depends on many factors: whether you've received a tracking cookie by logging into other services from that company, whether your ISP has assigned you a static IP address, whether you use a large or small ISP, and more. But the core point is this: by retaining search logs, these companies place your privacy at risk.

Google recently announced that they will be anonymizing search logs after 18-24 months. It's better than their old approach (retaining all information indefinitely). But is it good enough? Your searches in the last 18-24 months probably add up to a pretty interesting picture. It can be scary to think how accurate that picture might be. Even scarier is thinking about where its accuracy would be be an illusion.

Take the case of Thelma Arnold, for example. She is the 62-year-old widow who was identified from "anonymized" search records which AOL deliberately exposed in 2006. She's not a terrorist, a drug dealer, or a sex addict. So she shouldn't have anything to hide. Right?

Maybe.

As the NY Times article reports, "Her search history includes 'hand tremors,' 'nicotine effects on the body,' 'dry mouth' and 'bipolar.'" Yikes. Hope Thelma isn't looking for health insurance... Or life insurance... Or a job with a company wanting to minimize the cost of insuring employees... Or anything else where this picture of her health could be held against her.

The worst part? It isn't a picture of her health at all. It's her friends' health. As the Times article continues: "Ms. Arnold said she routinely researched medical conditions for her friends to assuage their anxieties. Explaining her queries about nicotine, for example, she said: 'I have a friend who needs to quit smoking and I want to help her do it.'"

But aren't Ms. Arnold and the foolish release of AOL's search records a special situation? No company would follow in those footsteps after seeing the grilling AOL took. Right? Maybe. But why do they leave the possibility open by retaining these logs? Could one disgruntled employee expose the logs to harm the company? Could a failing company sell off the logs as a final way to salvage assets? Could one company become so large and involved in so many different fields that the Big Brother scenarios we fear could occur entirely within its own corporate boundaries?

Or could widespread tracking and sharing of online activity data just become a standard part of business? Look no further than our all-important credit reports to see how the monitoring of our personal information can become deeply ingrained into the private sector. Is it really so far-fetched to imagine a similar system built on information culled from our online activities?

George Orwell was brilliant in highlighting the importance of privacy to everyone (not just "bad guys" with something to hide). He was brilliant in foreseeing the clash between technology and privacy. Did his one error come in choosing a villain? Maybe the government isn't the primary threat.

Maybe Big Brother will be born out of Big Business.

http://blogs.sun.com/woodjr/date/20070309 Friday March 09, 2007

How Much is the "Blog Worth" Meme Worth?

Sun is currently experiencing an outbreak of "How Much is Your Blog Worth" references (1, 2, 3, 4, 5, 6). They aren't the first, though... Rich Burridge got into the act back in November, and Rich Sharples was truly ahead of his time with a mention back in October 2005.

It's a fun (though meaningless) way to look at your blog's popularity. But I wonder... What happens if we ask the tool about its own worth? Will it devour itself in an infinite loop of self-examination?

Guess not...


The "blog worth" page is worth $3,916,213.98.
How much is your blog worth?

And given that this guy has over 35,000 pages linking to him (and plasters his page with advertising), maybe this is one case where the number really does have some meaning.

So providing bloggers with a Monopoly-money valuation of themselves turns out to be a $4M idea? Scary thought.

http://blogs.sun.com/woodjr/date/20070308 Thursday March 08, 2007

The Scientific Strengths of Nations

Thumbnail Image of "The Strengths of Nations Data" Visualization

W. Bradford Paley has created some fascinating and beautiful "Map of Science" visualizations. They show how scientific fields (or "paradigms") relate to one another, based on how often academic papers in each area reference one another. There is even an offer to provide a nice poster of their work for just the cost of shipping and handling.

The latter provides the best version of the base image, in my opinion. It includes category labels (such as "Quantum Physics" and "Biochemistry"), which make it easier to see high-level trends. For example, it appears that their algorithm places Computer Science in closer relation to Brain Research than to Math--something which I find interesting (assuming I'm reading the chart correctly).

But I find a second picture even more interesting. "The Strengths of Nations" visualization uses the same technique, but creates a separate image representing the scientific work of different countries. By comparing different countries' images, we can see where countries over-weight or under-weight work in different scientific fields. One example, as Paley explains, is:

Even at this gross reduction, you can see image variations relating to how the US treats science (the large map: heavy in the Medical Sciences at the lower left) and, say, China (top of the rightmost column: heavy in Physics, the nodes at the upper right).

Interesting, isn't it?

By the way, when looking at the high-res versions of these images, be sure your browser isn't scaling them down (or you won't see much). To avoid scaling, you may need to click on the image a second time once it comes up in your browser. And to give credit where it's due... I found these via a post on infosthetics.com.

http://blogs.sun.com/woodjr/date/20070307 Wednesday March 07, 2007

Google Maps Super-Zoom

Yes, that really is a screenshot of a Google Maps view showing a couple of guys, their camels, and a yak. Philipp Lenssen of Google Blogoscoped has details showing how it's possible to zoom-in beyond the normal limit for the satellite views of certain areas in Google Maps. (If you're really impatient or skeptical, here is a direct link to the Google Maps view.)

Unfortunately, it doesn't look like they have similar high-res imaging for the Java logo on Sun's SCA14 Building. If anyone in the area happens to be a pilot (or has some other way of obtaining such an aerial image), let me know. I could probably weave it into our own map.

http://blogs.sun.com/woodjr/date/20070306 Tuesday March 06, 2007

Is NoFollow Misnamed or Not?

Conventional wisdom is that the rel="nofollow" mechanism is misnamed. As the current version of the NoFollow Wikipedia article says:

rel="nofollow" actually tells a search engine "Don't score this link" rather than "Don't follow this link." This differs from the meaning of nofollow as used within a robots meta tag, which does tell a search engine: "Do not follow any of the hyperlinks in the body of this document."

But... Recently Matt Cutts (a Google specialist in SEO issues) has contradicted that. Specifically, a forum participant asked:

...does nofollow really prevent Google from crawling a page?
And Matt responded:
...if a page would have been found anyway via other links, it doesn't prevent crawling of that page. But I believe that if the only link to a page is a nofollow link, Google won't follow that link to the destination page.

So he's saying that rel="nofollow" really does mean "don't follow" (at least to Google), and that the conventional wisdom (and Wikipedia article) are wrong?

Is that right? It'd be nice to have a definitive answer, given the "I believe" opening in Matt's statement.

http://blogs.sun.com/woodjr/date/20070305 Monday March 05, 2007

World's Oldest Blogger

Olive Riley is 107 years old. She started a blog last month (with some help on the typing and technical details).

So far, readers have left 450 comments on her blog. Technorati shows that 300 other blogs now link to it. And it's been a very popular feature on the front page of Digg.

Not bad for just over two weeks of blogging.

And please do actually read the entries. They're quite interesting. I learned about Shandy (a drink made of half beer, half lemonade), why zoning restrictions are prohibiting lettuce farming in an agricultural area near Ms. Riley, why she voluntarily had her teeth removed in her early thirties, and why she already has fresh olives planned for her menu in 2010.

http://blogs.sun.com/woodjr/date/20070304 Sunday March 04, 2007

The 100 Oldest Currently-Registered .COM Domains

Digg just featured a list of the 100 oldest currently-registered dot-com domains. Sun and IBM are tied for #11 (having both registered on March 19, 1986). HP's domain is older by just over two weeks. And Microsoft doesn't even make the list.

http://blogs.sun.com/woodjr/date/20070302 Friday March 02, 2007

NoFollow Considered Harmful?

I've noticed a fair number of people recently calling the rel="nofollow" mechanism a failure and calling for its end. Loren Baker is one such voice, with a post called "13 Reasons Why NoFollow Tags Suck". Andy Beal is another, with a post entitled "Google’s Lasnik Wishes 'NoFollow Didn’t Exist'".

I'm on the opposite side of this argument. As I mentioned a while back, I think that web pages need even more control over the "voting intent" of hyperlinks. So instead of sending NoFollow to its grave, I'd like to see it extended (though probably with a new name and format, such as the Vote Links microformat).

I don't want to re-hash that discussion today. Instead, I want to examine the most prominent argument from the anti-NoFollow crowd: that it just doesn't work. Comment spam has increased in blogs since the time when NoFollow was introduced. Because of that, these people argue that NoFollow is an outright failure and isn't needed in the first place because any good blogger is vigilant in moderating comments.

Again, I disagree. Of course comment spam has increased. Blogging and spamming both have little barrier to entry and high growth. It was inevitable that comment spam would increase, even if the benefit to the spammer for each instance was reduced (which NoFollow ensures, by eliminating any PageRank bonus). But that growth alone doesn't mean that NoFollow is a failure. If a disease grows, do we assume that all related medical treatments and research are failures and should be stopped?

Comment spam would be even worse if the NoFollow mechanism didn't exist. Its practitioners would be multiplied because every shady marketing guide around would be touting "amazing benefits" of using blog comments to increase one's standing in Google.

Even if I'm wrong and NoFollow has done nothing to reduce comment spam, at least it has protected the quality of search results. Google isn't the only one with a vested interest in maintaining quality search results. We would all suffer if we had to go back to the "bad old days" of low-quality web search.

What about the idea that any good blog will have vigilantly moderated comments and make NoFollow irrelevant? Good moderation of blog comments is very important. But the argument that it can displace NoFollow assumes that blatant spam is the only threat. As I mentioned in my "Hyperlinks as Votes" entry, a PageRank-style system in part depends upon us each voting in our own "name" (URL). Without NoFollow, that system breaks down with hyperlinks coming from your URL which aren't spam but also aren't something you would intend to positively endorse.

Suppose I post a comment on your blog with a link back to an entry of my own which is completely relevant but disagrees with you at every turn. It isn't spam. And unless you're particularly thin-skinnned, you probably shouldn't exercise your moderation power to delete it. But should search engines interpret that link to be your positive vote for the quality or importance of my page? And even if you think it should, would you want that vote to be of the same strength as one given to something which you directly referenced in the body of your post?

It isn't time for NoFollow to go away. It's time for it to grow up into something more powerful and expressive.


This is a personal weblog, I do not speak for my employer.