Friday Sep 19, 2008
Friday Sep 19, 2008
I owe the web analytics community an apology. I was doing some analysis yesterday that made me realize that my Omniture formulas for the hard bounce rate and the soft bounce rate didn't match my definitions. So here are my corrected formulas:
Originally, I had the soft bounce rate as ( [exites] - [single accesss] ) / [visits], then I realized that the definition requires the denominator to be only the multi-page visits. In light of this, the formula can now be stated as the number of exits that were not part of single page visits divided by the number of visits that were not single page visits. Here we are trying to answer the question, "Of visitors who have visited at least one other page before this one, what percentage end their visit on this page?"
I also incorrectly had visits in the denominator for the hard bounce rate, when it should be entries. So the formula in plain english is calculating the number of single page visits divided by the number of entries through that page. Here we are trying to answer the question, "Of visitors who start on this page, what percentage don't visit any other pages?".
One caveat I have thought of is when a visitor visits a page, refreshes the page, then leaves. It is advised that you consider the refresh rate when contemplating the soft bounce rate as it will add to the soft bounce rate when you would probably consider that a hard bounce. Although, not every refresh is a two page visit.
Friday Sep 12, 2008
I introduce you to the moving average. The concept is simple, replace each data point with the average of the last X data points (including that data point). No more looking at graphs like the following and trying to understand the overall trend...
Instead, use a moving average graph like the following twelve-month moving average to filter out seasonal noise and get a clear picture of what is going on...
Omniture SiteCatalyst has a built in feature to do this.
I have a strong warning to give about an ugly mistake you can only make once - Make sure you have enough data points preceding your starting point. For example, in the above graph, the first data point with a calculated moving average is September. For this graph to be correct, the preceding eleven months need to have complete data. For this data set, the month preceding the first October is when tracking began, so the twelve-month moving average can only be calculated twelve months from October.. If I had started my moving average at the first June, then the moving average would be very low because I am missing three months of data in my computation (Jul-Sep).
Wednesday Aug 20, 2008
I'm kicking myself for taking so long to try out del.icio.us. I've been hooked overnight. If you don't use it, all I can say is that you are missing a huge opportunity!
Benefits:
It's these last two items that make del.icio.us an indispensable web analytics tool. So go get a del.icio.us account and listen to your customers.
Tips:
Monday Aug 11, 2008
Now that we've been tracking feeds with FeedBurner for quite a while, the big question is, "How many total subscribers do I actually have?" Don't get me wrong, FeedBurner is great, but I'm astounded that they haven't addressed this question. It's all over the Google FeedBurner group. It's nice to see the daily number of subscribers, but seriously, what's the total? Here's my attempt at estimating the total number of subscribers to your feed with FeedBurner that I posted in my response to the group:
1) Take the average subscribers over the last 7 day, or last 30 days depending on how often you post.
2) Multiply the average subscribers for this time period by the number of days in this time period.
3) Divide the number from step 2) by the total number of posts in the same time period
My assumption is that most subscribers will read each post one time. Those subscribers that read some posts more than once will cause this number to be a little high. This depends a lot on your content too. This also assumes that when they re-read they have to access your feed again, but this differs for various feed readers. I would guess that web based feed readers won't cache all the feeds. However, it looks like Google Reader does because I've tried modifying a feed but the changes don't appear. Feed readers could also cache a feed based on how many subscribers there are according to them by using a lower-limit of subscribers requirement. For example, Google Reader may not cache a feed unless there are over 50 subscribers to that feed with their reader (this is not necessarily true, just an example). In reality, this calculation may be more accurate if most feed readers (web-based or otherwise) do cache feeds.
Tuesday Jul 29, 2008
How about a little insight into web analysts' behaviour instead of visitor behaviour today?
The
web analytics vendor (Omniture) we use provides us with four different
tools that offer varying degrees of complexity. From a "big
picture" tool, ClickMap, all the way to a deep dive tool, Data
Warehouse.
In discussion with a couple of other Sun web analysts recently I made the comment that there are "too many tools. I guess you can't build a house with just a hammer." The tool you need really depends on how much precision, data confidence and expense you want. We do this decision making continually and think nothing of it.
It is very important to consider your confidence level with each tool's output. Most of my work is done somewhere in between with SiteCatalyst and to some extent, Discover. I just don't trust ClickMap very much. The reason I hardly use Data Warehouse is because of time constraints. This is another factor web analysts consider when choosing a tool. I usually don't want to wait days for information. So I usually prefer a lower level of confidence over a long wait time.
Another factor is cost. The smaller the level of confidence, the shorter the wait and the smaller the cost. Cost is really a factor of wait time and is not determined by the level of confidence. The level of confidence is a by-product of the wait time. Quick is cheap and dirty. Lengthy waits beget precision and are more expensive. You can see that precision and level of confidence are intertwined. Cost begets precision which begets confidence level.
What I find interesting is that we usually skip the first step mentally - Cost. We just automatically assume that we can't wait and can't spend money. While the latter is usually true, wait time isn't always as expensive as we think it is. In this fast-paced world, faster is assumed to be better even if precision is lost, so long as our level of confidence doesn't run dry. I guess this is just how business works. We strive for a balance between precision and confidence level to get answers as quickly as possible at the smallest cost.
I'm
curious about how much you think about this on a day to day basis. Do
you prefer low cost, high precision or high confidence levels in your
analysis? In which situations would you prefer one over the others?
Happy analyzing your analysis behaviour!