Web Analytics Analyzed
Strupp's Weblog
All | General

20090824 Monday August 24, 2009

Cookie Retention Rates I've decided that I no longer care about cookie deletion.

I now care about cookie retention.  It's more valuable from a business stand point to know how long users keep their cookies than to know how frequently they delete them.  After all, if you are trying to say something about long term conversion rates, or other customer behavior that drags on for a while,  it's easier to  think in the positive.  At least,  the cow playing the ukulele in my head has been much happier since I started looking at the data this way.

Below is a chart that shows what percent of users returning after X days on the same computer still have the same cookie.

The data shows a very sharp drop off after just a few days (72% retention after 7 days) with a slowly decreasing rate of decay as the time span gets longer.  After a year the cookie retention rate levels out at around 20%. 

This means that if a user is going to delete their cookie they do it fairly quickly.  After about 30 days the rate of decay becomes much slower suggesting a different mechanism is at play.  One can suppose that the daily, weekly, and monthly deletion mechanisms are more deliberate, probably cookie deletion software and manual deletion.  The longer term decay could be less deliberate, nonsystematic mechanisms, like accidental deletion or deletion for some isolated specific cause like fixing one's browser settings to repair a technical problem.

This data is important to understand when considering business processes that take a long time.  Consider, for example, a B2B conversion process that takes six months from initial click through to closing a deal.  The highest conversion rate you could hope to measure would be around 35%.  In other words, if 100% of the users who clicked on your ad on January 1, made a purchase six months later on July 1, your measured conversion rate would be around 35% because that's how many users still have the same cookie.

Conversely, if your measured conversion rate after 180 days was 10%,  then the actual conversion rate was more like 10%/0.35 = 29%. 

Now, you would have to be pretty brave to stand up in front of a room and actually scale up your conversion rates based on cookie retention rates, but the point is that you can at least point in the right direction as to what your real conversion rates might be.

This finding is also important for optimizing intelligent systems which deliver personalized content to users based on anonymous cookie (non logged in) behavior.  Basically, any learning mechanism which takes more than 30 days to fine tune recommended content for the end user is largely shooting in the dark.

How Did We Make this Calculation?

The data is based on a first party metrics cookie on a software developer forums site.  It is the same data that I've been discussing in my last few blog posts.

Each data point in the chart is calculated as follows. 

For 1 day elapsed time, we compared data from January 2 to January 1.  We identified logged in user names who had logged in on Jan 2 and Jan 1.  We kept just return user names who had the same IP, OS and browser to try to limit the sample to return users on the same computer. 

Then we compared for the two dates, the visitor IDs for each return user name and counted how many were unchanged. 

We repeat that calculation for another 1 day elapsed time increment of Jan 3-Jan 2.  Repeat again for the segment of Jan 4-Jan 3, and so on to create 364 one day segments for the year. 

We sum the total number of return user names and return user visitorIDs, respectively, of the 364 individual segments. Cookie retention for 1 day elapsed time is then calculated as the ratio of the number of return visitor IDs divided by the number of return user names.

Can you see where this is going?

We start over to calculate the retention rate for 2 day elapsed time.  Take the data from Jan 3 and compare to Jan 1.  Count return users names and visitor IDs. Take data from Jan 4 and Jan 2. Count. Repeat this 363 times, add up the segments and take the ratio. Record that as the cookie retention for 2 day elapsed time.

Etc.

Sounds like a lot of calculations, and it is.  But that's what student interns are for.  I was fortunate to have a brilliant young intern majoring in applied math, Garrett Clark, who did this work and deserves most of the credit for the results in this posting as well as the earlier cookie deletion data I posted in my blog.  He tells me that he was up past midnight every night doing these calculations by hand, but I think he's lying.  I'm pretty sure he used some clever programming, MySQL, and spreadsheet tricks to automate the calculations.   He may have indeed been up past midnight every night, but it was more likely at the local brew pub.

As we (well, Garrett) iterated the calculation for longer elapsed times, the sample sizes decrease which leads to increased scatter in the data. For example, for 1 day elapsed time the sample size was 3372 return user names, but after 180 days the sample size had dropped to 19.

The sample size of return users gets smaller as you analyze longer elapsed times because fewer users are likely to return for the longer elapsed times. In addition, there are fewer segments available to provide data. A 10 day elapsed time has 365-10=355 data segments while the 360 day elapsed time only has  365-360=5 segments.

You'll also notice that the fit line superimposed on the data does not look like a smooth function.  It isn't.  We weren't able to find a single function which fit the data well across the entire time frame, so the line is mainly there just to guide the eye rather than imply a true fit.   It makes sense, though, that more than one function would be needed to fit the data because, as supposed above, there are likely different mechanisms corresponding to different user behavior modes underlying the data.

It would be interesting to see how this curve might be different for different audiences.  This audience is extremely technical.  It's often been supposed that technical and consumer audiences behave differently and this method would be a good way to compare them.

( Aug 24 2009, 11:02:03 AM MDT ) Permalink Comments [7]


Archives
Language
Links
Referrers