Web Analytics Analyzed
Strupp's Weblog
All | General

20090824 Monday August 24, 2009

Cookie Retention Rates I've decided that I no longer care about cookie deletion.

I now care about cookie retention.  It's more valuable from a business stand point to know how long users keep their cookies than to know how frequently they delete them.  After all, if you are trying to say something about long term conversion rates, or other customer behavior that drags on for a while,  it's easier to  think in the positive.  At least,  the cow playing the ukulele in my head has been much happier since I started looking at the data this way.

Below is a chart that shows what percent of users returning after X days on the same computer still have the same cookie.

The data shows a very sharp drop off after just a few days (72% retention after 7 days) with a slowly decreasing rate of decay as the time span gets longer.  After a year the cookie retention rate levels out at around 20%. 

This means that if a user is going to delete their cookie they do it fairly quickly.  After about 30 days the rate of decay becomes much slower suggesting a different mechanism is at play.  One can suppose that the daily, weekly, and monthly deletion mechanisms are more deliberate, probably cookie deletion software and manual deletion.  The longer term decay could be less deliberate, nonsystematic mechanisms, like accidental deletion or deletion for some isolated specific cause like fixing one's browser settings to repair a technical problem.

This data is important to understand when considering business processes that take a long time.  Consider, for example, a B2B conversion process that takes six months from initial click through to closing a deal.  The highest conversion rate you could hope to measure would be around 35%.  In other words, if 100% of the users who clicked on your ad on January 1, made a purchase six months later on July 1, your measured conversion rate would be around 35% because that's how many users still have the same cookie.

Conversely, if your measured conversion rate after 180 days was 10%,  then the actual conversion rate was more like 10%/0.35 = 29%. 

Now, you would have to be pretty brave to stand up in front of a room and actually scale up your conversion rates based on cookie retention rates, but the point is that you can at least point in the right direction as to what your real conversion rates might be.

This finding is also important for optimizing intelligent systems which deliver personalized content to users based on anonymous cookie (non logged in) behavior.  Basically, any learning mechanism which takes more than 30 days to fine tune recommended content for the end user is largely shooting in the dark.

How Did We Make this Calculation?

The data is based on a first party metrics cookie on a software developer forums site.  It is the same data that I've been discussing in my last few blog posts.

Each data point in the chart is calculated as follows. 

For 1 day elapsed time, we compared data from January 2 to January 1.  We identified logged in user names who had logged in on Jan 2 and Jan 1.  We kept just return user names who had the same IP, OS and browser to try to limit the sample to return users on the same computer. 

Then we compared for the two dates, the visitor IDs for each return user name and counted how many were unchanged. 

We repeat that calculation for another 1 day elapsed time increment of Jan 3-Jan 2.  Repeat again for the segment of Jan 4-Jan 3, and so on to create 364 one day segments for the year. 

We sum the total number of return user names and return user visitorIDs, respectively, of the 364 individual segments. Cookie retention for 1 day elapsed time is then calculated as the ratio of the number of return visitor IDs divided by the number of return user names.

Can you see where this is going?

We start over to calculate the retention rate for 2 day elapsed time.  Take the data from Jan 3 and compare to Jan 1.  Count return users names and visitor IDs. Take data from Jan 4 and Jan 2. Count. Repeat this 363 times, add up the segments and take the ratio. Record that as the cookie retention for 2 day elapsed time.

Etc.

Sounds like a lot of calculations, and it is.  But that's what student interns are for.  I was fortunate to have a brilliant young intern majoring in applied math, Garrett Clark, who did this work and deserves most of the credit for the results in this posting as well as the earlier cookie deletion data I posted in my blog.  He tells me that he was up past midnight every night doing these calculations by hand, but I think he's lying.  I'm pretty sure he used some clever programming, MySQL, and spreadsheet tricks to automate the calculations.   He may have indeed been up past midnight every night, but it was more likely at the local brew pub.

As we (well, Garrett) iterated the calculation for longer elapsed times, the sample sizes decrease which leads to increased scatter in the data. For example, for 1 day elapsed time the sample size was 3372 return user names, but after 180 days the sample size had dropped to 19.

The sample size of return users gets smaller as you analyze longer elapsed times because fewer users are likely to return for the longer elapsed times. In addition, there are fewer segments available to provide data. A 10 day elapsed time has 365-10=355 data segments while the 360 day elapsed time only has  365-360=5 segments.

You'll also notice that the fit line superimposed on the data does not look like a smooth function.  It isn't.  We weren't able to find a single function which fit the data well across the entire time frame, so the line is mainly there just to guide the eye rather than imply a true fit.   It makes sense, though, that more than one function would be needed to fit the data because, as supposed above, there are likely different mechanisms corresponding to different user behavior modes underlying the data.

It would be interesting to see how this curve might be different for different audiences.  This audience is extremely technical.  It's often been supposed that technical and consumer audiences behave differently and this method would be a good way to compare them.

( Aug 24 2009, 11:02:03 AM MDT ) Permalink Comments [7]

Comments:

Great info, Paul (and Garrett). I will share this with the SMILE team shortly. In reading it, I kept thinking about the point you actually made in the last paragraph. Isn't a developer audience much more likely to delete cookies than a non-developer audience? I hope you will have an opportunity to try this with a "consumer" audience, as you suggest.

Personally, I run the "ccleaner" program once-a-week as part of my computer maintenance routine. I tell it which cookies to keep, and it deletes all other ones that have been set over the course of the week.

And Paul, I'm especially interested in the cow that plays the ukele in your head. What songs does it play??

Posted by Gary Zellerbach on August 25, 2009 at 11:22 AM MDT #

The conventional wisdom is that developers would be more likely to delete their cookies, but some have argued the other way suggesting those with technical experience have less to fear so don't bother.

Regarding the cow in my head, well, here's what it's playing...
http://www.youtube.com/watch?v=oGdlJWfx1GA

Posted by Paul Strupp on August 25, 2009 at 01:43 PM MDT #

Nice post Paul/Garrett

Are your cookies first party, third party or both?

Posted by Brian Clifton on August 25, 2009 at 06:00 PM MDT #

Excellent, and extremely useful data. Has immediate implications for nearly every area of online marketing, affiliate marketing in particular. Thank you for some great food for thought today.

Posted by Geno Prussakov on August 25, 2009 at 06:52 PM MDT #

Brian, this is a first party cookie. And Geno, your eyes are pretty good that you estimated the retention rate after one day to be 84%. It's actually 85%.

Those first few data points are kinda hard to read, so here's the data...
1 day = 85%
2 days = 75%
3 days = 70%
4 days = 63%
5 days = 55%

Posted by Paul Strupp on August 26, 2009 at 08:23 AM MDT #

That drop off seems unusually steep for 1st-party cookies... Just thinking out loud - could it be users behind a NAT (single ip address for entire org), using different machines?

i.e. people sharing their login details with others that appear as a single ip/same username/no cookies....

I can understand visitors deleting cookies after day 1 i.e. sessionizing all persistent cookies. But for this to continue over the course of a week just seems odd behaviour.

The more gradual decay makes sense as over time there is a greater increase of machines crashing, new browser/OS being installed, new computers being purchased etc.

Once again, great data...

Posted by Brian Clifton on August 26, 2009 at 09:40 AM MDT #

Hi Paul & Garrett,

Awesome study! Very nice to see mathematics used this intensively in web analytics. I would love to see more of this going on in our industry. We need more mathematicians!

Posted by Dustin Wallace on August 27, 2009 at 10:12 AM MDT #

Post a Comment:

Comments are closed for this entry.

Archives
Language
Links
Referrers