Why Downloads Fail (Part 1)
We find product managers look long and hard at the completed download percentage information for their products. We track this data carefully for every product on the Sun Download Center, java.com, and a few other venues that supply Sun software, and we provide self-service reporting tools for viewing the results. The formula is pretty simple (at least on the surface): (downloads initiated / download completed) * 100 = download completion percentage (or success rate). For example, if 100 customers start a download and 75 complete it, the completion rate is 75%.
Of course, product managers want to see 100% and often come knocking on our doors if the percentages are significantly below that. As a result, we've investigated this subject in extensive detail and for a number of years now, and I think we've got some interesting and useful answers. I'm going to start sharing our findings, as I hope this'll be of interest to anyone involved with ESD (electronic software distribution). Due to time limitations, however, I'm not in a position to write it all up at once, so I'll have to post as time allows (thus, this is Part 1!).
One of the first steps is to define and gage the measurement system, making sure the results represent accurately what we think they do. As a result of discussions with other ESD specialists, we know that these results are neither defined nor measured the same. In fact, because we make our download infrastructure available to other applications at Sun via web services, we don't always measure the same even within our company. So here's how it works with the Sun Download Center:
A download initiation is captured when the user clicks a download link. We attach a unique identifier to this action and to the link. As a result, we can track and aggregate multiple clicks on the same link by the same user, so they count as one initiation. (Note the difference from other systems that might count every click on the same link as a separate initiation.)
A download completion is when our download servers report back to our application that the user has received (in one end-to-end download) a number of bytes equal to the size of the file.
Now if you know how download managers (DLM) work, you'll realize they really complicate things when it comes to tracking download completions. Because with a DLM the user can pause or stop a download and come back later, we can under-report downloads when users don't complete in one end-to-end pull. (We are currently putting a fix in place on the SDLC reports to mitigate this issue.)
Further complicating things is that download managers can break a file into separate threads and may not download a file in order. If you measure completion by "last byte delivered" (another popular method), this isn't necessarily accurate either. That's because a DLM can deliver the last byte to a user but have an intermediate byte range fail, thus the user never completes the download!
We've brainstormed ways to achieve exact precision, even with DLMs, but the expense is unreasonable. Fortunately, because we do millions of downloads a month, small fluctuations don't have a measurable impact on the big picture. The way we do it today meets the business requirements in terms of tracking overall success rate, identifying problem products, and occasionally surfacing other issues we'll discover and fix.
Lastly, I wanted to point out that completion rate differs if measured by file or by product. The above formula works fine for files but doesn't work for products that consist of multiple files. I'll explain...
A large product may be broken up into smaller segments to make downloading easier. (For example, the Solaris 10 Operating System DVD ISO image is over 2 GB in size so broken into smaller segments for downloading.) The user recombines the segments after download, then installs the product. So if you want to know the success rate for a product (as opposed to a file), we need to measure that based on the lowest completion percentage for all the required files that make up the product. Continuing with the Solaris 10 example, let's say there are 5 segments to download. The customer must have all 5 segments in order to install the product, or the "product download" can't be successful. So if the success rate for segments 1-4 is 80% but only 65% for segment 5, the product download success rate is 65%.
OK, so now we understand (hopefully) some basic definitions and concepts about how we do this type of reporting. In the next post, I'll start to drill down into the causes that affect these success rates.
The (a?) customer side of this is...
Posted by Mike Gerdts on November 10, 2005 at 07:38 PM PST #
Usually I do not have an X session open to the machine where I really want the download to go to. As such, a nice GUI tool that supports drag and drop does me little good. I typically access the remote machines via ssh (Cygwin with OpenSSH).
If I am likely to lose my session before it completes, I will use "nohup wget -O file.zip 'url'". Since Sun does offer something like Windows terminal services (hint - offer SunRay server with Solaris, provide a Java SunRay client) I would guess that most people don't have a good way to "nohup" a GUI.
Posted by Mike Gerdts on November 11, 2005 at 03:51 PM PST #
Posted by Anthony Bryan on November 17, 2005 at 06:46 PM PST #
Some download managers support downloading a single file in "segements" which is really multi-threaded downloading. I know this is useful for open source downloads that may be hosted on multiple servers. The download manager can then pull different segments (byte ranges) of the same file from different servers -- in theory (and I'm sure it happens in practice as well) this speeds up the overall download.
You are correct that SDM doesn't support multi-threaded downloads at this time, and we appreciate the suggestion. Actually it's already on our list for enhancements, though I don't have a time table for a new release. Fortunately, as you mention, our servers are running very fast nowadays so this isn't an issue with Sun downloads.
Posted by Gary Zellerbach on November 18, 2005 at 08:37 AM PST #