We find product managers look long and hard at the completed download percentage information for their products. We track this data carefully for every product on the Sun Download Center, java.com, and a few other venues that supply Sun software, and we provide self-service reporting tools for viewing the results. The formula is pretty simple (at least on the surface):  (downloads initiated / download completed) * 100 = download completion percentage (or success rate). For example, if 100 customers start a download and 75 complete it, the completion rate is 75%.

Of course, product managers want to see 100% and often come knocking on our doors if the percentages are significantly below that. As a result, we've investigated this subject in extensive detail and for a number of years now, and I think we've got some interesting and useful answers. I'm going to start sharing our findings, as I hope this'll be of interest to anyone involved with ESD (electronic software distribution). Due to time limitations, however, I'm not in a position to write it all up at once, so I'll have to post as time allows (thus, this is Part 1!).

One of the first steps is to define and gage the measurement system, making sure the results represent accurately what we think they do. As a result of discussions with other ESD specialists, we know that these results are neither defined nor measured the same. In fact, because we make our download infrastructure available to other applications at Sun via web services, we don't always measure the same even within our company. So here's how it works with the Sun Download Center:

A download initiation is captured when the user clicks a download link. We attach a unique identifier to this action and to the link. As a result, we can track and aggregate multiple clicks on the same link by the same user, so they count as one initiation. (Note the difference from other systems that might count every click on the same link as a separate initiation.)

A download completion is when our download servers report back to our application that the user has received (in one end-to-end download) a number of bytes equal to the size of the file.

Now if you know how download managers (DLM) work, you'll realize they really complicate things when it comes to tracking download completions. Because with a DLM the user can pause or stop a download and come back later, we can under-report downloads when users don't complete in one end-to-end pull. (We are currently putting a fix in place on the SDLC reports to mitigate this issue.)

Further complicating things is that download managers can break a file into separate threads and  may not download a file in order. If you measure completion by "last byte delivered" (another popular method), this isn't necessarily accurate either. That's because a DLM can deliver the last byte to a user but have an intermediate byte range fail, thus the user never completes the download!

We've brainstormed ways to achieve exact precision, even with DLMs, but the expense is unreasonable. Fortunately, because we do millions of downloads a month, small fluctuations don't have a measurable impact on the big picture. The way we do it today meets the business requirements in terms of tracking overall success rate, identifying problem products, and occasionally surfacing other issues we'll discover and fix.

Lastly, I wanted to point out that completion rate differs if measured by file or by product. The above formula works fine for files but doesn't work for products that consist of multiple files. I'll explain...

A large product may be broken up into smaller segments to make downloading easier. (For example, the Solaris 10 Operating System DVD ISO image is over 2 GB in size so broken into smaller segments for downloading.) The user recombines the segments after download, then installs the product. So if you want to know the success rate for a product (as opposed to a file), we need to measure that based on the lowest completion percentage for all the required files that make up the product.  Continuing with the Solaris 10 example, let's say there are 5 segments to download. The customer must have all 5 segments in order to install the product, or the "product download" can't be successful. So if the success rate for segments 1-4 is 80% but only 65% for segment 5, the product download success rate is 65%.

OK, so now we understand (hopefully) some basic definitions and concepts about how we do this type of reporting. In the next post, I'll start to drill down into the causes that affect these success rates.

Comments:

The (a?) customer side of this is...

  • It is common for multi-segment downloads (such as DVD images) to not complete them in one session. In the first session I may get segments 1 and 2, then come back an hour later or the next business day to get the rest of the segments.
  • The machine that my browser runs on is a Windows laptop on an oversubscribed wireless LAN (ok for email, and ssh, not ok for DVD images) that is nowhere near where I really need the image at. There is no way that I want to download a couple gig to my laptop then transfer it to the place it really needs to be. As such, a text-based tool (like wget) needs to work well. Because of long URL's, ampersands, and multiple redirections, it is non-trivial to copy a link from a browser into a terminal window and form the appropriate wget command.
  • Very few of the .iso images that I download ever make it onto a CD or DVD. It would actually be easier for me if I had a netinstall image for each OS release. After using wget to transfer them to a lab machine in a data center, I use lofiadm (and occassionally dd to extract slices... ugh!) to mount the images and create netinstall images.
  • Lately (this is a huge compliment) I don't think that many of the download problems have been on the Sun side. There have been times where I tried to download a file from Sun into my company's intranet and found it incredibly slow. I then fired up a download of the same product to a machine in my home and it was pretty much at the full rate that my cable modem can do. When "the rest of the internet isn't slow" at work, I commonly get over 1 mbit/sec from Sun downloads. Very impressive. This is a very different story than it was a while ago (pre S10 launch?).

Posted by Mike Gerdts on November 10, 2005 at 07:38 PM PST #

Usually I do not have an X session open to the machine where I really want the download to go to. As such, a nice GUI tool that supports drag and drop does me little good. I typically access the remote machines via ssh (Cygwin with OpenSSH).

If I am likely to lose my session before it completes, I will use "nohup wget -O file.zip 'url'". Since Sun does offer something like Windows terminal services (hint - offer SunRay server with Solaris, provide a Java SunRay client) I would guess that most people don't have a good way to "nohup" a GUI.

Posted by Mike Gerdts on November 11, 2005 at 03:51 PM PST #

I know the SDM supports segmented downloading, but it doesn't seem to support downloading segments from alternate locations (ie a segment from each server). While Sun seems to have really fast downloads from their servers, other places aren't so speedy. This could be really useful for multi-gigabyte ISO files.

Posted by Anthony Bryan on November 17, 2005 at 06:46 PM PST #

I think it's a matter of how you define "segments". Sun Download Manager (SDM) supports downloading of whole files. For a large DVD image, for ex., it may be broken into smaller files that are "segments". You can load up SDM with all the segments at once and it will download them sequentially.

Some download managers support downloading a single file in "segements" which is really multi-threaded downloading. I know this is useful for open source downloads that may be hosted on multiple servers. The download manager can then pull different segments (byte ranges) of the same file from different servers -- in theory (and I'm sure it happens in practice as well) this speeds up the overall download.

You are correct that SDM doesn't support multi-threaded downloads at this time, and we appreciate the suggestion. Actually it's already on our list for enhancements, though I don't have a time table for a new release. Fortunately, as you mention, our servers are running very fast nowadays so this isn't an issue with Sun downloads.

Posted by Gary Zellerbach on November 18, 2005 at 08:37 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by Gary Zellerbach