Moving A Petabyte of Data
(With apologies for the headfake of posting this entry then taking it down - my fingers were working faster than my brain, and I accidentally posted the entry without completing it. Or proofreading it.)
I made a speech last week at which I asserted it was faster to send a petabyte of data from San Francisco to Hong Kong by sailboat, than by the internet.
I got quite a few "how can that possibly be true?" kinds of questions, so here's the math. (Full disclosure, I am a mathematician by training, which guarantees me a lifetime of small "off by one" errors in all subsequent calculations - so if I get something wrong, be gentle).
A petabyte is a thousand terabytes, which is a million gigabytes, or a billion megabytes. Or 8 billion megabits. With me so far?
So if you had a half megabit per second internet connection, which is relatively high in the US (relatively low compared to residential bandwidth available in, say, Korea), it'd take you 16 billion seconds, or 266 million minutes, or 507 years to transmit the data. Can you sail to Hong Kong faster than that? At a full megabit, just divide the time in half. Even at a hundred megabits (about the highest, generally available, of any carrier I've seen), it's a few years.
As Hal Stern once said to me, "Never understimate the bandwidth of a station wagon full of storage driving down the [New] Jersey Turnpike" - and now you understand why tape based storage has such a lasting appeal to so many enterprises recording, compiling, transporting or just plain archiving, very large quantities of data. From video surveillance to trading data. Standard tapes are 500GB each (currently), and fit nicely into cardboard boxes with overnight express labels.
One other big benefit to tape as an archive format? When the data's at rest, it consumes no electricity - just imagine a petabyte of data spinning on even the most power efficient disk storage (for reference, a petabyte of active disk-based storage is the equivalent of more than 40 Thumpers, each drawing more than a kilowatt - and tipping the scales at something north of 150 lbs, slightly tougher to put on a sailboat, or in an overnight envelope). For data to be available, disks have to be kept spinning and cool (tape has no equivalent requirement).
Now there is no one hammer for all nails, and tape isn't perfect for a lot of applications (near line storage, eg) - but it plays a prominent role in some remarkably cutting edge high performance computing applications, along with social networking and content aggregation sites (who think nothing of gathering terabytes of data every day) - tape archive isn't just for banks or telcos running mainframes (although we're good there, too).
So yes, at least for now, it's faster to send a petabyte of data via a sailboat than the internet (at least defined by the bandwidth to which most of us have access).
Which btw, is another reason we're refreshing our Solaris on DVD program - it's more efficient for many folks to get a 4 Gigabyte DVD in the mail (for FREE) than nurse our download centers, a megabit at a time. (And I apologize for how slow the DVD deliveries have been - we haven't exactly executed perfectly here, but hopefully it's getting better as I type.)
And I don't want to even think about moving a zettabyte.
Posted on 05:48PM Mar 12, 2007 | Comments[36]


















Posted by Azrul on March 12, 2007 at 07:09 PM PDT #
Posted by Vasu Vattipalli on March 12, 2007 at 07:24 PM PDT #
Posted by Hanh Nguyen on March 12, 2007 at 10:16 PM PDT #
Posted by John Birrell on March 12, 2007 at 10:43 PM PDT #
Posted by Peter on March 13, 2007 at 12:29 AM PDT #
How long it will take to put one petabyte on tape? Or are you planing to send the only copy on that boat? What if the boat sinks??!!
Just to make my position clear: I agree that tape is the current best long term storage. At least until other feasible solution emerges. Perhaps if holographic storage delivers its promises, it can be a good alternative solution.
Posted by Moby Dick on March 13, 2007 at 01:42 AM PDT #
Posted by Evgeny Kibalko on March 13, 2007 at 04:29 AM PDT #
Posted by Jim H on March 13, 2007 at 05:10 AM PDT #
Posted by Anonymous Coward on March 13, 2007 at 09:35 AM PDT #
Posted by Laxman on March 13, 2007 at 11:44 AM PDT #
Posted by old stk guy on March 13, 2007 at 02:15 PM PDT #
Posted by Gerald Wise on March 13, 2007 at 02:25 PM PDT #
Posted by Serge on March 13, 2007 at 03:35 PM PDT #
Posted by John McLaughlin on March 13, 2007 at 05:17 PM PDT #
Posted by Steve on March 13, 2007 at 05:58 PM PDT #
Posted by Dakshina on March 13, 2007 at 11:43 PM PDT #
Posted by md on March 14, 2007 at 02:57 AM PDT #
Posted by apt on March 14, 2007 at 03:09 AM PDT #
Posted by Jimmy Lin on March 14, 2007 at 08:39 AM PDT #
Posted by SMS on March 14, 2007 at 10:11 AM PDT #
Posted by Zsolt Horváth on March 14, 2007 at 11:51 AM PDT #
Posted by Stephen Rossi on March 14, 2007 at 02:15 PM PDT #
Posted by Serge on March 14, 2007 at 02:44 PM PDT #
Posted by Neil Davis on March 15, 2007 at 03:59 AM PDT #
Posted by Kevin Hutchinson on March 15, 2007 at 05:44 AM PDT #
Posted by Anonymous on March 15, 2007 at 08:29 AM PDT #
And why a sailboat when one of your BlackBoxes looks so much better? ;-D
Posted by Gianni on March 15, 2007 at 08:38 AM PDT #
Posted by Dustin Wallace on March 15, 2007 at 12:51 PM PDT #
Posted by Tim Cook on March 15, 2007 at 05:00 PM PDT #
Posted by Kimberly King on March 16, 2007 at 07:32 AM PDT #
Posted by Serge on March 16, 2007 at 10:29 AM PDT #
The sailboats are the computer?
The Internet2 is currently faster than sailboats, and commercial Internet providers are upgrading their networks to handle today's and tomorrow's data as fast as customers and investors are willing to pay.
The current verified record for transferring bulk data over a very long distance (Tokyo -> Pacific -> US -> Atlantic -> Amsterdam) is about 8.8 gigabits per second (search: Internet2 "land speed record"), which could transfer a marketing petabyte in 909090 seconds or about 10.5 days. A plane full of tapes is still faster, but sailboats can't even come close.
If someone builds a terabit per second network across the Pacific (search: "Trans-Pacific Express"), and fully dedicates the bandwidth to the transfer of a petabyte of data, it would take 8000 seconds, or a little over two hours, at an initial cost of about $500 million for a new cable and computing and storage equipment on each end.
When I think of fast sailboats, I somehow think of Larry Ellison. While he may never have the technology to build a wind-blown sailboat this fast, he has the financial means to buy a company that has a boat that will install a fast cable for him, or at least lease an 10Gbps wave off of and installed cable for far less.
Posted by Eric Ziegast on March 16, 2007 at 11:47 AM PDT #
Posted by Wesley Parish on March 17, 2007 at 04:55 AM PDT #
Posted by Anshu Sharma on March 17, 2007 at 07:07 PM PDT #
I received my Solaris 10 DVD kit yesterday morning, the download time has been a bit of an issue for downloading operating systems, which is why I stuck with Solaris 9 for such a long time. I am not sure if I will upgrade my home web server to Solaris 10, as it is ran on an Ultra 5, and I am not sure if it will run too efficiently, however I will give it a go on my i386 machine, could you maybe recommend anywhere where I could get some newer SPARC hardware within the UK at a lower price? I have tried several sites including ebay and ITSupplies.net however I question the credibility of ebay, and ITSupplies.net seem to be overpriced. Unfortunately I have very little money as I am only in college and working in a weekend job.
Anyway I tend to agree that we are somewhat limited by the bandwidth that we can fit through our intercontinental connections. But I think in the UK and USA the main problem is the speed of client connections, Here in the UK most of us are stuck on 2mbit unless you are lucky enough to live in London or Manchester. I guess in some parts of the US the situation is worse as exchanges and cabling to more remote areas could be limited. The limitations on bandwidth of data lines limit the technology that could evolve over the internet, for instance if we could subscribe to 100mbit lines cheaply then the possibilities would be somewhat never ending, especially for the media industries to deliver content. We wouldn't be restricted to the same TV channels and unrecognized publishers and artist could become more recognized, simply because downloading one of their movies would be so much faster and therefore you would be more likely to watch it rather than waiting for a download to complete. So overall I think that the limitations of internet connections are holding us back, and that ISPs and telecommunications providers should be more concerned with technological advances, rather than trying to upgrade existing and often out of date systems.
Posted by Rob Putt on March 18, 2007 at 12:23 AM PDT #
Posted by Wesley Parish on March 19, 2007 at 03:52 AM PDT #