Friday February 11, 2005 | SysBlog Notes from Storage R&D |
|
OOO Well I have no hair left. OpenOffice paragraph numbering is completely frustrating. I have one document where no combination of numbering features will provide a basic nested numbered outline (1., 1.1., 1.1.1., etc). In fact there are 3 different ways to set numbering (format an outline, change a paragraph style, format paragraph) and they all seem completely disconnected.
What's Up With EMC Centera?? A couple interesting articles just came out... 1) We've known from the beginning that Centera was pretty slow, but that was typically ok for the archival-oriented applications it was serving. We think this has mostly to do with the asymmetry of the clustered architecture, and something that they can only incrementally improve through the use of faster Linux boxes. 2)Then we started hearing about hash collisions in the field causing data loss. The statisticians have always reported that the odds of a hash collision are extremely small (like 1/(2^big number)), but that assumes that data is random. In fact the applications are extremely homegeneous (eg 100 million 10KB check images) and now collisions are happening. Hash collisions can affect the unwary in different ways so it's not clear what mechanism is causing the data loss, perhaps the de-duplication process that deletes redundant versions of what are perceived to be identical files. Perhaps it's a read operation that opens the wrong file. Either way, this is a black-eye for EMC that will continue to get blacker, and is causing many vendors to scramble back to their designs. Fortunately our product is built assuming there will be collisions, thus handles them nicely. 3) Now they admit that the namespace only supports up to 400M objects. Boy, two black eyes in one day. Anthony had a good calculation..."240,000 emails/hour or 5.76M emails/day or 11.52M Objects/day will exceed Centera 400M limit in 34.72 days". Now it should be clear why EMC is asking customers to containerize their emaiul records to reduce file count. If you really want a glimpse into the day-to-day issues of Centera, check out the CAS Yahoo group at http://groups.yahoo.com/group/CASTechGroup/. ( Feb 09 2005, 03:05:59 PM PST ) Permalink Comments [1]
Content Addressable Storage (CAS) CAS is an interesting new approach to storage [well actually it's been around a while, just not broadly commercialized] where a crypto hash "checksum" is calculated for every file that's stored. The hash algorithm might be the lately maligned 80-bit MD5 or a 160-bit SHA1 or something different. The adjective "Content Addressable" is relatively misleading since it implies that the hash is used as a file handle. Yet there are no systems that work like that today, including Centera for which the CAS acronym was invented. Filepool, Scale8, and a variety of other researchy SSP-oriented storage designs did use hashes for internally-facing file handles, but none of those things are commercially deployed today afaik.
Why use hashes? Well the real reason is that it simplifies the design for a storage system intended to be scalable by eliminating the need for distributed lock management across a clustered system. The serendipitous benefit is that it typically means objects are immutable, something useful for SEC-regulated applications.
So the term CAS is not particularly accurate, is aligned pretty strongly with EMC, and is not particularly benefits oriented so their term will die a slow death over the next couple years. In fact SNIA has already changed the naming of their respective committee away from CAS. Airplane wrecks Spend some time researching local aviation disasters. There's something very emotional about standing on the site of an aviation disaster. It's amazing how little can be left of a large airliner that hits the ground at 200 mph. The fragility of those craft is something to respect. The biggest bay-area disaster was the United DC-6 that hit Tolman Peak near Fremont in the 60's. I was up there but couldn't find any trace. I have 2 friends that have crashed their aircraft, a third who died. Dave used his parachute over Altamont, broke a few vertibrae. Lynn stall-spun on the test flight of an experimental racer. Wayne was filming a movie and turned his ag-cat into the wrong canyon. Several other friends have landing-light souveniers in their hangars. Bill cartwheeled spectacularly at the Moffet show last year and walked away from it (god bless Curtis Pitts). My closest call was aileron flutter over TCY that broke the wing-attach, bent 2 pushrods, and cracked 2 spars. Good thing the runway was 3000 ft away ;-) ( Feb 07 2005, 09:00:54 AM PST ) Permalink Comments [2] In the beginning So here we go with our first shot at blogging....Hmmm I don't feel any different. ( Feb 07 2005, 08:56:31 AM PST ) Permalink Comments [0] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||