SysBlog
SysBlog
Notes from Storage R&D
All | Aviation | Cycling | General | Storage

20050314 Monday March 14, 2005

CAS is dead, long live CAS!

There's lots of interest among customers and suppliers about CAS and how to address the CAS market. All that interest is not misdirected, because the CAS systems that have been marketed to-date have interesting properties and solve customer problems. However, their interesting properties have nothing to do with the fact that they calculate crypto-hash checksums for the files being stored.

Here are the interesting properties of today's so-called* CAS systems:
  • They are clustered designs that can be scaled horizontally
  • Physical location of files is abstracted, allowing transparent migration & healing.
  • They tend to heal themselves, reducing the need for service
  • When a file is stored, they generate their own unique name for the file.
  • The primary access model is via API, although secondary access via file systems is often supported
  • They are immutable (files cannot be modified)

There is nothing in these valuable benefits about hashing or hash algorithms. These are the properties of Object Archival Storage Systems, which is a far more appropriate way of describing the breed. Dare I propose a new acronym OASS? Well the SNIA committee charged with standardizing these things is working hard on their own answer, and I'll defer to them. They used to call themselves CAS Solutions Initiative (CASSI), but they too have seen the light.

*A point of fact, but none of the CAS systems on the market today are actually CAS. CAS implies that the stored objects are accessed using a hash value computed from the file's contents. But there are not commercially available systems that do this today. For example, EMC's Centera uses a "C-clip" as the object handle, which is an amalgamation of a metadata record and the object hash. Other CAS/OS systems may use other more reliable ways of creating unique object identifiers that have nothing to do with the hash value.

So it would seem that the term "CAS" is meaningless, and we all hope it dies. But Object Storage is here to stay due to its propensity to solve unsolved problems of scale, reliability, and TCO. Somewhere in there is a role for computing hash values, but that feature will be less and less visible to customers, especially as Object Storage moves into a primary storage role.

( Mar 14 2005, 11:22:11 AM PST ) Permalink Comments [1]

20050307 Monday March 07, 2005

Different Models for strategic asset archival

I spent last Tuesday near London participating in a very interesting meeting of the PrestoSpace organization. Prestospace is organized mainly by the good folks at BBC, but it is intended as a knowledge sharing forum for a variety of government and commercial owners of "legacy" media collections. They are putting their heads together to solve some of the challenges of how to take old collections of media (film cannisters for example) and convert them into archives for both preservation and to improve access. What should be made clear is that not all archive applications are the same.

In fact, I describe the various implementation as belonging to 4 categories, each of which will impose slightly different requirements on the storage and serving infrastructure:

1) Heritage archives:
Digital versions of historic or culturally important artifacts including video, images, documents, audio, etc. The data in the heritage archive represents in many cases the primary data of history. There is an increasing expectation that our children should be able to randomly surf this data as they form their own opinions of historic events, uncolored by the author of a textbook. Unfortunately many of these collections are already eroding to dust, unrecoverable from their original celluloid, silver nitrate, wax, or papyrus forms. Curators are now in a race to preserve these artifacts in digital form to protect them for future generations. Customers building heritage archives are particularly worried about cost, long-term data integrity (beyond the life of the media), and changes to their organizations required to support new digital workflows.

2) Compliance archives:
Compliance archives are digital repositories that are employed to support mandated data management policies handed down by various regulatory agencies. Regulations are imposed to insure well-documented business processes or to insure responsible and legal handling of individuals personal information. Example are SEC17a which requires immutability and auditability of trading records. HIPAA requires quality records retention, but it also works to protect individuals' data privacy. Sarbanes-Oxley does not mandate requirements for storage systems, but encourages good archival and audit practices that benefit from archive deployments. In Compliance applications, specific storage features such as immutability (WORM) and retention policies may be required to guarantee regulatory compliance.

3) Repurposing Archives:
These archives are typically found in media markets where an online archive provides the distribution leverage for generating incremental revenue or for lowering the cost of existing workflows. For some media companies with aging collections of master assets, preservation may also be a consideration. In many heritage archives, repurposing eventually turns out to provide tangible ROI as unpredicted distribution models provide opportunities for monetization. An excellent example is the GM Media Archives which were scanned and put online in a revenue generating asset management system. Another example is commonly found in film production where images, characters, and other assets are far easier to leverage for promotional materials or even future film productions if placed online in digital form.

4) Digital Distribution repositories:
Sometimes called origin servers, these are large scale storage systems that are built to facilitate more widespread digital distribution of assets that are otherwise available on library or enterprise shelves. Although digital library initiatives (see diglib.org) often concern themselves with preservation, the bulk of their material is simply books and journals that deserve the benefits of wide-area electronic distribution. It's not a surprise for example that Reed-Elsevier, the leading publisher of scientific journals deploys massive storage repositories. The digital distribution repository is also the core for next generation video-on-demand repositories being planned by broadcasters to provide random access into millions of hours of broadcast video or audio.

If we are to derive a specification for a storage system that is going to serve all of these applications, we need to tackle the following problems:
1) Initial deployments may be small, but grow significantly over time as ROI models are proven.
2) Capital budgets are the most significant barrier to archive deployment, and can vary from year-to-year.
3) The customer cannot afford sophisticated specialists for managing and maintaining the storage.
4) The collection may be too large to back up.
5) The life of the archive is longer than the life of the media. Migration should be easy and painless.
6) The life of the archive may be longer than the life of the data format.
7) Metadata is critical to finding and organizing data.
8) Cost and throughput of scanning efforts are continuing headaches.

So isn't the answer obvious?
We think so.

( Mar 07 2005, 03:18:57 PM PST ) Permalink Comments [0]

20050223 Wednesday February 23, 2005

Hash cracking and storage Announcement of SHA-1 crypto hash cracking here.
Bruce Schneier has a nice dialogue on hash cracking here.

Ancient chinese proverb: "It's not about the algorithm, it's about how you use it."

Storage vendors are continuing to discount market concerns about hash collisions by saying "the odds of hash collision are infinitesmal". Well, I know a customer with 2B objects in storage. Is 1/2,000,000,000 small enough? Yeah I know, 1/(2^80) or something of that form is the statistical answer. The point is that storage systems have to do better. If there is a non-zero probability of hash collision, then the system must accept and welcome hash collisions!

Hashes cannot be used in exclusivity to validate uniqueness of a data object.

( Feb 23 2005, 09:03:35 AM PST ) Permalink Comments [0]

20050216 Wednesday February 16, 2005

Messaging to the world on Honeycomb

Honeycomb is a storage project that started under the management of Bill Joy and Greg Papadopolous in the CTO organization. It started with the assumption that even so-called next-generation storage systems being proposed still don't solve the underlying problems in the large-scale file-storage marketplace.

Here are some highlights from an exhaustive market engagement program that began the same time we began our design.
  • Customers need more economical storage - to buy and own.
    The growth of on-line data, especially “fixed content” data is explosive. Many large scale customers have already passed the Petabyte boundary. For these environments, it's important to reduce the cost of the storage platform by using commodity components. Furthermore, large-scale storage today is more expensive to maintain than it is to purchase. Our customers simply cannot continue to add sysadmins as their data grows, and we need to dislocate conventiontional wisdom on "how many TB can be managed by a single Sysadmin?".
  • Customers need improved reliability, availability and serviceability.
    In today's systems, these characteristics add to storage acquisition cost and to TCO. In the future, RAS needs to be improved while lowering both ownership and acquisition cost. That means a system that will tolerate lots of failures and heal itself appropriately without anyone needing to show up at 2am on a Saturday.
  • Customers need transparent and non-disruptive scalability.
    Our customers are demanding “just in time” storage provisioning. Customers should only have to pay for storage as it's needed, and when it's deployed, scaling must not cause any disruption to the customer's application or clients. The problem is particularly acute for archival applications that scale steadily over time. Utility pricing (the ability to charge monthly just for the GB used), helps in this regard by eliminating the capital budgeting process for customers.
  • Customers need to more easily organize and find data.
    It's clear that when we're talking about millions or hundreds of millions of files, the management and protection of metadata (data about the data) is as important as the management and protection of data. All storage systems today ignore application metadata, and in order to find their data, customers deploy external databases with search capability that carry substantial costs and management burden. If these data attributes are damaged or lost, the data itself is effectively lost. In addition to application attribute metadata, customers increasingly need to track whether the data is obsolete, current, ownerless, mission-critical, or in need of regulatory treatment. Today, expensive humans manage this, but the right system architecture can greatly simplify the process.
What is Honeycomb?

Honeycomb is a collection of hardware and software technologies that solve problems around next generation large-scale “data hungry” applications. That includes better methods for reliability, availability, scaling, and even searching and organizing data. Honeycomb's features were explicitly designed to address the customer problems articulated above. Currently, there is not a technology solution offered that addresses the following customer pain points effectively. Honeycomb is being designed to fill this void in the market. Honeycomb can be deployed as technology components that complement existing NAS products, or even as a standalone storage system.


Why is Honeycomb being developed?

Honeycomb demonstrates Sun's dedication to solving next generation data storage and management problems. It's not about simply beating competition, it's about giving customers strategically powerful data management solutions.


Why is Sun better positioned to lead in this marketplace?

LAN-attached storage, inclulding NAS, CAS, HSM, and other file-based services calls upon the ultimate convergence of CPUs, OS, protocols, and networks. All of these things are core competencies of Sun. If we think towards next generation devices, we look to clustering, cryptography, consolidation and grid capabilities, load balancing, database, utility models, and a host of other areas, again all core competencies of Sun. The challenge is to make them all work together to solve unsolved customer problems. From Jonathan down we are committed to making that happen and that's why I work here.

I know what you're thinking..."the devil is in the details". Well, the details above are all I can provide until later this year when the NDA covers can be lifted a bit. Stay tuned for more.

( Feb 16 2005, 02:11:19 PM PST ) Permalink Comments [2]

20050209 Wednesday February 09, 2005

What's Up With EMC Centera?? A couple interesting articles just came out...
EMC dodges question on Centera performance
Security flaw could put EMC Centera users at risk
Scalability Hampers Large Email Archives

1) We've known from the beginning that Centera was pretty slow, but that was typically ok for the archival-oriented applications it was serving. We think this has mostly to do with the asymmetry of the clustered architecture, and something that they can only incrementally improve through the use of faster Linux boxes.

2)Then we started hearing about hash collisions in the field causing data loss. The statisticians have always reported that the odds of a hash collision are extremely small (like 1/(2^big number)), but that assumes that data is random. In fact the applications are extremely homegeneous (eg 100 million 10KB check images) and now collisions are happening. Hash collisions can affect the unwary in different ways so it's not clear what mechanism is causing the data loss, perhaps the de-duplication process that deletes redundant versions of what are perceived to be identical files. Perhaps it's a read operation that opens the wrong file. Either way, this is a black-eye for EMC that will continue to get blacker, and is causing many vendors to scramble back to their designs. Fortunately our product is built assuming there will be collisions, thus handles them nicely.

3) Now they admit that the namespace only supports up to 400M objects. Boy, two black eyes in one day. Anthony had a good calculation..."240,000 emails/hour or 5.76M emails/day or 11.52M Objects/day will exceed Centera 400M limit in 34.72 days". Now it should be clear why EMC is asking customers to containerize their emaiul records to reduce file count.

If you really want a glimpse into the day-to-day issues of Centera, check out the CAS Yahoo group at http://groups.yahoo.com/group/CASTechGroup/.

( Feb 09 2005, 03:05:59 PM PST ) Permalink Comments [1]

Content Addressable Storage (CAS) CAS is an interesting new approach to storage [well actually it's been around a while, just not broadly commercialized] where a crypto hash "checksum" is calculated for every file that's stored. The hash algorithm might be the lately maligned 80-bit MD5 or a 160-bit SHA1 or something different. The adjective "Content Addressable" is relatively misleading since it implies that the hash is used as a file handle. Yet there are no systems that work like that today, including Centera for which the CAS acronym was invented. Filepool, Scale8, and a variety of other researchy SSP-oriented storage designs did use hashes for internally-facing file handles, but none of those things are commercially deployed today afaik.
For file handles...

  • Centera uses something called a "C-clip", which combines the object hash with a metadata identifier.
  • Sun's CIS uses MD5's for auditing and compliance, but a conventional file system and filename scheme for naming files.
  • Honeycomb uses yet something else.

So CAS is a loose term for things for things that
a) are WORM systems and
b) calculate a hash somewhere along the way.

Why use hashes? Well the real reason is that it simplifies the design for a storage system intended to be scalable by eliminating the need for distributed lock management across a clustered system. The serendipitous benefit is that it typically means objects are immutable, something useful for SEC-regulated applications.

So the term CAS is not particularly accurate, is aligned pretty strongly with EMC, and is not particularly benefits oriented so their term will die a slow death over the next couple years. In fact SNIA has already changed the naming of their respective committee away from CAS.

( Feb 09 2005, 02:48:43 PM PST ) Permalink Comments [0]


Archives
Language
Links
Referrers