I spent last Tuesday near London participating in a very interesting meeting of the PrestoSpace organization. Prestospace is organized mainly by the good folks at BBC, but it is intended as a knowledge sharing forum for a variety of government and commercial owners of "legacy" media collections. They are putting their heads together to solve some of the challenges of how to take old collections of media (film cannisters for example) and convert them into archives for both preservation and to improve access. What should be made clear is that not all archive applications are the same.
In fact, I describe the various implementation as belonging to 4 categories, each of which will impose slightly different requirements on the storage and serving infrastructure:
1) Heritage archives:
Digital versions of historic or culturally important artifacts including video, images, documents, audio, etc. The data in the heritage archive represents in many cases the primary data of history. There is an increasing expectation that our children should be able to randomly surf this data as they form their own opinions of historic events, uncolored by the author of a textbook. Unfortunately many of these collections are already eroding to dust, unrecoverable from their original celluloid, silver nitrate, wax, or papyrus forms. Curators are now in a race to preserve these artifacts in digital form to protect them for future generations. Customers building heritage archives are particularly worried about cost, long-term data integrity (beyond the life of the media), and changes to their organizations required to support new digital workflows.
2) Compliance archives:
Compliance archives are digital repositories that are employed to support mandated data management policies handed down by various regulatory agencies. Regulations are imposed to insure well-documented business processes or to insure responsible and legal handling of individuals personal information. Example are SEC17a which requires immutability and auditability of trading records. HIPAA requires quality records retention, but it also works to protect individuals' data privacy. Sarbanes-Oxley does not mandate requirements for storage systems, but encourages good archival and audit practices that benefit from archive deployments. In Compliance applications, specific storage features such as immutability (WORM) and retention policies may be required to guarantee regulatory compliance.
3) Repurposing Archives:
These archives are typically found in media markets where an online archive provides the distribution leverage for generating incremental revenue or for lowering the cost of existing workflows. For some media companies with aging collections of master assets, preservation may also be a consideration. In many heritage archives, repurposing eventually turns out to provide tangible ROI as unpredicted distribution models provide opportunities for monetization. An excellent example is the GM Media Archives which were scanned and put online in a revenue generating asset management system. Another example is commonly found in film production where images, characters, and other assets are far easier to leverage for promotional materials or even future film productions if placed online in digital form.
4) Digital Distribution repositories:
Sometimes called origin servers, these are large scale storage systems that are built to facilitate more widespread digital distribution of assets that are otherwise available on library or enterprise shelves. Although digital library initiatives (see diglib.org) often concern themselves with preservation, the bulk of their material is simply books and journals that deserve the benefits of wide-area electronic distribution. It's not a surprise for example that Reed-Elsevier, the leading publisher of scientific journals deploys massive storage repositories. The digital distribution repository is also the core for next generation video-on-demand repositories being planned by broadcasters to provide random access into millions of hours of broadcast video or audio.
If we are to derive a specification for a storage system that is going to serve all of these applications, we need to tackle the following problems:
1) Initial deployments may be small, but grow significantly over time as ROI models are proven.
2) Capital budgets are the most significant barrier to archive deployment, and can vary from year-to-year.
3) The customer cannot afford sophisticated specialists for managing and maintaining the storage.
4) The collection may be too large to back up.
5) The life of the archive is longer than the life of the media. Migration should be easy and painless.
6) The life of the archive may be longer than the life of the data format.
7) Metadata is critical to finding and organizing data.
8) Cost and throughput of scanning efforts are continuing headaches.
So isn't the answer obvious?
We think so.