Preservation and Archiving
It was an exciting week for Preservation and Archiving activities.
First, I was introduced to several key academic researchers (and software developers) in the digital repository space who had come early to get the inside scoop on the latest ST 5800 (Honeycomb) developments. Dave Tarrant (University of Southampton), Ben O’Steen (University of Oxford), and Neil Jeffries, R&D Project Manager, Oxford University Libraries met with us in the Honeycomb development facility in San Francisco. I have discussed Open Archives before, but these folks are working on some great software that starts to fulfill that promise. They are also customers of the ST 5800 and are thrilled that we just Open Sourced the latest (1.1.1) product code and can contribute some of their ideas to that code base in the Honeycomb community.
Which brings us to the event this last week that I attended. Sun sponsors a regular set of meetings of what is called the Preservation and Archiving Special Interest Group (PASIG). The latest event was held last week in San Francisco and boasted attendance of over 130 non-Sun attendees from across the industry. There were lots of great talks given, and I highlight a few below:
Neil Jefferies talked about the Oxford Digital Asset Management System (slides). They use FEDORA-Commons in combination with the ST 5800 and other open source projects. They are ingesting the output of the Google Library Project into DAMS, something they call Mass Digitisation Ingest Components (MDICS).
David Tarrant talked about his work with EPrints at the University of Southampton (slides). His emphasis is on supporting small science as well as big science projects. He also talked about some of the requirements for long term storage. He has combined the open source EPrints repository software with the ST 5800, utilizing the self managed storage to extends the value of the repository. EPrints has an extensible plug-in architecture that allows for any storage device or database to be plugged in. He also discussed the Preserv interoperabilty project that is working towards standards that separate the reository software from the storage controller module and from the actual physical storage.
Ray Clarke gave a great talk (slides) on some of the best practices for long term data retention. He also gave a talk (slides) on some of the activities at the SNIA in this area including XAM, the Data Management Forum (DMF) and the new Long Term Retention Technical Work Group (LTR TWG). SNIA is working on a format for encapsulating data along with its metadata that will allow it to preserved in a storage system independant manner over its lifetime. Most of the attendees of the conference expect to retain and preserve their data essentially forever.
There were lots of other great talks which you can find here. Many thanks to Art Pasquinelli (and many others) who did a bang up job as usual organizing the event. The next meeting is tentatively scheduled for November in Europe. If you want to join the Sun PASIG community, join here.

