Following the success of Open URL, the Open Archives Initiative has been one of the most promising development in the digital library world. Tools like Oaister (pronounced oyster), the National Science Digital Library, and the IMLS Digital Collections Registry show there has been a dramatic uptake in the number of libraries and tools that have implemented it.
This relatively light-weight protocol was designed to make sharing of metadata as simple as RSS aggregation. As the number of adoptors has risen, the aggregators have seen a few XML-related snags.
In short, metadata is user input. First law of programming: Never trust user input.
Many papers at library conferences are designed to showcase a particular implementation that went better than expected. That’s great–it’s always good to see libraries succeeding. However, it takes much more courage to share lessons learned, so that pitfalls can be avoided.
The winning paper at JCDI 2006 was written by Carl Lagoze, one of the original architects of the OAI protocol. In the paper, “Metadata aggregation and “automated digital libraries”: A retrospective on the NSDL experience.” he shares his rude awakening that many OAI archives are stuck with XML that don’t validate, which makes aggregators like the NSDL subject to truckloads of autogenerated emails.
As Dorothea’s commentary put it:
“The winning non-student paper both amused and frustrated me. Carl Lagoze talked about the National Science Digital Library, and how it was believed that the Magic Metadata Fairy would use OAI-PMH to build a beautiful searchable garden of science, and how everyone ended up with an ugly, weed-choked, cracked-asphalt vacant lot instead.”
She goes on to say what few technologists want to say. People still matter.
“I’ll be blunt. The solution for NSDL’s problem is hiring cataloguers, or metadata librarians, or indexers/abstracters, or whatever you want to call ’em, to clean up the incoming garbage. Ideally, OAI-PMH would be a two-way protocol, so that nice cleaned-up metadata made its way back to the repository that had spewed the garbage in the first place. That, however (despite all the jaw-flapping about frameworks that went on during JCDL) does not seem to be in the offing. It should be.”
Catalogers still matter. Especially the new breed of catalogers.