Open Libraries “… are signs of life and hope: They are the cornerstone of democracy”

Posts Tagged NISO

Repurposing Metadata

As the Open Archive Initiative Protocol for Metadata Harvesting has become a central component of digital library projects, increased attention has been paid to the ways metadata can be reused. As every computer project since the beginning of time has had occasion to understand, the data available for harvesting is only as good as the data entered. Given these quality issues, there are larger questions about how to reuse the valuable metadata once it has been originally described, cataloged, annotated, and abstracted.

Squeezing metadata into a juicer
As is often the case, the standards and library community were out in front in thinking about how to make metadata accessible in a networked age. With the understanding that most of the creators of the metadata would be professionals, choices were left about repeating elements, etc., in the Dublin Core standard.

This has proved to be an interesting choice, since validators and computers tend to look unfavorably on the unique choices that may make sense only locally. Thus, as the weblog revolution started in 2000 and became used in even the largest publications by 2006, these tools could not be ignored as a mass source of metadata creation.

Reusing digital objects
In the original 2006 proposal to the Mellon Foundation, Carl Lagoze wrote that “Terms like cyberinfrastructure, e-scholarship, and e-science all describe a concept of data-driven scholarship where researchers access shared data sets for analysis, reuse, and recombination with other network-available resources. Interest in this new scholarship is not limited to the physical and life sciences. Increasingly, social scientists and humanists are recognizing the potential of networked digital scholarship. A core component of this vision is a new notion of the scholarly document or publication.

Rather than being static and text-based, this scholarly artifact flexibly combines data, text, images, and services in multiple ways regardless of their location and genre.”

After being funded, this proposal has turned into something interesting, with digital library participation augmented by Microsoft, Google, and other large company representatives joining the digital library community. Since Atom feeds have garnered much interest and have become a IETF recommended standard, there is community interest in bringing these worlds together. Now known as the Open Archives Initiative for Object Reuse and Exchange (OAI-ORE), the alpha release is drawing interesting reference implementations as well as criticism for the methods used to develop it.

Resource maps everywhere
Using existing web tools is a good example of working to extend rather to invent. As Herbert van de Sompel noted in his Fall NISO Forum presentation, “Materials from repositories must be re-usable in different contexts, and life for those materials starts in repositories, it does not end there.” And as the Los Alamos National Laboratory Library experiments have shown, the amount of reuse that’s possible when you have journal data in full-text is extraordinary.

Another potential use of OAI-ORE beyond the repositories it was meant to assist can be found in the Flickr Commons project. With pilot implementations from the Library of Congress, the Powerhouse Museum and the Brooklyn Museum, OAI-ORE could play an interesting role in aggregating user-contributed metadata for evaluation, too. Once tags have been assigned, this metadata could be collected for further curation. In this same presentation, van de Sompel showed a Flickr photoset as an example of a compound information object.

Anything but lack of talent
A great way to understand the standard is to see it in use. Michael Giario of the Library of Congress developed a plugin for WordPress, a popular content management system that generates Atom. His plugin generates a resource map that is valid Atom and which contains some Dublin Core elements, including title, creator, publisher, date, language, and subject. This resource map can be transformed into RDF triples via GRDDL, which again facilitate reuse by the linked data community.

This turns metadata creation on its head, since the Dublin Core elements are taken directly from what the weblog author enters as the title, the name of the weblog author, subjects that were assigned, and the date and time of the entry. One problem OAI-ORE problem promises to solve is how to connect disparate URLs into one unified object, which the use of Atom simplifies.

As the OAI-ORE specification moves into beta, it will be interesting to see if the constraints of the wider web world will breathe new life into carefully curated metadata. I certainly hope it does.



What SERU Solves

Good faith has powered collaboration between libraries and publishers for over 100 years. When books are ordered and purchased from publishers, libraries enter a long-term relationship with the object. In the world of bits, it is understood that the publisher’s relationship with the object stops with the check clearing from the library. In the world of atoms, diffusion happens at a different pace.

Then as now, the publisher gives the library implicit and explicit rights. The library rarely turns around and sells purchased books at a markup, and as needs shift, books may be deaccessioned or sold at a book sale or in the gift shop. All rights belong to the library, and no contracts other than common law govern the publisher relationship.

This has worked out well for both parties. Libraries get to offer information and knowledge to all comers, and publishers get to extend their reach to even non-paying customers. Because the usual customer rights are upheld, infringing uses are rare—not many people copy entire books at a copy machine—and the rare trope of doing well by doing good is upheld.

In the digital age
A few years ago, I was involved in a project to digitize medical reference books. Previously, the highly valuable books were chained to hospital library desks to prevent theft. As the software evolved to allow full text searching, natural language processing on queries, and cross searching with journals and databases, a developer raised an important question. “How are we going to get paid?” Enter the simultaneous use license. Exit simplicity. Enter negotiations. Exit the accustomed rights attached to print books. Enter simultaneous uses.

And of course, this isn’t a new problem. Books were chained to desks from the 15th to 18th centuries until it became attractive to display them spine out. In time, the risk of theft receded due to multiple copies. In the early 20th century, the German literary critic Walter Benjamin predicted that technology would change printing and writing: “With the woodcut graphic art became mechanically reproducible for the first time, long before script became reproducible by print. The enormous changes which printing, the mechanical reproduction of writing, has brought about in literature are a familiar story.” CNI collected a list of circulation policies that ALA has compiled over the years, but it doesn’t cover how the freedom to read is made different in the age of mechanical reproduction.

Enter SERU
As my eminent colleague K. Matthew Dames points out, mistrust does characterize the licensing landscape. This is in part what standards are meant to address—adding clarity to new and sometimes bewildering territory, which licensing certainly is.

As a recommended working practice, NISO’s Shared Electronic Resource Understanding (SERU) offers radical common sense. In part, it says, “Both publishers and subscribing institutions will make reasonable efforts to prevent the misuse of the subscribed content. The subscribing institution will employ appropriate measures to ensure that access is limited to authorized users and will not knowingly allow unauthorized users to gain access. While the subscribing institution cannot control user behavior, an obligation to inform users of appropriate uses of the content is acknowledged, and the subscribing institution will cooperate with the publisher to resolve problems of inappropriate use.”

New circulation policies
This goes some way towards creating a circulation policy for the digital age. Dames correctly points out that the current licensing process is broken, and the stakes are high. But without lawyers being reminted as librarians en masse, this impedence mismatch is likely to continue. Given this logjam, SERU was birthed to set reasonable terms as a starting point.

Thus, SERU offers a solution for “particularly smaller publishers who perhaps do not have in-house lawyers or rights departments that can handle them.” Since there is no lack of mechanisms for restricting access to content in exchange for new business models, isn’t now the time to start setting terms before they are set for both libraries and publishers by larger interests?

Though SERU doesn’t claim to answer every possible scenario, it does offer a better, faster, and cheaper method for protecting the rights of libraries and publishers in the age of mechanical reproduction.