Evolution not Revolution

Swimming in salt water is wonderful; drinking it is not. Four hundred years ago, the first American settlers in Jamestown, Virginia, ran into troubles during their first five years because the fresh water they depended upon for drinking turned brackish in the summer. Suddenly, besides the plagues, angry Indians, and crop difficulties, they had to find new sources of fresh water inland. Libraries and publishers are facing a similar challenge as the hybrid world of print and online publications have changed the economic certainties that have kept both healthy.

The past five years in the information world have been full of revolutionary promise, but the new reality has not yet matched the promise of a universal library. Google Scholar promised universal access to scholarly information, yet its dynamic start in 2004 has not brought forth many new evolutionary changes since its release. In fact, the addition of Library Links using OpenURL support is the last newest major feature Scholar has seen. The NISO standard that enables seamless full-text access has shown its value.

For years, It’s been predicted that the Google Books project would revolutionize scholarship, and in some respects it has done so. But in seeking a balance between cornering Amazon’s market for searching inside books, respecting author’s rights, finding the rights holders of so-called orphan works, and solving metadata and scanning quality issues, its early promise is not yet fulfilled.
Continue reading

The Information Bomb and Activity Streams

In 1993, Yale computer science professor David Gelertner opened a package he thought was a dissertation in progress. Instead, it was a bomb from the Unabomber, who had written in his manifesto that “Technological society is incompatible with individual freedom and must therefore be destroyed and replaced by primitive society so that people will be free again.” Though Kaczynski’s point was lost when attached to violence, it’s ironic that his target was a computer science professor who professed not to like computers, the tool of a technological society.

In addition, in one of the dissertations Gelertner supervised, Eric Thomas Freeman proposed a new direction for information management. Freeman argued that “In an attempt to do better we have reduced information management to a few simple and unifying concepts and created Lifestreams. Lifestreams is a software architecture based on simple data structure, a time-ordered stream of documents, that can be manipulated with a small number of powerful operators to locate, organize, summarize, and monitor information.” Thus, the stream was born of a desire to answer information overload.

While Freeman anticipated freedom from common desktop computing metaphors, the Web had not reached ubiquity 12 years ago. His lifestreams principles live on in the software interfaces of twitter, delicious, Facebook, and FriendFeed. But have you tried to find a tweet from three months ago? How about something you wrote on Facebook last year? And FriendFeed discussions have no obvious URL, so there’s no easy way to return to the past. This planned obsolescence is by design, and the stream comes and goes like an information bomb.

In The Anxiety of Obsolescence, Pomoma College English professor Kathleen Fitzpatrick says that “The Internet is merely the latest of the competitors that print culture has been pitted against since the late nineteenth century. Threats to the book’s presumed dominance over the hearts and minds of Americans have arisen at every technological turn—or so the rampant public discourse of print’s obsolescence would lead one to believe.” Fitzpatrick goes on to say that her work is dedicated to demonstrating the “peacable coexistence of literature and television, despite all the loud claims to the contrary.” This objective is a useful response to the usual kvetching about the utter uselessness of the activity stream of the day.

A Standard for Sharing
Now popularized as activity streams, the flow of information has gained appeal because it gives users a way to curate their own information. Yet there is no standard way for this information to be recast by the user or data providers in a way that preserves privacy or archival access.

Chris Messina has advocated for social network interoperability, and suggests that “with a little effort on the publishing side, activity streams could become much more valuable by being easier for web services to consume, interpret and to provide better filtering and weighting of shared activities to make it easier for people to get access to relevant information from people that they care about, as it happens.” Messina points out that the activity stream “provide what all good news stories provide: the who, what, when, where and sometimes, how.”

In the digital age, activity streams could be used as a way to record interactions with scholarly materials. Just as COUNTER and Metrics from Scholarly Use of Electronic Records (MESUR) record statistics about how journal articles are viewed, an activity stream standard could be used to provide context around browsing.

For example, Swarthmore has a fascinating collection of W.H. Auden incunabula. You can see what books he checked out, the books he placed on reserve for his students, and even his unauthorized annotations, including his exasperated response on his own work, “Oh God, what rubbish.” What seemed ephemeral is a fascinating exercise in tracing the thought of a poet in America at a crucial period in his scholarly development. If we had captured what Auden was listening to, reading, and attending at the same time, what a treasure trove it would be for biographers and scholars.

The Appeal of Activity Streams
In 2007, Dan Chudnov wrote in Social Software: You Are an Access Point, “There’s a downside to all of this talk of things “social.” As soon as you become an access point, you also become a data point. Make no mistake-Facebook and My Space wouldn’t still be around if they couldn’t make a lot of money off of each of us, so remember that while your use of these services makes it all seem better for everybody else, the sites’ owners are skimming profit right off the top of that network effect.” How then can the user access and understand their own streams and data points?

Maciej Ceglowski, former Mellon Foundation grant officer and Yahoo engineer, has founded an antisocial bookmarking service called Pinboard which safeguards user privacy over monetization and sharing features. One of its appealing features is placing the user at the center of what they choose to share, without presuming that the record is open by default. In fact, bookmarks can be made private with ease.

In The Information Bomb, Paul Virilio wrote that “Digital messages and images matter less than their instantaneous delivery: the shock effect always wins out over the consideration of the informational content. Hence the indistinguishable and unpredictable character of the offensive act and the technical breakdown.” Users can manage or drown in the stream. To safeguard this information, users should push for their own data to made available so that they can make educated choices.

With the well-founded Department of Justice inquiry into the Google Book project about monopoly pricing and privacy, libraries can now ask for book usage information. Just as position information enables the Hathi Project to provide full-text searchability, usage information would give libraries a way to better serve patrons, and to give special collections a treasure trove of information.

The Information Bomb The Information BombPaul Virilio; Verso Books 2000WorldCatLibraryThingGoogle BooksBookFinder 

Annotating Video

It seems that everything’s available online, except the ability to search for particular video scenes. Recently, I was searching for an actress I’d last seen in a film 15 years ago and imdb.com was no help. I eventually found Lena Olin by watching the credits, but the experience made me wonder if video standards could aid the discovery process.

In a conversation last year, Kristen Fisher Ratan of Highwire Press wondered if there was a standards-based way to jump to a particular place in a video, which YouTube currently offers through URL parameters. This is an obvious first step for citation, much as the page number is the lingua franca of academic citations and footnotes. And after a naming convention is established, the ability to retrieve passages and to optimize by searching strings is a basic requirement for all video applications.

Josh Bernoff, a Forrestor researcher, is quite skeptical about video standards, saying, “Don’t expect universal metadata standards. Standards will develop around discrete applications, driven primarily by distributors like cable and satellite operators.” While this is likely true of the present, use of established markup languages like RDF using relevant subsets of Dublin Core extensions could enable convergence. As John Toebes, Cisco chief architect, wrote for the W3C Video on the Web workshop, “Industry support for standards alignment, adoption, and extension would positively impact the overall health of the content management and digital distribution industry.”

Existing Models
It’s useful to examine the standards that have formed around still images, since there is a mature digital heritage for comparisons. NISO’s Standard and Data Dictionary for Digital Still Images, known as MIX, is a comprehensive guide for defining the fields that are in use for managing images.

IPTC and EXIF standards for images have the secondary benefit of embedding metadata so that information is added at the point of capture in a machine-readable format. However, many images, particularly historical ones, need metadata to be added. Browsing Flickr images gives an idea of the model—camera information comes from the EXIF metadata, and IPTC can be used to capture rights information. However, tags and georeferencing is typically added after the image has been taken, which requires a different standard.

Fotonotes is one of the best annotation technologies going, and has been extended by Flickr and others to give users and developers the ability to add notes to particular sections of an image. The annotations are saved in an XML file, and are easily readable, if not exactly portable.

The problem
For precise retrieval, video requires either a text transcript or complete metadata. Jane Hunter and Renato Iannella did an excellent job of proposing a model system for news video indexing using RDF and Dublin Core extensions in their proposal, now ten years old. There has been some standardization around the use of Flash and MPEG standards for web display of video, which narrows the questions just as PDF adoption standardized journal article display.

With renewed interest in Semantic Web technologies from the Library of Congress and venture capital investors, the combination of Dublin Core extensions for video and the implementation of SMIL (pronounced smile) may be prime territory for mapping to an archival standard for video.

Support is being built into Firefox and Safari, but the exciting part of SMIL is that it can reference metadata from markup. So, if you have a video file, metadata about the object, a transcript, and various representations (archival, web, and mobile encodings of the file), SMIL can contain the markup for all of these things. Simply stated, SMIL is a text file that describes a set of media files and how they should be presented.

Prototypes on the horizon
Another way of obtaining metadata is through interested parties or scholars collaborating to create a shared pool of information to reference. The Open Annotation Collaboration, just now seeking grant funding, and featuring Herbert van de Sompel and Jane Hunter as investigators, seeks to establish a mechanism for client-side integration of video snippets and text as well as machine-to-machine interaction for deeper analysis and collection.

And close by is a new Firefox add-on, first described in D-Lib as NeoNote, which promises a similar option for articles and videos. One attraction it offers is the ability for scholars to capture their annotations, share them selectively, and use a WebDAV server for storage. This assumes a certain level of technical proficiency, but the distributed approach to storage has been a proven winner in libraries for many years now.

The vision
Just as the DOI revolutionized journal article URL permanence, I hope for a future where a video URL can be passed to an application and all related annotations can be retrieved, searched, and saved for further use. Then, my casual search for the actress in The Reader and The Unbearable Lightness of Being will be a starting point for retrieval instead of a journey down the rabbit hole.

Are You Paying Attention?

Not for the first time, the glut of incoming information threatens to push out useful knowledge into merely a cloud of data. And there’s no doubt that activity streams and linked data are two of the more interesting things to aid research in this onrushing surge of information. In this screen-mediated age, the advantages of deep focus and hyper attention are mixed up like never before, since the advantage accrues to the company who can collect the most data, aggregate it, and repurpose it to willing marketers.

N. Katherine Hayles does an excellent job of distinguishing between the uses of hyper and deep attention without privileging either. Her point is simple,”Deep attention is superb for solving complex problems represented in a single medium, but it comes at the price of environment alertness and flexibility of response. Hyper attention excels at negotiating rapidly changing environments in which multiple foci compete for attention; its disadvantage is impatience with focusing for long periods on a noninteractive object such as a Victorian novel or complicated math problem.”

Does data matter?
The MESUR project is one of the more interesting research projects going, now living on as a product from Ex Libris called bx. Under the hood, MESUR looks at the research patterns of searches, not simply the number of hits, and stores the information as triples, or subject-predicate-object information in RDF, the resource description framework. RDF triple stores can put the best of us to sleep, so one way of thinking about it is smart filters. Having semantic information available allows computers to distinguish between Apple the fruit and Apple the computer.

In use, semantic differentiation gives striking information gains. I picked up the novel Desperate Characters, by Paula Fox. While reading it, I remembered that I first heard it mentioned in an essay by Jonathan Franzen, who wrote the foreward to the edition I purchased. This essay was published in Harper’s, and the RDF framework in use on harpers.org gave me a way to see articles both by Franzen, as well articles that were about him. This semantic disambiguation is the obverse of the firehose of information that is monetized from advertisements.

Since MESUR is pulling information from CalTech and Los Alamos National Laboratory’ SFX link resolver service logs, there’s a immediate relevance filter applied, given the scientists who are doing research in those institutions. Using the information contained in the logs, it’s possible to see if a given IP address belonging to faculty or department) goes through an involved research process, or a short one. The researcher’s clickstream is captured, and fed back for better analysis.  Any subsequent researcher who clicks on a similar SFX link has a recommender system seeded with ten billion clickstreams. This promises researchers a smarter Works Cited, so that they can see what’s relevant in their field prior to publication. Competition just got smarter.

Standards based way of description
Attention.xml, first proposed in 2004 as an open standard by Technorati technologist Tantek Çelik and journalist Steve Gilmor, promised to give priority to items that users want to see. The problem, articulated five years ago, was that feed overload is real, and the need to see new items and what friends are also reading requires a standard that allows for collaborative reading and organizing.

The standard seems to have been absorbed into Technorati, but the concept lives on in the latest beta of Apple’s browser Safari, which lists Top Sites by usage and recent history, as does Firefox’s Speed Dial. And of course, Google Reader has Top Recommendations, which tries to leverage the enormous corpus of data it collects into useful information.

Richard Powers’ novel Galatea 2.2 describes an attempt to train a neural network to recognize the Great Books, but finds socializing online to be a failing project: “The web was a neighborhood more efficiently lonely than the one it replaced. Its solitude was bigger and faster. When relentless intelligence finally completed its program, when the terminal drop box brought the last barefoot, abused child on line and everyone could at last say anything to everyone else in existence, it seemed to me we’d still have nothing to say to each other and many more ways not to say it.” Machine learning has its limits, including whether the human chooses to pay attention to the machine in a hyper or deep way.

Hunch, a web application designed by Caterina Fake, known as co-founder of Flickr, is a new example of machine learning. The site offers to “help you make decisions and gets smarter the more you use it.” After signing up, you’re given a list of preferences to answer. Some are standard marketing questions, like how many people live in your household, but others are clever or winsome. The answers are used to construct a probability model, which is used when you answer “Today, I’m making a decision about…” As the application is a work in progress, it’s not yet a replacement for a clever reference librarian, even if its model is quite similar to the classic reference interview. It turns out that machines are best at giving advice about other machines, and if the list of results incorporates something larger than the open Web, then the technology could represent a leap forward. Already, it does a brilliant job at leveraging deep attention to the hypersprawling web of information.

How to Achieve True Greatness
Privacy has long returned to norms first seen in small-town America before World War II, and our sense of self is next up on the block.  This is  as old as the Renaissance described in Baldesar Castiglione’s The Book of the Courtier and as new as twitter, the new party line, which gives ambient awareness of people and events.

In this age of information overload, it seems like a non sequitur that technology could solve what it created. And yet, since the business model of the 21st century is based on data and widgets made of code, not things, there is plenty of incentive to fix the problem of attention. Remember, Google started as a way to assign importance based on who was linking to who.

This balance is probably best handled by libraries, with their obsessive attention to user privacy and reader needs, and librarians are the frontier between the machine and the person. The open question is, will the need to curate attention be overwhelming to those doing the filtering?
Galatea 2.2 Galatea 2.2Richard L. Powers; Farrar, Straus, Giroux 1995WorldCatLibraryThingGoogle BooksBookFinder 

Lock-in leads to lockdown

What goes up must come down. This simple law of gravity can been seen in baseball, and these days, the stock market.

As I attended the Web 2.0 conference in New York recently, I had occasion to ask Tim O’Reilly what he thought about libraries. “Well, OCLC’s doing some good things,” he said. I encouraged him to continue looking at library standards, as the 2006 Reading 2.0 conference pulled together a number of interesting people who have been poking at the standards that knit libraries and publishers together. 

But the phrase Web 2.0, coined by O’Reilly, was showing signs of age. From the halycon days, where every recently funded website showed rounded corners and artful form submission fades, the new companies were a shadow of their former booth size. Sharing space with the Interop conference, Web 2.0 was the bullpen to the larger playing field.

Interoperability
What helps companies to grow and expand? Some posit that the value of software is estimated by lock-in, that is, the number of users who would incur switching costs by moving to a competitor or another platform.

In the standards world, lock-in is antithetical to good functioning. Certainly proprietary products and features play a role to keep innovation happening, but cultural institutions are too important to risk balkanization of data for short-term profits.

Trusted peers
It seems to me that curation has moved to the network level, and a certain amount of democratization is now possible. The cautions about privacy and users as access points are true and useful, but librarians and publishers have a role in recommending information, and this is directly correlated to expert use of recommender systems. Web 2.0 applications like del.icio.us for bookmarks, last.fm for music, and Twitter and Facebook for social networks provide a level of personal guidance that was algorithmically impossible before data was easily collectible.

Prior to last.fm’s 2007 purchase by CBS Music, public collective data about listening habits was deemed “too valuable” to be mashed up by programmers any longer. In the library world, there’s a unique opportunity to give users the ability to see recommendations from trusted people. Though del.icio.us does this quite well for Internet-accessible sources, there’s an opportunity extant for the scholarly publishers to standardize on a method. Elsevier’s recent Article 2.0 contest shows encouraging signs of moving towards a release of control back to the authors and institutions that originally wrote and sponsored the work.

In the end, though, companies that are forced to choose between opening up their data or paying their employees will not likely choose the long-term reward. Part of this difficulty, however, has been tied to the lack of available legal options, standards, or licenses for releasing data into the public domain. The Creative Commons project has pointed many people to defined choices if they choose to enable their works into the public domain or for reuse.

Jonathan Rochkind of Johns Hopkins University points out that “A Creative Commons license is inappropriate for cataloging records, precisely because they are unlikely to be copyrightable. The whole legal premise of Creative Commons (and open source) licenses is that someone owns the copyright, and thus they have the right to license you to use it, and if you want a license, these are the terms. If you don’t own a copyright in the first place, there’s no way to license it under Creative Commons.

The Open Data Commons has released a set of community norms for sharing data. This is a great step towards a standard way of separating profit concerns from the public good, and also frees companies from agonizing legal discussions about liability and best practices. 

Standard widgets
If sharing entire data sets isn’t feasible, one practice that was nearly universal in Web 2.0 companies was the use of widgets to embed data and information.

In his prescient entry, “Blogs, widgets, and user sloth,” Stu Weibel describes the difficulty he had installing a widget, a still-depressing reality today.

Netvibes, a company that provides personalized start pages, has proposed a standard for a universal widget API. The jOPAC, an “integrated web widget,” uses this suggestion to make its library catalog embeddable in several online platforms and operating systems. Since widgets are still being used for commercial ventures, there seems to be an opportunity to define a clear method of data exchange. The University of Pennsylvania’s Library Portal is a good example of where this future could lead, as its portal page is flexible and customizable.

Perhaps a widget standard would give emerging companies and established ventures a method to exchange information in a way that promotes profits, privacy, and potential.

Here Comes Everybody Here Comes Everybody: The Power of Organizing Without OrganizationsClay Shirky; Penguin Press HC 2008WorldCatLibraryThingGoogle BooksBookFinder 

Mining for Meaning

In David Lodge’s 1984 novel, Small World, a character remarks that literary analysis of Shakespeare and T.S. Eliot “would just lend itself nicely to computerization….All you’d have to do would be to put the texts on to tape and you could get the computer to list every word, phrase and syntactical construction that the two writers had in common.”

This brave new world is upon us, but the larger question for Google and OCLC, among other purveyors of warehoused metadata and petabytes of information, is how to achieve meaning. One of the brilliant insights derived from Terry Winograd‘s research and mentoring is that popularity in the form of inbound links does matter for web pages, at least. In the case of all the world’s books turned into digitized texts, it’s a harder question to assign meaning without popularity, a canon, or search queries as a guide.

Until recently, text mining wasn’t possible at great scale. And as the great scanning projects continue on their bumpy road, the mysteries of what will come out of them have yet to emerge into meaning for users.

Nascent standards
Bill Kasdorf pointed out several  XML models for books in his May NISO presentation, including NISO/ISO 12083, TEI, DocBook, NLM Book DTD, and DTBook. These existing models have served publishers well, though they have been employed for particular uses and have not yet found common ground across the breath of book types. The need for a standard has never been clearer, but it will require vision and a clear understanding of solved problems to push forward.

After the professor in Small World gains access to a server, he grows giddy with the possibilities of finding “your own special, distinctive, unique way of using the English language….the words that carry a distinctive semantic content.” While we may be delighted about the possibilities that searching books afford, there is the distinct possibility that the world of the text could be changed completely.

Another mechanism for assigning meaning to full text has been opened up by web technology and science. The Open Text Mining Interface is a method championed by Nature Publishing Group as a way to share the contents of their archives in XML for the express purpose of text mining while preserving intellectual property concerns. Now in a second revision, the OTMI is an elegant method of enabling sharing, though it remains to be seen if the initiative will spread to a larger audience.

Sense making
As the corpus lurches towards the cloud, one interesting example of semantic meaning comes in the Open Calais project, an open platform by the reconstituted Thomson Reuters. When raw text is fed into the Calais web service, terms are extracted and fed into existing taxonomies. Thus, persons, countries, and categories are first identified and then made available for verification.

This experimental service has proved its value for unstructured text, but it also works for extracting meaning from the most recent weblog posting to historic newspapers newly scanned into text via Optical Character Recognition (OCR). Since human-created metadata and indexing services are among the most expensive things libraries and publishers create, any mechanism to optimize human intelligence by using machines to create meaning is a useful way forward.

Calais shows promise for metadata enhancement, since full text can be mined for its word properties and fed into  taxonomic structures. This could be the basis for search engines that understand natural language queries in the future, but could also be a mechanism for accurate and precise concept browsing.

Glimmers of understanding
One method of gaining new understanding is to examine solved problems. Melvil Dewey understood vertical integration, as he helped with innovations around 3×5 index cards, cabinets, as well as the classification systems that bears his name. Some even say he was the first standards bearer for libraries, though it’s hard to believe that anyone familiar with standards can imagine that one person could have actually been entirely responsible.

Another solved problem is how to make information about books and journals widely available. This has been done twice in the past century—first with the printed catalog card, distributed by the Library of Congress for the greater good, and the distributed catalog record, at great utility (and cost) by the Online Computer Library Center.

Pointers are no longer entirely sufficient, since the problem is not only how to find information but how to make sense of it once it has been found. Linking from catalog records has been a partial solution, but the era of complete books online is now entering its second decade. The third stage is upon us.

Small world: an academic romanceDavid Lodge; Penguin 1985WorldCatLibraryThingGoogle BooksBookFinder 

Repurposing Metadata

As the Open Archive Initiative Protocol for Metadata Harvesting has become a central component of digital library projects, increased attention has been paid to the ways metadata can be reused. As every computer project since the beginning of time has had occasion to understand, the data available for harvesting is only as good as the data entered. Given these quality issues, there are larger questions about how to reuse the valuable metadata once it has been originally described, cataloged, annotated, and abstracted.

Squeezing metadata into a juicer
As is often the case, the standards and library community were out in front in thinking about how to make metadata accessible in a networked age. With the understanding that most of the creators of the metadata would be professionals, choices were left about repeating elements, etc., in the Dublin Core standard.

This has proved to be an interesting choice, since validators and computers tend to look unfavorably on the unique choices that may make sense only locally. Thus, as the weblog revolution started in 2000 and became used in even the largest publications by 2006, these tools could not be ignored as a mass source of metadata creation.

Reusing digital objects
In the original 2006 proposal to the Mellon Foundation, Carl Lagoze wrote that “Terms like cyberinfrastructure, e-scholarship, and e-science all describe a concept of data-driven scholarship where researchers access shared data sets for analysis, reuse, and recombination with other network-available resources. Interest in this new scholarship is not limited to the physical and life sciences. Increasingly, social scientists and humanists are recognizing the potential of networked digital scholarship. A core component of this vision is a new notion of the scholarly document or publication.

Rather than being static and text-based, this scholarly artifact flexibly combines data, text, images, and services in multiple ways regardless of their location and genre.”

After being funded, this proposal has turned into something interesting, with digital library participation augmented by Microsoft, Google, and other large company representatives joining the digital library community. Since Atom feeds have garnered much interest and have become a IETF recommended standard, there is community interest in bringing these worlds together. Now known as the Open Archives Initiative for Object Reuse and Exchange (OAI-ORE), the alpha release is drawing interesting reference implementations as well as criticism for the methods used to develop it.

Resource maps everywhere
Using existing web tools is a good example of working to extend rather to invent. As Herbert van de Sompel noted in his Fall NISO Forum presentation, “Materials from repositories must be re-usable in different contexts, and life for those materials starts in repositories, it does not end there.” And as the Los Alamos National Laboratory Library experiments have shown, the amount of reuse that’s possible when you have journal data in full-text is extraordinary.

Another potential use of OAI-ORE beyond the repositories it was meant to assist can be found in the Flickr Commons project. With pilot implementations from the Library of Congress, the Powerhouse Museum and the Brooklyn Museum, OAI-ORE could play an interesting role in aggregating user-contributed metadata for evaluation, too. Once tags have been assigned, this metadata could be collected for further curation. In this same presentation, van de Sompel showed a Flickr photoset as an example of a compound information object.

Anything but lack of talent
A great way to understand the standard is to see it in use. Michael Giarlo of the Library of Congress developed a plugin for WordPress, a popular content management system that generates Atom. His plugin generates a resource map that is valid Atom and which contains some Dublin Core elements, including title, creator, publisher, date, language, and subject. This resource map can be transformed into RDF triples via GRDDL, which again facilitate reuse by the linked data community.

This turns metadata creation on its head, since the Dublin Core elements are taken directly from what the weblog author enters as the title, the name of the weblog author, subjects that were assigned, and the date and time of the entry. One problem OAI-ORE problem promises to solve is how to connect disparate URLs into one unified object, which the use of Atom simplifies.

As the OAI-ORE specification moves into beta, it will be interesting to see if the constraints of the wider web world will breathe new life into carefully curated metadata. I certainly hope it does.

What SERU Solves

Good faith has powered collaboration between libraries and publishers for over 100 years. When books are ordered and purchased from publishers, libraries enter a long-term relationship with the object. In the world of bits, it is understood that the publisher’s relationship with the object stops with the check clearing from the library. In the world of atoms, diffusion happens at a different pace.

Then as now, the publisher gives the library implicit and explicit rights. The library rarely turns around and sells purchased books at a markup, and as needs shift, books may be deaccessioned or sold at a book sale or in the gift shop. All rights belong to the library, and no contracts other than common law govern the publisher relationship.

This has worked out well for both parties. Libraries get to offer information and knowledge to all comers, and publishers get to extend their reach to even non-paying customers. Because the usual customer rights are upheld, infringing uses are rare—not many people copy entire books at a copy machine—and the rare trope of doing well by doing good is upheld.

In the digital age
A few years ago, I was involved in a project to digitize medical reference books. Previously, the highly valuable books were chained to hospital library desks to prevent theft. As the software evolved to allow full text searching, natural language processing on queries, and cross searching with journals and databases, a developer raised an important question. “How are we going to get paid?” Enter the simultaneous use license. Exit simplicity. Enter negotiations. Exit the accustomed rights attached to print books. Enter simultaneous uses.

And of course, this isn’t a new problem. Books were chained to desks from the 15th to 18th centuries until it became attractive to display them spine out. In time, the risk of theft receded due to multiple copies. In the early 20th century, the German literary critic Walter Benjamin predicted that technology would change printing and writing: “With the woodcut graphic art became mechanically reproducible for the first time, long before script became reproducible by print. The enormous changes which printing, the mechanical reproduction of writing, has brought about in literature are a familiar story.” CNI collected a list of circulation policies that ALA has compiled over the years, but it doesn’t cover how the freedom to read is made different in the age of mechanical reproduction.

Enter SERU
As my eminent colleague K. Matthew Dames points out, mistrust does characterize the licensing landscape. This is in part what standards are meant to address—adding clarity to new and sometimes bewildering territory, which licensing certainly is.

As a recommended working practice, NISO’s Shared Electronic Resource Understanding (SERU) offers radical common sense. In part, it says, “Both publishers and subscribing institutions will make reasonable efforts to prevent the misuse of the subscribed content. The subscribing institution will employ appropriate measures to ensure that access is limited to authorized users and will not knowingly allow unauthorized users to gain access. While the subscribing institution cannot control user behavior, an obligation to inform users of appropriate uses of the content is acknowledged, and the subscribing institution will cooperate with the publisher to resolve problems of inappropriate use.”

New circulation policies
This goes some way towards creating a circulation policy for the digital age. Dames correctly points out that the current licensing process is broken, and the stakes are high. But without lawyers being reminted as librarians en masse, this impedence mismatch is likely to continue. Given this logjam, SERU was birthed to set reasonable terms as a starting point.

Thus, SERU offers a solution for “particularly smaller publishers who perhaps do not have in-house lawyers or rights departments that can handle them.” Since there is no lack of mechanisms for restricting access to content in exchange for new business models, isn’t now the time to start setting terms before they are set for both libraries and publishers by larger interests?

Though SERU doesn’t claim to answer every possible scenario, it does offer a better, faster, and cheaper method for protecting the rights of libraries and publishers in the age of mechanical reproduction.

Illuminations IlluminationsWalter Bejamin; Schocken Books 1969WorldCatRead OnlineLibraryThingGoogle BooksBookFinder