Open Libraries “… are signs of life and hope: They are the cornerstone of democracy”

Posts from April 2007

NetConnect Spring 2007 podcast episode 3

In Requiem for a Nun, William Faulkner famously said, “The past isn’t dead. It isn’t even past.” With the advent of new processes, the past can survive and be retrieved in new ways and forms. The new skills needed to preserve digital information are the same ones that librarians have always employed to serve users: selection, acquisition, and local knowledge.

The print issue of NetConnect is bundled with the April 15th issue of Library Journal, or you can read the articles online.

Jessamyn West of librarian.net says in Saving Digital History that librarians and archivists should preserve digital information, starting with weblogs. Tom Hyry advocates using extensible processing in Reassessing Backlogs to make archives more accessible to users. And newly appointed Digital Library Federation executive director Peter Brantley covers the potential of the rapidly evolving world of print on demand in a Paperback in 4 Minutes. Melissa Rethlefsen describes the new breed of search engines in Product Pipeline, including those that incorporate social search. Gail Golderman and Bruce Connolly compare databases’ pay-per-view in Pay by the Slice, and Library Web Chic Karen Coombs argues that librarians should embrace a balancing act in the debate between Privacy vs Personalization.

Jessamyn and Peter join me in a far-ranging conversation about some of the access challenges involved for readers and librarians in the world of online books, including common APIs for online books and how to broaden availability for all users.

Books
New Downtown Library
Neal Stephenson
Henry Petroski

Software
Greasemonkey User Scripts
Twitter
Yahoo Pipes
Dopplr

Outline
0:00 Music
0:10 Introduction

1:46 DLF Executive Director Peter Brantley
2:30 California Digital Library

4:13 Jessamyn West
5:08 Ask Metafilter
6:17 Saving Digital History
8:01 What Archivists Save
12:02 Culling from the Firehose of Information
12:34 API changes
14:15 Reading 2.0
15:13 Common APIs and Competitive Advantage
17:15 A Paperback in 4 Minutes
18:36 Lulu
19:06 On Demand Books
21:24 Attempts at hacking Google Book Search
22:30 Contracts change?
23:17 Unified Repository
23:57 Long Tail Benefit
24:45 Full Text Book Searching is Huge
25:08 Impact of Google
27:08 Broadband in Vermont
29:16 Questions of Access
30:45 New Downtown Library
33:21 Library Value Calculator
34:07 Hardbacks are Luxury Items
35:47 Developing World Access
37:54 Preventing the Constant Gardener scenario
40:21 Book on the Bookshelf
40:54 Small Things Considered
41:53 Diamond Age
43:10 Comment that spurred Brantley to read the book
43:40 Marketing Libraries
44:15 Pimp My Firefox
45:45 Greasemonkey User Scripts
45:53 Twitter
46:25 Yahoo Pipes
48:07 Dopplr
50:25 Software without the Letter E
50:45 DLF Spring Forum
52:00 OpenID in Libraries
53:40 Outro
54:00 Music

Listen here or subscribe to the podcast feed

[display_podcast]


 
icon for podpress  Open Libraries Episode 3: Play Now | Play in Popup | Download

Open Data: What Would Kilgour Think?

The New York Public Library has reached a settlement with iBiblio, the public’s library and digital archive at the University of Chapel Hill, North Carolina, for harvesting records from its Research Libraries catalog, which it claims is copyrighted.

Heike Kordish, director of the NYPL Humanities Library, said a cease and desist letter was sent because a 1980s incident by an Australian harvesting effort which turned around and resold the NYPL records.

Simon Spero, iBiblio employee and technical assistant to the assistant vice chancellor at UNC-Chapel Hill, said NYPL requested that its library records be destroyed, and the claim was settled with no admission of wrongdoing. “I would characterize the New York Public Library as being neither public nor a library,” Spero said.

It is a curious development that while the NYPL is making arrangements under private agreements to allow Google to scan its book collection into full-text that it feels free to threaten other research libraries over MARC records.

The price of open data
This follows a similar string of disagreements about open data with OCLC and the MIT Simile project. The Barton Engineering Library catalog records were widely made available via Bit Torrent, a decentralized network file sharing format.

This has since been resolved by making the Barton data available again, though in RDF and MODS, not MARC, under a Creative Commons license for non-commercial use.

OCLC CEO Jay Jordan said the issues around sharing data had their genesis in concerns about the Open WorldCat project and sharing records with Microsoft, Google, and Amazon. Other concerns about private equity firms entering the library market also drove recent revisions to the data sharing policies.

OCLC quietly revised its policy about sharing records, which had not been updated since 1987 after numerous debates in the 1980s about the legality of copyrighting member records.

The new WorldCat policy, reads in part, “WorldCat® records, metadata and holdings information (”Data”) may only be used by Users (defined as individuals accessing WorldCat via OCLC partner Web interfaces) solely for the personal, non-commercial purpose of assisting such Users with locating an item in a library of the User’s choosing… No part of any Data provided in any form by WorldCat may be used, disclosed, reproduced, transferred or transmitted in any form without the prior written consent of OCLC except as expressly permitted hereunder.”

Looking through the most recent board minutes, it looks like concerns have been raised about “the risk to FirstSearch revenues from OpenWorldCat,” and management incentive plans have been approved.

What is good for libraries?
Another project initiated by Simon Spero, entitled Fred 2.0 after recently deceased Fred Kilgour of OCLC, Yale, and Chapel Hill fame, recently released Library of Congress authority file and subject information, which was gathered by similar means as the NYPL records.

Spero said the purpose of the project is “dedicated to the men and women at the Library of Congress and outside, who have worked for the past 108 years to build these authorities, often in the face of technology seemingly designed to make the task as difficult as possible.”

Since Library of Congress data by definition cannot be copyrighted as free government information, the project was more collaborative in nature and has received acclaim for its help in pointing out cataloging irregularities in the records. OCLC also offers a linked authority file as a research project.

Firefox was born from open source
While the purpose of releasing library data has not yet reached consensus about what will be built as a result, it can be compared to Netscape open-sourcing the Mozilla code in 2000, which eventually brought Firefox and other open source projects to light. It also shows that the financial motivations of library organizations by necessity dictate the legal mechanisms of protection.