Open Libraries “… are signs of life and hope: They are the cornerstone of democracy”

Posts Tagged Google

Repurposing Metadata

As the Open Archive Initiative Protocol for Metadata Harvesting has become a central component of digital library projects, increased attention has been paid to the ways metadata can be reused. As every computer project since the beginning of time has had occasion to understand, the data available for harvesting is only as good as the data entered. Given these quality issues, there are larger questions about how to reuse the valuable metadata once it has been originally described, cataloged, annotated, and abstracted.

Squeezing metadata into a juicer
As is often the case, the standards and library community were out in front in thinking about how to make metadata accessible in a networked age. With the understanding that most of the creators of the metadata would be professionals, choices were left about repeating elements, etc., in the Dublin Core standard.

This has proved to be an interesting choice, since validators and computers tend to look unfavorably on the unique choices that may make sense only locally. Thus, as the weblog revolution started in 2000 and became used in even the largest publications by 2006, these tools could not be ignored as a mass source of metadata creation.

Reusing digital objects
In the original 2006 proposal to the Mellon Foundation, Carl Lagoze wrote that “Terms like cyberinfrastructure, e-scholarship, and e-science all describe a concept of data-driven scholarship where researchers access shared data sets for analysis, reuse, and recombination with other network-available resources. Interest in this new scholarship is not limited to the physical and life sciences. Increasingly, social scientists and humanists are recognizing the potential of networked digital scholarship. A core component of this vision is a new notion of the scholarly document or publication.

Rather than being static and text-based, this scholarly artifact flexibly combines data, text, images, and services in multiple ways regardless of their location and genre.”

After being funded, this proposal has turned into something interesting, with digital library participation augmented by Microsoft, Google, and other large company representatives joining the digital library community. Since Atom feeds have garnered much interest and have become a IETF recommended standard, there is community interest in bringing these worlds together. Now known as the Open Archives Initiative for Object Reuse and Exchange (OAI-ORE), the alpha release is drawing interesting reference implementations as well as criticism for the methods used to develop it.

Resource maps everywhere
Using existing web tools is a good example of working to extend rather to invent. As Herbert van de Sompel noted in his Fall NISO Forum presentation, “Materials from repositories must be re-usable in different contexts, and life for those materials starts in repositories, it does not end there.” And as the Los Alamos National Laboratory Library experiments have shown, the amount of reuse that’s possible when you have journal data in full-text is extraordinary.

Another potential use of OAI-ORE beyond the repositories it was meant to assist can be found in the Flickr Commons project. With pilot implementations from the Library of Congress, the Powerhouse Museum and the Brooklyn Museum, OAI-ORE could play an interesting role in aggregating user-contributed metadata for evaluation, too. Once tags have been assigned, this metadata could be collected for further curation. In this same presentation, van de Sompel showed a Flickr photoset as an example of a compound information object.

Anything but lack of talent
A great way to understand the standard is to see it in use. Michael Giario of the Library of Congress developed a plugin for WordPress, a popular content management system that generates Atom. His plugin generates a resource map that is valid Atom and which contains some Dublin Core elements, including title, creator, publisher, date, language, and subject. This resource map can be transformed into RDF triples via GRDDL, which again facilitate reuse by the linked data community.

This turns metadata creation on its head, since the Dublin Core elements are taken directly from what the weblog author enters as the title, the name of the weblog author, subjects that were assigned, and the date and time of the entry. One problem OAI-ORE problem promises to solve is how to connect disparate URLs into one unified object, which the use of Atom simplifies.

As the OAI-ORE specification moves into beta, it will be interesting to see if the constraints of the wider web world will breathe new life into carefully curated metadata. I certainly hope it does.



ALA 2007: Online Books, Copyright, and User Preferences

Ben Bunnell, Google library partnership manager, and Cliff Guren, Microsoft director of publisher evangelism, presented their view of the future to reference publishers June 22 during ALA at the Independent Reference Publishers Group meeting.

Google moves into reference

Bunnell said it was his first time presenting to publishers instead of librarians, and he gave a brief overview of the Google Books program. It has now digitized one million of 65 million books worldwide, and has added Spanish language books to its collections via partnerships with the University of Texas Austin and the University of Madrid. Google is finding that librarians have been using Book Search for acquisitions, which is a somewhat unexpected use.

Microsoft innovates behind

Cliff Guren said Microsoft’s goal is to turn web search into information search. “The reality is that 5 percent of the world’s information is digitized, less than 1 percent of the National Archives and less than 5 percent of the Library of Congress.”

Guren described new initiatives within Live Search, first launched in April 2006, including a partnership with Ingram to store copies of digitized texts, and agreements with CrossRef, Highwire, Eric, and JSTOR for metadata, and Books in Print data. Live Academic Search currently has 40 million articles from 30,000 journals, and includes books from “out of copyright content only.” Library partners include the University of California, the University of Toronto, Cornell University, the New York Public Library, and the British Library. Technology partners include Kirtas Technologies and the Internet Archive, recently declared a library in its own right by the State of California.New features in Live Book Search include options for publishers to retain control, including displaying percent viewable, image blocking, pages forward and back, and a page range exclusion modifier which also shows the user the number of pages alloted. The most unique feature shown was a view of the book page with a highlighted snippet.

Libraries negotiate collaboratively

Mark Sandler, director of CIC library initiatives, followed the sales presentations with some “inconvenient truths.” Sandler said library print legacy collections are deteriorating, some content has been lost in research libraries, and that “users prefer electronic access.”Stating the obvious, Sandler said “we can’t sustain hybridity,” referring to overlapping print and electronic collection building. More controversially, he made the claim that “Maybe we’re not in the book business after all.”Sandler said books take many shapes in libraries, including ebooks, database content, audiobooks, and that pricing models have shifted to include aggregate collections and “by the drink.”With legacy collections digitized, including the American Memory Project, Making of America, Documenting the American South, Valley of the Shadow, and Wright’s American Fiction, libraries had an early start with these types of projects. But with Google’s mission of organizing all the world’s information and making it universally accessible, Sandler claimed libraries are at the point of no return vis a vis change.With library partnerships with not only Google and Microsoft, but also Amazon, the Million Book Project (MBP), and new royalty arrangements, Sandler said there’s a world of new work for libraries to do, including using digitized texts to make transformative works with math, chemical equations, and music to archive, integrate and aggregate content.

Millenials

Lynn Silipigni Connaway, OCLC Research, and Marie Radford, Rutgers University associate professor, described their IMLS-funded grant on millenials’ research patterns. Using a somewhat ill-conceived reproduction of a chat reference interaction gone awry, Connaway and Radford talked about “screenagers” and described user frustration with current reference tools.”Libraries need to build query share,” Connaway said. Their research intends to study non-users, as well as experiential users and learners. One of the initial issues is since students have been taught to guard privacy online, librarians can be viewed as “psychos and internet stalkers” when they enter online environments like Facebook and MySpace.

What’s in it for us?

Reference publishers asked Google and Microsoft representatives, “What’s in it for us to collaborate with you?”

Cliff Guren said, “If I were in your business, I would be scared–your real competition is Wikipedia.” Bunnell deflected the question, saying “librarians use Google Book Search” and advised publishers to “try a few books and see what happens.” Bunnell said he had been surprised to see thesaurus content and other reference books added by publishers, as he had thought they would be outside the scope. “Yet Merriam-Webster added their synonyms dictionary, and they seem to be pleased.”Guerin said,”We think we’re adding value for independent publishers,” but “if there are 400 reference works on the history of jazz, perhaps there will only be 5 or 10 needed in the future because of the inefficiencies of the print system.” Bunnell countered this point with an example, saying, “Cambridge University Press is using Google Book stats to determine what backlist books to bring back into print.”John Dove, Credo CEO (formerly xRefer), spoke about the real difference between facts and knowledge, and that “facts should be open to all.” Connaway said OCLC is finding that WorldCat.org referral traffic stats show 50 percent of users come from Google Book Search, 40 percent from Libraries, and 9 percent from blogs and wikis.

Future of print?

Gale Reference said they are seeing declining profits from print reference, and asked,”What’s the life of a reference book? Does it have 5 or 10 years left?” Radford answered by saying “I think the paper reference book will be disappearing.” She said all New Jersey universities will share reference collections because of lack of space and funds. Guren was more encouraging, saying “There’s still a need for what you [reference publishers] do. Reference information is needed, though perhaps a reference book is not.”



ALA 2007: Top Tech Trends

At the ALA Top Tech Trends Panel, panelists including Marshall Breeding, Roy Tennant, Karen Coombs, and John Blyberg discussed RFID, open source adoption in libraries, and the importance of privacy.

Marshall Breeding, director for innovative technologies and research at Vanderbilt University Libraries (TN), started the Top Tech Trends panel by referencing his LJ Automation Marketplace article, “An Industry Redefined,” which predicted “unprecedented disruption” in the ILS market. Breeding said 60 percent of the libraries in one state are facing a migration due to the Sirsi/Dynix product roadmap being changed, but he said “not all ILS companies are the same.”

Breeding said open source is new to the ILS world as a product, even though it’s been used as infrastructure in libraries for many years. Interest has now expanded to the decision makers. The Evergreen PINES project in Georgia, with 55 of 58 counties participating, was “mostly successful.” With the recent decision to adopt Evergreen in British Columbia, there is movement to open source solutions, though Breeding cautioned it is “still miniscule compared to most libraries.”

Questioning the switch being compared to an avalanche, Breeding said several commercial support companies have sprung up to serve the open source ILS market, including Liblime, Equinox, and CARe Affiliates. Breeding predicted an era of “new decoupled interfaces.”

John Blyberg, head of technology and digital initiatives at Darien Public Library (CT), said the “back end [in the ILS] needs to be shored up because it has a ripple effect” on other services. Blyberg said RFID is coming, and it makes sense for use in sorting and book storage, echoing Lori Ayre’s point that libraries “need to support the distribution demands of the Long Tail.” Feeling that “privacy concerns are non-starters, because RFID is essentially a barcode,” he said the RFID information is stored in a database, which should be the focus of security concerns.

Finally, Blyberg said that vendor interoperability and a democratic approach to development is needed in the age of Innovative’s Encore and Ex Libris’ Primo, both which can be used with different ILS systems and can decouple the public catalog from the ILS. With the xTensible catalog (xC) and Evergreen coming along, Blyberg said there was a need for funding and partners to further enhance their development.

Walt Crawford of OCLC/RLG said the problem with RFID is the potential of having patron barcodes chipped, which could “lead to the erosion of patron privacy.” Intruders could datamine who’s reading what, which Crawford said is a serious issue.

Joan Frye Williams countered that both Blyberg and Crawford were “insisting on using logic on what is essentially a political problem.” Breeding agreed, saying that airport security could scan chips, and “my concern is that third generation RFID chips may not be readable in 30 years, much less the hundreds of years that we expect barcodes to be around for.”

Karen Coombs, head of web services at the University of Houston (TX), listed three trends:
• The end user as content contributor, which she cautioned was an issue. “What happens if YouTube goes under and people lose their memories?” Coombs pointed to the project with the National Library of Australia and its partnership with Flickr as a positive development.
• Digital as format of choice for users, pointing out iTunes for music and Joost for video. Coombs said there is currently “no way for libraries to provide this to users, especially in public libraries.” Though companies like Overdrive and Recorded Books exist to serve this need, perhaps her point was that the consumer adoption has superseded current library demand.
• A blurred line between desktop and web applications, which Coombs demonstrated with YouTube remixer and Google Gears, “which lets you read your feeds when you’re offline.”

John Blyberg responded to these trends, saying that he sees academic libraries pursuing semantic web technologies, including developing ontologies. Coombs disagreed with this assessment, saying that “libraries have lots of badly-tagged HTML pages.” Roy Tennant agreed, “If the semantic web arrives, buy yourself some ice skates, because hell will have frozen over.”

Breeding said that he longs for “SOA [services-oriented architecture] but I’m not holding my breath.” And Walt Crawford said, “Roy is right—most content providers don’t provide enough detail, and they make easy things complicated and don’t tackle the hard things.” Coombs pointed out, “People are too concerned with what things look like,” but Crawford interjected, “not too concerned.”

Roy Tennant, OCLC senior program manager, listed his trends:
• Demise of the catalog, which should push the OPAC into the back room where it belongs and elevate discovery tools like Primo and Encore, as well as OCLC WorldCat Local.
• Software as a Service (SaaS), formerly known as ASP and hosted services, which means librarians “don’t have to babysit machines, and is a great thing for lots of librarians.”
• Intense marketplace uncertainty due to the private equity buyouts of ExLibris and SirsiDynix and the rise of Evergreen and Koha looming options. Tennant also said he sees “WorldCat Local as a disruptive influence.” Aside from the ILS, the abstract and indexing (A&I) services are being disintermediated as Google and OCLC are going direct to publishers to license content.
Someone asked if libraries should get rid of local catalogs, and Tennant said “only when it fits local needs.”

Walt Crawford said:
• Privacy still matters. Crawford questioned if patrons really wanted libraries to turn into Amazon in an era of government data mining and inferences which could track a ten year patron borrowing pattern.
• The slow library movement, which argues that locality is vital to libraries, mindfulness matters, and open source software should be used “where it works”
• The role of the public library as publisher. Crawford pointed out libraries in Charlotte-Mecklenberg County, libraries in Vermont that Jessamyn West works with, and Wyoming as farther along this path, and said the “tools are good enough that it’s becoming practical.”

Blyberg said that systems “need to be more open to the data that we put in there.” Williams said that content must be “disaggregatable and remixable, and Coombs pointed out the current difficulty of swapping out ILS modules, and said ERM was a huge issue. Tennant referenced the Talis platform, and said one of Evergreen’s innovations is its use of the XMPP (Jabber) protocol, which is “easier than SOAP web services, which are too heavyweight.”

Marshall Breeding responded to a question asking if MARC was dead, saying “I’m married to a cataloger, but we do need things in addition to MARC, which is good for books, like Dublin Core and ONIX.” Coombs pointed out that MARCXML is a mess because it’s retrofitted and doesn’t leverage the power of XML. Crawford said, “I like to give Roy [Tennant] a hard time about his phrase ‘MARC is dead,” and for a dying format, the Moen panel was full at 8 a.m.

Questioners asked what happens when “the one server” goes down, and Blyberg responded, “What if your T-1 line goes down?” Joan Frye Williams exhorted the audience to “examine your consciences when you ask vendors how to spend their time.” Coombs agreed, saying that her experience on user groups had exposed her to “crazy competing needs that vendors are faced with—[they] are spread way too thin.” Williams said there are natural transition points and she spoke darkly of a “pyramid scheme” and that you “get the vendors you deserve.” Coombs agreed, saying, “Feature creep and managing expectations is a fiercely difficult job, and open source developers and support staff are different people.”

Joan Frye Williams, information technology consultant, listed:
• New menu of end-user focused technologies. Williams said she worked in libraries when the typewriter was replaced by an OCLC machine, and libraries are still not using technology strategically. “Technology is not a checklist,” Williams chided, saying that the 23 Things movement of teaching new skills to library staff was insufficient.
• Ability for libraries to assume development responsibility in concert with end-users
• Have to make things more convenient, adopting (AI) artificial intelligence principles of self-organizing systems. Williams said, “If computers can learn from their mistakes, why can’t we?”

Someone asked why libraries are still using the ILS. Coombs said it’s a financial issue, and Breeding responded sharply, saying, “How can we not automate our libraries?” Walt Crawford agreed, saying, “Are we going to return to index cards?”
When the panel was asked if library home pages would disappear, Crawford and Blyberg both said they would be surprised. Williams said “the product of the [library] website is the user experience.” She said Yorba Linda Public Library (CA) is enhancing their site with a live book feed that updates “as books are checked in, a feed scrolls on the site.”

And another audience member asked why the panel didn’t cover toys and protocols. Crawford said “outcomes matter,” and Coombs agreed, saying “I’m a toy geek but it’s the user that matters.” Many participants talked about their use of Twitter, and Coombs said portable applications on a USB drive have the potential to change public computing in libraries. Tennant recommended viewing the Photosynth demo, first shown at the TED conference.
Finally, when asked how to keep up with trends, especially for new systems librarians, Coombs said, “It depends what kind of library you’re working in. Find a network—ask questions on the code4lib [IRC] channel.”

Blyberg recommended constructing a “well-rounded blogroll” that includes sites from the humanities, sciences, and library and information science will help you be a well-rounded feed reader.” Tennant recommended a “gasp—dead tree magazine, Business 2.0,” Coombs said the Gartner website has good information about technology adoptions, and Williams recommended trendwatch.com.

Links to other trends:
Karen Coombs’ Top Technology Trends
Meredith Farkas’ Top Technology Trends
3 Trends and a Baby (Jeremy Frumkin)
Some Trends from the LiB (Sarah Hougton-Jan)
“Sum” Top Tech Trends for the Summer of 2007 (Eric Lease Morgan)

And other writeups and podcast:
Rob Styles
Ellen Ward
Chad Haefele



Presenting at ALA panel on Future of Information Retrieval

The Future of Information Retrieval

Ron Miller, Director of Product Management, HW Wilson, hosts a panel of industry leaders including:Mike Buschman, Program Manager, Windows Live Academic, Microsoft.R. David Lankes, PhD, Director of the Information Institute of Syracuse, and Associate Professor, School of Information Studies, Syracuse University.Marydee Ojala, Editor, ONLINE, and contributing feature and news writer to Information Today, Searcher, EContent, Computers in Libraries, among other publications.Jay Datema, Technology Editor, Library Journal

Add to calendar:Monday, 25 June 2007, 8-10 a.m, Room 103bPreliminary slides and audio attached.


 
icon for podpress  Open Libraries Presentation: Play Now | Play in Popup | Download

NetConnect Spring 2007 podcast episode 3

In Requiem for a Nun, William Faulkner famously said, “The past isn’t dead. It isn’t even past.” With the advent of new processes, the past can survive and be retrieved in new ways and forms. The new skills needed to preserve digital information are the same ones that librarians have always employed to serve users: selection, acquisition, and local knowledge.

The print issue of NetConnect is bundled with the April 15th issue of Library Journal, or you can read the articles online.

Jessamyn West of librarian.net says in Saving Digital History that librarians and archivists should preserve digital information, starting with weblogs. Tom Hyry advocates using extensible processing in Reassessing Backlogs to make archives more accessible to users. And newly appointed Digital Library Federation executive director Peter Brantley covers the potential of the rapidly evolving world of print on demand in a Paperback in 4 Minutes. Melissa Rethlefsen describes the new breed of search engines in Product Pipeline, including those that incorporate social search. Gail Golderman and Bruce Connolly compare databases’ pay-per-view in Pay by the Slice, and Library Web Chic Karen Coombs argues that librarians should embrace a balancing act in the debate between Privacy vs Personalization.

Jessamyn and Peter join me in a far-ranging conversation about some of the access challenges involved for readers and librarians in the world of online books, including common APIs for online books and how to broaden availability for all users.

Books
New Downtown Library
Neal Stephenson
Henry Petroski

Software
Greasemonkey User Scripts
Twitter
Yahoo Pipes
Dopplr

Outline
0:00 Music
0:10 Introduction

1:46 DLF Executive Director Peter Brantley
2:30 California Digital Library

4:13 Jessamyn West
5:08 Ask Metafilter
6:17 Saving Digital History
8:01 What Archivists Save
12:02 Culling from the Firehose of Information
12:34 API changes
14:15 Reading 2.0
15:13 Common APIs and Competitive Advantage
17:15 A Paperback in 4 Minutes
18:36 Lulu
19:06 On Demand Books
21:24 Attempts at hacking Google Book Search
22:30 Contracts change?
23:17 Unified Repository
23:57 Long Tail Benefit
24:45 Full Text Book Searching is Huge
25:08 Impact of Google
27:08 Broadband in Vermont
29:16 Questions of Access
30:45 New Downtown Library
33:21 Library Value Calculator
34:07 Hardbacks are Luxury Items
35:47 Developing World Access
37:54 Preventing the Constant Gardener scenario
40:21 Book on the Bookshelf
40:54 Small Things Considered
41:53 Diamond Age
43:10 Comment that spurred Brantley to read the book
43:40 Marketing Libraries
44:15 Pimp My Firefox
45:45 Greasemonkey User Scripts
45:53 Twitter
46:25 Yahoo Pipes
48:07 Dopplr
50:25 Software without the Letter E
50:45 DLF Spring Forum
52:00 OpenID in Libraries
53:40 Outro
54:00 Music

Listen here or subscribe to the podcast feed

[display_podcast]


 
icon for podpress  Open Libraries Episode 3: Play Now | Play in Popup | Download

← Before