ALA 2006: Top Tech Trends

In yet another crowded ballroom, the men (and woman) of LITA prognosticated on the future of libraries and technology.

Walt Crawford moderated the panel and spoke in absentia for Sarah Houghton. Her trends were:

Returning power to content owners
An OCLC ILS with RedLightGreen as the front-end
Outreach online

Karen Schneider listed four:

Faceted Navigation, from Endeca and others
eBooks–the Sophie project from the Institute for the Future of the Book
The Graphic Novel–Fun House
Net Neutrality

Eric Lease Morgan listed several, and issued a call for a Notre Dame Perl programmer throughout his trends:

VoIP, which he thought would cure email abuse
Web pages are now blogs and wikis, which may cause preservation issues since they are dynamically generated from a database
Social Networking tools
Open Source
Metasearch, which he thought may be dead given its lack of de-duplication
Mass Digitization, and the future services libraries can provide against it
Growing Discontent with Library Catalogs
Cataloging is moving to good enough instead of complete
OCLC is continuing to expand and refine itself
LITA 40 year anniversary–Morgan mentioned how CBS is just celebrating their 55th anniversary of live color TV broadcasting

Tom Wilson noted two things: “Systems aren’t monolithic, and everything is an interim solution.”

Roy Tennant listed three trends:
Next generation finding tools, not just library catalogs. Though the NGC4Lib mailing list is a necessary step, metasearch still needs to be done, and it’s very difficult to do. Some vendors are introducing products like Innovative’s Encore and Ex Libris’ Primo which attempt to solve this problem.
The rise of filtering and selection. Tennant said, “The good news is everyone can be a publisher. And the bad news is, Everyone can be a publisher.”
Rise of microcommunities, like code4lib, which give rise to ubiquitious and constant communication.

Discussion after the panelists spoke raised interesting questions, including Clifford Lynch’s recommendation of Microsoft’s Stuff I’ve Seen. Marshall Breeding recommended tagging WorldCat, not local catalogs, but Karen Schneider pointed out that the user reviews on Open World Cat were deficient compared to Amazon.

When asked how to spot trends, Eric Lease Morgan responded, “Read and read and read–listservs, weblogs; Listen; Participate.” Roy Tennant said, “Look outside the library literature–Read the Wall Street Journal, Fast Company, and Business 2.0. Finally, look for patterns.”

More discussion, and better summaries:
LITA Blog » Blog Archive » Eric Lease Morgan’s Top Tech Trends for ALA 2006; Sum pontifications
LITA Blog » Blog Archive » The Annual Top 10 Trends Extravaganza
Hidden Peanuts » ALA 2006 – LITA Top Tech Trends
ALA TechSource | Tracking the Trends: LITA’s ALA Annual ’06 Session
Library Web Chic » Blog Archive » LITA Top Technology Trends

ALA 2006: Future of Search

This oversubscribed session (I sat on the floor, as did many others) featured Stephen Abram of Sirsi/Dynix/SLA president and Joe Janes of the University of Washington debating the future of search, moderated by LJ columnist Roy Tennant.

Abram asked a pointed question, which decided the debate early, “Were libraries ever about search? Search was rarely the point…unless you wanted to become a librarian.” In Abram’s view, the current threat to libraries comes from user communities like Facebook/MySpace, since MySpace is now the 6th largest search engine. Other threats to libraries include the Google patent on quality.

Abram said the problem of the future is winnowing, and that you cannot teach people to search.”Boolean doesn’t work,” he said. Abram felt it was a given that more intelligence needs to be built into the interface.

In more sociological musings, Abram said “Facts have a half-life of 12 years,” and social networks matter since “teens and 20s live through their social networks. The world is ahead of us, and teams are contextual. People solve problems in groups.”

Joe Janes asked, “What would happen if you made WorldCat open source? Would the fortress of metadata in Dublin, OH crumble?” When asked if libraries should participate in OpenWorldCat, Abram said, “Sure, why not? Our competitor is ignorance, not access. Libraries transform lives.”

Janes pointed out that none of the current search services (Google Answers, Yahoo Answers, and the coming Microsoft Answers) have worked well, and Tennant said “While Google and Yahoo may have the eyeballs of users, libraries have the feet of users.”

In an interesting digression from the question at hand, Abram asked why libraries aren’t creating interesting tools like LibraryThing and LibraryELF (look for a July NetConnect feature about the ELF by Liz Burns). Janes said it comes back to privacy concerns, since this is the “looking over your shoulder decade. Hi, NSA!” With the NSA and TSA examining search, banking, and phone records, library privacy ethics are being challenged like no recent time in history.

Roy Tennant asked if libraries should incorporate better interface design, relevance ranking, spelling suggestions, and faceting browsing. Abram said it’s already happening at North Carolina State University with the Endeca catalog project. The Grokker pilot at Stanford is another notable example, and the visual contents and tiled results set mirror how people learn. “Since the search engines are having problems putting ads in visual search, it’s good for librarians.”

Abram got the most laughter by pointing out that the thing that killed Dialog was listening to their users. As librarian requests made Dialog even more precise, “At the end of a Dialog search, you could squeeze a diamond out of your ass.” Janes said the perfect search is “no search at all, one that has the lightest cognitive load.”

Since libraries are, in Janes’ words, “a conservation organization because the human record is at stake, the worst nightmare is that nothing changes and libraries die. The finest vision is to put Google out of business.” Abram’s view was libraries must become better at advocacy and trust users to lay paths through catalog tagging and other vendor initiatives.

The question of the future of search turned into the future of libraries, and Joe Janes concluded that “Libraries are in the business of vacations we enabled, cars we helped fix, businesses we started, and helping people move.” Abram ended with a pithy slogan for libraries, the place of “Bricks, Clicks, and Tricks.”

Other commentary here:
The Shifted Librarian: 20060624 Who Controls the Future of Search?
Library Web Chic » Blog Archive » The Ultimate Debate : Who Controls the Future of Search
LITA Blog » Blog Archive » The Ultimate Debate: Who Controls the Future of Search
AASL Weblog – The Ultimate Debate: Who Controls the Future of Search?

ALA 2006: Google Book Search

Ben Bunnell, Manager of Google Book Search and author of an upcoming Last Byte column in the July NetConnect (no link yet), described how Google cofounders Larry Page and Marissa Mayer originally conceived of the book scanning project while they were in graduate school at Stanford. Using a metronome, they estimated that a 300 page book would take 40 minutes to digitize. Though it wasn’t answered at the session, other panels mentioned that the entire University of Michigan library collection of 7 million books is slated for completion in six to seven years. Libraries will be interesting places in 2010.

Google’s intended goal is to “digitize all books,” and Bunnell said “Google is not focused on author, genre, or time period.” Lawsuits from the Author’s Guild and others have slowed progress.

There are three areas of digitization: Publisher agreements for recently published books (except for Elsevier–one panelist quipped that Google should buy Elsevier); books currently in the public domain (before 1923), and what Tim O’Reilly calls the “twilight zone” (75% of what has been published).

The easy part is scanning books in the public domain (before 1923 in the US). This includes Jane Austen, Charles Dickens, Emily Dickinson, and Shakespeare. Other digital projects have started with this, including Project Gutenberg, Early English Books Online, and the Making of America project. The public domain content makes up 20 percent of all available books.

Google already has agreements from all US major publishers, and they are getting digital copies directly for books in print, which are 5 percent of the total.

The controversy comes with books published from 1923-2000. Currently, Google is continuing to scan these books and display their contents in “snippet view” and in a selected number of pages. Searches currently show three snippets.

Following the presentation, discussion revealed that Google now has an agreement with the Library of Congress as well as the other five libraries in the Google Print project (University of Michigan, Oxford, Harvard, Stanford, and NYPL). The Find It in a Library links are live for some books that were originally scanned via libraries and Bunnell said they “are close to linking all books.” Google wants users to alert them to the copyright status of a book, so it seems reasonable to expect a contact link to show up soon. Third, the link syntax of Google Books is static, so one audience member asked if it would be possible to link from a library catalog to the online copy. This is possible, but requires that the patron has a Google Account, which raises privacy concerns.

Recommendation for Google: If you’re going to have a panel from noon to one, bring along your Googleplex chef and feed the hungry librarians.

Open Archives Initiative

Following the success of Open URL, the Open Archives Initiative has been one of the most promising development in the digital library world. Tools like Oaister (pronounced oyster), the National Science Digital Library, and the IMLS Digital Collections Registry show there has been a dramatic uptake in the number of libraries and tools that have implemented it.

This relatively light-weight protocol was designed to make sharing of metadata as simple as RSS aggregation. As the number of adoptors has risen, the aggregators have seen a few XML-related snags.

In short, metadata is user input. First law of programming: Never trust user input.

Many papers at library conferences are designed to showcase a particular implementation that went better than expected. That’s great–it’s always good to see libraries succeeding. However, it takes much more courage to share lessons learned, so that pitfalls can be avoided.

The winning paper at JCDI 2006 was written by Carl Lagoze, one of the original architects of the OAI protocol. In the paper, “Metadata aggregation and “automated digital libraries”: A retrospective on the NSDL experience.” he shares his rude awakening that many OAI archives are stuck with XML that don’t validate, which makes aggregators like the NSDL subject to truckloads of autogenerated emails.

As Dorothea’s commentary put it:

“The winning non-student paper both amused and frustrated me. Carl Lagoze talked about the National Science Digital Library, and how it was believed that the Magic Metadata Fairy would use OAI-PMH to build a beautiful searchable garden of science, and how everyone ended up with an ugly, weed-choked, cracked-asphalt vacant lot instead.”

She goes on to say what few technologists want to say. People still matter.

“I’ll be blunt. The solution for NSDL’s problem is hiring cataloguers, or metadata librarians, or indexers/abstracters, or whatever you want to call ’em, to clean up the incoming garbage. Ideally, OAI-PMH would be a two-way protocol, so that nice cleaned-up metadata made its way back to the repository that had spewed the garbage in the first place. That, however (despite all the jaw-flapping about frameworks that went on during JCDL) does not seem to be in the offing. It should be.”

Catalogers still matter. Especially the new breed of catalogers.

Open World Cat

In typical OCLC style, a quiet revolution is brewing. Formerly a subscription-only database, WorldCat has begun to progagate into search engines–Google, Yahoo, and Ask in particular–and with the merger of RLG, it looks like a truly spectacular interface could be created to the union catalog.

In the meantime, it’s curious that OCLC chose to use an ISBN-based permalink structure instead of OpenURL. It does showcase FRBR, but beyond that it’s not very interoperable.

The real question is, will OCLC enter the SEO (search engine optimization) business so that library results show on the first page?

Open URL

Open URL solves the appropriate copy issue, but many other questions have sprung up for library discussion.

You can learn more in Roy Tennant and Carol Tenopir‘s forthcoming July columns.

Should Google have a list of resolvers? What about Microsoft?
Is it useful for OCLC to be developing a registry?
Why is the usability so poor? Pop window after pop up window…
Do users want a limit to full-text programmed for them?
Should it be as easy as writing a weblog entry to link to library subscription resources? The inventors of COinS think so.

Open Source

When Linux and Apache were familiar to the sysadmin community, Dan Chudnov had the foresight to develop a site for librarians to swap tips about open source software that would be useful in libraries.

Now, with libraries contemplating the advantages of home-grown open source library catalogs anew, open source expertise has become even more valuable.

The State of Georgia has made a committment to developing an open-source catalog, so things are starting to get interesting…

Open Search

Z39.50 has been a useful technology for searching library catalogs and individual databases for many years, but it presents certain implementation challenges–Bath or SUTRS?

The Open Search specification is interesting, since it promises much the same thing. It was initiated by a commercial entity–A9–and some libraries are starting to pay attention to it as a supplement for SR/U and Z39.50.

Open Content

Brewster Kahle and the Open Content Alliance are doing some interesting and credible things.

It’s especially interesting to see the open source software being made available from it, like Dojo.

Some of the scans are quite beautiful, like this Henry James book.

Open Access

It’s laudable to make the entirety of human knowledge, especially scientific, available for free. But what about that free lunch?

news @ nature.com – Open-access journal hits rocky times – Financial analysis reveals dependence on philanthropy.

The Public Library of Science (PLoS), the flagship publisher for the open-access publishing movement, faces a looming financial crisis. An analysis of the company’s accounts, obtained by Nature, shows that the company falls far short of its stated goal of quickly breaking even. In an attempt to redress its finances, PLoS will next month hike the charge for publishing in its journals from US$1,500 per article to as much as $2,500.

In the beginning, libraries were excited about the open access movement because it promised to save them money from the serials budget. However, as Phil Davis pointed out last year, libraries still face the price of print subscriptions, plus membership fees, as well as having to subsidize author submission fees. From this angle, open access looks like less of a bargain than a mechanism to subsidize research and development for new publications.