Evolution not Revolution

Swimming in salt water is wonderful; drinking it is not. Four hundred years ago, the first American settlers in Jamestown, Virginia, ran into troubles during their first five years because the fresh water they depended upon for drinking turned brackish in the summer. Suddenly, besides the plagues, angry Indians, and crop difficulties, they had to find new sources of fresh water inland. Libraries and publishers are facing a similar challenge as the hybrid world of print and online publications have changed the economic certainties that have kept both healthy.

The past five years in the information world have been full of revolutionary promise, but the new reality has not yet matched the promise of a universal library. Google Scholar promised universal access to scholarly information, yet its dynamic start in 2004 has not brought forth many new evolutionary changes since its release. In fact, the addition of Library Links using OpenURL support is the last newest major feature Scholar has seen. The NISO standard that enables seamless full-text access has shown its value.

For years, It’s been predicted that the Google Books project would revolutionize scholarship, and in some respects it has done so. But in seeking a balance between cornering Amazon’s market for searching inside books, respecting author’s rights, finding the rights holders of so-called orphan works, and solving metadata and scanning quality issues, its early promise is not yet fulfilled.
Continue reading

Jhumpa Lahiri • Unaccustomed Earth

Unaccustomed earth Unaccustomed earth: storiesJhumpa Lahiri; Knopf 2008WorldCatRead OnlineLibraryThingGoogle BooksBookFinder Sometimes, a short story sticks with you until you find it with pleasure living in a larger collection. In 1991, I read a short story by Tobias Wolff standing up in a Chicago bookstore that I looked for until it was included in The Night in Question.

Jhumpa Lahiri’s new book of short stories, Unaccustomed Earth, contains another haunting story, “Nobody’s Business,” first published in The New Yorker in 2001. It gives a stark account of graduate student despair—first at life delayed due to years of study, then postponed because of deferred relationships left to explode into messy life. Paul, the narrator, gives an outsider account of Indian courtship rituals drawn into housemate drama. Desperate to prove his innocence of what he learns, he provides telephonic evidence of how she is being betrayed.

Lahiri isn’t afraid to show life as it is. Painful, entangled with family obligations and academic aspirations, the stories show adult parents and children reaching accomodations with hidden truths and adjustments to immigrant life. Her stories show how second-generation Bengali immigrants draw pleasure from their Harvard and MIT PhDs, just as their accomplishments push them away from their families of origin. When the characters marry outside their connections, as in “Only Goodness,” they feel guilt and relief in equal measure.

The final three stories, linked through the characters Hema and Kaushik, give a tragic account of a family left to reconstitute itself after a mother’s early death rips it asunder. Though Lahiri leaves a narrative option for easy closure, the devastating ending feels, well, like life in the midst of death.

Mining for Meaning

In David Lodge’s 1984 novel, Small World, a character remarks that literary analysis of Shakespeare and T.S. Eliot “would just lend itself nicely to computerization….All you’d have to do would be to put the texts on to tape and you could get the computer to list every word, phrase and syntactical construction that the two writers had in common.”

This brave new world is upon us, but the larger question for Google and OCLC, among other purveyors of warehoused metadata and petabytes of information, is how to achieve meaning. One of the brilliant insights derived from Terry Winograd‘s research and mentoring is that popularity in the form of inbound links does matter for web pages, at least. In the case of all the world’s books turned into digitized texts, it’s a harder question to assign meaning without popularity, a canon, or search queries as a guide.

Until recently, text mining wasn’t possible at great scale. And as the great scanning projects continue on their bumpy road, the mysteries of what will come out of them have yet to emerge into meaning for users.

Nascent standards
Bill Kasdorf pointed out several  XML models for books in his May NISO presentation, including NISO/ISO 12083, TEI, DocBook, NLM Book DTD, and DTBook. These existing models have served publishers well, though they have been employed for particular uses and have not yet found common ground across the breath of book types. The need for a standard has never been clearer, but it will require vision and a clear understanding of solved problems to push forward.

After the professor in Small World gains access to a server, he grows giddy with the possibilities of finding “your own special, distinctive, unique way of using the English language….the words that carry a distinctive semantic content.” While we may be delighted about the possibilities that searching books afford, there is the distinct possibility that the world of the text could be changed completely.

Another mechanism for assigning meaning to full text has been opened up by web technology and science. The Open Text Mining Interface is a method championed by Nature Publishing Group as a way to share the contents of their archives in XML for the express purpose of text mining while preserving intellectual property concerns. Now in a second revision, the OTMI is an elegant method of enabling sharing, though it remains to be seen if the initiative will spread to a larger audience.

Sense making
As the corpus lurches towards the cloud, one interesting example of semantic meaning comes in the Open Calais project, an open platform by the reconstituted Thomson Reuters. When raw text is fed into the Calais web service, terms are extracted and fed into existing taxonomies. Thus, persons, countries, and categories are first identified and then made available for verification.

This experimental service has proved its value for unstructured text, but it also works for extracting meaning from the most recent weblog posting to historic newspapers newly scanned into text via Optical Character Recognition (OCR). Since human-created metadata and indexing services are among the most expensive things libraries and publishers create, any mechanism to optimize human intelligence by using machines to create meaning is a useful way forward.

Calais shows promise for metadata enhancement, since full text can be mined for its word properties and fed into  taxonomic structures. This could be the basis for search engines that understand natural language queries in the future, but could also be a mechanism for accurate and precise concept browsing.

Glimmers of understanding
One method of gaining new understanding is to examine solved problems. Melvil Dewey understood vertical integration, as he helped with innovations around 3×5 index cards, cabinets, as well as the classification systems that bears his name. Some even say he was the first standards bearer for libraries, though it’s hard to believe that anyone familiar with standards can imagine that one person could have actually been entirely responsible.

Another solved problem is how to make information about books and journals widely available. This has been done twice in the past century—first with the printed catalog card, distributed by the Library of Congress for the greater good, and the distributed catalog record, at great utility (and cost) by the Online Computer Library Center.

Pointers are no longer entirely sufficient, since the problem is not only how to find information but how to make sense of it once it has been found. Linking from catalog records has been a partial solution, but the era of complete books online is now entering its second decade. The third stage is upon us.

Small world: an academic romanceDavid Lodge; Penguin 1985WorldCatLibraryThingGoogle BooksBookFinder 

What SERU Solves

Good faith has powered collaboration between libraries and publishers for over 100 years. When books are ordered and purchased from publishers, libraries enter a long-term relationship with the object. In the world of bits, it is understood that the publisher’s relationship with the object stops with the check clearing from the library. In the world of atoms, diffusion happens at a different pace.

Then as now, the publisher gives the library implicit and explicit rights. The library rarely turns around and sells purchased books at a markup, and as needs shift, books may be deaccessioned or sold at a book sale or in the gift shop. All rights belong to the library, and no contracts other than common law govern the publisher relationship.

This has worked out well for both parties. Libraries get to offer information and knowledge to all comers, and publishers get to extend their reach to even non-paying customers. Because the usual customer rights are upheld, infringing uses are rare—not many people copy entire books at a copy machine—and the rare trope of doing well by doing good is upheld.

In the digital age
A few years ago, I was involved in a project to digitize medical reference books. Previously, the highly valuable books were chained to hospital library desks to prevent theft. As the software evolved to allow full text searching, natural language processing on queries, and cross searching with journals and databases, a developer raised an important question. “How are we going to get paid?” Enter the simultaneous use license. Exit simplicity. Enter negotiations. Exit the accustomed rights attached to print books. Enter simultaneous uses.

And of course, this isn’t a new problem. Books were chained to desks from the 15th to 18th centuries until it became attractive to display them spine out. In time, the risk of theft receded due to multiple copies. In the early 20th century, the German literary critic Walter Benjamin predicted that technology would change printing and writing: “With the woodcut graphic art became mechanically reproducible for the first time, long before script became reproducible by print. The enormous changes which printing, the mechanical reproduction of writing, has brought about in literature are a familiar story.” CNI collected a list of circulation policies that ALA has compiled over the years, but it doesn’t cover how the freedom to read is made different in the age of mechanical reproduction.

Enter SERU
As my eminent colleague K. Matthew Dames points out, mistrust does characterize the licensing landscape. This is in part what standards are meant to address—adding clarity to new and sometimes bewildering territory, which licensing certainly is.

As a recommended working practice, NISO’s Shared Electronic Resource Understanding (SERU) offers radical common sense. In part, it says, “Both publishers and subscribing institutions will make reasonable efforts to prevent the misuse of the subscribed content. The subscribing institution will employ appropriate measures to ensure that access is limited to authorized users and will not knowingly allow unauthorized users to gain access. While the subscribing institution cannot control user behavior, an obligation to inform users of appropriate uses of the content is acknowledged, and the subscribing institution will cooperate with the publisher to resolve problems of inappropriate use.”

New circulation policies
This goes some way towards creating a circulation policy for the digital age. Dames correctly points out that the current licensing process is broken, and the stakes are high. But without lawyers being reminted as librarians en masse, this impedence mismatch is likely to continue. Given this logjam, SERU was birthed to set reasonable terms as a starting point.

Thus, SERU offers a solution for “particularly smaller publishers who perhaps do not have in-house lawyers or rights departments that can handle them.” Since there is no lack of mechanisms for restricting access to content in exchange for new business models, isn’t now the time to start setting terms before they are set for both libraries and publishers by larger interests?

Though SERU doesn’t claim to answer every possible scenario, it does offer a better, faster, and cheaper method for protecting the rights of libraries and publishers in the age of mechanical reproduction.

Illuminations IlluminationsWalter Bejamin; Schocken Books 1969WorldCatRead OnlineLibraryThingGoogle BooksBookFinder 

ALA 2007: Online Books, Copyright, and User Preferences

Ben Bunnell, Google library partnership manager, and Cliff Guren, Microsoft director of publisher evangelism, presented their view of the future to reference publishers June 22 during ALA at the Independent Reference Publishers Group meeting.

Google moves into reference
Bunnell said it was his first time presenting to publishers instead of librarians, and he gave a brief overview of the Google Books program. It has now digitized one million of 65 million books worldwide, and has added Spanish language books to its collections via partnerships with the University of Texas Austin and the University of Madrid. Google is finding that librarians have been using Book Search for acquisitions, which is a somewhat unexpected use.

Microsoft innovates behind
Cliff Guren said Microsoft’s goal is to turn web search into information search. “The reality is that 5 percent of the world’s information is digitized, less than 1 percent of the National Archives and less than 5 percent of the Library of Congress.”

Guren described new initiatives within Live Search, first launched in April 2006, including a partnership with Ingram to store copies of digitized texts, and agreements with CrossRef, Highwire, Eric, and JSTOR for metadata, and Books in Print data. Live Academic Search currently has 40 million articles from 30,000 journals, and includes books from “out of copyright content only.” Library partners include the University of California, the University of Toronto, Cornell University, the New York Public Library, and the British Library. Technology partners include Kirtas Technologies and the Internet Archive, recently declared a library in its own right by the State of California.New features in Live Book Search include options for publishers to retain control, including displaying percent viewable, image blocking, pages forward and back, and a page range exclusion modifier which also shows the user the number of pages alloted. The most unique feature shown was a view of the book page with a highlighted snippet.

Libraries negotiate collaboratively
Mark Sandler, director of CIC library initiatives, followed the sales presentations with some “inconvenient truths.” Sandler said library print legacy collections are deteriorating, some content has been lost in research libraries, and that “users prefer electronic access.”Stating the obvious, Sandler said “we can’t sustain hybridity,” referring to overlapping print and electronic collection building. More controversially, he made the claim that “Maybe we’re not in the book business after all.”Sandler said books take many shapes in libraries, including ebooks, database content, audiobooks, and that pricing models have shifted to include aggregate collections and “by the drink.”With legacy collections digitized, including the American Memory Project, Making of America, Documenting the American South, Valley of the Shadow, and Wright’s American Fiction, libraries had an early start with these types of projects. But with Google’s mission of organizing all the world’s information and making it universally accessible, Sandler claimed libraries are at the point of no return vis a vis change.With library partnerships with not only Google and Microsoft, but also Amazon, the Million Book Project (MBP), and new royalty arrangements, Sandler said there’s a world of new work for libraries to do, including using digitized texts to make transformative works with math, chemical equations, and music to archive, integrate and aggregate content.

Millenials
Lynn Silipigni Connaway, OCLC Research, and Marie Radford, Rutgers University associate professor, described their IMLS-funded grant on millenials’ research patterns. Using a somewhat ill-conceived reproduction of a chat reference interaction gone awry, Connaway and Radford talked about “screenagers” and described user frustration with current reference tools.”Libraries need to build query share,” Connaway said. Their research intends to study non-users, as well as experiential users and learners. One of the initial issues is since students have been taught to guard privacy online, librarians can be viewed as “psychos and internet stalkers” when they enter online environments like Facebook and MySpace.

What’s in it for us?
Reference publishers asked Google and Microsoft representatives, “What’s in it for us to collaborate with you?”

Cliff Guren said, “If I were in your business, I would be scared–your real competition is Wikipedia.” Bunnell deflected the question, saying “librarians use Google Book Search” and advised publishers to “try a few books and see what happens.” Bunnell said he had been surprised to see thesaurus content and other reference books added by publishers, as he had thought they would be outside the scope. “Yet Merriam-Webster added their synonyms dictionary, and they seem to be pleased.”Guerin said,”We think we’re adding value for independent publishers,” but “if there are 400 reference works on the history of jazz, perhaps there will only be 5 or 10 needed in the future because of the inefficiencies of the print system.” Bunnell countered this point with an example, saying, “Cambridge University Press is using Google Book stats to determine what backlist books to bring back into print.”John Dove, Credo CEO (formerly xRefer), spoke about the real difference between facts and knowledge, and that “facts should be open to all.” Connaway said OCLC is finding that WorldCat.org referral traffic stats show 50 percent of users come from Google Book Search, 40 percent from Libraries, and 9 percent from blogs and wikis.

Future of print?
Gale Reference said they are seeing declining profits from print reference, and asked,”What’s the life of a reference book? Does it have 5 or 10 years left?” Radford answered by saying “I think the paper reference book will be disappearing.” She said all New Jersey universities will share reference collections because of lack of space and funds. Guren was more encouraging, saying “There’s still a need for what you [reference publishers] do. Reference information is needed, though perhaps a reference book is not.”

Presenting at ALA panel on Future of Information Retrieval

The Future of Information Retrieval

Ron Miller, Director of Product Management, HW Wilson, hosts a panel of industry leaders including:
Mike Buschman, Program Manager, Windows Live Academic, Microsoft.
R. David Lankes, PhD, Director of the Information Institute of Syracuse, and Associate Professor, School of Information Studies, Syracuse University.
Marydee Ojala, Editor, ONLINE, and contributing feature and news writer to Information Today, Searcher, EContent, Computers in Libraries, among other publications.
Jay Datema, Technology Editor, Library Journal

Add to calendar:
Monday, 25 June 2007
8-10 a.m, Room 103b
Preliminary slides and audio attached.

IDPF: Google and Harvard

Libraries And Publishers
At the 2007 International Digital Publishing Forum (IDPF) in New York May 9th, publishers and vendors discussed the future of ebooks in an age increasingly dominated by large-scale digitization projects funded by the deep pockets of Google and Microsoft.

In a departure from the other panels, which discussed digital warehouses and repositories, both planned and in production from Random House and HarperCollins, Peter Brantley, executive director of the Digital Library Federation and Dale Flecker of Harvard University Library made a passionate case for libraries in an era of information as a commodity.

Brantley began by mentioning the Library Project on Flickr, and led with a slightly ominous series of slides:
 “Libraries buy books (For a while longer), followed by “Libraries don’t always own what’s in the book, just the book (the “thing” of the book).



He then reiterated the classic rights that libraries protect: The Right to Borrow, Right to Browse, Right to Privacy, and Right to Learn, and warned that “some people may become disenfranchised in the the digital world, when access to the network becomes cheaper than physical things.” Given the presentation that followed from Tom Turvey, director of the Google Book Search project, this made sense.

Brantley made two additional points, saying “Libraries must permanently hold the wealth of our many cultures to preserve fundamental Rights, and Access to books must be either free or low-cost for the world’s poor.”

 He departed from conventional thinking on access, though, when he argued that this low-cost access didn’t need to include fiction. Traditionally, libraries began as subscription libraries for those who couldn’t afford to purchase fiction in drugstores and other commercial venues.

Finally, Brantley said that books will become communities as they are integrated, multiplied, fragmented, collaborative, and shared, and publishing itself will be reinvented. Yet his conclusion contained an air of inevitability, as he said, “Libraries and publishers can change the world, or it will be transformed anyway.”



A podcast recording of his talk is available on his site.

Google Drops A Bomb
Google presented a plan to entice publishers to buy into two upcoming models for making money from Google Book Search, including a weekly rental “that resembles a library loan” and a purchase option, “much like a bookstore,” said Tom Turvey, director of Google Book Search Partnerships.

 The personal library would allow search across the books, expiration and rental, and copy and paste. No pricing was announced. Google has been previewing the program at events including the London Book Fair.

Turvey said Google Book Search is live in 70 countries and eight languages. Ten years ago, zero percent of consumers clicked before buying books online, and now $4 billion of books are purchased online. “We think that’s a market,”Turvey said, “and we think of ourselves as the switchboard.”

Turvey, who previously worked at bn.com and ebrary, said publishers receive the majority of the revenue share as well as free marketing tools, site-brandable search inside a book with restricted buy links, and fetch and push statistical reporting.

He said an iTunes for Books was unlikely, since books don’t have one device, model or user experience that works across all categories. Different verticals like fiction, reference, and science, technology and medicine (STM), require a different user experience, Turvey said.

Publishers including SparkNotes requested a way to make money from enabling a full view of their content on Google Books, as did many travel publishers. Most other books are limited to 20 percent visibility, although Turvey said there is a direct correlation between the number of pages viewed and subsequent purchases.

This program raises significant privacy questions. If Google has records that can be correlated with all the other information it stores, this is the polar opposite of what librarians have espoused about intellectual freedom and the privacy of circulation records. Additionally, the quality control questions are significant and growing, voiced by historian Robert Townsend and others.

Libraries are a large market segment to publishers. It seems reasonable to voice concerns about this proposal at this stage, especially those libraries who haven’t already been bought and sold.

 Others at the forum were skeptical. Jim Kennedy, vice president and director at the Associated Press, said, “The Google guy’s story is always the same: Send us your content and we’ll monetize it.”

Ebooks Ejournals And Libraries
Dale Flecker of the Harvard University Library gave a historical overview of the challenges libraries have grappled with in the era of digital information.



Instead of talking about ebooks, which he said represent only two percent of usage at Harvard, Flecker described eight challenges about ejournals, which are now “core to what libraries do” and have been in existence for 15-20 years. Library consultant October Ivins challenged this statistic about ebook usage as irrelevant, saying “Harvard isn’t typical.” She said there were 20 ebook platforms demonstrated at the 2006 Charleston Conference, though discovery is still an issue.

First, licensing is a big deal. There were several early questions: Who is a user? What can they do? Who polices behavior? What about guaranteed performance and license lapses? Flecker said that in an interesting shift, there is a move away from licenses to “shared understandings,” where content is acquired via purchase orders.



Second, archiving is a difficult issue. Harvard began in 1630, and has especially rich 18th century print collections, so it has been aware that “libraries buy for the ages.” The sticky issues come with remote and perpetual access, and what happens when a publisher ceases publishing.

Flecker didn’t mention library projects like LOCKSS or Portico in his presentation, though they do exist to answer those needs. He did say that “DRM is a bad actor” and it’s technically challenging to archive digital content. Though there have been various initiatives from libraries, publishers, and third parties, he said “Publishers have backed out,” and there are open questions about rights, responsibilities, and who pays for what. In the question and answer period that followed, Flecker said Harvard “gives lots of money” to Portico.”



Third, aggregation is common. Most ejournal content is licensed in bundles and consortia and buying clubs are common. Aggregated platforms provide useful search options and intercontent functionality.

Fourth, statistics matter, since they show utility and value for money spent. Though the COUNTER standard is well-defined and SUSHI gives a protocol for exchange of multiple stats, everyone counts differently.

Fifth, discovery is critical. Publishers have learned that making content discoverable increases use and value. At first, metadata was perceived to be intellectual property (as it still is, apparently), but then there was a grudging acceptance and finally, enthusiastic participation. It was unclear which metadata Flecker was describing, since many publisher abstracts are still regarded as intellectual property. He said Google is now a critical part of the discovery process.

Linkage was the sixth point. Linking started with citations, when publishers and aggregators realized that many footnotes contained links to articles that were also online. Bilateral agreements came next, and finally, the Digital Object Identifier (DOI) generalized the infrastructure and helped solve the “appropriate copy” problem, along with OpenURL. With this solution came true interpublished, interplatform, persistent and actionable links which are now growing beyond citations.

Seventh, there are early glimpses of text mining in ejournals. Text is being used as fodder for computational analysis, not just individual reading. This has required somewhat different licenses geared for computation, and also needs a different level of technical support.

Last, there are continuing requirements for scholarly citation that is:• Unambiguous• Persistent• At a meaningful level. Article level linking in journals has proven to be sufficient, but the equivalent for books (the page? chapter? paragraph?) has not been established in an era of reflowable text.

In the previous panel, Peter Brantley asked the presenters on digital warehouses about persistent URLS to books, and if ISBNs would be used to construct those URLs. There was total silence, and then LibreDigital volunteered that redirects could be enabled at publisher request.

As WorldCat.org links have also switched from ISBN to OCLC number for permanlinks, this seems like an interesting question to solve and discuss. Will the canonical URL for a book point to Amazon, Google, OCLC, or OpenLibrary?

NetConnect Spring 2007 podcast episode 3

In Requiem for a Nun, William Faulkner famously said, “The past isn’t dead. It isn’t even past.” With the advent of new processes, the past can survive and be retrieved in new ways and forms. The new skills needed to preserve digital information are the same ones that librarians have always employed to serve users: selection, acquisition, and local knowledge.

The print issue of NetConnect is bundled with the April 15th issue of Library Journal, or you can read the articles online.

Jessamyn West of librarian.net says in Saving Digital History that librarians and archivists should preserve digital information, starting with weblogs. Tom Hyry advocates using extensible processing in Reassessing Backlogs to make archives more accessible to users. And newly appointed Digital Library Federation executive director Peter Brantley covers the potential of the rapidly evolving world of print on demand in a Paperback in 4 Minutes. Melissa Rethlefsen describes the new breed of search engines in Product Pipeline, including those that incorporate social search. Gail Golderman and Bruce Connolly compare databases’ pay-per-view in Pay by the Slice, and Library Web Chic Karen Coombs argues that librarians should embrace a balancing act in the debate between Privacy vs Personalization.

Jessamyn and Peter join me in a far-ranging conversation about some of the access challenges involved for readers and librarians in the world of online books, including common APIs for online books and how to broaden availability for all users.

Books
New Downtown Library
Neal Stephenson
Henry Petroski

Software
Greasemonkey User Scripts
Twitter
Yahoo Pipes
Dopplr

Outline
0:00 Music
0:10 Introduction

1:46 DLF Executive Director Peter Brantley
2:30 California Digital Library

4:13 Jessamyn West
5:08 Ask Metafilter
6:17 Saving Digital History
8:01 What Archivists Save
12:02 Culling from the Firehose of Information
12:34 API changes
14:15 Reading 2.0
15:13 Common APIs and Competitive Advantage
17:15 A Paperback in 4 Minutes
18:36 Lulu
19:06 On Demand Books
21:24 Attempts at hacking Google Book Search
22:30 Contracts change?
23:17 Unified Repository
23:57 Long Tail Benefit
24:45 Full Text Book Searching is Huge
25:08 Impact of Google
27:08 Broadband in Vermont
29:16 Questions of Access
30:45 New Downtown Library
33:21 Library Value Calculator
34:07 Hardbacks are Luxury Items
35:47 Developing World Access
37:54 Preventing the Constant Gardener scenario
40:21 Book on the Bookshelf
40:54 Small Things Considered
41:53 Diamond Age
43:10 Comment that spurred Brantley to read the book
43:40 Marketing Libraries
44:15 Pimp My Firefox
45:45 Greasemonkey User Scripts
45:53 Twitter
46:25 Yahoo Pipes
48:07 Dopplr
50:25 Software without the Letter E
50:45 DLF Spring Forum
52:00 OpenID in Libraries
53:40 Outro
54:00 Music

Listen here or subscribe to the podcast feed

Dreaming in Code (review)

Dreaming in code Dreaming in code: two dozen programmers, three years, 4,732 bugs, and one quest for transcendent softwareScott Rosenberg; Crown Publishers 2007WorldCatRead OnlineLibraryThingGoogle BooksBookFinder  Salon’s Scott Rosenberg has written an elegant bird’s eye view of modern software development by observing the development of Chandler, an open source calendaring project. It was originally publicized as a way to kill the Exchange server hegemony in much the same way that Apache has dominated Microsoft’s IIS.

Yet as the subtitle says, “two dozen programmers, three years, 4,732 bugs, and one quest for transcendent software” hasn’t yet resulted in a product ready for general consumption.

The detours have been interesting. We witness the birth of PyLucene, as developers seek a full-text indexing solution that works with their unified repository. And perhaps CalDAV, soon to ship with OS X’s Leopard, will be the project’s legacy.

It’s a compelling vision: a type-agnostic program to manage email, calendar events, and contacts. Yet Google chose dis-integration with its calendar and Gmail. And Apple has made backend data integration possible, but has kept the individual applications separate.

As the project enters its third year, Rosenberg takes a detour into the history of software development. After surveying the hilltop, he makes a modest recommendation. Computer science programs should be more like MFA programs, which require students to study great works, share work, and revise constantly.

During this chapter, 37 Signals’s Getting Real methodology is held up, along with The Joel Test for software development as possible signposts on the road ahead. Since Ruby on Rails came from a simple tasks list, perhaps there is some life in Getting Real for complicated projects, too.

In fact, the scenery is often as enjoyable as the narrative. I was happy to learn that CivicSpace, a Drupal module/modification came from Chandler’s benevolent dictator-for-life, Mitch Kapor. An excerpt from the book is up at Technology Review that delves into the history of Hungarian notation.

As the Chandler project continues to take shape, one ponders the irony that if the developers had been using a completed program that fulfilled the dream, their project might be done already. The hardest software to finish may be that which measures time. Perhaps we need the next Proust to reinvent computer science. Until then, Dreaming in Code will have to suffice.

NetConnect Winter 2007 podcast episode 2

This is the second episode of the Open Libraries podcast, and I was pleased to have the opportunity to talk to some of the authors of the Winter netConnect supplement, entitled Digitize This!

The issue covers how libraries can start to digitize their unique collections. K. Matthew Dames and Jil Hurst-Wahl wrote an article about copyright and practical considerations in getting started. They join me, along with Lotfi Belkhir, CEO of Kirtas Technologies, to discuss the important issue of digitization quality.

One of the issues that has surfaced recently is exactly what libraries are receiving from the Google Book Search project. As the project grows beyond the initial five libraries into more university and Spanish libraries, many of the implications have become more visible.

The print issue of NetConnect is bundled with the January 15th issue of Library Journal, or you can read the articles online.

Recommended Books:
Kevin
Knowledge Diplomacy

Jill
Business as Unusual

Lotfi
Free Culture
Negotiating China
The Fabric of the Cosmos

Software
SuperDuper
Google Documents
Arabic OCR

0 Music and Intro
1:59 Kevin Dames on his weblog Copycense
2:48 Jill Hurst-Wahl on Digitization 101
4:16 Jill and Kevin on their article
4:34 SLA Digitization Workshop
5:24 Western NY Project
6:45 Digitization Expo
7:43 Lotfi Belkhir
9:00 Books to Bytes
9:26 Cornell and Microsoft Digitization
11:00 Scanning vs Digitization
11:48 Google Scanning
15:22 Michael Keller’s OCLC presentation
16:14 Google and the Public Domain
17:52 Author’s Guild sues Google
21:13 Quality Issues
24:10 MBooks
26:56 Public Library digitization
27:14 Incorporating Google Books into the catalog
28:49 CDL contract
30:22 Microsoft Book Search
31:15 Double Fold
39:20 Print on Demand and Digitization
39:25 Books@Google
43:14 History on a Postcard
45:33 iPRES conference
45:46 LOCKSS
46:45 OAIS