Are You Paying Attention?

Not for the first time, the glut of incoming information threatens to push out useful knowledge into merely a cloud of data. And there’s no doubt that activity streams and linked data are two of the more interesting things to aid research in this onrushing surge of information. In this screen-mediated age, the advantages of deep focus and hyper attention are mixed up like never before, since the advantage accrues to the company who can collect the most data, aggregate it, and repurpose it to willing marketers.

N. Katherine Hayles does an excellent job of distinguishing between the uses of hyper and deep attention without privileging either. Her point is simple: Deep attention is superb for solving complex problems represented in a single medium, but it comes at the price of environment alertness and flexibility of response. Hyper attention excels at negotiating rapidly changing environments in which multiple foci compete for attention; its disadvantage is impatience with focusing for long periods on a noninteractive object such as a Victorian novel or complicated math problem.

Does data matter?

The MESUR project is one of the more interesting research projects going, now living on as a product from Ex Libris called bx. Under the hood, MESUR looks at the research patterns of searches, not simply the number of hits, and stores the information as triples, or subject-predicate-object information in RDF, the resource description framework. RDF triple stores can put the best of us to sleep, so one way of thinking about it is smart filters. Having semantic information available allows computers to distinguish between Apple the fruit and Apple the computer.

In use, semantic differentiation gives striking information gains. I picked up the novel Desperate Characters, by Paula Fox. While reading it, I remembered that I first heard it mentioned in an essay by Jonathan Franzen, who wrote the foreward to the edition I purchased. This essay was published in Harper’s, and the RDF framework in use on harpers.org gave me a way to see articles both by Franzen, as well articles that were about him. This semantic disambiguation is the obverse of the firehose of information that is monetized from advertisements.

Since MESUR is pulling information from CalTech and Los Alamos National Laboratory’s SFX link resolver service logs, there’s a immediate relevance filter applied, given the scientists who are doing research in those institutions. Using the information contained in the logs, it’s possible to see if a given IP address belonging to faculty or department) goes through an involved research process, or a short one. The researcher’s clickstream is captured, and fed back for better analysis.  Any subsequent researcher who clicks on a similar SFX link has a recommender system seeded with ten billion clickstreams. This promises researchers a smarter Works Cited, so that they can see what’s relevant in their field prior to publication. Competition just got smarter.

Standards based way of description

Attention.xml, first proposed in 2004 as an open standard by Technorati technologist Tantek Celik and journalist Steve Gilmor, promised to give priority to items that users want to see. The problem, articulated five years ago, was that feed overload is real, and the need to see new items and what friends are also reading requires a standard that allows for collaborative reading and organizing.

The standard seems to have been absorbed into Technorati, but the concept lives on in the latest beta of Apple’s browser Safari, which lists Top Sites by usage and recent history, as does Firefox’s Speed Dial. And of course, Google Reader has Top Recommendations, which tries to leverage the enormous corpus of data it collects into useful information.

Richard Powers’ novel Galatea 2.2 describes an attempt to train a neural network to recognize the Great Books, but finds socializing online to be a failing project: “The web was a neighborhood more efficiently lonely than the one it replaced. Its solitude was bigger and faster. When relentless intelligence finally completed its program, when the terminal drop box brought the last barefoot, abused child on line and everyone could at last say anything to everyone else in existence, it seemed to me we’d still have nothing to say to each other and many more ways not to say it.” Machine learning has its limits, including whether the human chooses to pay attention to the machine in a hyper or deep way.

Hunch, a web application designed by Caterina Fake, known as co-founder of Flickr, is a new example of machine learning. The site offers to “help you make decisions and gets smarter the more you use it.” After signing up, you’re given a list of preferences to answer. Some are standard marketing questions, like how many people live in your household, but others are clever or winsome. The answers are used to construct a probability model, which is used when you answer “Today, I’m making a decision about…” As the application is a work in progress, it’s not yet a replacement for a clever reference librarian, even if its model is quite similar to the classic reference interview. It turns out that machines are best at giving advice about other machines, and if the list of results incorporates something larger than the open Web, then the technology could represent a leap forward. Already, it does a brilliant job at leveraging deep attention to the hypersprawling web of information.

How to Achieve True Greatness

Privacy has long returned to norms first seen in small-town America before World War II, and our sense of self is next up on the block.  This is as old as the Renaissance described in Baldesar Castiglione’s The Book of the Courtier and as new as twitter, the new party line, which gives ambient awareness of people and events.

In this age of information overload, it seems like a non sequitur that technology could solve what it created. And yet, since the business model of the 21st century is based on data and widgets made of code, not things, there is plenty of incentive to fix the problem of attention. Remember, Google started as a way to assign importance based on who was linking to who.

This balance is probably best handled by libraries, with their obsessive attention to user privacy and reader needs, and librarians are the frontier between the machine and the person. The open question is, will the need to curate attention be overwhelming to those doing the filtering?

Galatea 2.2 Galatea 2.2 Richard L. Powers; Farrar, Straus, Giroux 1995

ALA 2007: Top Tech Trends

At the ALA Top Tech Trends Panel, panelists including Marshall Breeding, Roy Tennant, Karen Coombs, and John Blyberg discussed RFID, open source adoption in libraries, and the importance of privacy.

Marshall Breeding, director for innovative technologies and research at Vanderbilt University Libraries (TN), started the Top Tech Trends panel by referencing his LJ Automation Marketplace article, “An Industry Redefined,” which predicted unprecedented disruption the ILS market. Breeding said 60 percent of the libraries in one state are facing a migration due to the Sirsi/Dynix product roadmap being changed, but he said not all ILS companies are the same.

Breeding said open source is new to the ILS world as a product, even though it’s been used as infrastructure in libraries for many years. Interest has now expanded to the decision makers. The Evergreen PINES project in Georgia, with 55 of 58 counties participating, was mostly successful. With the recent decision to adopt Evergreen in British Columbia, there is movement to open source solutions, though Breeding cautioned it is still miniscule compared to most libraries.

Questioning the switch being compared to an avalanche, Breeding said several commercial support companies have sprung up to serve the open source ILS market, including Liblime, Equinox, and CARe Affiliates. Breeding predicted an era of new decoupled interfaces.

John Blyberg, head of technology and digital initiatives at Darien Public Library (CT), said the back end [in the ILS] needs to be shored up because it has a ripple effect on other services. Blyberg said RFID is coming, and it makes sense for use in sorting and book storage, echoing Lori Ayres point that libraries need to support the distribution demands of the Long Tail. Feeling that privacy concerns are non-starters, because RFID is essentially a barcode, he said the RFID information is stored in a database, which should be the focus of security concerns.

Finally, Blyberg said that vendor interoperability and a democratic approach to development is needed in the age of Innovative’s Encore and Ex Libris’ Primo, both which can be used with different ILS systems and can decouple the public catalog from the ILS. With the xTensible catalog (xC) and Evergreen coming along, Blyberg said there was a need for funding and partners to further enhance their development.

Walt Crawford of OCLC/RLG said the problem with RFID is the potential of having patron barcodes chipped, which could lead to the erosion of patron privacy. Intruders could datamine who’s reading what, which Crawford said is a serious issue.

Joan Frye Williams countered that both Blyberg and Crawford were insisting on using logic on what is essentially a political problem. Breeding agreed, saying that airport security could scan chips, and that my concern is that third generation RFID chips may not be readable in 30 years, much less the hundreds of years that we expect barcodes to be around for.

Karen Coombs, head of web services at the University of Houston (TX), listed three trends:
1. The end user as content contributor, which she cautioned was an issue. What happens if YouTube goes under and people lose their memories? Combs pointed to the project with the National Library of Australia and its partnership with Flickr as a positive development.
2. Digital as format of choice for users, pointing out iTunes for music and Joost for video. Coombs said there is currently no way for libraries to provide this to users, especially in public libraries. Though companies like Overdrive and Recorded Books exist to serve this need, perhaps her point was that the consumer adoption has superseded current library demand.
3. A blurred line between desktop and web applications, which Coombs demonstrated with YouTube remixer and Google Gears, which lets you read your feeds when you’re offline.

John Blyberg responded to these trends, saying that he sees academic libraries pursuing semantic web technologies, including developing ontologies. Coombs disagreed with this assessment, saying that libraries have lots of badly-tagged HTML pages. Tennant agreed, saying If the semantic web arrives, buy yourself some ice skates, because hell will have frozen over.

Breeding said that he longs for SOA [services-oriented architecture] but I’m not holding my breath. And Walt Crawford said, Roy is right—most content providers don’t provide enough detail, and they make easy things complicated and don’t tackle the hard things. Coombs pointed out, People are too concerned with what things look like, but Crawford interjected, not too concerned.

Roy Tennant, OCLC senior program manager, listed his trends:
1. Demise of the catalog, which should push the OPAC into the back room where it belongs and elevate discovery tools like Primo and Encore, as well as OCLC WorldCat Local.
2. Software as a Service (SaaS), formerly known as ASP and hosted services, which means librarians don’t have to babysit machines, and is a great thing for lots of librarians.
3. Intense marketplace uncertainty due to the private equity buyouts of ExLibris and SirsiDynix and the rise of Evergreen and Koha looming options. Tennant also said he sees WorldCat Local as a disruptive influence. Aside from the ILS, the abstract and indexing (A&I) services are being disintermediated as Google and OCLC are going direct to publishers to license content.
Someone asked if libraries should get rid of local catalogs, and Tennant said, only when it fits local needs.

Walt Crawford said:
1. Privacy still matters. Crawford questioned if patrons really wanted libraries to turn into Amazon in an era of government data mining and inferences which could track a ten year patron borrowing pattern.
2. The slow library movement, which argues that locality is vital to libraries, mindfulness matters, and open source software should be used where it works.
3. The role of the public library as publisher. Crawford pointed out libraries in Charlotte-Mecklenberg County, libraries in Vermont that Jessamyn West works with, and Wyoming as farther along this path, and said the tools are good enough that it’s becoming practical.

Blyberg said that systems need to be more open to the data that we put in there. Williams said that content must be disaggregatable and remixable, and Coombs pointed out the current difficulty of swapping out ILS modules, and said ERM was a huge issue. Tennant referenced the Talis platform, and said one of Evergreen’s innovations is its use of the XMPP (Jabber) protocol, which is easier than SOAP web services, which are too heavyweight.

Marshall Breeding responded to a question asking if MARC was dead, saying I’m married to a cataloger, but we do need things in addition to MARC, which is good for books, like Dublin Core and ONIX. Coombs pointed out that MARCXML is a mess because it’s retrofitted and doesn’t leverage the power of XML. Crawford said, II like to give Roy [Tennant] a hard time about his phrase MARC is dead, and for a dying format, the Moen panel was full at 8 a.m.

Questioners asked what happens when the one server goes down, and Blyberg responded, What if your T-1 line goes down? Joan Frye Williams exhorted the audience to examine your consciences when you ask vendors how to spend their time. Coombs agreed, saying that her experience on user groups had exposed her to crazy competing needs that vendors are faced with, [they] are spread way too thin. Williams said there are natural transition points and she spoke darkly of a pyramid scheme and that you get the vendors you deserve. Coombs agreed, saying, Feature creep and managing expectations is a fiercely difficult job, and open source developers and support staff are different people.

Joan Frye Williams, information technology consultant, listed:
1. New menu of end-user focused technologies. Williams said she worked in libraries when the typewriter was replaced by an OCLC machine, and libraries are still not using technology strategically. Technology is not a checklist, Williams chided, saying that the 23 Things movement of teaching new skills to library staff was insufficient.
2. Ability for libraries to assume development responsibility in concert with end-users
3. Have to make things more convenient, adopting (AI) artificial intelligence principles of self-organizing systems. Williams said, If computers can learn from their mistakes, why can’t we?

Someone asked why libraries are still using the ILS. Coombs said it’s a financial issue, and Breeding responded sharply, saying, How can we not automate our libraries? Walt Crawford agreed, saying, Are we going to return to index cards?

When the panel was asked if library home pages would disappear, Crawford and Blyberg both said they would be surprised. Williams said the product of the [library] website is the user experience. She said Yorba Linda Public Library (CA) is enhancing their site with a live book feed that updates as books are checked in, a feed scrolls on the site.

And another audience member asked why the panel didn’t cover toys and protocols. Crawford said outcomes matter, and Coombs agreed, saying I’m a toy geek but it’s the user that matters. Many participants talked about their use of Twitter, and Coombs said portable applications on a USB drive have the potential to change public computing in libraries. Tennant recommended viewing the Photosynth demo, first shown at the TED conference.

Finally, when asked how to keep up with trends, especially for new systems librarians, Coombs said, It depends what kind of library you’re working in. Find a network and ask questions on the code4lib [IRC] channel.

Blyberg recommended constructing a well-rounded blogroll that includes sites from the humanities, sciences, and library and information science will help you be a well-rounded feed reader. Tennant recommended a gasp dead tree magazine, Business 2.0. Coombs said the Gartner website has good information about technology adoptions, and Williams recommended trendwatch.com.

Links to other trends:
Karen Coombs Top Technology Trends
Meredith Farkas Top Technology Trends
3 Trends and a Baby (Jeremy Frumkin)
Some Trends from the LiB (Sarah Hougton-Jan)
Sum Tech Trends for the Summer of 2007 (Eric Lease Morgan)

And other writeups and podcast:
Rob Styles
Ellen Ward
Chad Haefele

IDPF: Google and Harvard

Libraries And Publishers
At the 2007 International Digital Publishing Forum (IDPF) in New York May 9th, publishers and vendors discussed the future of ebooks in an age increasingly dominated by large-scale digitization projects funded by the deep pockets of Google and Microsoft.

In a departure from the other panels, which discussed digital warehouses and repositories, both planned and in production from Random House and HarperCollins, Peter Brantley, executive director of the Digital Library Federation and Dale Flecker of Harvard University Library made a passionate case for libraries in an era of information as a commodity.

Brantley began by mentioning the Library Project on Flickr, and led with a slightly ominous series of slides: “Libraries buy books (For a while longer), followed by “Libraries don’t always own what’s in the book, just the book (the “thing” of the book).¨

He then reiterated the classic rights that libraries protect: The Right to Borrow, Right to Browse, Right to Privacy, and Right to Learn, and warned that “some people may become disenfranchised in the the digital world, when access to the network becomes cheaper than physical things.” Given the presentation that followed from Tom Turvey, director of the Google Book Search project, this made sense.

Brantley made two additional points, saying “Libraries must permanently hold the wealth of our many cultures to preserve fundamental Rights, and Access to books must be either free or low-cost for the world’s poor.”¨ He departed from conventional thinking on access, though, when he argued that this low-cost access didn’t need to include fiction. Traditionally, libraries began as subscription libraries for those who couldn’t afford to purchase fiction in drugstores and other commercial venues.

Finally, Brantley said that books will become communities as they are integrated, multiplied, fragmented, collaborative, and shared, and publishing itself will be reinvented. Yet his conclusion contained an air of inevitability, as he said, “Libraries and publishers can change the world, or it will be transformed anyway.”

A podcast recording of his talk is available on his site.

Google Drops A Bomb
Google presented a plan to entice publishers to buy into two upcoming models for making money from Google Book Search, including a weekly rental “that resembles a library loan” and a purchase option, “much like a bookstore,” said Tom Turvey, director of Google Book Search Partnerships.¨ The personal library would allow search across the books, expiration and rental, and copy and paste. No pricing was announced. Google has been previewing the program at events including the London Book Fair.

Turvey said Google Book Search is live in 70 countries and eight languages. Ten years ago, zero percent of consumers clicked before buying books online, and now $4 billion of books are purchased online. “We think that’s a market,”Turvey said, “and we think of ourselves as the switchboard.”

Turvey, who previously worked at bn.com and ebrary, said publishers receive the majority of the revenue share as well as free marketing tools, site-brandable search inside a book with restricted buy links, and fetch and push statistical reporting.¨He said an iTunes for Books was unlikely, since books don’t have one device, model or user experience that works across all categories. Different verticals like fiction, reference, and science, technology and medicine (STM), require a different user experience, Turvey said.

Publishers including SparkNotes requested a way to make money from enabling a full view of their content on Google Books, as did many travel publishers. Most other books are limited to 20 percent visibility, although Turvey said there is a direct correlation between the number of pages viewed and subsequent purchases.

This program raises significant privacy questions. If Google has records that can be correlated with all the other information it stores, this is the polar opposite of what librarians have espoused about intellectual freedom and the privacy of circulation records. Additionally, the quality control questions are significant and growing, voiced by historian Robert Townsend and others.

Libraries are a large market segment to publishers. It seems reasonable to voice concerns about this proposal at this stage, especially those libraries who haven’t already been bought and sold. Others at the forum were skeptical. Jim Kennedy, vice president and director at the Associated Press, said, “The Google guy’s story is always the same: Send us your content and we’ll monetize it.”

Ebooks Ejournals And Libraries
Dale Flecker of the Harvard University Library gave a historical overview of the challenges libraries have grappled with in the era of digital information.

Instead of talking about ebooks, which he said represent only two percent of usage at Harvard, Flecker described eight challenges about ejournals, which are now “core to what libraries do” and have been in existence for 15-20 years. Library consultant October Ivins challenged this statistic about ebook usage as irrelevant, saying “Harvard isn’t typical.” She said there were 20 ebook platforms demonstrated at the 2006 Charleston Conference, though discovery is still an issue.

First, licensing is a big deal. There were several early questions: Who is a user? What can they do? Who polices behavior? What about guaranteed performance and license lapses? Flecker said that in an interesting shift, there is a move away from licenses to “shared understandings,” where content is acquired via purchase orders.¨

Second, archiving is a difficult issue. Harvard began in 1630, and has especially rich 18th century print collections, so it has been aware that “libraries buy for the ages.” The sticky issues come with remote and perpetual access, and what happens when a publisher ceases publishing.

Flecker didn’t mention library projects like LOCKSS or Portico in his presentation, though they do exist to answer those needs. He did say that “DRM is a bad actor” and it’s technically challenging to archive digital content. Though there have been various initiatives from libraries, publishers, and third parties, he said “Publishers have backed out,” and there are open questions about rights, responsibilities, and who pays for what. In the question and answer period that followed, Flecker said Harvard “gives lots of money” to Portico.”

Third, aggregation is common. Most ejournal content is licensed in bundles and consortia and buying clubs are common. Aggregated platforms provide useful search options and intercontent functionality.

Fourth, statistics matter, since they show utility and value for money spent. Though the COUNTER standard is well-defined and SUSHI gives a protocol for exchange of multiple stats, everyone counts differently.

Fifth, discovery is critical. Publishers have learned that making content discoverable increases use and value. At first, metadata was perceived to be intellectual property (as it still is, apparently), but then there was a grudging acceptance and finally, enthusiastic participation. It was unclear which metadata Flecker was describing, since many publisher abstracts are still regarded as intellectual property. He said Google is now a critical part of the discovery process.

Linkage was the sixth point. Linking started with citations, when publishers and aggregators realized that many footnotes contained links to articles that were also online. Bilateral agreements came next, and finally, the Digital Object Identifier (DOI) generalized the infrastructure and helped solve the “appropriate copy” problem, along with OpenURL. With this solution came true interpublished, interplatform, persistent and actionable links which are now growing beyond citations.

Seventh, there are early glimpses of text mining in ejournals. Text is being used as fodder for computational analysis, not just individual reading. This has required somewhat different licenses geared for computation, and also needs a different level of technical support.¨Last, there are continuing requirements for scholarly citation that is: • Unambiguous •Persistent • At a meaningful level. Article level linking in journals has proven to be sufficient, but the equivalent for books (the page? chapter? paragraph?) has not been established in an era of reflowable text.

In the previous panel, Peter Brantley asked the presenters on digital warehouses about persistent URLS to books, and if ISBNs would be used to construct those URLs. There was total silence, and then LibreDigital volunteered that redirects could be enabled at publisher request.

As WorldCat.org links have also switched from ISBN to OCLC number for permanlinks, this seems like an interesting question to solve and discuss. Will the canonical URL for a book point to Amazon, Google, OCLC, or OpenLibrary?