in Open Archives Initiative, Open URL

Using Drupal to put Endnote online

There is still no easy way to manage a library of references on a personal or institutional site. Librarians who want to put up a list of institutional publications, or researchers who want to share references are limited by existing software limitations, privacy concerns, or technical road blocks. This problem has been mitigated by a open source CMS with a handy bibliographic data module.

The Drupal content management system is attractive to many librarians and information scientists because of its deep use of taxonomy. Daniel Chudnov uses it to power Open Source Systems for Libraries, and his personal weblog, One Big Library. Roy Tennant uses Drupal for the TechEssence.info, and the Ann Arbor Public Library uses it for user registration, resource weblogs, and the overall site.

However, state of the art in bibliographic management and collaboration is still stuck in 1990. When a writer wants to collect articles, there are a number of client applications (all owned by Thomson ISI ResearchSoft, including Endnote, ProCite, and Reference Manager, plus WriteNote) that do a nice job of saving the references and integrating with word processors to format the citations.

Endnote is the most commonly-used program, but it was not designed to share references. Modern science is all about collaboration, from grant proposals to international research. In the worst case, sharing an Endnote library on a network server can cause corruption. In the best case, shared Endnote libraries are limited to read-only if another person has it open, which limits collaboration.

A version of EndnoteWeb has been in development for most of 2006, and is promised by January of next year. Early reports of integration with Web of Science tell of limited functionality and interoperability.

In 2002, a number of former Reference Manager employees waited for their non-compete agreements with ISI to expire, then founded RefWorks, an online version of the familiar bibliographic managers.In the last two years, applications including Connotea and CiteULike have integrated bilbiographic manager capabilities to their social bookmarking applications. Both allow RIS and BibTeX upload and download to systems managed at Nature Publishing Group and the University of Manchester, respectively.

At Cold Spring Harbor Laboratory the annual reports of the institution have listed lab publications for over 100 years. These references have not been added to Pubmed, which still only goes back to 1950. Thus, this unique information needed to be put into a format so that scholars could cite the early history of genetics, and the tragic misfire of eugenics research.

Many approachs were tried. One early method was programmer-centric, where the data was entered into a SQL database and a web front-end was scripted to add basic fields. While this was a promising start, it left out the rich data fields that enable bibliographic managers to capture complete citation information.

Since the library was examining digital asset management systems, Greenstone was assessed for its citation abilities. Ian Witten was able to jury-rig a solution that imported RIS information about citations, but getting them to display in a full way wasn’t simple.

As the prototyping continued, the initial database of 1800 records was exported out of the SQL database into comma separated value (CSV) format, and imported into Endnote. The archives clerk started assessing the reference types, and added new fields. For example, Institution was added so that a sort by the name could be used. A new reference type was added for non-standard reports.

In the process of adding this information, Endnote’s integration with OpenURL became useful. Using the standard bibliographic fields, it was possible to launch a search that queried the library’s subscriptions to see if a full-text version existed. And for many articles in Science magazine, a full-text scan was available.

In the short-term, links to the JSTOR archive were added to Endnote. Longer-term, it would be useful to put in COinS from the web interface so that every citation could be queried via OpenURL.

Cold Spring Harbor Laboratory already had a site license for Endnote, so switching to RefWorks wasn’t feasible. In addition, the local version of Connotea isn’t exacly lightweight to deploy, requriing two MySQL databases and memcached to handle the online load. Since Nature is currently funding the open-source project, questions were raised about the continuting development of the project.

The archives clerk finished the authority control work on the Endnote database, which included hand-checking the references to the print version of the annual reports. Once this was completed, a need was voiced to make these references available online.

Ron Jeromeof the National Research Council Canada Institute for Chemical Process and Environmental Technology wrote a Bibliography module for Drupal which allows Endnote import in .enw or XML formats. This module is currently being extended to allow Open Archives Initiative harvesting.

This module was installed, and the 2200 Cold Spring Harbor Laboratory publications from 1890-1950 were imported into MySQL. The display is clear, and the default display is citation format. All other fields were imported, but live in the database for display on demand.

This module holds great promise for archive integration, since harvesting by OAI would allow libraries to harvest the records from web resources that aren’t specifically enabled for archives management. Endnote format is a lowest barrier format for scientists and researchers.

In the future, Cold Spring Harbor Laboratory hopes to integrate these early records with the other archives collections managed by Digitool. For now, other laboratories and libraries can use Drupal and the Bibliography module for easy reference sharing.

Write a Comment

Comment