Digital archiving enhances Caltech research
Kimberly Douglas

The Caltech Sherman Fairchild Library of Engineering and Applied Science is actively contributing to the revolution in scientific publishing that got underway in the late ’90s with widespread use of the Internet. While engineering departments nationwide have a long history of documenting the research of their faculties, it is only in the last decade that such technical reports have been available online; and only in the last several years that a concerted, worldwide effort has been made to standardize how such reports are put online so that they can be easily found and accessed.

For decades—even as far back as the ’50s—computer science departments around the world have published technical reports documenting their research. Through a Defense Advanced Research Projects Agency (DARPA) grant in 1994, this effort was moved to a digital library environment called NCSTRL, the Networked Computer Science Technical Reference Library (www.ncstrl.org). Carnegie Mellon University, Cornell University, and MIT, among others, were early participants in this digital repository of technical reports. Over the years the list of participating organizations has expanded to more than 200, and the project has curtailed the need to print runs of the reports. As long as each site keeps its collection current and its repository server up, everyone on the network has immediate access to the worldwide collection.

This model is now finding its way into almost every discipline, providing a means to communicate directly with colleagues, known and unknown. Several groups within the Engineering and Applied Science division at Caltech have a technical report series: the GALCIT (Graduate Aeronautical Laboratories) Reports in Aeronautics, the Environmental Quality Laboratory Technical Reports, and the Keck Laboratory Reports in Environmental Engineering, to name a few, in addition to the Institute’s Computer Science Technical Reports. Other series relate to specific research projects: Caltech’s Accelerated Strategic Computing Initiative (ASCI) Technical Reports is a recent series, while the Center for Advanced Computing Research Technical Reports were initiated as long ago as 1991.

The Caltech Library System digital repository. The Caltech Library System is working with Division of Engineering and Applied Science faculty to promote networked access to the division’s research for the national and international scientific communities. The library’s goal is to provide a digital collection service that will assist in the dissemination of faculty research across the Web network and in perpetuity.

In collaboration with a group wishing to create a digital archive, the library establishes a policy document describing the scope of the collection and how the contents are to be certified. An author or other responsible party formally gives permission, allowing Caltech, via the library, to place the reports in the digital repository. The library converts any print-only reports to portable document format (PDF) files, adds the metadata (descriptive and identifying elements of the document) for searching, and then submits each report electronically to the local digital repository (http://library.caltech.edu/digital/) for archiving.

The library has already extended the Computer Science Technical Report collection online back through 1987 and plans to continue the conversion process until the series is completely digitized. New technical reports are added to the repository as soon as the appropriate Caltech faculty member approves them and they are submitted to the library. The ASCI Technical Reports repository was launched just a few months ago and reports are added as authors make them available. Also in the works is an archive of all Caltech theses (see http://gwaihir.caltech.edu:8880/ETD-db/ for a demonstration).

Caltech’s repository and worldwide scholarly communication. One might well ask what is required of archiving in an environment as mutable and volatile as that of the digital network. The library’s commitment entails adhering to current and evolving national standards for protocols, file formats, and markup. To that end, the Caltech Library System has joined the Coalition of Networked Information (www.cni.org) and is an active participant in the Open Archives Initiative (OAi, at www.openarchives.org/).

The Coalition of Networked Information was founded in 1990 and is an organization supported by institutional members representing higher education, publishing, networks and telecommunications, information technology, and libraries and library organizations. The objective is to advance the potential for networked information technology to increase scholarly communication and enrich intellectual productivity.

OAi, more specifically, aims to support archives of many different types, with an emphasis on allowing the harvesting of metadata describing diverse “records” of content stored in managed repositories. In the near future, it is quite possible that others will develop discovery services ultimately pointing to the repository maintained by the Caltech Library System. By being integrated within this larger context, the collections that the Caltech Library System maintains will be joined logically with others for discovery purposes. The aim is for these digital collections to eventually be available on distributed servers worldwide and permanently accessible to the scientific community.

Kimberly Douglas is director of Caltech’s Sherman Fairchild Library of Engineering and Applied Science.