User:Mmwidder/sandbox/Digital Libraries Initiative

The Digital Libraries Initiative (DLI), funded from 1994 to 2004 by agencies of the United States federal government, coincided with transformational changes in the larger, global information and communications environment and inspired by new research possibilities associated with the rapid growth of online digital content. Initial planning for the program began in the early 1990s, but the stage was set by developments in computing and networking technologies during prior decades. The DLI projects produced waves of technological innovations across the entire information lifecycle. Although grounded in the Computer and Information sciences, projects funded were continually informed by domain research across disciplines and created innovative methodologies and resources applicable to a broad set of scientific and non-scientific problem domains. Often these were characterized by novel collaborative efforts.

Background
In the early 1990s, the National Science Foundation (NSF), Defense Advanced Research Projects Agency (DARPA) and the National Aeronautics and Space Administration (NASA) each were supporting basic research in computing and communications and began to see the value of digital content organized into “digital libraries” as a broad, newly emerging research area of great potential.

The home program for DLI-1 and DLI-2 was the Special Projects Program in the Division of Information and Intelligent Systems at the National Science Foundation. Additional funding could be made available to existing and new projects to exploit opportunities via different award mechanisms. One of the most effective was the Small Grants for Exploratory Research (SGER) award mechanism. This allowed the Program Director to fund modestly sized projects (up to about $100K normally) without requiring peer review of the proposal. The SGER mechanism was used to great effect as the technology environment was so fluid and dynamic. Many of the SGER awards were targeted at providing new tools and environments for users. In addition, several NSF-wide funding initiatives offered new sources of support. The Information Technologies and Research Initiative (ITR) provided a considerable number of projects with funds to sustain their work as well as funding new DLI projects.

Throughout the DLI Initiatives, Division managers Y.T. Chien and Michael Lesk, as well as their counterparts at the other funding agencies, provided brilliant leadership and light-handed oversight of the overall program, giving program managers considerable license to adjust and add to funded projects in response to changes in the larger technology, organizational and social environments in which the program operated. Their wisdom was instrumental to the program’s success.

During the same period all things related to computing and communications technologies were undergoing exponential change and a global information environment was rapidly developing, moving quickly across national, cultural and economic boundaries. Technology advances led to public demand for more for use in many aspects of human activity. As the cost of digital components plummeted, the ability to create, manipulate, manage and share digital content became available to substantial segments of the world’s population – although undeveloped regions continued to be disadvantaged. Digital content conversion and creation proliferated and eventually surpassed storage availability. Within a few short years universal access to information was becoming a reality but continued to pose major challenges related to practical usability.

The Digital Libraries Initiative intellectual agenda continued to be inspired and defined by a broad community of researchers and practitioners. The program encouraged domain researchers to broaden the boundaries of inquiry in many subject areas and to embrace the use of digital content, tools and infrastructure. In so doing they elucidated promising new directions for information technology research. The legacy of the DLI program is clearly evident today in the enduring scholarly resources that it initiated and in many other aspects of research practices, data management, education and social activities.

The initiative grew out of the agencies’ recognition of the increasing demand for network-based systems and services capable of providing diverse communities of users with coherent access to large, geographically distributed stores of many types of digital content. The technical viability for undertaking this effort was based on dramatic advances being made in computing and communications technologies, infrastructure and supporting services as well as an exceptionally high level of demand.

Informal working groups of agency managers were formed and met regularly over a period of time to define programmatic goals and discuss alternative research agendas. NASA and DARPA were “mission” agencies, with well-defined needs and the initiative needed to fit their agendas as well as the NSF’s traditional role as the nation’s primary source of support for academic basic research. Ideas brought forward by the agency program managers were the starting point for technical workshops organized by domain researchers who then developed research agendas for program solicitations. Planning for the first Digital Libraries program consisted of a series of workshops held in 1992-1993 and involved some of the most prominent researchers at the time. The workshop produced a series of reports compiled into a “Digital Libraries Source Book.”

At this time, NSF was taking the lead role in developing a National Research and Education Network (NREN) as part of the larger Federal High Performance Computing and Communications Program (HPCC). The USA Federal High Performance Computing and Communications Program was put forward in a supplementary report to the President’s FY1992 Budget. The HPCC Program was a framework for coordinating efforts in four general focus areas. It requested additional funds be added to established programs in the eight participating Federal agencies. The HPCC Program was the product of several years of planning and discussion within the Federal Coordination Council for Science, Engineering and Technology (FCCSET). At the heart of the HPCC program were teraflop-scale “Grand Challenge” computational science applications that were viewed as the driving motivation for developing high performance parallel computing systems and high-bandwidth research networks.

The four focus areas of the HPCC Program were: 1) High Performance Computing Systems (supercomputers and scalable parallel systems);

2) Advanced Software Technology and Algorithms;

3) National Research and Education Network (NREN); and,

4) Basic Research and Human Resources.

In 1994, the HPCC Program was expanded to include a fifth program component, Information Infrastructure Technology and Applications (IITA). IITA addressed research and development needed to develop an underlying technology base for the National Information Infrastructure (NII) and to address National Challenges. National Challenges were seen to be those applications that could have broad and beneficial impact on the Nation's economic competitiveness and contribute to the well-being of the citizenry. "Digital and electronic libraries” were identified as one of the National Challenges.

Although the Internet at the time of the Digital Libraries Initiative program announcement was of limited functionality and the bandwidth among sites varied dramatically, expectations were high. The number of nodes and volume of traffic was growing at an unprecedented rate. NSFNet was the designated “backbone” of the NREN 3-tiered design strategy.

The NSFNet Story is one of singular success. It began in 1986 with a three-tiered design which in many ways still exists today: (1) the national backbone, (2) regional networks, and (3) campus networks. TCP/IP was adopted from the ARPANET design. It quickly evolved from a 56kbs backbone in 1986, to a T1 (1.5Mbs) in 1988, to T3 speed (45Mbs) by the end of 1992 addressing over 6,000 networks with more than 1,000,000 host computers, reaching every U.S. state, every continent and 60 countries, although the available bandwidth to connected sites varied considerably. The final report for the NSFNet Program, entitled: “NSFNET: A Partnership for High-Speed Networking, Final Report, 1987-1995” gives a comprehensive description of this program and its achievements.

The idea of internet-accessible digital collections had also gained credence due to the growing adoption of the World Wide Web (WWW) access framework introduced by Tim Berners Lee of CERN in 1991 and a new WWW browser named Mosaic, developed by Marc Andreessen at the National Center for Supercomputing Applications (NCSA) in early 1993. Although the NSFNet bandwidth and number of connections was still limited, the potential was clear. For each digital libraries project it was mandatory that their testbed be accessible via the World Wide Web.

Program Announcement NSF93-141
The first program announcement, “Research in Digital Libraries (NSF93-141): A Joint Initiative of the National Science Foundation, (Defense) Advanced Research Projects Agency and the National Aeronautics And Space Administration,” was released in the fall of 1993 to select and fund up to six projects at up to $1.2M/yr for four years. The Digital Libraries Initiative was designed as a basic research initiative, broadly cast to advance the means to collect, store, organize and access information in digital form via the internet. It identified five broad areas of research:

•	capturing data of all forms (text, images, sound, speech, mixed and multimedia, etc.) and descriptive information about the data (metadata);

•	categorizing and organizing electronic information in a variety of formats;

•	creating advanced software and algorithms for browsing, searching, filtering;

•	abstracting, summarizing and combining large volumes of data, imagery, and all kinds of information; and

•	utilization of networked databases distributed around the nation and around the world.

The Internet was the main consideration driving the program, as stated in the program announcement introduction:

“Information sources accessed via the Internet are the ingredients of a digital library. Today, the network connects some information sources that… include reference volumes, books, journals, newspapers, phone directories, sound and voice recordings, images, video clips, scientific data (raw data streams from instruments and processed information), and private information services such as stock market reports and private newsletters. These information sources, when connected electronically through a network, represent important components of an emerging, universally accessible, digital library.”

Funded DLI-1 Projects
By the submission deadline in February 1994, 73 proposals were received and following a 3-stage review process; (including site visits to finalists), six university-led consortium projects were selected for funding in the Fall of 1994. The six funded projects and research foci were:

1. Carnegie Mellon University Informedia: Digital Video Libraries Research areas: integrated speech, image and natural language for digital video libraries

2. University of Michigan Intelligent Agent Architectures Research areas: software agents; resource federation; artificial service market economies; educational impact

3. Stanford University Uniform Access to Distributed Internet-based Resources Research areas: interoperability; protocols & standards; distributed object architectures; interface design for distributed information retrieval

4. University of California, Santa Barbara Geographic Information Systems Project Focus: Research areas:  spatially-indexed data; content-based retrieval; image-compression; metadata

5. University of Illinois Intelligent Search and the Net Research areas: large-scale information retrieval across knowledge domains; semantic search; SGML; user/usage studies

6. University of California, Berkeley Media Integration and Access Research areas: new models of “documents”; natural language processing; Project Focus: content-based image retrieval; innovative interface design

Once funded, the DLI-1 projects became exceptionally successful in building partnerships with other organizations resulting in technology and resources exchange, sharing expertise, forming collaborative activities and forming valuable interpersonal interactions. This resulted in fresh perspectives on their activities. The partner organizations represented diverse interests in digital libraries technologies and included major US computer and communications companies, academic institutions at all levels, libraries, publishers, government and state agencies, professional associations, and other organizations. A short list of partners included Xerox, IBM, Hewlett-Packard, Microsoft, Digital Equipment, Sun Microsystems, Ricoh, Hitachi, Bell Atlantic, Intel, Illustra, Oracle Corporation, SoftQuad, OpenText, et.al. Content providers include Elsevier, John Wiley, MIT Press, Open University, Telecom Italia, numerous professional societies and federal and state Agencies.

The DLI program and six funded projects became a focus of intense interest from the broader computer and information science and engineering communities. The program was clearly a techno-centric program – almost all of the funding was supporting information and computing research. Concerns of libraries and memory institutions’ were also addressed, but in a more peripheral manner. The role of university libraries in the Initiative was small, but very important. This imbalance continued to be actively debated within the larger digital libraries community. However, given the research orientation of DLI-1 and its role in the larger Federal High Performance Computing and Communications Program (HPCC), a clear funding path to libraries and other memory institutions could not be created. This would be resolved to a degree in the Digital Libraries Initiative – Phase 2 (DLI-2).

Collaboration Between Projects
It was seen as critical to the success of the program to establish a high level of interaction among the projects and to build a program culture in which the projects could freely share their ideas and findings. It was determined that “All Projects” meetings would be held at least once a year and that topical research workshops be funded that would bring together researchers from each of the projects to foster collaboration. In addition, a planning workshop was convened in May 1995 to establish a core research agenda for the projects that would accommodate individual project goals yet simultaneously address the larger and exceptionally challenging problems inherent in creating internet-based, linked digital libraries. Hector Garcia-Molina (Stanford) and Cliff Lynch (University of California) organized the meeting. This meeting proved to identify lasting research priorities, many of which are guiding efforts today, and at the same time made a strong case for a more open research culture that was already taking root as the internet grew and gained functionality.

The report from the meeting recommended that priority be placed on three categories of activities: Infrastructure; Research Agendas and Priorities; and Scaling of Digital Library Experiments. Infrastructure referred to the development of common schemes for the naming of digital objects, and the linking of these schemes to protocols for object transmission, metadata, and object type classifications, as well as a deployed public key cryptosystem infrastructure, including key servers and appropriate standards. Research Agendas and Priorities referred to research in digital library interoperability, research in describing objects and repositories, and research in collection management and organization. Collection management and organization research is the area where traditional library missions and practices are reinterpreted for the digital library environment. Finally, Scaling of Digital Library Experiments referred to a common vision where tens of thousands of repositories of digital information are autonomously managed yet integrated into what users view as a coherent digital library system. The report urged that the development of digital library research move rapidly towards an infrastructure that can support and facilitate this common vision.

The definition of “digital libraries” offered for consideration by the report "An organized collection of multimedia data with information management methods that represent the data as information and knowledge.”

The phrase "digital libraries" had been adopted widely over electronic libraries, virtual libraries, and others. "Electronic" referred primarily to the nature of the digital device technologies and "virtual" implied a synthetic environment that was meant to resemble an original, physical environment. "Digital" referred to the encoding of information on electronic (and other) media. Digital representation enabled enormous functionalities for information corpora. Once digitally encoded it information could be processed and communicated in a multitude of ways through the use of computing and networking tools and infrastructure.

Results of Projects
The DLI-1 projects produced a constant stream of new technologies and tools. They gathered volumes of diverse digital content from many sources – all without incurring cost to the funding agencies. The project testbeds contained a large variety of data objects: text in many scripts and formats, documents, images, aerial photos, sensor data, GIS data, maps, tables, audio, video, multimedia, mixed-media.

Carnegie Mellon
The Carnegie Mellon project began to bring to video and film the same functionality and capability that existed with text, including critical aspects of search, retrieval, categorization and summarization. The team worked with WQED/Pittsburgh to enable users to access, explore and retrieve science and mathematics materials from video archives. The project made significant strides in speech, image and language understanding for video library creation, including news-on-demand processing.

University of California, Berkeley
The University of California, Berkeley project pursued an exceptionally broad agenda of research related to supporting user-oriented access to large distributed collections of diverse data types. Their operational testbed amounted to more than 5 terabytes of data, including the entire collection of documents made available to the project by the California Department of Water Resources - a corpus of about 100,000 pages of scanned text. The project demonstrated important progress in developing innovative technologies for document analysis, content based image retrieval and processes for integrating users' requirements into the design process.

University of California, Santa Barbara
The University of California, Santa Barbara project produced new GIS capabilities and integrated these with their large map collection in the Library. Users could search diverse holdings of spatially-indexed information, including aerial and satellite imagery, maps, gazetteers, geo-referenced photos and a wealth of linked information such as the history and locations of ancient Indian burial sites and other archaeological sites.

University of Illinois
The University of Illinois project addressed data indexing, semantic federation and information retrieval and analysis capabilities of the web. This involved research into search, navigation, structuring, selection, analysis, linking and publishing. The testbed consisted of journal and magazine articles from engineering and science publications, including figures and equations and supported processing, indexing, searching and displaying the SGML-based collection. The collection was later changed to an XML markup scheme. New Web analysis environments based on concept spaces, category maps, text region visualization and path correlations were other research outcomes of the project with the goal of creating a new semantic web analysis environment called the Interspace.

University of Michigan
The University of Michigan research agenda was aimed at automated resource gathering using intelligent agents as well as creating and evaluating a modular, scalable and extensible architecture for digital libraries, capable of supporting information access and brokering in large scale, heterogeneous, dynamic organizations of hybrid (digital and print-on-paper) collections and services. The research was focused on and informed by testbed construction, deployment, and evaluation of multimedia earth and space science collections. The library would support research in these domains as well as support of project-centered learning in high school science classes based upon enhanced access to primary resources. One of the outcomes from the project was testing of early JSTOR content, a project funded by the Mellon Foundation and later to be part of a Not-for-Profit organization, Ithaka.

Stanford University
The Stanford University Project addressed interoperability at multiple levels. Interoperable, federated distributed systems were seen as a critical milestone in digital libraries research and one that would only be reached incrementally. Scalability both in number and type of distributed objects was a problem of huge dimensions and how that would be achieved – either via intelligent software or building more intelligence into digital objects (metadata and context) – was a question yet to be answered. The technical concepts and vocabularies associated with this work was almost opaque to the lay person - a veritable soup of inscrutable notions: language independence, modularity, automated systems supervision, heterogeneous dynamic sources, digital objects, wrappers, ontologies, protocols, programming threads, service fault tolerance, metadata and wave indexes. The Stanford project was addressing many of the core issues addressing distributed repositories' interoperability.

The Stanford Project and the Beginnings of Google
In 1997 an agency site visit team went to Stanford and each of the 15 graduate students working on the Stanford Digital Library project presented their work. The site review team expressed approval for all of the good work being done on the Stanford project. Among the graduate students were Larry Page and Sergei Brin. A new pagerank algorithm developed by them was presented that would enable more accurate search of the web. Named “Backrub” at the time, it was a few months later renamed Google and in 1998 Google Inc. opened for business. Google proved to offer a significant improvement over other web page ranking technologies and made web search faster, easier and more accurate. Today Google is one of the country’s largest and most profitable corporations.

Founding of D-Lib Magazine
During the course of the Digital Libraries Program (DLI-1) communities of researchers, practitioners and users continually inspired and informed the program direction and management. The Principal Investigators themselves were instrumental not only in skillfully guiding their own projects, but in working with each other to build a single, coordinated and balanced program. The agency managers provided oversight and endeavored to identify and secure additional funding to exploit opportunities as they arose. Project reporting and review was structured to fully satisfy agency requirements and simultaneously, to create informative materials for the expanding DL community. At the same time care was taken to avoid creating unnecessary project overhead and distraction for the researchers engaged in their work.

It was determined that capturing the intellectual output of the individual projects and disseminating results was particularly important. D-Lib Magazine was funded by DARPA in 1995 and produced by the Corporation for National Research Initiatives (CNRI). Bill Arms and Amy Friedlander quickly raised D-Lib to a highly professional and authoritative publication and it continues to be an invaluable journal for the DL community today. Bill, then a Vice President at the Corporation for National Research Initiatives also served as an “ex-officio” DLI Program Manager and consultant to the program and provided invaluable advice and guidance. A DLI coordination web site was also established and funded as part of the University of Illinois project. Through the efforts of Susan Harum and Ben Gross who created and managed and gathered relevant material, the site grew to become an accurate record of DLI-1 activities including event information, papers, research and planning reports, and presentations from the projects. The web site was exceptionally comprehensive and the number of visits was in the tens of thousands each month. Both resources were integral to community building and program success.

Efforts of the projects predicted and helped to create the arcs of technology research and development that have led to many of the capabilities being utilized today across the information technologies lifecycle - from content creation, to information access and retrieval to preservation and archiving. New means of collaboration and new models for scholarly communication, better suited to research embedded in a digital environment were posited. One is shown below, given in a 1999 presentation by Robert Wilensky, lead of the University of California, Berkeley project. The projects catalyzed innovative research processes and directly supported graduate and undergraduate students and inspired them to think in new terms. By June of 1997 DLI-1 researchers had produced more than 300 papers and publications.

International Collaboration in DLI-1
By 1997 in became clear that international collaboration was instrumental to the larger goals of the program and also simply the right thing to do in a global information environment. Dan Atkins, leading the University of Michigan project, had already envisioned the network as an emerging distributed knowledge work environment in part based on his experience with the NSF-funded Upper Atmospheric Research Collaboratory (UARC). He proposed forming joint USA-Europe working groups to explore the “vision space” for distributed digital libraries and services. Costantino Thanos took the lead for the European side. Supplementary funds were made available to establish five working groups, each led by one person from the USA and another from Europe. Researchers from other countries participated as well. Shigeo Sugimoto from the University of Tsukuba, Japan provided important perspectives in the metadata discussions. The groups explored and made recommendations in the following areas:

•	Global Resource Discovery

•	Interoperability

•	Metadata

•	Multilingual Information Access

•	Intellectual Property And Economics

The final report for the collaborative working groups was produced by Peter Schäuble and Alan Smeaton in 1998.

Impact of DLI-1
The DLI-1 projects and partner activities successfully demonstrated that large amounts of heterogeneous information could be gathered from many sources and made into coherent collections in testbed-scale repositories. They demonstrated that digital objects of many types could be searched, retrieved and manipulated to yield useful knowledge, and increasing understanding across broad topical domains. DLI-1 research illuminated the complexity and difficulty of fundamental issues of functionality, scalability, interoperability, reliability and usability.

During that time computing and communications technologies for management and access to digital content were undergoing transformational change offering increasing capabilities at decreasing costs. Within a few short years, the ability to create, manipulate, manage and share digital content had become within the financial means of small organizations and individuals. New applications and services were becoming available on a daily basis.

Digital content had become a new driver for internet growth and the appeal and interest in digital libraries continued to grow. “Digital libraries” had become a transformational metaphor for thinking about information, systems, people and interactions between them, encouraging new perspectives on large-scale, distributed information environments and inventive communication and learning practices. Technology discourse had become increasingly sophisticated and there was an exceptionally large community of researchers and practitioners with the goal of making globally distributed information-of-value intellectually accessible to large, diverse user populations desiring knowledge for many purposes.

But the ability to access and use digital content was not keeping pace with technologies and content development. The rate of information creation was far surpassing the rate of development of technologies to access and use the material effectively. As the amount of information on the internet continued to grow, the number of research issues grew with it. A major unanswered question was how to increase functionality over distributed collections of diverse content. There were (at least) two camps debating the issue; those in support of developing more intelligent software and those supporting broad-based efforts to create more intelligible content, beginning at the earliest stages of the information lifecycle. This was just one of many questions being asked. It was clear that there was much more to be done and planning for a second phase of the Digital Libraries Initiative began in earnest.

Formative Stages
The early brainstorming sessions for a follow-on program to DLI-1 were held by a small group of agency program managers at a small French Bistro within walking distance of NSF and DARPA. Weekly meetings were held over a period of several months, as well as continuing informal e-mail dialog with leading researchers in the DL community. Notes were compiled and these were offered as input for a planning meeting organized by Dan Atkins of the University of Michigan with the goal of constructing an intellectual agenda for the new program.

The planning meeting was held in Santa Fe, New Mexico in March of 1997 and attended by more than 50 experts from the broader digital libraries community. Paul Duguid, of the University of California, Berkeley, drafted the initial versions of the final report from the copious materials created by the breakout groups. The workshop produced an idealistic and ambitious vision for digital libraries research and urged the agencies to adopt it, stating:

“Work on digital libraries aims to help with generating, sharing and using knowledge. It aims to improve practices of communities so they are more effective, efficient, productive and better able to maximize the benefits of collaboration. It seeks to extend the content and utility of digital libraries to aid existing communities and to facilitate the emergence of new communities of discourse, research, and learning. Communities in this case are defined on multiple dimensions: geography, common interests, values, needs, culture, language, goals, etc.”

and that,

"...the concept of a "digital library" is not merely equivalent to a digitized collection with information management tools. It is rather an environment to bring together collections, services, and people in support of the full life cycle of creation, dissemination, use, and preservation of data, information, and knowledge."

The workshop encouraged a new program to frame future research around three central areas:

o	system-centered issues, including scalability, interoperability, adaptability and durability;

o	collection-centered issues, including support for many types of data objects and representations; and

o	user-centered issues, including research motivated by the information needs of a diverse user community (i.e. human-centered).

DLI-2 would support research across the entire information lifecycle including content creation, access, use and usability, preservation and archiving. High priority would be placed on interoperability and technology integration, content and collections development and management, applications and operational infrastructure, and understanding digital libraries in domain-specific, economic, social, and international contexts.

Significantly, the program would go beyond computing and communications disciplinary communities and invite proposals from domain scholars, practitioners, and users from many fields including the arts and humanities. By doing so, DLI-2 adopted a viewpoint that significant advances in information technologies research could be made by exploring the perspectives, methods and applications of non-science domains given their rich variety of information types, methodologies and complex research questions. More specifically, the DLI-2 research agenda acknowledged an interdependency between technologies development for science and non-science domains not addressed programmatically before. This would prove to have significant serious consequences in the DLI budget and will be addressed later, as not all computer science departments were supportive of this new twist in program funding. (Griffin, 1998)

Sponsorship by additional agencies was solicited and gained. In addition to NSF, (where 11 separate programs contributed funding), DARPA and NASA, the National Library of Medicine, National Endowment for the Humanities and the Library of Congress pledged support in various forms. The NSF Division of Undergraduate Education was a major contributor, using DLI-2 to begin exploring the resources for the National Science Digital Library (NSDL). Still other agencies participated as partners, joining in planning and working group discussions and All-Project meetings. These agencies included the Institute of Museum and Library Services, the Smithsonian Institution and the National Archives and Records Administration. It was becoming evident that a keen knowledge of leading-edge digital resources would be a primary part of these agencies' programs and assets.

Program Announcement NSF 98-63
The DLI - 2 solicitation (NSF 98-63) involved two separate rounds of submissions: a) July 15, 1998 proposal deadline; and, b) May 17, 1999 proposal deadline. More than 300 proposals were received, with 34 projects initially funded. The total base award amount for these projects was approximately $48M over the period FY1998 - FY2003. As the program progressed, other program funds allowed for more than 20 additional awards and numerous supplemental funding actions.

The projected represented a full-spectrum of activities: fundamental research, content creation and collections development, tools, services, domain-specific applications, new testbeds and highly functional operational environments. The projects addressed issues over the entire information lifecycle. Altogether, DLI-2 projects had researchers participating from 35 different university departments from more than 200 organizations in more than 30 different countries.

Funded DLI-2 Projects
Major awards in the form of cooperative agreements were made to four of the original six DLI-1 projects to expand on their accomplishments. These were:

•	Stanford University

•	Carnegie Mellon University

•	University of California, Santa Barbara

•	University of California, Berkeley

The three California universities put forward a plan for a closely coordinated, collaborative effort to build integrated technologies from the basic connectivity of the Internet and use web technologies as the framework for collaborative creation, access and use of knowledge resources. They would be joined in this effort by the San Diego Supercomputer Center and a new major academic digital library under development, the California Digital Library, a major effort by the nine campuses of the University of California system.

In addition, there were five new large projects, each focusing on distinctly different aspects of digital content creation, management and use. These were:

•	Columbia University: A Patient Care Digital Library: Personalized Search and Summarization over Multimedia Information

•	Tufts University: A Digital Library for the Humanities

•	Michigan State/Northwestern University: Founding a National Gallery of the Spoken Word

•	Cornell University: Security and Reliability in Component-based Digital Libraries

•	Indiana University: Creating the Digital Music Library

The DLI-2 management model was one of a modular, open program structure that would allow for new sponsors and increased budget flexibility to fund new projects at any time, and to build on and enhance existing projects. Program intellectual goals were to enlarge the topical scope, and place more emphasis on content issues as well as technology development and applications to keep pace with advances in the development and the use of distributed, networked information resources from around the nation and the world. In many ways, the program culture reflected the positive aspects of the open culture of the internet by introducing new technologies, new forms of content, and new data management practices and making them freely available to all. DLI-2 managers and researchers also recognized that the internet had transcended national, cultural and language boundaries and actively promoted international collaborative work.

A great strength of the program was the interdisciplinary richness of the projects, the high levels of interaction between them and extensive partnering with private sector corporations and other organizations. On the agencies side, program management proceeded in an exceptional atmosphere of enthusiasm, good will, and cooperation.

The titles imply an overwhelming diversity. Yet the unifying principles articulated in the Santa Fe report are clearly evident – each project addressed at least one of the core issues of using digital information to advance knowledge making and infrastructure development.

Edward Fox from Virginia Tech in an ASIS&T Bulletin article in the Fall of 1999 entitled, “The Digital Libraries Initiative: Update and Discussion”, gave an insightful summary of the DLI-1 and DLI-2 programs and their research coverage. The article contained a table that illustrated the disciplinary scope of the DLI-2 awards:

DLI-2 and New Types of Scholarship
The posture of DLI-2 differed from that of DLI-1 in a very important way. In DLI-1, digital content was a focus of research – how to structure, access and connect it in a distributed environment. This was still the case in many DLI-2 projects, but in others, digital content became an instrument for research. In this case, the faint outlines of an altogether new type of scholarship were beginning to become visible. That mode of scholarship is emerging rapidly today and is termed cyberscholarship or digital scholarship. By developing transformative methodologies based on computation and data, DLI-2 opened new topical areas for inquiry, altered long-standing structures of disciplines and offered new tools and resources for domain informatics.

The DLI-2 projects began immediately to produce a cornucopia of groundbreaking, creative, innovative and transformative accomplishments. The scope of the program was unprecedented. However, given the rate at which types of new digital content were becoming available and linked, and the uptake of information technologies by researchers across disciplines, the response was not altogether unexpected. A few of the accomplishments are noted in the following paragraphs, categorized loosely according to the Santa Fe Workshop groupings, although many of the larger projects addressed research in all three areas. Additional projects were funded as part of the NSF-wide Knowledge and Distributed Intelligence Program and the Information Technology Research Program.

Systems-centered Research
The Stanford/UC Berkeley/UC Santa Barbara/California Digital Library/San Diego Supercomputer Center collaboration made significant progress through a spectrum of activities related to systems, content and user issues. Stanford addressed information processes and protocol issues, library and information services, interface clients and other processing services including interfaces for hand-held devices. Substantial effort was put into CORBA distributed object technology. University of California, Berkeley developed tools and technologies that supported highly improved models of information access, dissemination and mechanisms for new publication models for scholarly communication. University of California, Santa Barbara continued to develop geospatial digital library technologies and tools for applying them in learning environments with research related to automatic geo-referencing text, gazetteer and thesaurus content standards. The Alexandria Digital Library (ADL) architecture was based on extensible web services and handled generic queries. The project offered library services that managed collections of item-level metadata and supported item-level search using a standardized query language. Collection registries provided collection-level discovery services.

SGER and LOCKSS
In 1999, an Small Grant for Exploratory Research (SGER) award was made to Stanford to fund work on a new model for electronic journal modeling, High Wire Press, that also involved a collaborative, easy-to-use, distributed web cache to allow libraries to recover lost materials. Subsequently, a new software architecture called LOCKSS (Lots of Copies Keeps Stuff Safe) was designed and built by David Rosenthal, a distinguished software engineer. It enabled a web library system where storage and management was at once localized, decentralized, distributed, highly replicated, easy to use, inexpensive to operate and hardware independent. It is in widespread use today.

Carnegie Mellon University, Informedia
The Carnegie Mellon Informedia project produced new metadata extractors and summarizers for video content.

Cornell University
The Cornell project, with additional funding from the international program (described later in this paper), workshop supplements and SGER awards, would make significant progress on digital repository technologies by developing the Fedora open source technologies and later Duraspace. The project also partnered with Los Alamos National Laboratory and the University of Southampton, UK to create a sustained framework for open access and archives by initiating the Open Archives Initiative.

Columbia University
The Columbia Project, Personalized Search and Summarization over Multimedia Information (PERSIVAL), worked within hospital, medical research and clinical environments and developed prototype system components for diagnostic procedures, search of heterogeneous, patient specific and multimedia medical records and produced new forms of document summarization, term definition and video summarization resources. It employed machine learning technologies and meta-search techniques in helping to build advanced capabilities in this realm as well as using formal concept models to organize and present digital library content and visualize concepts as well as relationships between digital content.

University of Indiana
The University of Indiana Variations 2 Digital Music Library was the first of its kind to undertake research to provide users access to a multimedia collection of music in a variety of formats and musical styles, develop multiple user applications on a single foundation of content and technology (e.g., music library services, music education), develop a software system that integrates music in multiple media and formats: audio, video, score images, score notation; and, provide a basis for digital library research (e.g., usability, intellectual property, metadata). The Johns Hopkins Lester Levy Sheet Music digital workflow management project was much smaller in scale, but provided critical leadership in this area. A third music content focused award, the joint NSF/JISC Online Music Recognition and Search project, led by Donald Byrd and Tim Crawford, together with these, would launch a new subfield of computer science – music information analysis and retrieval. The result is a large, vibrant community of music information researchers, going well beyond computer scientists. The annual conference for ISMIR - The International Society for Music Information Retrieval, was first held in October 2000 and has garnered a world-wide following. Additional awards continued to be made from program discretionary funds to nurture this innovative line of interdisciplinary research. J. Steven Downie at the University of Illinois received an award to establish an internationally accessible music information retrieval development and testing environment Mirex that has proven to be a valuable community resource. This work continues to be supported by the Andrew W. Mellon Foundation.

University of Pennsylvania
At the University of Pennsylvania, Peter Buneman and others began one of the first projects to address data provenance in digital collections. This encompassed a complex set of issues because one was interested in data at all levels of granularity – from a single pixel in a digital image to a whole database. The project began to provide a substrate for recording and tracking provenance by advancing new data models, new query languages and new storage techniques. Professor Buneman later became a senior member of the Digital Curation Centre at the University of Edinburgh and helped in establishing the research agenda there. Digital Curation has since become a prominent research area with yearly international conferences and meetings.

New Forms of Digital Objects and Collections
Data visualization had been an essential component of computational science since the mid-1980s. Computer simulations to model phenomenon at the extremes of physical reality were an important part of research in physics, astronomy, biology, chemistry and other scientific disciplines. As simulations grew more complex and vast amounts of experimental data become part of the research process, the term eScience came into use. Somewhat later, “big data” was used to characterize processing of massive data sets. In all of these modalities, data visualization is an essential tool for understanding vast quantities of data.

The DLI-2 projects demonstrated conclusively the value of a related, but different type of data visualization - digital representation and enhancement of physical objects. Given the rapid advances in digital processing and device technologies, it had become feasible both technically and economically to create extremely accurate digital surrogates of a wide variety of physical objects and places. Many of the projects focused on artifacts of significant cultural and historical value, building 2D and 3D representations in exacting detail in a non-intrusive fashion, while leaving the originals intact and unaltered. In many cases the digital “copy” demonstrated an increased scholarly value both due to its singular attributes (increased resolution, legibility, elucidation of 3-D structure), but also because the possibilities for linkages to other related objects or analog materials had been greatly increased. Creating high-resolution digital models and representations allowed shared, collaborative use and non-destructive analysis not possible otherwise. Students and scholars around the world could then collaborate on the analysis of objects and materials for which direct access would have been practically impossible. This work offered memory institutions around the world new means to expand and complete diverse collections of artifacts.

Imaging of Manuscripts and Written Materials
DLI-2 projects made great advances in making aging and deteriorating manuscripts and records available. Projects used advanced imaging to recover invisible, illegible and incomplete inscriptions from all types of media. In so doing, they made accessible to scholars a large portion of the human record contained in previously inaccessible or unreadable media. The work employed innovative 2D and 3D illumination and reflectance technologies, back lighting, laser scanning and other creative approaches. In many cases, the work had to be done in situ, given the fragility of the original. Once imaging was completed, 2D and 3D digital image processing algorithms, such as non-planar flattening algorithms, made it possible to enhance, recover and restore information in some cases to a state closely resembling that at the time the artifact was created. DLI-2 researchers also developed optical character recognition technologies (OCR) for non-Roman alphabets and glyphs, including Arabic, Sanskrit and Chinese. Multispectral illumination and planar flattening algorithms revealed markings invisible to the naked eye. Fluorescent imaging could reveal overwritten texts in palimpsests. DLI-2 projects demonstrated that many forms of human expression -- writings, texts, sounds, images, sites, places, objects -- could be captured and represented in digital form, at a resolution to suit the purpose at hand, organized into coherent collections and made widely available at a remarkably modest cost.

University of California, Los Angeles
At the University of California, Los Angeles, Professor Robert Englund created the Cuneiform Digital Library Initiative. For more than ten years he has worked with scholars at the Max Planck Institute for the History of Science and other organizations to build a comprehensive library of images of cuneiform tablets at exceptional resolution with a wealth of contextual information – this was done while that part of the world which contained many of the principal cuneiform sites, Iraq, was engaged in strife. His efforts have saved many of these invaluable records from being lost forever. The project continues to receive support from a number of federal agencies, the Andrew W. Mellon Foundation and the Max Planck Society.

University of Kentucky
The work of the visualization group at the University of Kentucky led by Professor Brent Seales has captured and restored in stunning detail a large number of manuscripts, some more than a millennium old that were severely damaged or deteriorated. Using a variety of innovative techniques, and by building close working relationships with the documents holders, digital copies of the manuscripts are generally available for study to all.

Digital Sanskrit Library
Other researchers looked to writings in non-Roman scripts. The Digital Sanskrit Library, originally a collaborative project between researchers at SUNY at Buffalo and Brown University developed a comprehensive library of materials in Sanskrit. Sanskrit has been the principal culture-bearing language of India for more than three millennia. The manuscripts in Sanskrit number one hundred times those in Greek and Latin combined. Yet despite the abundance of Sanskrit literature, there has been a dearth of material available for instruction besides introductory textbooks.

Imaging of Three-Dimensional Objects and Sites
Similar advances were made in the capture and digital reconstruction of three-dimensional objects. With the introduction of high-resolution laser scanning apparatus and x-ray computed tomography, the external and internal structures of physical objects could be modeled and highly accurate digital facsimiles constructed. At the University of Texas, The Digital Morphology Library, under the direction of Professor Timothy Rowe, has assembled imagery of more than 750 specimens contributed by nearly 150 collaborating researchers. The site allows for the viewing and printing of 3-D objects as well as many other services, including three-dimensional printing of artifacts. The images have been featured in a multitude of professional and public print and broadcast media productions.

Cultural heritage and the arts benefitted from DLI-2 research and technologies developments. The Digital Michelangelo Project at Stanford, led by Professor Marc Levoy, set a standard for excellence in capturing, rendering and analyzing statuary.

Projects such as the one at Brown University led by David Cooper applied scanning and mathematical techniques to identify related fragments and reconstruct individual broken vessels from mounds of pieces found at archaeological sites. Archaeological digital site information could be also combined with field notes, maps, older photographic images and other evidence to create a more complete composite of the site.

Computational photography, an outgrowth of computer graphics research and camera arrays, gave rise to an entirely new way of constructing 3-D models of sites. By taking millions of photos of a site, (such as a famous cathedral) posted on public web sites, researchers developed algorithms for combining them to render 3-D structures in astonishing detail.

Other sites were reconstructed based on recorded knowledge and then reconstructed in virtual spaces. At the University of Virginia, Professor Bernard Frischer produced an extensive, highly detailed, award winning virtual model of Rome in the year 400 A.D.

Imaging techniques were also applied to works of art not only to gain insight into the special genius that was a part of the gifted artist, but also to identify forgeries. James Wang, Jia Li and others at the Pennsylvania State University studied a number of Van Gogh’s works along with famous forgeries and were able to quickly ascertain the difference using brushstroke and texturing analyses. Imaging “into” the various layers of the work gave art historians new tools for studying the works of the masters and sharing the results widely.

Other subject areas were transformed and enriched through geographic information systems technology (GIS), which together with a rapidly growing corpora of special and temporal indexed data, allowed accurate rebuilding of the historic record of times and places. These developments together with ongoing, large-scale digitization of analog collections of library and museum holdings and open access to on-line books, scholarly journals, and other knowledge resources gave new opportunities for scholarship and research in many domains, (particularly the humanities) by greatly expanding the evidence base. For historians and other humanities scholars for whom the “raw material” for research is factual material, massive amounts of new data and tools for discovering relationships among seemingly disparate events became available on a regular basis.

The DLI-2 program also sponsored workshops dedicated to specific goals related to the fine arts. It co-sponsored with the Andrew W. Mellon Foundation and Harvard University a workshop on Digital Imagery for Works of Art in November, 2001, co-chaired by Kevin Kiernan, University of Kentucky, Charles Rhyne, Reed College and Ron Spronk, Harvard University. The final report identified promising areas for collaboration and research. Another workshop, organized in collaboration with the Beazley Archive, University of Oxford, UK was held in 2006 to build a vision of next-generation classical art repositories. The goal was to achieve a common basis for cooperation between leading European cultural heritage projects in classical art and digital library specialists based in the USA. In so doing, it provided subject specialists, museum curators, and computer scientists with the ability to plan for a technical architecture that would support advanced visualization tools and techniques.

Spoken Word Digital Libraries
The Michigan State and Northwestern University project was the first to address access to very large corpora of spoken-word audio content. It explored difficult technical and intellectual problems in the delivery of high-quality voice materials online via the World Wide Web. While considerable efforts had been directed at search and retrieval of textual content, this project developed algorithms for searching using the acoustic data directly along with associated text. A remarkable effort led by one of the Principal Investigators, Jerry Goldman, was funded as a separate project beginning in 2002, under the name of the OYEZ Project. Oyez makes available more than 9000 hours of Supreme Court testimony – a complete record over a period of more than 50 years, synchronized to the sentence level. The Oyez archive continues to be cited by national and international groups as an outstanding achievement and exceptional contribution to a large multidisciplinary audience.

The same technologies had the secondary effect of increasing the user base of memory institutions. In many instances enhancement of primary analog resources provided critical contextual information to born-digital data. The work had a broad impact. By working together, computer and information technologists, archaeologists, cultural historians and curators of cultural heritage materials were able to add knowledge of the human record and learn to combine cutting edge information technologies with domain expertise in addressing complex tasks for the good of all.

As the DLI-2 projects continued their work, new interdisciplines emerged and became established: new forms of cultural heritage informatics, digital humanities, digital archaeology, music information analysis and retrieval, computational photography to name a few. Integration of new data types into domain research led to organizing new scholarly associations and new venues for presenting results.

Many of the DLI projects became showcase efforts and captured the interest of the national and world press. A few of these are noted in the image sidebar as taken from the DLI-2 Web Site Home Page.

Tools and Resources for New Users
Other projects looked to new ways to engage new groups of users. The University of Maryland project, Digital Libraries for Children, led by Allison Druin of the Human-Computer Interaction Lab (HCIL), examined how very young children access, use and explore digital resources and learning materials. Children became participants in the design group for the project’s digital library. By 2010, the project had evolved into the International Children’s Digital Library (ICDL) and won the American Library Association President’s Award for International Library Innovation. The ICDL was given the recognition as one of 25 Best Websites for Teaching and Learning by the American Association of School Librarians.

Yet others undertook research to build new tools for humanities scholars working with the printed page. This included “Edition Production Technologies” research for image-based electronic editions to help reconstruct folios from lost or damaged manuscripts and “Virtual Variorum Editions,” such as the Cervantes Project at Texas A&M University undertaken by Richard Furuta. This contained important editions in image and text as well as annotation of variances among different editions of Cervantes’ works.

The Electronic Cultural Atlas Initiative (ECAI) focused on gathering data and building tools for time-enabled GIS viewing of historical and cultural gazetteer data. An early objective was to enable historical, cultural and social data found in toponym-rich digital resources to be geo-referenced and visualized on a map interface. However, this project has continued to expand in both content richness and new tools and is currently exploring visualization of very large Chinese and Korean character data sets in immersive environments.

Silk Road Superimposed on NASA Image of Lights of Earth

Throughout the funding period of DLI-2, the initiative also contributed funds to other large scale efforts in order to support transitions or upgrades to capabilities. Among these were:

The Visible Human Project (National Institutes of Health)

The National Virtual Observatory (http://www.us-vo.org/)

The arXiv Open Access E-prints Repository (http://arxiv.org/

The Universal Digital Library Million Books Project (http://www.ulib.org/)

Survivors of the Shoah Visual History Foundation (http://dornsife.usc.edu/vhi/aboutus/)

Since that time, these projects have grown and expanded.

As in the first phase of the Initiative, All Projects Meetings were held at least once each year prior to merging with the Joint Digital Libraries Conference series. At these meetings, disciplinary affiliations and university and departmental badges were put aside and participants could watch and listen to presentations with the deep attention and appreciation characteristic of devoted scholars. As their colleagues presented their latest work demonstrating the power and scope of digital libraries research at the cutting-edge, there would be times when the audience would break into spontaneous applause. A meeting that exemplified the zeitgeist of the DLI program was the joint DLI and Coalition for Networked Information (CNI) meeting hosted by JISC at Stratford-upon-Avon in June 2000. This followed the first DLI-JISC collaborative program described below. Over a three-day period, all of the DLI-2 projects and a considerable number of JISC funded projects made presentations and demonstrated their achievements. It was a hugely successful gathering and noted in a special editorial in D-Lib Magazine.

DLI-2 International Collaborations
As the decade of the 1990s progressed, the internet was making national boundaries transparent and the faint outlines of a global knowledge environment were becoming visible as the internet spread and more organizations began to create and convert information into digital form and place these into open access institutional repositories. The Digital Libraries Initiative placed an exceptionally high value on international collaboration due to the rich possibilities both for interdisciplinary collaboration and the means to coordinate complementary activities with other international funding bodies.

The benefits of international collaborative work were clear and compelling. Coordinated international efforts could steer the development trajectories of distributed repository architectures, content representation, access frameworks and delivery services. In so doing, the program would help to ensure that the future would bring an international information environment that would be far more capable than simply a scaled-up version of the current one in which data of many types was increasingly abundant, but also uneven in description, representation and organizational schemes. Also, the environment was in a constant state of change, with the result that users were finding it difficult to locate, retrieve and put digital content to productive use. Coordinated implementation of technology frameworks and content development practices was necessary to achieve high degrees of collective functionality.

The Joint Information Systems Committee (JISC) of the United Kingdom was a natural first international partner for the Digital Libraries Initiative. Its activities and management approaches complemented those of the Digital Libraries Initiative sponsors. JISC funded not only research, but academic information infrastructure as well, and placed significant emphasis on project management and evaluation. JISC Secretary, Malcolm Read, O.B.E. was a recognized leader in strategies for building a global information environment. Norman Wiseman, JISC lead for international collaborative work, was exceptionally capable in designing joint programs that could meet multiple criteria imposed by different funding bodies. JISC had many seasoned project managers experienced with state-of-the-art practices.

The collaboration with JISC afforded new opportunities. The DLI projects were funded as primarily basic research grants and testbeds based on a broad set of review criteria. Funding was not provided to build fully operational systems at scale. DLI investigators had considerable license to explore new issues as they arose and could alter workplans to accommodate changes. The projects had considerable autonomy and were encouraged to be entrepreneurial. Oversight and evaluation was provided, but formal reporting requirements of the projects were very basic.

JISC-funded projects were expected to bring results that were more immediately beneficial and widely applicable. Project plans were more formal and progress was closely monitored. The projects are also expected to have exit strategies that leveraged JISC investment and institutional commitments. Significant financial and intellectual 'buy-in' by the institutions was viewed as vital to achieving success.

JISC Competition 1
The JISC/DLI partnership produced two coordinated call for proposals. The first in 1999 resulted in six projects, each receiving about $1 million over a three-year term. An article describing this joint program appeared in the June 1999 issue of D-Lib Magazine. Several received considerable publicity and had a lasting impact.

One project mentioned earlier was the Cornell/Southampton/Los Alamos “Integrating and Navigating ePrint Archives through Citation-Linking” project that was instrumental in launching the Open Archives Initiative and its associated activities, as well as developing digital repositories management software.

“The Online Music Recognition and Searching (OMRAS)” projects, a partnership between the University of Massachusetts and King’s College, made pioneering advances in music information retrieval as was noted above. It received extensive coverage in an article in The Economist and as noted earlier initiated the remarkably popular and successful annual meeting series, the ISMIR international symposiums.

Project Camileon, a collaboration between the University of Michigan and the University of Leeds, explored emulation options for digital preservation. Its work on accessing information from the BBC's Domesday Book project was widely praised.

JISC Competition 2
The enthusiastic positive response and success of the first collaborative program led to a second entitled “Digital Libraries In The Classroom: Testbeds For Transforming Teaching And Learning". The program’s broad objective was to explore new ways to bring about significant improvement in learning and teaching at the undergraduate level using state of the art digital and internet-based services, digital content of all forms and innovative approaches within particular topic domains.  This call resulted in four projects selected for funding, each receiving about $3.0M over a five-year term.

Deutsche Forschungsgemeinschaft (DFG)
The success of the collaboration with JISC prompted additional successful collaborations with the Deutsche Forschungsgemeinschaft (DFG) of Germany (six joint projects, Ewald Brahms and Sigrun Eckelmann Program Leads) and the Cultural Heritage Applications Unit of the European Commission (Bernard Smith, Head of Unit, Patricia Manson, Program Official) as well as individual organizations from several Asian and African countries. For each joint proposal, separate reviews were performed by each funding organization. The highest ranked proposals were then reviewed by a joint review panel. As a result, the awards were highly selective and meritorious. The scope of international projects was broad, highly interdisciplinary and engaged a large group of stakeholders. In addition to having research components, many of the projects also focused on issues of immediate interest to libraries, museums and archives.

Other International Collaborative Projects
The DLI international collaborative programs were immensely popular and resulted in highly leveraged projects with exceptionally broad participation and positive impact. A new Program Announcement, NSF02-085, offered support for international collaborations (although NSF could only provide funds to US partners). The response was overwhelming and received exciting proposals. Unfortunately the program was terminated after only one round of competition.

There were also a large number of technical workshops and international working groups that continued to define critical emerging research areas and suggest specific research areas for cooperation. Prominent among these were the NSF/EU working groups, co-sponsored by the DELOS Network of Excellence funded under the EC Fifth Framework Programme and led by Costantino Thanos from CNR-ISTI in Pisa, Italy. Each working group was co-chaired by a prominent member of the European and the US digital libraries research community. The groups met regularly over three years and produced authoritative reports on a variety of technical, social, cultural and legal aspects of digital libraries. The groups are noted below. Among the recommendations emerging from these extensive efforts were to significantly increase federal programs investment. There was also unanimous advice to correct shortcomings of the earlier DLI project models. Among the most ardently recommended was sustained, stable support for building large-scale operational systems allowing evaluation along numerous technical and social dimensions. An NSF- EC All Projects Meeting was held in Rome in March 2002.

Summaries of the reports were published in a special issue of the International Journal of Digital Libraries.

NSF - DELOS Network of Excellence Working Groups

•	Spoken-Word Digital Audio Collections: Co-leaders: Steve Renals (University of Sheffield) - Jerry Goldman (Northwestern University)

•	Digital Libraries Information Infrastructures: Co-leaders: Yannis Ioannidis (University of Athens) - David Maier (Oregon Health and Science University)

•	Personalization and Recommender Systems in Digital Libraries: Co-leaders: Alan Smeaton (Dublin City University) - Jamie Callan (Carnegie Mellon University)

•	ePhilology: Emerging Language Technologies and Rediscovery of the Past Co-leaders: Susan Hockey (University College London) - Gregory Crane (Tufts University)

•	Digital Imaging for Significant Cultural and Historical Materials: Co-leaders: Alberto del Bimbo (University of Florence) - Ching-chih Chen (Simmons College)

•	Digital Archiving and Preservation: Co-leaders: Seamus Ross (University of Glasgow) - Margaret Hedstrom (University of Michigan)

•	Actors in Digital Libraries Co-leaders: Jose Borbinha (National Library of Portugal) - John Kunze (University of California, San Francisko)

•	Evaluation of Digital Libraries: Testbeds, Measurements, and Metrics leaders: Ron Larsen (University of Pittsburgh) Christine Borgman (University of California, Los Angeles) Ingeborg T. Sølvberg (NTNU, Norway ), Laszlo Kovacs (Hungarian Academy of Sciences ) Norbert Fuhr ((University of Duisburg)

Initial Planning For Digital Libraries Initiative – Phase 3 (DLI-3)
A third digital libraries program was planned in response to recommendations and guidance offered in insightful reports such as the President’s Information Technology Advisory Committee Report “Digital Libraries: Universal Access to Human Knowledge”, the National Science Foundation Blue Ribbon Advisory Panel on Cyberinfrastructure’s report “Revolutionizing Science and Engineering through Cyberinfrastructure” and the report "Knowledge Lost in Information - Research Directions For Digital Libraries" from the Chatham Workshop on Digital Libraries Futures, funded by the DLI and organized by Ron Larsen and Howard Wactlar and held in June 2003. The report was sweeping and incisive and put forward a bold new agenda for digital libraries research and infrastructure, soundly building on prior work while clearly identifying new research needed to improve the semantic interoperability and functionality of distributed repositories containing diverse digital content. The report stated succinctly some of the most fundamental challenges arising from the “data deluge”:

“While major progress has been made in indexing, searching, streaming, analysis, summarization, and interpretation of multimedia data, the more that is accomplished exposes the more that remains to be done… Systems for information access, delivery, and presentation are in a continual state of catch-up as they scale to the ever-increasing generative capabilities of sensor networks and related information sources. Increasing demands are being placed on knowledge access, creation, use, and discovery across disciplines, and of content interpretation across linguistic, cultural, and geographic boundaries. The opportunities are unlimited, but they will remain only challenges unless a continued commitment by the National Science Foundation sustains and accelerates research into the most fundamental of our intellectual assets – information.”

NSF – Library of Congress Collaborative Program on Digital Archiving and Long-Term Preservation (DIGARCH)
In September of 2004, a Digital Archiving and Long-Term Preservation (DIGARCH) Program Solicitation (NSF 04-592) was released. This was a jointly sponsored effort between the Division of Information and Intelligent Systems (IIS) and the Office of Strategic Initiatives (OSI) of the Library of Congress. Although modest in terms of investment ($2.3M), the scope was broad and the focus was of great importance. The projects funded would focus on: a) digital repository models; b) tools, technologies and processes, and, c) organizational, economic and policy issues. A total of 25 projects were funded addressing these areas.

For years, many people believed that once converted into digital form, information would gain indefinite longevity and permanence. In fact the opposite was becoming increasingly evident – digital media degraded at a faster rate than that on which analog data was recorded. In addition, the amount of digital information was estimated to nearly double every year. By the close of 2010, the size of the “Digital Universe” was estimated to measure over one million petabytes and had surpassed the available capacity to store it.

Several workshops had been convened to establish a research agenda for digital preservation and archiving and had identified long-term preservation of digital materials as a critical national issue. In addition, the joint NSF-EU working group coordinated with the DELOS Network of Excellence to produce a set of recommendations and justifications for investment in a report entitled “Invest to Save: Report and Recommendation of the NSF-DELOS Working Group on Digital Archiving and Preservation”.

The Library of Congress Office of Strategic Initiatives (OSI) had already initiated an exceptionally informed and far-reaching program for preservation of born-digital content, the National Digital Information Infrastructure Preservation and Archiving Program (NDIIPP). The program was remarkably effective, due in large measure to the leadership of Laura Campbell, Associate Librarian, and senior managers of OSI and the efforts of its staff in organizing a network of partners with stakeholders representing interests across a broad spectrum of organizations. The implementation strategy has been recognized as a model for excellence and productivity.

The Library of Congress continues to be a leader in developing processes and community building for the preservation and stewardship of born digital content through its National Digital Stewardship Alliance program.

Digital Libraries Research Program Ends
In the Fall of 2003 planning for a new Digital Libraries Research Program had largely been completed, based on the findings and recommendations of the Chatham Workshop and in response to recommendations and guidance offered in reports including the PITAC report “Digital Libraries: Universal Access to Human Knowledge” and the January 2003 “Report of the National Science Foundation Blue Ribbon Advisory Panel on Cyberinfrastructure” as well as to the large number of technical workshops and international working groups which continued to define new and dynamic areas of interdisciplinary research. The Chatham Workshop had called for a substantial investment over a 5 year period in both research and infrastructure development. The NSF Budget Request to Congress also identified digital libraries research as an important area of investment.

However implementation of the new digital libraries research program was abruptly terminated. With new management at NSF came reprogramming of budgets and restructuring of the research divisions. Investment in digital libraries research and infrastructure was determined to be of a lower priority. A limited number of digital libraries related activities continued to be funded through the Information Technologies Research Program (ITR) program and through discretionary funds in some programs. However without a central, named, and separately budgeted digital libraries research program, the digital libraries research community lost a major cohesive component and an obvious program to which to submit a proposal.

The Digital Libraries Initiative proceeded through more than 10 years of funding projects and activities inspired and articulated by the broad community of researchers and practitioners the program was meant to serve. Projects funded through the DLI were instrumental in inspiring new areas of information technology research, creative project models and new methods for scholarly work. This period coincided with one in which unprecedented developments in computing and communications technologies created a globally distributed information environment that proved to continually gain value and functionality as digital content grew, became structured and linked and new access tools and applications were developed.

The period of time was also one in which altogether new social networks and organizations were formed as a consequence of new technological environments and diverse, cross-cultural communities found common ground and cause to pursue knowledge-making and management of digital resources. The ability to create, manipulate, manage, and share digital content is now taken for granted and has become a natural part of human activity in all nearly all aspects of daily life – from work to leisure. New and creative forms of expression can be found on the World Wide Web, which itself has become a focus of academic research.

Digital libraries research now has spread into many named subfields, some of which are identified in the programs of such larger conferences as the Joint Digital Libraries Conference (JCDL) and the Theory and Practice of Digital Libraries (TPDL). But there are others that now constitute specific research areas with strong affinity communities and their own professional associations, with regular conferences and meetings, such as the International Society for Music Information Retrieval conferences, the Open Repositories Conference, Digital Curation Conference, Digital Humanities Conference and many others. The Digital Libraries Initiative helped to bridge the analog and digital eras by funding projects that could allow researchers, scholars and practitioners to address key issues during this rapid period of change and in so doing ensure that the overall outcome would be positive. The projects demonstrated the ever-growing importance of digital information for progress in nearly all academic fields and demonstrated that transformational scholarship in science and in the humanities is often closely linked to the role of data and computation in inquiry.

There is still much to be done. Questions of how best to deal with the ever-increasing scale and complexity of information to make it most useful continue to be debated. Massive data stores are proliferating and being linked at the collection level. The number and complexity of the data objects they contain will only increase and information objects must become contextual information for others as well as primary objects to result in a larger integrated corpora. Each addition of a new information item to an internet repository not only grows the repository, but potentially increases the meaning and adds value to existing data objects in other connected repositories. The linked open data model shows promise for achieving this.

Open Access and new models of scholarly communication, a theme that was present in many of the DLI projects will continue to dramatically alter scholarship and the reporting and dissemination of results. The combination of open access repositories and massive amounts of actively formatted digital content dramatically increases the possibilities for transformative research by creating new means for combination, association, and establishment of relationships in the holdings of distributed repositories. Open access to scholarly publications accelerates activities and discovery across stages of the research lifecycle making the entire global enterprise more productive – and fair. Today, there are approximately 28,000 active peer-reviewed journals. There are about 8000 open access journals listed in the Directory of Open Access Journals. The combination of open access repositories and massive amounts of actively formatted digital content dramatically increases the possibilities for transformative research by creating new means for combination, association, and establishment of relationships in distributed knowledge repositories.

In the internet world digital information can become part of an interactive, interlinked global system of knowledge resources, computational systems, virtual laboratories, major instrumentation, etc. Information objects of all types and scale become highly operational resources, offering scholars who use them immediate access to other networked collections, data sets, software programs and tools, instruments, chains of cited document sources, and so on. This presages many changes. New organizational and individual practices will emerge and the accompanying issues are complex. New conceptualizations and vocabularies will need to be adopted to describe new circumstances and information environments having semantic capabilities. It is likely that new community-driven policies will need to be adopted. Perhaps the greatest challenges will be found in inspiring change in disciplinary and academic cultures and practices that emanate from the core of the academy. How traditional organizations, like universities and libraries will adapt and what new hybrid organizations will emerge are still largely unanswered questions.