Neotoma Paleoecology Database

The Neotoma Paleoecology Database (Neotoma) is an open international data resource that stores and shares multiple kinds of fossil, paleoecological, and paleoenvironmental data. Neotoma specializes in fossil data holdings at timescales covering the last several decades to the last several million years. Neotoma is organized and led by scientists and enhances data consistency through community curation by experts. Neotoma data are open to all and available to anyone with an internet connection.

Neotoma data are used by scientists and teachers (especially paleoecologists, biogeographers, and archaeologists) to study the responses of species and ecosystems to past environmental change and growing human activity. Paleoclimatologists use Neotoma data to help reconstruct past climates. Sample research questions addressed include: 1) How sensitive are ecosystems to past climate change. 2) Why were rates of tree range expansion so fast after the end of the last ice age, given that tree seed dispersal distances are usually so short (Reid's Paradox)? 3) Where and when did humans begin transforming ecosystems? 4) What were the causes and consequences of the widespread extinctions of large animals over the last 50,000 years? 5) Which ecosystems are characterized by abrupt change between alternate stable states and what triggers these abrupt changes? 6) How have freshwater resources and aquatic ecosystems been affected by human land use and activity over the last several decades?

Data types and data volume
The species and taxa stored in Neotoma represent a breadth of terrestrial and aquatic organisms: plants (pollen and larger fossils), mammals and other vertebrates, insects and other invertebrates, diatoms, ostracodes, and testate amoebae. Neotoma also stores the age estimates provided by radiometric dating (e.g. radiocarbon, lead-210) and the age estimates that are derived from statistical models of age as a function of depth in sediment column. The Neotoma data model is extensible to other types of paleoecological and paleoenvironmental variables.

Data volume in Neotoma is growing rapidly, as are the data holdings in other paleontological and contemporary databases. As of May 2020, Neotoma held 7 million individual observations from over 38,700 datasets, 18,600 sites, 7,000 scientific papers, 6,000 authors, and 100 countries [1 ]. For comparison, On Nov 8, 2017, Neotoma held 3.8 million observations, from 17,275 datasets and 9,269 sites.

History
The intellectual foundations of Neotoma trace back to efforts by early paleontologists and paleoecologists in the first half of the 20th century to assemble many individual records into larger mapped syntheses. As von Post wrote, paleoecologists must "think horizontally, work vertically," i.e. think across both time and space to understand the processes governing the ever-changing distribution of species, the associations among species, and the diversity of life.

These efforts accelerated in the 1970s and 1980s, when a number of scientific teams began assembling databases of fossil distributions to study the spatial distributions of species over space and time and the effects of past environmental variations on these distributions. These efforts were powered by advances in computing capabilities and the growing availability of radiocarbon and other radiometric dates to provide a common time framework for all fossil occurrences. Much of this work focused on environmental and ecological changes accompanying the glacial-interglacial cycles of the Quaternary. These databases were used both by paleoclimatologists to draw inferences about past climates that could be used to test the paleoclimatic simulations of earth system models, and by paleoecologists interested in how past community dynamics were driven by these environmental changes. For example, Margaret Davis demonstrated tree species experienced large range shifts with the climate changes at the end of the last ice age and that species responded individualistically. As a result, many past communities were 'no analog,' i.e. their mixtures of species lack any close counterpart in modern communities. Some records and Constituent Databases in Neotoma extend deeper into the Cenozoic.

In parallel, other research teams were gathering fossil records from high-resolution sediment archives spanning the last few decades to centuries to study the effects of human activities upon communities and ecosystems. Examples include the effects of acid rain on ecosystems in the 1980s, or the eutrophication of many lake ecosystems due to increasing nutrient runoff into lakes and streams.

Many of these initial data-gathering efforts were led by individual pioneers (e.g. Margaret Davis, Tom Webb, Russ Graham, Bjorn Berglund, Jacques-Louis Beaulieu) or by small research teams. As these efforts have matured and as the amount of data has grown, the volume and complexity of paleoecological data is now beyond the capacity of any single individual expert to manage or curate. At the same time, many smaller paleontological and paleoecological databases have been unable to keep up with current advances in informatics, or have gone offline as funding lapsed or lead investigators retired or moved on.

Hence, the fields of paleoecology and paleontology have developed data governance models based on community curation, in which data resources like Neotoma are managed by communities of scientists working together to curate and share their data. Neotoma follows a model of centralized informatics but distributed scientific governance, and is best viewed as a coalition of Constituent Databases that share a common set of database and software resources, while retaining separate rights to govern and curate the data in their Data Stewards' domains of expertise. For example, the European Pollen Database uses the Neotoma data model and software services, but is governed by its own board and community of expert data stewards.

Neotoma works closely with the Paleobiology Database, which has a similar intellectual history, but has focused on the entire history of life, at timescales of millions to hundreds of millions of years. Together, Neotoma and the Paleobiology Database have helped launch the EarthLife Consortium, a non-profit umbrella organization to support the easy and free sharing of paleoecological and paleobiological data.

Data curation and governance
Neotoma employs a model of distributed data curation and governance. In this model, Neotoma data are curated and governed by a community of Data Stewards, organized into Constituent Databases. These Constituent Databases can be organized by region, time, or taxonomic group. For example, FAUNMAP is a Constituent Database in Neotoma that manages Quaternary fossil vertebrate records in North America, while MioMap primarily emphasizes Miocene vertebrate records. For pollen data, Constituent Databases are organized geographically and include the European Pollen Database, the North American Pollen Database, and the Latin American Pollen Database. Other major Constituent Databases include the Testate Amoebae Database, the International Ostracode Database, and the Diatom Paleoecology Data Cooperative. All data in Neotoma are uploaded and curated by Data Stewards associated with one or more Constituent Databases. This model of distributed community curation is essential to ensuring data quality and consistency.

Neotoma is led by a Neotoma Leadership Council (NLC) comprising 14 elected councilors, of which 2 seats are reserved for early career scientists (Bylaws). Elections are held annually, with roughly one-third of the NLC elected each cycle.

Neotoma is a recommended data facility for the Earth Sciences Division of the National Science Foundation, Past Global Changes, and the American Quaternary Association. Neotoma is a member of the ICSU World Data System and is registered with COPDESS registry for scientific data sources adhering to FAIR (Findable, Accessible, Interoperable, Reproducible) principles. Neotoma has been supported by multiple sources, including the National Science Foundation and the Belmont Forum.

Data use and access
Use of data in Neotoma is governed by a Creative Commons NC-BY license, which permits unrestricted use as long as data sources are properly acknowledged and cited (Neotoma Data Use Policy). Proper full citation of data in Neotoma occurs at three levels: Neotoma itself, the governing Constituent Database(s), and the original authors.

Data can be retrieved from Neotoma in several ways. Neotoma Explorer is a map-based interface designed for quick-look searches and first-pass data explorations. Explorer is well suited for researchers interested in quick-look searches and data views and for explorations by high school and college-level teachers and students. Teaching exercises using Neotoma Explorer have been prepared and hosted by the Science and Education Research Center (SERC) at Carleton College. An R package (neotoma) supports exporting of data from Neotoma into the R programmatic environment. Application Programmatic Interfaces (APIs) support access to Neotoma data by third-party software developers. Resources using Neotoma data include the Flyover Country app for travelers and the Global Pollen Project.