Taxonomic database

A taxonomic database is a database created to hold information on biological taxa – for example groups of organisms organized by species name or other taxonomic identifier – for efficient data management and information retrieval. Taxonomic databases are routinely used for the automated construction of biological checklists such as floras and faunas, both for print publication and online; to underpin the operation of web-based species information systems; as a part of biological collection management (for example in museums and herbaria); as well as providing, in some cases, the taxon management component of broader science or biology information systems. They are also a fundamental contribution to the discipline of biodiversity informatics.

Goals
Taxonomic databases digitize scientific biodiversity data and provide access to taxonomic data for research. Taxonomic databases vary in breadth of the groups of taxa and geographical space they seek to include, for example: beetles in a defined region, mammals globally, or all described taxa in the tree of life. A taxonomic database may incorporate organism identifiers (scientific name, author, and – for zoological taxa – year of original publication), synonyms, taxonomic opinions, literature sources or citations, illustrations or photographs, and biological attributes for each taxon (such as geographic distribution, ecology, descriptive information, threatened or vulnerable status, etc.). Some databases, such as the Global Biodiversity Information Facility(GBIF) database and the Barcode of Life Data System, store the DNA barcode of a taxon if one exists (also called the Barcode Index Number (BIN) which may be assigned, for example, by the International Barcode of Life project (iBOL) or UNITE, a database for fungal DNA barcoding).

A taxonomic database aims to accurately model the characteristics of interest that are relevant to the organisms which are in scope for the intended coverage and usage of the system. For example, databases of fungi, algae, bryophytes and vascular plants ("higher plants") encode conventions from the International Code of Botanical Nomenclature while their counterparts for animals and most protists encode equivalent rules from the International Code of Zoological Nomenclature. Modelling the relevant taxonomic hierarchy for any taxon is a natural fit with the relational model employed in almost all database systems. Scientific consensus is not reached for all taxon groups, and new species continue to be described; therefore, another goal of taxonomic databases is to aid in resolving conflicts of scientific opinion and unify taxonomy.

History
Possibly the earliest documented management of taxonomic information in computerised form comprised the taxonomic coding system developed by Richard Swartz et al. at the Virginia Institute of Marine Science for the Biota of Chesapeake Bay and described in a published report in 1972. This work led directly or indirectly to other projects with greater profile including the NODC Taxonomic Code system which went through 8 versions before being discontinued in 1996, to be subsumed and transformed into the still current Integrated Taxonomic Information System (ITIS). A number of other taxonomic databases specializing in particular groups of organisms that appeared in the 1970s through to the present jointly contribute to the Species 2000 project, which since 2001 has been partnering with ITIS to produce a combined product, the Catalogue of Life. While the Catalogue of Life currently concentrates on assembling basic name information as a global species checklist, numerous other taxonomic database projects such as Fauna Europaea, the Australian Faunal Directory, and more supply rich ancillary information including descriptions, illustrations, maps, and more. Many taxonomic database projects are currently listed at the TDWG "Biodiversity Information Projects of the World" site.

Issues
The representation of taxonomic information in machine-encodable form raises a number of issues not encountered in other domains, such as variant ways to cite the same species or other taxon name, the same name used for multiple taxa (homonyms), multiple non-current names for the same taxon (synonyms), changes in name and taxon concept definition through time, and more. Non-standardized categories and metadata in taxonomic databases hampers the ability for researchers to analyze the data. One forum that has promoted discussion and possible solutions to these and related problems since 1985 is the Biodiversity Information Standards (TDWG), originally called the Taxonomic Database Working Group.

While online databases have great benefits (for example, increased access to taxonomic information), they also have issues such as data integrity risks due to on- and off-line versions and continuous updates, technical access issues due to server or internet outage, and differing capacities for complex queries to extract taxonomic data into lists. As the quantity of information in online taxonomic databases rapidly expands, data aggregation, and the integration and alignment of non-standardized data across databases, is a big challenge in taxonomy and biodiversity informatics.