User:Ilkka Nousiainen (EMÜ)/sandbox

DNA barcoding requires reference library, which requires agreement on DNA marker regions. These marker regions are used to compare unknown species to the reference library OR to add known species to the reference library.

Workflow of DNA barcoding (only points)

Identification & taxonomy - KATARINA

Sampling & preservation - JUDIT

Marker selection - MAŠA

DNA extraction, amplification (PCR) & sequencing - ILKKA

Reference libraries & bioinformatics – KALMAN

Methodology
DNA barcoding is a molecular method to identify unkown organisms based on a DNA sequence in its genome.

DNA barcoding consists of ... steps:

1. Sampling (and identification) of specimens

2. DNA extraction

3. DNA amplification with PCR

4. Analysis of PCR products

5. Sequencing of PCR products

6. Analysis of results

DNA extraction

Issues: contamination, PCR inhibitors, DNA quality/purity/concentration

PCR amplification

·     choosing marker gene (requirements)

·     choosing/designing primers (requirements)

·     Verification of amplicons (PCR product) on gel

Adding INDEX/TAG, KEY, ADAPTORS (beginning and end)

Sequencing

Bioinformatics analysis

Reference library submission

DNA extraction
In order to have a DNA sequence that can be compared with a DNA barcode reference library, first a DNA sample of the organism is needed. Using DNA extraction methods, pure DNA molecules (either fragments of DNA or whole genomes) can be obtained from the organisms.

Barcoding Metazoans; it all began with insects
DNA barcoding of animals is based on a relatively simple concept. All eukaryote cells contain mitochondria, and animal mitochondrial DNA (mtDNA) has a relatively fast mutation rate, resulting in the generation of diversity within and between populations over relatively short evolutionary timescales (thousands of generations). Typically, in animals, a single mtDNA genome is transmitted to offspring by each breeding female, and the genetic effective population size is proportional to the number of breeding females. This contrasts with the nuclear genome, which is around 100 000 times larger, where males and females each contribute two full genomes to the gene pool and effective size is therefore proportional to twice the total population size. This reduction in effective population size leads to more rapid sorting of mtDNA gene lineages within and among populations through time, due to variance in fecundity among individuals (the principle of coalescence). The combined effect of higher mutation rates and more rapid sorting of variation usually results in divergence of mtDNA sequences among species and a comparatively small variance within species.

In a follow-up paper to his initial 2003 paper, Hebert and different co-authors tested COI differences in congeneric species pairs (2,238 species) from 11 phyla of animals plus the four dominant orders of insects (Coleoptera, Diptera, Lepidoptera and Hymenoptera) as well as "other insects" and concluded that species level discrimination was satisfactory using the proposed COI gene region in all the groups studied with the exception of Cnidaria, which they ascribed to the exceptionally low rates of mitochondrial evolution in the latter group. Since, success has been found barcoding field and museum specimens alike, such as in the Zahiri et al. (2014) study of 1541 species of Canadian Noctuoidea (Lepidoptera). Genetic identification of aquatic insects, especially Ephemeroptera, Trichoptera, and Plecoptera, have been successful and are useful to distinguish subtleties among immature forms of each family as well as for to aid in bioassessment. Barcoding of insects and other organisms have significant potential as conservation, biodiversity, and broad environmental tools.

Exceptions, where mtDNA fails as a test of species identity, can occur through occasional recombination (direct evidence for recombination in mtDNA is available in some bivalves such as Mytilus but it is suspected that it may be more widespread) and through occurrences of hybridization. Male-killing microorganisms, cytoplasmic incompatibility-inducing symbionts (e.g., Wolbachia), as well as heteroplasmy, may affect patterns of mtDNA diversity within species, although these do not necessarily result in barcoding failure. Occasional horizontal gene transfer (such as via cellular symbionts), or other "reticulate" evolutionary phenomena in a lineage can lead to misleading results (i.e., it is possible for two different species to share mtDNA). In particular, mtDNA seems to be particularly prone to interspecific introgression probably due to difference between sexes in mate-choice and dispersal. Additionally, some species may carry divergent mtDNA lineages segregating within populations, often due to historical geographic structure, where these divergent lineages do not reflect species boundaries.

A 2017 study by Rach et al. on Odonates, specifically dragonflies (Anisoptera) and the damselflies (Zygoptera), a basal group of insects, found that the "standard" (Folmer) region of the COI gene was sub-optimal for species resolution in that group, and that a different portion of the same gene, which they termed COIB, showed higher success in discriminating sister taxa at different taxonomic levels. These authors therefore suggested that a layered barcode approach, i.e. adding additional markers to enhance the discrimination potential in metabarcoding studies where the taxonomic composition within the samples may not be known in advance.

In Cnidaria, where the COI gene has been found to be unsuitable on account of its slow rate of evolution in that group, more success has been reported using a combination of COI plus a short, adjacent intergenic region (igr1) plus a fragment of the octocoral‐specific mitochondrial protein‐coding gene, msh1 in octocorals, and the 16S mitochondrial ribosomal RNA gene in pelagic forms. In sponges, the other major non-Bilaterian animal group, congeneric species are difficult to amplify or separate with the standard COI barcoding fragment, and data compilation and study is presently focussed on the ribosomal RNA 28S C-Region.

Barcoding flowering plants
The use of the COI sequence is not appropriate in plants because of slower rate of cytochrome c oxidase I gene evolution in higher plants than in animals. A series of experiments was conducted to find a more suitable region of the genome for use in the DNA barcoding of flowering plants (or the larger group of land plants). Nuclear internal transcribed spacer region and the plastid trnH-psbA intergenic spacer; other researchers advocated other regions such as matK.

Two chloroplast genes, the combination of rbcL and matK have been proposed as a barcode for plants. Adding the nuclear internal transcribed spacer ITS2 region was proposed to provide better resolution between species. The chloroplast region ycf1 may be a more suitable gene.

Barcoding fungi
As noted above, the current, officially approved barcoding locus for fungi is the ITS region, chosen from a group of six candidates (SSU, LSU, ITS, RPB1, RPB2, MCM7) as the most broadly applicable across major fungal lineages. However, the ITS region has been noted as not working well in some highly speciose genera such as Aspergillus, Cladosporium, Fusarium, Penicillium and Trichoderma, since these taxa have narrow or no barcode gaps in their ITS regions; it may therefore be necessary to sequence one or more single-copy protein-coding genes as a secondary barcode marker for certain fungal genera and/or lineages in order to obtain the most precise identifications at the species level. Stielow et al. (2015) also discuss the applicability of a number of potential secondary fungal DNA barcodes including TEF1α, TOPI, PGK and LNS2 in particular groups.

Barcoding protists
The Protist Working Group (ProWG) of the Consortium for the Barcode of Life (CBOL) reported that for protists—a "convenience" group of mainly single-celled eukaryotes representing many diverse lineages presently characterized as a range of "supergroups"—a 2-stage strategy is recommended: first, a preliminary identification using a universal eukaryotic barcode, called the pre-barcode, proposed to be the ∼500 base pair variable V4 region of 18S rDNA, followed by a second, group-specific barcode yet to be fully defined, for which stated possibilities include 28S rDNA, ITS rDNA, 18S rDNA, COI, rbcL, SL RNA and perhaps more

Vouchered specimens
DNA sequence databases like GenBank contain many sequences that are not tied to vouchered specimens (for example, herbarium specimens, cultured cell lines, or sometimes images). This is problematic in the face of taxonomic issues such as whether several species should be split or combined, or whether past identifications were sound. Therefore, best practice for DNA barcoding is to sequence vouchered specimens

Software
Software for DNA barcoding requires integration of a field information management system (FIMS), laboratory information management system (LIMS), sequence analysis tools, workflow tracking to connect field data and laboratory data, database submission tools and pipeline automation for scaling up to eco-system scale projects. Geneious Pro can be used for the sequence analysis components, and the two plugins made freely available through the Moorea Biocode Project, the Biocode LIMS and Genbank Submission plugins handle integration with the FIMS, the LIMS, workflow tracking and database submission.

The Barcode of Life Data Systems (BOLD) is a web based workbench and database supporting the acquisition, storage, analysis, and publication of DNA barcode records. By assembling molecular, morphological, and distributional data, it bridges a traditional bioinformatics chasm. BOLD is the most prominently used barcoding software and is freely available to any researcher with interests in DNA barcoding. By providing specialized services, it aids the assembly of records that meet the standards needed to gain BARCODE designation in the global sequence databases. Because of its web-based delivery and flexible data security model, it is also well positioned to support projects that involve broad research alliances.<!-- Methodology Barcoding Metazoans; it all began with insects

DNA barcoding of animals is based on a relatively simple concept. All eukaryote cells contain mitochondria, and animal mitochondrial DNA (mtDNA) has a relatively fast mutation rate, resulting in the generation of diversity within and between populations over relatively short evolutionary timescales (thousands of generations). Typically, in animals, a single mtDNA genome is transmitted to offspring by each breeding female, and the genetic effective population size is proportional to the number of breeding females. This contrasts with the nuclear genome, which is around 100 000 times larger, where males and females each contribute two full genomes to the gene pool and effective size is therefore proportional to twice the total population size. This reduction in effective population size leads to more rapid sorting of mtDNA gene lineages within and among populations through time, due to variance in fecundity among individuals (the principle of coalescence). The combined effect of higher mutation rates and more rapid sorting of variation usually results in divergence of mtDNA sequences among species and a comparatively small variance within species.

In a follow-up paper to his initial 2003 paper, Hebert and different co-authors tested COI differences in congeneric species pairs (2,238 species) from 11 phyla of animals plus the four dominant orders of insects (Coleoptera, Diptera, Lepidoptera and Hymenoptera) as well as "other insects" and concluded that species level discrimination was satisfactory using the proposed COI gene region in all the groups studied with the exception of Cnidaria, which they ascribed to the exceptionally low rates of mitochondrial evolution in the latter group.[15] Since, success has been found barcoding field and museum specimens alike, such as in the Zahiri et al. (2014) study of 1541 species of Canadian Noctuoidea ([16]Lepidoptera). Genetic identification of aquatic insects, especially Ephemeroptera, Trichoptera, and Plecoptera, have been successful and are useful to distinguish subtleties among immature forms of each family as well as for to aid in bioassessment.[17] Barcoding of insects and other organisms have significant potential as conservation, biodiversity, and broad environmental tools.[18]

Exceptions, where mtDNA fails as a test of species identity, can occur through occasional recombination (direct evidence for recombination in mtDNA is available in some bivalves such as Mytilus[19] but it is suspected that it may be more widespread[20]) and through occurrences of hybridization.[21] Male-killing microorganisms,[22] cytoplasmic incompatibility-inducing symbionts (e.g., Wolbachia[22]), as well as heteroplasmy, may affect patterns of mtDNA diversity within species, although these do not necessarily result in barcoding failure. Occasional horizontal gene transfer (such as via cellular symbionts[23]), or other "reticulate" evolutionary phenomena in a lineage can lead to misleading results (i.e., it is possible for two different species to share mtDNA). In particular, mtDNA seems to be particularly prone to interspecific introgression [24] probably due to difference between sexes in mate-choice and dispersal. Additionally, some species may carry divergent mtDNA lineages segregating within populations, often due to historical geographic structure, where these divergent lineages do not reflect species boundaries.[25][26]

A 2017 study by Rach et al. on Odonates, specifically dragonflies (Anisoptera) and the damselflies (Zygoptera), a basal group of insects, found that the "standard" (Folmer) region of the COI gene was sub-optimal for species resolution in that group, and that a different portion of the same gene, which they termed COIB, showed higher success in discriminating sister taxa at different taxonomic levels.[27] These authors therefore suggested that a layered barcode approach, i.e. adding additional markers to enhance the discrimination potential in metabarcoding studies where the taxonomic composition within the samples may not be known in advance.

In Cnidaria, where the COI gene has been found to be unsuitable on account of its slow rate of evolution in that group, more success has been reported using a combination of COI plus a short, adjacent intergenic region (igr1) plus a fragment of the octocoral‐specific mitochondrial protein‐coding gene, msh1 in octocorals,[28] and the 16S mitochondrial ribosomal RNA gene in pelagic forms.[29] In sponges, the other major non-Bilaterian animal group, congeneric species are difficult to amplify or separate with the standard COI barcoding fragment, and data compilation and study is presently focussed on the ribosomal RNA 28S C-Region.[30] Barcoding flowering plants

The use of the COI sequence is not appropriate in plants because of slower rate of cytochrome c oxidase I gene evolution in higher plants than in animals.[7] A series of experiments was conducted to find a more suitable region of the genome for use in the DNA barcoding of flowering plants (or the larger group of land plants).[31] Nuclear internal transcribed spacer region and the plastid trnH-psbA intergenic spacer;[7] other researchers advocated other regions such as matK.[31]

Two chloroplast genes, the combination of rbcL and matK have been proposed as a barcode for plants.[32] Adding the nuclear internal transcribed spacer ITS2 region was proposed to provide better resolution between species.[33] The chloroplast region ycf1 may be a more suitable gene.[34] Barcoding fungi

As noted above, the current, officially approved barcoding locus for fungi is the ITS region, chosen from a group of six candidates (SSU, LSU, ITS, RPB1, RPB2, MCM7) as the most broadly applicable across major fungal lineages. However, the ITS region has been noted as not working well in some highly speciose genera such as Aspergillus, Cladosporium, Fusarium, Penicillium and Trichoderma, since these taxa have narrow or no barcode gaps in their ITS regions; it may therefore be necessary to sequence one or more single-copy protein-coding genes as a secondary barcode marker for certain fungal genera and/or lineages in order to obtain the most precise identifications at the species level.[35] Stielow et al. (2015) also discuss the applicability of a number of potential secondary fungal DNA barcodes including TEF1α, TOPI, PGK and LNS2 in particular groups.[36] Barcoding protists

The Protist Working Group (ProWG) of the Consortium for the Barcode of Life (CBOL) reported that for protists—a "convenience" group of mainly single-celled eukaryotes representing many diverse lineages presently characterized as a range of "supergroups"—a 2-stage strategy is recommended: first, a preliminary identification using a universal eukaryotic barcode, called the pre-barcode, proposed to be the ∼500 base pair variable V4 region of 18S rDNA, followed by a second, group-specific barcode yet to be fully defined, for which stated possibilities include 28S rDNA, ITS rDNA, 18S rDNA, COI, rbcL, SL RNA and perhaps more.[5]-->