Algae DNA barcoding

DNA barcoding of algae is commonly used for species identification and phylogenetic studies. Algae form a phylogenetically heterogeneous group, meaning that the application of a single universal barcode/marker for species delimitation is unfeasible, thus different markers/barcodes are applied for this aim in different algal groups.

Diatoms
Diatom DNA barcoding is a method for taxonomical identification of diatoms even to species level. It is conducted using DNA or RNA followed by amplification and sequencing of specific, conserved regions in the diatom genome followed by taxonomic assignment.

One of the main challenges of identifying diatoms is that it is often collected as a mixture of diatoms from several species. DNA metabarcoding is the process of identifying the individual species from a mixed sample of environmental DNA (also called eDNA) which is DNA extracted straight from the environment such as in soil or water samples.



A newly applied method is diatom DNA metabarcoding which is used for ecological quality assessment of rivers and streams because of the specific response of diatoms to particular ecologic conditions. As species identification via morphology is relatively difficult and requires a lot of time and expertise, high-throughput sequencing (HTS) DNA metabarcoding enables taxonomic assignment and therefore identification for the complete sample regarding the group specific primers chosen for the previous DNA amplification.

Until now, several DNA markers have already been developed, mainly targeting the 18S rRNA. Using the V4 hypervariable region of the ribosomal small subunit DNA (SSU rDNA), DNA-based identification was found to be more efficient then the classical morphology based approach. Other conserved regions in the genomes which are frequently used as marker genes are ribulose-1-5-bisphosphate carboxylase (rbcL), cytochrome oxidase I (cox1, COI), ITS and 28S. It has been shown repeatedly that the molecular data gained by diatom eDNA metabarcoding quite faithfully reflect the morphology-based biotic diatom indices and therefore provide a similar assessment of ecosystem status. In the meantime, diatoms are routinely used for the assessment of ecological quality in other freshwater ecosystems. Together with aquatic invertebrates they are considered as the best indicators of disturbance related to physical, chemical or biological conditions of watercourses. Numerous studies are using benthic diatoms for biomonitoring. Because no ideal diatom DNA barcode was found, it has been proposed that different markers are used for different purposes. Indeed, the highly variable cox1, ITS and 28S genes were considered more suitable for taxonomic studies, while more conserved 18S and rbcL genes seem more appropriate for biomonitoring.

Advantages
Applying the DNA barcoding concept to diatoms promises great potential to resolve the problem of inaccurate species identification and thus facilitate analyses of the biodiversity of environmental samples.

Molecular methods based on the NGS technology almost always leads to a higher number of identified taxa whose presence could subsequently be verified by light microscopy. Results of this study provides evidence that eDNA barcoding of diatoms is suitable for water quality assessment and could complement or improve traditional methods. Stoeck et al. also showed that eDNA barcoding provides a more insight into diatom diversity or other protist communities and therefore could be used for ecological projection of global diversity. Other studies showed different results. For example, inventories obtained from the molecular-based method were closer to those obtained by the morphology-based method when abundant species are in focus.

DNA metabarcoding can also increase the taxonomic resolution and comparability across geographic regions, which is often difficult using morphological characters only. Moreover, DNA-based identification allows extending the range of potential bioindicators, including the inconspicuous taxonomic groups that could be highly sensitive or tolerant to particular stressors. Indirectly, the molecular methods can also help filling the gaps in knowledge of species ecology, by increasing the number of samples processed coupled with a decrease in processing time (cost-effectiveness), as well as by increasing the accuracy and precision of correlation between species/MOTUs occurrence and environmental factors.

Challenges


Currently there is no consensus concerning methods for DNA preservation and isolation, the choice of DNA barcodes and PCR primers, nor agreement concerning the parameters of MOTU clustering and their taxonomic assignment. Sampling and molecular steps need to be standardize through development studies. One of the major limitation is the availability of reference barcodes for diatoms species. The reference database of bioindicator taxa is far from complete despite the constant efforts of numerous national barcoding initiatives a lot of species are still lacking barcode information. Furthermore, most existing metabarcoding data are only locally available and geographically scattered, which is hindering the development of globally useful tools. Visco et al. estimated that no more than 30% of European diatoms species are currently represented in reference databases. For example, there is an important lack for a number of species from the Fennoscandian communities (especially acidophilic diatoms, such as Eunotia incisa). It has also been shown that taxonomic identification with DNA barcoding is not accurate above species level, to discriminate varieties for example (reference missing).

Another well-known limitation of barcoding for taxonomic identification is the clustering method used before the taxonomic assignation: It often leads to massive loss of genetic information and the only reliable way to assess the effects of different clustering and different taxonomic assignation processes would be to compare the species list generated by different pipelines when using the same reference database. This has yet to be done for the variety of pipelines used in molecular assessment of diatom communities in Europe. Taxonomically validated databases, which includes accessible vouchers are also crucial for reliable taxa identification via NGS.

Additionally, primer bias is often found to be a major source of variation in barcoding and PCR primers efficiency can differ between diatoms species, i.e. some primers lead to a preferential amplification of one taxon over another.

The inference of abundance from metabarcoding data is considered as one of the most difficult issues in environmental use. The number of generated sequences by HTS does not directly correspond to the number of specimen or biomass and that different species can produce different amount of reads, (for example, due to differences in the chloroplast size with the rbcL marker). Vasselon et al. recently created a biovolume correction factor when using the rbcL marker. For example, Achnanthidium minutissimum has a small biovolume, and thus will generate less copies of the rbcL fragment (located in the chloroplast) than larger species. This correction factor, however, requires extensive calibration with each species own biovolume and has been tested only on a few species that far. Fluctuations of gene copy number for other markers, such as the 18S marker, does not seem to be species specific, but have not been tested yet.

Diatom target regions
Barcoding marker usually combine hypervariable regions of the genome (to allow the distinction between species) with very conserved region (to insure a specificity to the target organism). Several DNA markers, belonging to the nuclear, mitochondrial, and chloroplast genomes (rbcL, COI, ITS+5.8S, SSU, 18S...), have been designed and successfully used for diatoms identification with NGS.

18S and V4 subunit
The 18S gene region has been widely used as a marker in other protist groups and Jahn et al. were the first to test the 18S gene region for diatoms barcoding. Zimmerman et al. proposed a 390–410 bp long fragment of the 1800 bp long 18S rRNA gene locus as a barcode marker for the analysis of environmental samples with HTS. and discusses its use and limitations for diatom identification. This fragment includes the V4 subunit which is the largest and most complex of the highly variable regions within the 18S locus. They  highlighted  that this hypervariable region of the 18S gene have great potential for studying protist diversity at large scale but has limited efficiency to identification below species level or cryptic species.

rbcL
The rbcl gene is used for taxonomy studies (Trobajo et al. 2009) which  benefits include that rarely any intragenomic variation and they are very easily aligned and compared. An open-access reference library, called R-Syst::diatom includes data for two barcodes (18S and rbcL). It is freely accessible through a website. Kermmarec et al. also successfully used the rbcL gene for ecological assessment of diatoms. The rbcL marker is also easily aligned and compared.

Moniz and Kaczmarska investigated the amplification success of the SSU, COI, and ITS2 markers and found that the 300 – 400 bp ITS-2 + 5.8S fragment provided the highest success rate of amplification and good species resolution. This marker was subsequently used to separate morphologically defined species with a success rate of 99.5%. Despite this amplification success, Zimmerman et al. criticised the use of ITS-2 due to intra-individual heterogeneity. It has been suggested that SSU or the rbcL (Mann et al., 2010) markers less heterogenous between individuals and therefore more beneficial when distinguishing between species.

Genetic tool for biomonitoring and bioassessment
Diatoms are routinely used as part of a suite of biomonitoring tools which must be monitored as part of the European Water Framework Directive. Diatoms are used as an indicator of ecosystem health in freshwaters because they are ubiquitous, directly affected by the changes in physico-chemical parameters and show a better relationship with environmental variables than other taxa e.g. invertebrates, giving a better overall picture of water quality.



Over the recent years, researchers have developed and standardised the tools for the metabarcoding and sequencing of diatoms, to complement the traditional assessment using microscopy, opening up a new avenue of biomonitoring for aquatic systems. Using benthic diatoms through a method of next-generation sequencing approach to river biomonitoring revealed a good potential in it. Many studies have shown that metabarcoding and HTS (high-throughput sequencing) can be utilized to estimate the quality status and diversity in freshwaters. As part of the Environment Agency, Kelly et al. has developed a DNA-based metabarcoding approach to assess diatom communities in rivers for the UK. Vasselon et al. compared morphological and HTS approaches for diatoms and found that HTS gave a reliable indication of quality status for most rivers in terms of Specific Polluosensitivity Index (SPI). Vasselon et al. also applied DNA metabarcoding of diatoms communities to the monitoring network of rivers on the tropical Island Mayotte (French DOM-TOM).

Rimet et al. also explored the possibility of using HTS for assessing diatom diversity and showed that diversity indices from both HTS and microscopic analysis were well correlated although not perfect.

DNA barcoding and metabarcoding can be used to establish molecular metrics and indices, which potentially provide conclusions broadly similar to those of the traditional approaches about the ecological and environmental status of aquatic ecosystems.

Forensics
Diatoms are used to as a diagnosis tool for drowning in forensic practices. The diatom test is based on the principle of diatom inhalation from water into the lungs and distribution and deposition around the body. DNA methods can be used to confirm if the cause of death was indeed drowning and locate the origin of drowning. Diatom DNA metabarcoding, provides the opportunity to quickly analyse the diatom community present within a body and locate the origin of drowning and investigate if a body may have been moved from one place to another.

Cryptic species and databasing
Diatom metabarcoding may help delimit cryptic species that are difficult to identify using microscopy and help complete reference databases by comparing morphological assemblages to metabarcoding data.  

Other Microalgae
Chlorophytes possess an ancients and taxonomically very diverse lineage (Fang et al. 2014), including terrestrial plants too. Even though more than 14 000 species have been described based on structural and ultrastructural criteria (Hall et al. 2010) their morphological identification is often limited.

Several barcodes for chlorophytes have been proposed for DNA-based identification in order to bypass the problematics of the morphological one. Although the cytochrome oxidase I (COI, COX) coding gene (link) is a standard barcode for animals it proved to be unsatisfactory for chlorophytes because the gene contains several introns in this algae group (Turmel et al. 2002). Nuclear marker genes have been used for chlorophytes are SSU rDNA, LSU rDNA, rDNA ITS (Leliaert et al. 2014).

Macroalgae
Macroalgae—a morphological rather than taxonomic grouping—can be very challenging to identify because of their simple morphology, phenotypic plasticity and alternate lifecycle stages. Thus, algal systematics and identification have come to rely heavily on genetic/molecular tools such as DNA barcoding. The SSU rDNA gene is a common used barcode for phylogenetic studies on macroalgae. However, the SSU rDNA is a highly conserved region and typically lack resolution for species identification.

Over the past 2 decades certain standards for DNA barcoding with the aim of species identification have been developed for each of the main groups of macroalgae. The cytochrome c oxidase subunit I (COI) gene is commonly used as a barcode for red and brown algae, while tufA (plastid elongation factor), rbcL (rubisco large subunit) and ITS (internal transcribe spacer) are commonly used for green algae. These barcodes are typically 600-700 bp long.

The barcodes typically differ between the 3 main groups of macroalgae (red, green and brown) because their evolutionary heritage is very diverse. Macroalgae is a polyphyletic group, meaning that within the group they do not all share a recent common ancestor, making it challenging to find a gene that is conserved among all but variable enough for species identification.

Target regions
Adapted from