Microbial DNA barcoding



Microbial DNA barcoding is the use of DNA metabarcoding to characterize a mixture of microorganisms. DNA metabarcoding is a method of DNA barcoding that uses universal genetic markers to identify DNA of a mixture of organisms.

History
Using metabarcoding to assess microbial communities has a long history. Back in 1972, Carl Woese, Mitchell Sogin and Stephen Sogin first tried to detect several families within bacteria using the 5S rRNA gene. Only a few years later, a new tree of life with three domains was proposed by again Woese and colleagues, who were the first to use the small subunit of the ribosomal RNA (SSU rRNA) gene to distinguish between bacteria, archaea and eukaryotes. Out of this approach, the SSU rRNA gene made its way to be the most frequently used genetic marker for both prokaryotes (16S rRNA) and eukaryotes (18S rRNA). The tedious process of cloning those DNA fragments for sequencing got fastened up by the steady improvement of sequencing technologies. With the development of HTS (High-Throughput-Sequencing) in the early 2000s and the ability to deal with this massive data using modern bioinformatics and cluster algorithms, investigating microbial life got much easier.

Genetic markers
Genetic diversity is varying from species to species. Therefore, it is possible to identify distinct species by the recovery of a short DNA sequence from a standard part of the genome. This short sequence is defined as barcode sequence. Requirements for a specific part of the genome to serve as barcode should be a high variation between two different species, but not much differences in the gene between two individuals of the same species to make differentiating individual species easier. For both bacteria and archaea the 16S rRNA/rDNA gene is used. It is a common housekeeping gene in all prokaryotic organisms and therefore is used as a standard barcode to assess prokaryotic diversity. For protists, the corresponding 18S rRNA/rDNA gene is used. To distinguish different species of fungi, the ITS (Internal Transcribed Spacer) region of the ribosomal cistron is used.

Advantages
The existing diversity of the microbial world is not unraveled completely yet, although we know that it is mainly composed by bacteria, fungi and unicellular eukaryotes. Taxonomic identification of microbial eukaryotes requires exceedingly skillful expertise and is often difficult due to small sizes of the organisms, fragmented individuals, hidden diversity and cryptic species. Further, prokaryotes can simply not be taxonomically assigned using traditional methods like microscopy, because they are too small and morphologically indistinguishable. Therefore, via the use of DNA metabarcoding, it is possible to identify organisms without taxonomic expertise by matching short High Throughput Sequences (HTS)-derived gene fragments to a reference sequence database, e.g. NCBI. These mentioned qualities make DNA barcoding a cost-effective, reliable and less time-consuming method, compared to the traditional ones, to meet the increasing need for large-scale environmental assessments.

Applications
A lot of studies followed the first usage of Woese et al., and are now covering a variety of applications. Not only in biological or ecological research metabarcoding is used. Also in medicine and human biology bacterial barcodes are used, e.g. to investigate the microbiome and bacterial colonization of the human gut in normal and obese twins or comparison studies of newborn, child and adult gut bacteria composition. Additionally, barcoding plays a major role in biomonitoring of e.g. rivers and streams and grassland restoration. Conservation parasitology, environmental parasitology and paleoparasitology rely on barcoding as a useful tool in disease investigating and management, too.

Cyanobacteria
Cyanobacteria are a group of photosynthetic prokaryotes. Similar as in other prokaryotes, taxonomy of cyanobacteria using DNA sequences is mostly based on similarity within the 16S ribosomal gene. Thus, the most common barcode used for identification of cyanobacteria is 16S rDNA marker. While it is difficult to define species within prokaryotic organisms, 16S marker can be used for determining individual operational taxonomic units (OTUs). In some cases, these OTUs can also be linked to traditionally defined species and can therefore be considered a reliable representation of the evolutionary relationships. However, when analyzing a taxonomic structure or biodiversity of a whole cyanobacterial community (see DNA metabarcoding), it is more informative to use markers specific for cyanobacteria. Universal 16S bacterial primers have been used successfully to isolate cyanobacterial rDNA from environmental samples, but they also recover many bacterial sequences. The use of cyanobacteria-specific or phyto-specific 16S markers is commonly used for focusing on cyanobacteria only. A few sets of such primers have been tested for barcoding or metabarcoding of environmental samples and gave good results, screening out majority of non-photosynthetic or non-cyanobacterial organisms.

Number of sequenced cyanobacterial genomes available in databases is increasing. Besides 16S marker, phylogenetic studies could therefore include also more variable sequences, such as sequences of protein-coding genes (gyrB, rpoC, rpoD, rbcL, hetR, psbA, rnpB, nifH, nifD ), internal transcribed spacer of ribosomal RNA genes (16S-23S rRNA-ITS) or phycocyanin intergenic spacer (PC-IGS). However, nifD and nifH can only be used for identification of nitrogen-fixing cyanobacterial strains.

DNA barcoding of cyanobacteria can be applied in various ecological, evolutionary and taxonomical studies. Some examples include assessment of cyanobacterial diversity and community structure, identification of harmful cyanobacteria in ecologically and economically important waterbodies and assessment of cyanobacterial symbionts in marine invertebrates. It has a potential to serve as a part of routine monitoring programs for occurrence of cyanobacteria, as well as early detection of potentially toxic species in waterbodies. This might help us detect harmful species before they start to form blooms and thus improve our water management strategies. Species identification based on environmental DNA could be particularly useful for cyanobacteria, as traditional identification using microscopy is challenging. Their morphological characteristics which are the basis for species delimitation vary in different growth conditions. Identification under microscope is also time-consuming and therefore relatively costly. Molecular methods can detect much lower concentration of cyanobacterial cells in the sample than traditional identification methods.

Reference databases
The reference database is a collection of DNA sequences, which are assigned to either a species or a function. It can be used to link molecular obtained sequences of an organism to pre-existing taxonomy. General databases like the NCBI platform include all kind of sequences, either whole genomes or specific marker genes of all organisms. There are also different platforms where only sequences from a distinct group of organisms are stored, e.g. UNITE database exclusively for fungi sequences or the PR2 database solely for protist ribosomal sequences. Some databases are curated, which allows a taxonomic assignment with higher accuracy than using uncurated databases as a reference.