Microbial phylogenetics

Microbial phylogenetics is the study of the manner in which various groups of microorganisms are genetically related. This helps to trace their evolution. To study these relationships biologists rely on comparative genomics, as physiology and comparative anatomy are not possible methods.

1960s–1970s
Microbial phylogenetics emerged as a field of study in the 1960s, scientists started to create genealogical trees based on differences in the order of amino acids of proteins and nucleotides of genes instead of using comparative anatomy and physiology.

One of the most important figures in the early stage of this field is Carl Woese, who in his researches, focused on Bacteria, looking at RNA instead of proteins. More specifically, he decided to compare the small subunit ribosomal RNA (16rRNA) oligonucleotides. Matching oligonucleotides in different bacteria could be compared to one another to determine how closely the organisms were related. In 1977, after collecting and comparing 16s rRNA fragments for almost 200 species of bacteria, Woese and his team in 1977 concluded that Archaebacteria were not part of Bacteria but completely independent organisms.

1980s–1990s
In the 1980s microbial phylogenetics went into its golden age, as the techniques for sequencing RNA and DNA improved greatly. For example, comparison of the nucleotide sequences of whole genes was facilitated by the development of the means to clone DNA, making possible to create many copies of sequences from minute samples. Of incredible impact for the microbial phylogenetics was the invention of the polymerase chain reaction (PCR). All these new techniques led to the formal proposal of the three domains of life: Bacteria, Archaea (Woese himself proposed this name to replace the old nomination of Archaebacteria), and Eukarya, arguably one of the key passage in the history of taxonomy.

One of the intrinsic problems of studying microbial organisms was the dependence of the studies from pure culture in a laboratory. Biologists tried to overcome this limitation by sequencing rRNA genes obtained from DNA isolated directly from the environment. This technique made possible to fully appreciate that bacteria, not only to have the greatest diversity but to constitute the greatest biomass on earth.

In the late 1990s sequencing of genomes from various microbial organisms started and by 2005, 260 complete genomes had been sequenced resulting in the classification of 33 eucaryotes, 206 eubacteria, and 21 archeons.

2000s
In the early 2000s, scientists started creating phylogenetic trees based not on rRNA, but on other genes with different function (for example the gene for the enzyme RNA polymerase ). The resulting genealogies differed greatly from the ones based on the rRNA. These gene histories were so different between them that the only hypothesis that could explain these divergences was a major influence of horizontal gene transfer (HGT), a mechanism which permits a bacterium to acquire one or more genes from a completely unrelated organism. HGT explains why similarities and differences in some genes have to be carefully studied before being used as a measure of genealogical relationship for microbial organisms.

Studies aimed at understanding the widespread of HGT suggested that the ease with which genes are transferred among bacteria made impossible to apply ‘the biological species concept’ for them.

Phylogenetic representation
Since Darwin, every phylogeny for every organism has been represented in the form of a tree. Nonetheless, due to the great role that HGT plays for microbes some evolutionary microbiologists suggested abandoning this classical view in favor of a representation of genealogies more closely resembling a web, also known as network. However, there are some issues with this network representation, such as the inability to precisely establish the donor organism for a HGT event and the difficulty to determine the correct path across organisms when multiple HGT events happened. Therefore, there is not still a consensus between biologists on which representation is a better fit for the microbial world.

Methods for Microbial Phylogenetic Analysis
Most microbial taxa have never been cultivated or experimentally characterized. Utilizing taxonomy and phylogeny are essential tools for organizing the diversity of life. Collecting gene sequences, aligning such sequences based on homologies and thus using models of mutation to infer evolutionary history are common methods to estimate microbial phylogenies. Small subunit (SSU) rRNA (SSU rRNA) have revolutionized microbial classification since the 1970s and has since become the most sequenced gene . Phylogenetic inferences are determined based on the genes chosen, for example, 16S rRNA gene is commonly selected to investigate inferences in Bacteria and Archaea, and microbial eukaryotes most commonly use the 18S RNA gene.

Phylogenetic comparative methods
Phylogenetic comparative methods (PCMs) are commonly utilized to compare multiple traits across organisms. Within the scope of microbiome studies, it is not common for the use of PCMs, however, recent studies have been successful in identifying genes associated with colonization of human gut. This challenge was addressed through measuring the statistical association between a species that harbors the gene and the probability the species is present in the gut microbiome. The analyses showcase the combination of shotgun metagenomics paired with phylogenetically aware models.

Ancestral state reconstruction
This method is commonly used for estimation of genetic and metabolic profiles of extant communities using a set of reference genomes, commonly performed with PICRUSt (Phylogenetic Investigation of Communities by Reconstructing of Unobserved States) in microbiome studies. PICRUSt is a computational approach capable of prediction functional composition of a metagenome with marker data and a database of reference genomes. To predict which gene families are present, PICRUSt uses extended ancestral-state reconstruction algorithm and then combines the gene families to estimate composite metagenome.

Analysis of phylogenetic variables and distances
Phylogenetic variables are used to describe variables that are constructed using features in the phylogeny to summarize and contrast data of species in the phylogenetic tree. Microbiome datasets can be simplifies using phylogenetic variables by reducing the dimensions of the data to a few variables carrying biological information. Recent methods such as PhILR and phylofactorization address the challenges of phylogenetic variables analysis. The PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges. Incorporating both microbial evolutionary models with the isometric log-ratio transform creates the PhILR transform. Phylofactorization is a dimensionality-reducing tool used to identify edges in the phylogeny from which putative functional ecological traits may have arisen.

Challenges
Inferences in phylogenetics requires the assumption of common ancestry or homology but when this assumption is violated the signal can be disrupted by noise. It is possible for microbial traits to be unrelated due to horizontal gene transfer causing the taxonomic composition to reveal little about the function of a system.