Community fingerprinting

Community fingerprinting is a set of molecular biology techniques that can be used to quickly profile the diversity of a microbial community. Rather than directly identifying or counting individual cells in an environmental sample, these techniques show how many variants of a gene are present. In general, it is assumed that each different gene variant represents a different type of microbe. Community fingerprinting is used by microbiologists studying a variety of microbial systems (e.g. marine, freshwater, soil, and human microbial communities) to measure biodiversity or track changes in community structure over time. The method analyzes environmental samples by assaying genomic DNA. This approach offers an alternative to microbial culturing, which is important because most microbes cannot be cultured in the laboratory. Community fingerprinting does not result in identification of individual microbe species; instead, it presents an overall picture of a microbial community. These methods are now largely being replaced by high throughput sequencing, such as targeted microbiome analysis (e.g., 16s rRNA sequencing) and metagenomics.

Use
A fingerprinting analysis begins with an environmental sample (e.g. seawater or soil), from which total DNA is extracted. (Total DNA contains a mix of genetic material from all the microbes present in the sample.) A particular gene or DNA region is then selected as a target for analysis, under the assumption that each microbe species will have a different gene variant (also called a "phylotype"). Different methods (see below) can be used to visualize the phylotypes present in a sample. Because the aim of community fingerprinting is to gain an overall understanding of community structure, it is a particularly useful technique for analyzing time-series data collected from the field. For example, one could study the pattern of microbial succession in a habitat, or one could examine the response of a microbial community to an environmental perturbation, such as the release of a pollutant. Depending on what information is desired, different genes may be targeted. The most common are small subunit ribosomal RNA (rRNA) genes, such as 16S rRNA. These genes are frequently used in microbial phylogenetic analyses, so well-established techniques exist for their study. Other genes of interest might be those that are key in various metabolic processes.

Advantages and disadvantages
The advantages of community fingerprinting are that it can be performed quickly and relatively cheaply, and the analyses can accommodate a large number of samples simultaneously. These properties make community fingerprinting especially useful for monitoring changes in microbial communities over time. Also, fingerprinting techniques do not require one to have a priori sequence data for organisms in a sample. A disadvantage of community fingerprinting is that it results in largely qualitative, not quantitative data. When using qualitative data, it can be difficult to compare patterns observed in different studies or between different investigators. Also, community fingerprinting does not directly identify taxa in an environmental sample, though the data output from certain techniques (e.g. DGGE) can be analyzed further if one desires identification. Some authors point to poor reproducibility of results for certain fingerprinting methods, while other authors have criticized the inaccuracy of abundance estimates and the inability of some techniques to capture the presence of rare taxa. For example, it is difficult for the DGGE method to detect microbes that comprise less than 0.5%-1% of a bacterial community.

Techniques
This section presents three methods of community fingerprinting.

Terminal restriction fragment length polymorphism (T-RFLP)
Terminal restriction fragment length polymorphism (T-RFLP) is a method that uses fluorescently-labeled DNA fragments to produce a community fingerprint. This section presents a brief explanation of T-RFLP in the specific context of community fingerprinting. For a more detailed explanation, refer to the T-RFLP article.

Procedure
To perform T-RFLP (Figure 1), one must select a target gene (e.g. the 16S rRNA gene) to amplify by PCR. At least one primer used in the PCR reaction is fluorescently labeled at the 5´ end. After PCR amplification, each copied DNA segment carries the fluorescent label. Next, restriction enzymes are used to cut the amplified DNA at specific recognition sites. The underlying assumption is that each microbe in the sample will have a different sequence on the target gene, so a restriction enzyme will cut each microbe's DNA in a different place. (Each different restriction site is considered to represent a single operational taxonomic unit [OTU]). Thus, the enzyme will produce one fragment length for each type of microbe present in the sample. The result of digestion is a set of restriction fragments of different lengths, each of which is fluorescently labeled at one end. These are known as "terminal fragments" because they are labeled at the end where the PCR primer attached. (The unlabeled ends are not recorded in the final analysis.) Next, the fragments are separated by size through either gel or capillary electrophoresis. Laser detection captures the size and fluorescence-intensity patterns of the terminal fragments. DNA standards of known size and fluorescence are included in the analysis as references. By setting a minimum threshold for fluorescence, background noise will be excluded.

Data output and interpretation
The output of the laser detection step is an electropherogram that shows a series of peaks, with fragment length on the horizontal axis and fluorescence intensity on the vertical axis (Figure 1). A second output is a data table that lists the migration time of the fragments, the size in base pairs of each peak, and the height of and area under each peak.

The theoretical basis of T-RFLP assumes that peaks at different positions along the horizontal axis represent different types of organisms (or OTUs). The area under each fluorescence intensity peak is a proxy for relative abundance of each phylotype in the community. However, a number of caveats must be taken into account. Different types of organisms may share a restriction site in the gene of interest; if that is the case, these organisms would not be distinguished as different peaks on the electropherogram. Furthermore, area under a peak represents relative abundance rather than absolute abundance, and there are biases in abundance measurement and PCR amplification. For example, organisms that are scarce in the original total DNA sample will not be amplified enough to be detected in the final analysis. This leads to underestimation of community diversity. Liu et al. cite other possible factors that may distort results, including "differences in gene copy number between species and biases introduced during cell lysis, DNA extraction, and PCR amplification" (p. 4521). For those who seek detailed technical information, Marsh provides a catalog of potential biases that could be introduced in each step of the T-RFLP process.

Advantages and disadvantages
The major advantages of T-RFLP are that it is fast and can easily accommodate many samples. Also, the visual output simplifies comparison of community structure patterns across different samples. A disadvantage is the broad, qualitative nature of the data output, which must be interpreted with the above caveats in mind. Also, direct identification of microbes in a sample is not possible through T-RFLP.

Applications
Disayathanoowat et al. used T-RFLP to assess the microbial gut community, or microbiome, in two species of honeybees in Thailand. They found that the two species harbor different microbial communities and that the microbiome changes over the lifetime of the bees.

Joo et al. tested T-RFLP as a method for phytoplankton monitoring. The authors collected environmental water samples from reservoirs in a time series. After comparison of samples with known terminal restriction fragments (from a database built from cultures), they concluded that T-RFLP can be used effectively as a technique for monitoring changes in the phytoplankton community. However, diversity and abundance estimates were found to be less accurate than those found through other methods.

Procedure
Denaturing gradient gel electrophoresis (DGGE) is a microbial fingerprinting technique that separates amplicons of roughly the same size based on sequence properties (Figure 2). These properties dictate the threshold at which DNA denatures. The DGGE gel uses a gradient DNA denaturant (a mixture of urea and formamide), or a linear temperature gradient. When the fragment reaches its melting point (threshold of enough denaturant), it stops moving. This is due to the fact that a partially melted double-stranded DNA can no longer migrate through the gel. A GC clamp (about 40 bases with high GC content) is used as a special primer to anchor the PCR fragments together once they have denatured.

Data output and interpretation
Each lane on a gel represents one microbial community. The shared bands among the samples are the same size and roughly the same position on the gel. The gene variants that are not shared amongst microbial community samples do not match up horizontally with others. For example, if the gene of interest is 16S rRNA, as it was when the technique was first described, the PCR-amplified fragments will be in the same vertical location because they are all roughly the same size. Another target gene may have greater variation in length, but the denaturant gradient uses a second element (of melting point) to further distinguish between the samples. The DGGE gel will separate genes of the same size based on base sequence.

This technique shows to what extent microbial communities are the same or different in taxonomic composition. Each band in a different location on the gel represents a different phylotype (one unique sequence of a phylogenetic marker gene). For microbial communities this method profiles many individual 16S rRNA sequences. The number of bands at differing horizontal positions can be used to estimate the level of biodiversity in that sample and infer phylogenetic affiliation. In order to know more about phylogenetic affiliation, one could excise those bands from the gel and then sequence them.

Advantages and disadvantages
The use of denaturing profile serves as a way to separate DNA fragments of similar sizes. This is beneficial in assessing microbial diversity due to the fact that the 16S rRNA gene does not vary much in size across bacterial phyla. The DGGE gel provides a quick way of looking at biodiversity in a microbial sample and does not preclude the option of sequencing the bands of interest. This method does not require that the microbes be cultured in the lab and does not require any sequence data needed to design probes for hybridization methods. The main disadvantage is that this is a qualitative assessment of biodiversity and one must sequence the genes in order to make inferences about the phylogenetic relatedness. Another disadvantage is that the GC clamp can be variable each time it is synthesized. This leads to the potential for different DGGE profiles for the same 16S rRNA sequence.

Applications
Stephen et al. utilized DGGE for a rapid analysis of Proteobacteria in soil. They obtained an initial assessment of microbial diversity in their environmental samples from soil maintained for 36 years at the various pH values. They combined DGGE and hybridization techniques by probing the DNA fragments to obtain more detail about the natural populations. In this study, they were looking at a group of closely related bacterial types, all autotrophic β–proteobacterial ammonia oxidizers. The 16S rDNA samples yielded ambiguous overlapping bands when run out on a gel. The ambiguous overlapping bands were separated with cluster-specific radiolabelled probes, which yielded information of the relative abundance of the different genotypes in samples.

Ward et al. examined cyanobacterial mat communities in a Yellowstone hot spring by using DGGE analysis of 16S rRNA gene segments of aerobic chemoorganotrophic populations. DGGE allowed them to profile the community gaining new insight to diversity. They characterized the bands of interest by purifying and sequencing and detected many lineages previously unknown to be present.

Procedure
(Automated) ribosomal intergenic spacer analysis, or (A)RISA (Figure 3), takes advantage of the fact that prokaryotic DNA encodes for two highly conserved genes, the 16SrRNA gene and the 23SrRNA gene. These encode the small and large subunit genes in the rRNA operon. Between these two genes, there is an internal transcribed spacer (ITS) region. Due to the fact it is non-coding for proteins, it is a highly variable nucleotide sequence and length. Once DNA is isolated from a community, PCR amplifies this spacer region. The fragments can be run out on a gel (RISA), or the fluorescent primers can be translated into peaks in abundance of the different fragments lengths on an electropherogram (ARISA).

Data output and interpretation
Due to variable non-coding regions, the output for RISA is a gel with different banding patterns, and output for ARISA is an electropherogram with different peaks (similar to T-RFLP).

The brightness of the fluorescently labeled primers correlates to how prevalent that bacterial type is in the community. The banding pattern on the gel can be interpreted as a community-specific profile. Each DNA band or peak indicates at least one representative of that organism. In RISA, the bands on gel that do not match up in length represent different organisms in the community because they have different spacer regions between the two highly conserved genes. The electropherogram shows peaks correlating to the relative abundance of that spacer region in the sample.

Advantages and disadvantages
ARISA can have a higher resolution in detecting microbial diversity as compared to T-RFLP. This fingerprinting method is a quick and sensitive method to estimate microbial diversity. The observed length heterogeneities can be compared to databases for overlap with culturable organisms. One can design phylum-level oligonucleotide primers to get at questions regarding phylogenetic groups. One disadvantage to ARISA is the fact that a single organism may contribute more than one peak to the community profile. Unrelated organisms can also have similar spacer lengths, which leads to underestimates of community diversity. Due to these biases, researchers often use this method on multiple samples from each community in order to get an average assessment.

Applications
Ranjard et al. discuss several examples of types of studies in which RISA can be used. They cite several studies that have used this technique to fingerprint bacterial communities following such disturbances as antibiotic treatment, mercury stress and deforestation. They also demonstrated the successful use of ARISA for characterizing fungal communities, which is an aspect of microbial ecology that remains to be fully explored.

Schloss et al. conducted a study examining environmental variables and linking them to changes in the microbial ecology of a compost pile. They used ARISA to profile the community structure and look at microbial succession over stages in the composting process. They took DNA samples and sequenced the 16S rRNA gene to identify community members at the different phases of the process. Then they used ARISA to look at community-wide changes and then matched up the 16S rRNA gene sequence with the most common ARISA fragments to identify these microbial community members.