Terminal restriction fragment length polymorphism

Terminal restriction fragment length polymorphism (TRFLP or sometimes T-RFLP) is a molecular biology technique for profiling of microbial communities based on the position of a restriction site closest to a labelled end of an amplified gene. The method is based on digesting a mixture of PCR amplified variants of a single gene using one or more restriction enzymes and detecting the size of each of the individual resulting terminal fragments using a DNA sequencer. The result is a graph image where the x-axis represents the sizes of the fragment and the y-axis represents their fluorescence intensity.

Background
TRFLP is one of several molecular methods aimed to generate a fingerprint of an unknown microbial community. Other similar methods include DGGE, TGGE, ARISA, ARDRA, PLFA, etc. These relatively high throughput methods were developed in order to reduce the cost and effort in analyzing microbial communities using a clone library. The method was first described by Avaniss-Aghajani et al in 1994 and later by Liu in 1997 which employed the amplification of the 16S rDNA target gene from the DNA of several isolated bacteria as well as environmental samples. Since then the method has been applied for the use of other marker genes such as the functional marker gene pmoA to analyze methanotrophic communities.

Method
Like most other community analysis methods, TRFLP is also based on PCR amplification of a target gene. In the case of TRFLP, the amplification is performed with one or both the primers having their 5’ end labeled with a fluorescent molecule. In case both primers are labeled, different fluorescent dyes are required. While several common fluorescent dyes can be used for the purpose of tagging such as 6-carboxyfluorescein (6-FAM), ROX, carboxytetramethylrhodamine (TAMRA, a rhodamine-based dye), and hexachlorofluorescein (HEX), the most widely used dye is 6-FAM. The mixture of amplicons is then subjected to a restriction reaction, normally using a four-cutter restriction enzyme. Following the restriction reaction, the mixture of fragments is separated using either capillary or polyacrylamide electrophoresis in a DNA sequencer and the sizes of the different terminal fragments are determined by the fluorescence detector. Because the excised mixture of amplicons is analyzed in a sequencer, only the terminal fragments (i.e. the labeled end or ends of the amplicon) are read while all other fragments are ignored. Thus, T-RFLP is different from ARDRA and RFLP in which all restriction fragments are visualized. In addition to these steps the TRFLP protocol often includes a cleanup of the PCR products prior to the restriction and in case a capillary electrophoresis is used a desalting stage is also performed prior to running the sample.

Data format and artifacts
The result of a T-RFLP profiling is a graph called electropherogram which is an intensity plot representation of an electrophoresis experiment (gel or capillary). In an electropherogram the X-axis marks the sizes of the fragments while the Y-axis marks the fluorescence intensity of each fragment. Thus, what appears on an electrophoresis gel as a band appears as a peak on the electropherogram whose integral is its total fluorescence. In a T–RFLP profile each peak assumingly corresponds to one genetic variant in the original sample while its height or area corresponds to its relative abundance in the specific community. Both assumptions listed above, however, are not always met. Often, several different bacteria in a population might give a single peak on the electropherogram due to the presence of a restriction site for the particular restriction enzyme used in the experiment at the same position. To overcome this problem and to increase the resolving power of this technique a single sample can be digested in parallel by several enzymes (often three) resulting in three T-RFLP profiles per sample each resolving some variants while missing others. Another modification which is sometimes used is to fluorescently label the reverse primer as well using a different dye, again resulting in two parallel profiles per sample each resolving a different number of variants.

In addition to convergence of two distinct genetic variants into a single peak artifacts might also appear, mainly in the form of false peaks. False peaks are generally of two types: background “noises” and “pseudo” TRFs. Background (noise) peaks are peaks resulting from the sensitivity of the detector in use. These peaks are often small in their intensity and usually form a problem in case the total intensity of the profile is low (i.e. low concentration of DNA). Because these peaks result from background noise they are normally irreproducible in replicate profiles, thus the problem can be tackled by producing a consensus profile from several replicates or by eliminating peaks below a certain threshold. Several other computational techniques were also introduced in order to deal with this problem. Pseudo TRFs, on the other hand, are reproducible peaks and are linear to the amount of DNA loaded. These peaks are thought to be the result of ssDNA annealing on to itself and creating double stranded random restriction sites which are later recognized by the restriction enzyme resulting in a terminal fragment which does not represent any genuine genetic variant. It has been suggested that applying a DNA exonuclease such as the Mung bean exonuclease prior to the digestion stage might eliminate such artifact.

Interpretation of data
The data resulting from the electropherogram is normally interpreted in one of the following ways.

Pattern comparison
In pattern comparison the general shapes of electropherograms of different samples are compared for changes such as presence-absence of peaks between treatments, their relative size, etc.

Complementing with a clone library
If a clone library is constructed in parallel to the T-RFLP analysis then the clones can be used to assess and interpret the T-RFLP profile. In this method the TRF of each clone is determined either directly (i.e. performing T-RFLP analysis on each single clone) or by in silico analysis of that clone’s sequence. By comparing the T-RFLP profile to a clone library it is possible to validate each of the peaks as genuine as well as to assess the relative abundance of each variant in the library.

Peak resolving using a database
Several computer applications attempt to relate the peaks in an electropherogram to specific bacteria in a database. Normally this type of analysis is done by simultaneously resolving several profiles of a single sample obtained with different restriction enzymes. The software then resolves the profile by attempting to maximize the matches between the peaks in the profiles and the entries in the database so that the number of peaks left without a matching sequence is minimal. The software withdraws from the database only those sequences which have their TRFs in all analyzed profiles.

Multivariate analysis
A recently growing way to analyze T-RFLP profiles is use multivariate statistical methods to interpret the T-RFLP data. Usually the methods applied are those commonly used in ecology and especially in the study of biodiversity. Among them ordinations and cluster analysis are the most widely used. In order to perform multivariate statistical analysis on T-RFLP data, the data must first be converted to table known as a “sample by species table“ which depicts the different samples (T-RFLP profiles) versus the species (T-RFS) with the height or area of the peaks as values.

Advantages and disadvantages
As T-RFLP is a fingerprinting technique its advantages and drawbacks are often discussed in comparison with other similar techniques, mostly DGGE.

Advantages
The major advantage of T-RFLP is the use of an automated sequencer which gives highly reproducible results for repeated samples. Although the genetic profiles are not completely reproducible and several minor peaks which appear are irreproducible the overall shape of the electropherogram and the ratios of the major peaks are considered reproducible. The use of an automated sequencer which outputs the results in a digital numerical format also enables an easy way to store the data and compare different samples and experiments. The numerical format of the data can and has been used for relative (though not absolute) quantification and statistical analysis. Although sequence data cannot be definitively inferred directly from the T-RFLP profile, ‘’in-silico’’ assignment of the peaks to existing sequences is possible to a certain extent.

Drawbacks
Because T-RFLP relies on DNA extraction methods and PCR, the biases inherent to both will affect the results of the analysis. Also, the fact that only the terminal fragments are being read means that any two distinct sequences which share a terminal restriction site will result in one peak only on the electropherogram and will be indistinguishable. Indeed, when T-RFLP is applied on a complex microbial community the result is often a compression of the total diversity to normally 20-50 distinct peaks only representing each an unknown number of distinct sequences. Although this phenomenon makes the T-RFLP results easier to handle, it naturally introduces biases and oversimplification of the real diversity. Attempts to minimize (but not overcome) this problem are often done by applying several restriction enzymes and/ or labeling both primers with a different fluorescent dye. The inability to retrieve sequences from T-RFLP often leads to the need to construct and analyze one or more clone libraries in parallel to the T-RFLP analysis which adds to the effort and complicates analysis. The possible appearance of false (pseudo) T-RFs, as discussed above, is yet another drawback. To handle this researchers often only consider peaks which can be affiliated to sequences in a clone library.