Epistasis and functional genomics

Epistasis refers to genetic interactions in which the mutation of one gene masks the phenotypic effects of a mutation at another locus. Systematic analysis of these epistatic interactions can provide insight into the structure and function of genetic pathways. Examining the phenotypes resulting from pairs of mutations helps in understanding how the function of these genes intersects. Genetic interactions are generally classified as either Positive/Alleviating or Negative/Aggravating. Fitness epistasis (an interaction between non-allelic genes) is positive (in other words, diminishing, antagonistic or buffering) when a loss of function mutation of two given genes results in exceeding the fitness predicted from individual effects of deleterious mutations, and it is negative (that is, reinforcing, synergistic or aggravating) when it decreases fitness. Ryszard Korona and Lukas Jasnos showed that the epistatic effect is usually positive in Saccharomyces cerevisiae. Usually, even in case of positive interactions double mutant has smaller fitness than single mutants. The positive interactions occur often when both genes lie within the same pathway Conversely, negative interactions are characterized by an even stronger defect than would be expected in the case of two single mutations, and in the most extreme cases (synthetic sick/lethal) the double mutation is lethal. This aggravated phenotype arises when genes in compensatory pathways are both knocked out.

High-throughput methods of analyzing these types of interactions have been useful in expanding our knowledge of genetic interactions. Synthetic genetic arrays (SGA), diploid based synthetic lethality analysis on microarrays (dSLAM), and epistatic miniarray profiles (E-MAP) are three important methods which have been developed for the systematic analysis and mapping of genetic interactions. This systematic approach to studying epistasis on a genome wide scale has significant implications for functional genomics. By identifying the negative and positive interactions between an unknown gene and a set genes within a known pathway, these methods can elucidate the function of previously uncharacterized genes within the context of a metabolic or developmental pathway.

Inferring function: alleviating and aggravating mutations
In order to understand how information about epistatic interactions relates to gene pathways, consider a simple example of vulval cell differentiation in C. elegans. Cells differentiate from Pn cells to Pn.p cells to VP cells to vulval cells. Mutation of lin-26 blocks differentiation of Pn cells to Pn.p cells. Mutants of lin-36 behave similarly, blocking differentiation at the transition to VP cells. In both cases, the resulting phenotype is marked by an absence of vulval cells as there is an upstream block in the differentiation pathway. A double mutant in which both of these genes have been disrupted exhibits an equivalent phenotype that is no worse than either single mutant. The upstream disruption at lin-26 masks the phenotypic effect of a mutation at lin-36 in a classic example of an alleviating epistatic interaction.

Aggravating mutations on the other hand give rise to a phenotype which is worse than the cumulative effect of each single mutation. This aggravated phenotype is indicative of two genes in compensatory pathways. In the case of the single mutant a parallel pathway is able to compensate for the loss of the disrupted pathway however, in the case of the double mutant the action of this compensatory pathway is lost as well, resulting in the more dramatic phenotype observed. This relationship has been significantly easier to detect than the more subtle alleviating phenotypes and has been extensively studied in S. cerevisiae through synthetic sick/lethal (SSL) screens which identify double mutants with significantly decreased growth rates.

It should be pointed out that these conclusions from double-mutant analysis, while they apply to many pathways and mutants, are not universal. For example, genes can act in opposite directions in pathways, so that knocking out both produces a near-normal phenotype, while each single mutant is severely affected (in opposite directions). A well-studied example occurs during early development in Drosophila, wherein gene products from the hunchback and nanos genes are present in the egg, and act in opposite directions to direct anterior-posterior pattern formation. Something similar often happens in signal transduction pathways, where knocking out a negative regulator of the pathway causes a hyper-activation phenotype, while knocking out a positively acting component produces an opposite phenotype. In linear pathways with a single "output", when knockout mutations in two oppositely-acting genes are combined in the same individual, the phenotype of the double mutant is typically the same as the phenotype of the single mutant whose normal gene product acts downstream in the pathway.

SGA and dSLAM
Synthetic genetic arrays (SGA) and diploid based synthetic lethality analysis of microarrays (dSLAM) are two key methods which have been used to identify synthetic sick lethal mutants and characterize negative epistatic relationships. Sequencing of the entire yeast genome has made it possible to generate a library of knock-out mutants for nearly every gene in the genome. These molecularly bar-coded mutants greatly facilitate high-throughput epistasis studies, as they can be pooled and used to generate the necessary double mutants. Both SGA and dSLAM approaches rely on these yeast knockout strains which are transformed/mated to generate haploid double mutants. Microarray profiling is then used to compare the fitness of these single and double mutants. In the case of SGA, the double mutants examined are haploid and collected after mating with a mutant strain followed by several rounds of selection. dSLAM strains of both single and double mutants originate from the same diploid heterozygote strain (indicated by “diploid” of “dSLAM”). In the case of dSLAM analysis the fitness of single and double mutants is assessed by microarray analysis of a growth competition assay.

Epistatic miniarray profiles (E-MAPs)
In order to develop a richer understanding of genetic interactions, experimental approaches are shifting away from this binary classification of phenotypes as wild type or synthetic lethal. The E-MAP approach is particularly compelling because of its ability to highlight both alleviating and aggravating effects and this capacity is what distinguishes this method from others such as SGA and dSLAM. Furthermore, not only does the E-MAP identify both types of interactions but also recognizes gradations in these interactions and the severity of the masked phenotype, represented by the interaction score applied to each pair of genes.

E-MAPs exploit an SGA approach in order to analyze genetic interactions in a high-throughput manner. While the method has been particularly developed for examining epistasis in S. cerevisiae, it could be applied to other model organisms as well. An E-MAP collates data generated from the systematic generation of double mutant strains for a large clearly defined group of genes. Each phenotypic response is quantified by imaging colony size to determine growth rate. This fitness score is compared to the predicted fitness for each single mutant, resulting in a genetic interaction score. Hierarchical clustering of this data to group genes with similar interaction profiles allows for the identification of epistatic relationships between genes with and without known function. By sorting the data in this way, genes known to interact will cluster together alongside genes which exhibit a similar pattern of interactions but whose function has not yet been identified. The E-MAP data is therefore able to place genes into new functions within well characterized pathways. Consider for example E-MAP presented by Collins et al. which clusters the transcriptional elongation factor Dst1 alongside components of the mid region of the Mediator complex, which is involved in transcriptional regulation. This suggests a new role for Dst1, functioning in concert with Mediator.

The choice of genes examined within a given E-MAP is critical to achieving fruitful results. It is particularly important that a significant subset of the genes examined have been well established in the literature. These genes are thus able to act as controls for the E-MAP allowing for greater certainty in analyzing the data from uncharacterized genes. Clusters organized by sub-cellular localization and general cellular processes (e.g. cell cycle) have yielded profitable results in S. cerevisiae. Data from protein-protein interaction studies can also provide a useful basis for selecting gene groups for E-MAP data. We would expect genes which exhibit physical interactions to also demonstrate interactions at the genetic level and thus these can serve as adequate controls for E-MAP data. Collins et al. (2007) carried out a comparison of E-MAP scores and physical interaction data from large-scale affinity purification methods (AP-MS) and their data demonstrate that an E-MAP approach identifies protein-protein interactions with a specificity equal to that of traditional methods such as AP-MS.

High throughput methods of examining epistatic relationships face difficulties, however as the number of possible gene pairs is extremely large (~20 million in S. cerevisiae) and the estimated density of genetic interactions is quite low. These difficulties can be countered by examining all possible interactions in a single cluster of genes rather than examining pairs across the whole genome. If well chosen, these functional clusters contain a significantly higher density of genetic interactions than other regions of the genome and thus allows for a higher rate of detection while dramatically decreasing the number of gene pairs to be examined.

Generation of mutant strains: DAmP
Generating data for the E-MAP depends upon the creation of thousands of double mutant strains; a study of 483 alleles, for example, resulted in an E-MAP with ~100,000 distinct double mutant pairs. The generation of libraries of essential gene mutants presents significant difficulties however, as these mutations have a lethal phenotype. Thus, E-MAP studies rely upon strains with intermediate expression levels of these genes. The decreased abundance by messenger RNA perturbation (DAmP) strategy is particularly common for the high-throughput generation of mutants necessary for this kind of analysis and allows for the partial disruption of essential genes without loss of viability. DAmP relies upon the destabilization of mRNA transcripts by integrating an antibiotic selectable marker into the 3’UTR, downstream of the stop codon (figure 2). mRNA’s with 3’ extended transcripts are rapidly targeted for degradation and the result is a downregulation of the gene of interest while it remains under the control of its native promoter. In the case of non-essential genes, deletion strains may be used. Tagging at the deletion sites with molecular barcodes, unique 20-bp sequences, allows for the identification and study of relative fitness levels in each mutant strain.