Evolution by gene duplication

Evolution by gene duplication is an event by which a gene or part of a gene can have two identical copies that can not be distinguished from each other. This phenomenon is understood to be an important source of novelty in evolution, providing for an expanded repertoire of molecular activities. The underlying mutational event of duplication may be a conventional gene duplication mutation within a chromosome, or a larger-scale event involving whole chromosomes (aneuploidy) or whole genomes (polyploidy). A classic view, owing to Susumu Ohno, which is known as Ohno model, he explains how duplication creates redundancy, the redundant copy accumulates beneficial mutations which provides fuel for innovation. Knowledge of evolution by gene duplication has advanced more rapidly in the past 15 years due to new genomic data, more powerful computational methods of comparative inference, and new evolutionary models.

Theoretical models
Several models exist that try to explain how new cellular functions of genes and their encoded protein products evolve through the mechanism of duplication and divergence. Although each model can explain certain aspects of the evolutionary process, the relative importance of each aspect is still unclear. This page only presents which theoretical models are currently discussed in the literature. Review articles on this topic can be found at the bottom.

In the following, a distinction will be made between explanations for the short-term effects (preservation) of a gene duplication and its long-term outcomes.

Preservation of gene duplicates
Since a gene duplication occurs in only one cell, either in a single-celled organism or in the germ cell of a multi-cellular organism, its carrier (i.e. the organism) usually has to compete against other organisms that do not carry the duplication. If the duplication disrupts the normal functioning of an organism, the organism has a reduced reproductive success (or low fitness) compared to its competitors and will most likely die out rapidly. If the duplication has no effect on fitness, it might be maintained in a certain proportion of a population. In certain cases, the duplication of a certain gene might be immediately beneficial, providing its carrier with a fitness advantage.

Dosage effect or gene amplification
The so-called 'dosage' of a gene refers to the amount of mRNA transcripts and subsequently translated protein molecules produced from a gene per time and per cell. If the amount of gene product is below its optimal level, there are two kinds of mutations that can increase dosage: increases in gene expression by promoter mutations and increases in gene copy number by gene duplication.

The more copies of the same (duplicated) gene a cell has in its genome, the more gene product can be produced simultaneously. Assuming that no regulatory feedback loops exist that automatically down-regulate gene expression, the amount of gene product (or gene dosage) will increase with each additional gene copy, until some upper limit is reached or sufficient gene product is available.

Furthermore, under positive selection for increased dosage, a duplicated gene could be immediately advantageous and quickly increase in frequency in a population. In this case, no further mutations would be necessary to preserve (or retain) the duplicates. However, at a later time, such mutations could still occur, leading to genes with different functions (see below).

Gene dosage effects after duplication can also be harmful to a cell and the duplication might therefore be selected against. For instance, when the metabolic network within a cell is fine-tuned so that it can only tolerate a certain amount of a certain gene product, gene duplication would offset this balance.

Activity reducing mutations
In cases of gene duplications that have no immediate fitness effect, a retention of the duplicate copy could still be possible if both copies accumulate mutations that for instance reduce the functional efficiency of the encoded proteins without inhibiting this function altogether. In such a case, the molecular function (e.g. protein/enzyme activity) would still be available to the cell to at least the extent that was available before duplication (now provided by proteins expressed from two gene loci, instead of one gene locus). However, the accidental loss of one gene copy might then be detrimental, since one copy of the gene with reduced activity would almost certainly lie below the activity that was available before duplication.

Long-term fate of duplicated genes
If a gene duplication is preserved, the most likely fate is that random mutations in one duplicate gene copy will eventually cause the gene to become non-functional . Such non-functional remnants of genes, with detectable sequence homology, can sometimes still be found in genomes and are called pseudogenes.

Functional divergence between the duplicate genes is another possible fate. There are several theoretical models that try to explain the mechanisms leading to divergence:

Neofunctionalization
The term neofunctionalization was first coined by Force et al. 1999, but it refers to the general mechanism proposed by Ohno 1970. The long-term outcome of Neofunctionalization is that one copy retains the original (pre-duplication) function of the gene, while the second copy acquires a distinct function. It is also known as the MDN model, "mutation during non-functionality". The major criticism of this model is the high likelihood of non-functionalization, i.e. the loss of all functionality of the gene, due to random accumulation of mutations.

IAD model
IAD stands for 'innovation, amplification, divergence' and aims to explain evolution of new gene functions while preserving its existing functions. Innovation, i.e. the establishment of a new molecular function, can occur via side-activities of genes and thus proteins this is called Enzyme promiscuity. For example, enzymes can sometimes catalyse more than just one reaction, even though they usually are optimised for catalysing just one reaction. Such promiscuous protein functions, if they provide an advantage to the host organism, can then be amplified with additional copies of the gene. Such a rapid amplification is best known from bacteria that often carry certain genes on smaller non-chromosomal DNA molecules (called plasmids) which are capable of rapid replication. Any gene on such a plasmid is also replicated and the additional copies amplify the expression of the encoded proteins, and with it any promiscuous function. After several such copies have been made, and are also passed on to descendent bacterial cells, a few of these copies might accumulate mutations that eventually will lead to a side-activity becoming the main activity.

The IAD model have been previously tested in the lab by using bacterial enzyme with dual function as starting point. This enzyme is capable of catalyzing not only its original function, but also side function that can carried out by other enzyme. By allowing the bacteria with this enzyme to evolve under selection to improve both activities (original and side) for several generations, it was shown that one ancestral bifunctional gene with poor activities (Innovation) evolved first by gene amplification to increase expression of the poor enzyme, and later accumulated more beneficial mutations that improved one or both of the activities that can be passed on to the next generation (divergence)

Subfunctionalization
"Subfunctionalization" was also first coined by Force et al. 1999. This model requires the ancestral (pre-duplication) gene to have several functions (sub-functions), which the descendant (post-duplication) genes specialise on in a complementary fashion. There are now at least two different models that are labeled as subfunctionalization, "DDC" and "EAC".

DDC model
DDC stands for "duplication-degeneration-complementation". This model was first introduced by Force et al. 1999. The first step is gene duplication. The gene duplication in itself is neither advantageous, nor deleterious, so it will remain at low frequency within a population of individuals that do not carry a duplication. According to DDC, this period of neutral drift may eventually lead to the complementary retention of sub-functions distributed over the two gene copies. This comes about by activity reducing (degenerative) mutations in both duplicates, accumulating over time periods and many generations. Taken together, the two mutated genes provide the same set of functions as the ancestral gene (before duplication). However, if one of the genes was removed, the remaining gene would not be able to provide the full set of functions and the host cell would likely suffer some detrimental consequences. Therefore, at this later stage of the process, there is a strong selection pressure against removing any of the two gene copies that arose by gene duplication. The duplication becomes permanently established in the genome of the host cell or organism.

EAC model
EAC stands for "Escape from Adaptive Conflict". This name first appeared in a publication by Hittinger and Carroll 2007. The evolutionary process described by the EAC model actually begins before the gene duplication event. A singleton (not duplicated) gene evolves towards two beneficial functions simultaneously. This creates an "adaptive conflict" for the gene, since it is unlikely to execute each individual function with maximum efficiency. The intermediate evolutionary result could be a multi-functional gene and after a gene duplication its sub-functions could be carried out by specialised descendants of the gene. The result would be the same as under the DDC model, two functionally specialised genes (paralogs). In contrast to the DDC model, the EAC model puts more emphasis on the multi-functional pre-duplication state of the evolving genes and gives a slightly different explanation as to why the duplicated multi-functional genes would benefit from additional specialisation after duplication (because of the adaptive conflict of the multi-functional ancestor that needs to be resolved). Under EAC there is an assumption of a positive selection pressure driving evolution after gene duplication, whereas the DDC model only requires neutral ("undirected") evolution to take place, i.e. degeneration and complementation.