User:Tobysiko/article draft on Evolution by Gene Duplication

Gene duplication is generally thought to be a major driving force of evolutionary innovation. Gene duplication is a randomly occurring mutational event within cells during which a stretch of DNA, and all genes that might be located on it, is duplicated. The two copies of a gene resulting from such a duplication are inherited by all descendants of the cell in which it occurred and are thus prone to random mutation and natural selection. A central theory of molecular evolution states that most genes are under strong selection pressure to conserve any existing gene and that only after a gene duplication a gene is freed from this selection pressure and eventually might accumulate mutations that cause the gene to assume a new function in the cell. This view was first established by Susumu Ohno.

Theoretical models
Several models exist that try to explain how new cellular functions of genes and their encoded protein products evolve through the mechanism of duplication and divergence. Although each model can explain certain aspects of the evolutionary process, the relative importance of each aspect is still unclear. This page only presents which theoretical models are currently discussed in the literature. Review articles on this topic can be found at the bottom.

In the following, a distinction will be made between explanations for the short-term effects (preservation) of a gene duplication and its long-term outcomes.

Preservation of gene duplicates
Since a gene duplication occurs in only one cell, either in a single-celled organism or in the germ cell of a multi-cellular organism, its carrier (i.e. the organism) usually has to compete against other organisms that do not carry the duplication. If the duplication disrupts the normal functioning of an organism, the organism has a reduced reproductive success (or low fitness) compared to its competitors and will most likely die out rapidly. If the duplication has no effect on fitness, it might be maintained in a certain proportion of a population. In certain cases, the duplication of a certain gene might be immediately beneficial, providing its carrier with a fitness advantage.

Dosage effects
The so-called 'dosage' of a gene refers to the amount of mRNA transcripts and subsequently translated protein molecules are produced from a gene per time and per cell. If the amount of gene product is below its optimal level, there are two kinds of mutations that can increase dosage: increases in gene expression by promotor mutations and increases in gene copy number by gene duplication.

The more copies of the same (duplicated) gene a cell has in its genome, the more gene product can be produced simultaneously. Assuming that no regulatory feedback loops exist that automatically down-regulate gene expression, the amount of gene product (or gene dosage) will increase with each additional gene copy, until some upper limit is reached or sufficient gene product is available.

Furthermore, under positive selection for increased dosage, a gene duplication could be immediately advantageous and quickly increase in frequency in a population. In this case, no further mutations would be necessary to preserve (or retain) the duplicates. However, at a later time, such mutations could still occur, leading to genes with different functions (see below).

Gene dosage effects after duplication can also be harmful to a cell and the duplication might therefore be selected against. For instance, when the metabolic network within a cell is fine-tuned so that it can only tolerate a certain amount of a certain gene product, gene duplication would offset this balance.

Activity reducing mutations
In cases of gene duplications that have no immediate fitness effect, a retention of the duplicate copy could still be possible if both copies accumulate mutations that for instance reduce the functional efficiency of the encoded proteins without inhibiting this function altogether. In such a case, the molecular function (e.g. protein/enzyme activity) would still be available to the cell to at least the extend that was available before duplication (now provided by proteins expressed from two gene loci, instead of one gene locus). However, the accidental loss of one gene copy might then be detrimental, since one copy of the gene with reduced activity would almost certainly lie below the activity that was available before duplication.

Long-term fate of duplicated genes
If a gene duplication is preserved, the most likely fate is that random mutations in one duplicate gene copy will eventually cause the gene to become non-functional. Such non-functional remnants of genes, with detectable sequence homology, can sometimes still be found in genomes and are called pseudogenes.

Functional divergence between the duplicate genes is another possible fate. There are several theoretical models that try to explain the mechanisms leading to divergence:

Neofunctionalization
The term neofunctionalization was first coined by Force et al. 1999 , but it refers to the general mechanism proposed by Ohno 1970. The long-term outcome of Neofunctionalization is that one copy retains the original (pre-duplication) function of the gene, while the second copy acquires a distinct function. The major criticism of this model is the high likelihood of non-functionalizsation, i.e. the loss of all functionality of the gene, due to random accumulation of mutations.

Subfunctionalization
"Subfunctionalization" was also first coined by Force et al. 1999. This model requires the ancestral (pre-duplication) gene to have several functions (sub-functions), which the descendant (post-duplication) genes specialise on in a complementary fashion.

DDC model
DDC stands for "duplication-degeneration-complementation". This model was first introduced by Force et al. 1999. The first step here is gene duplication followed by activity reducing (degenerative) mutations in both duplicates. This eventually leads to the complementary retention of sub-functions distributed over the two gene copies.

EAC model
The evolutionary process described by the "Escape from Adaptive Conflict" (EAC) model actually begins before the gene duplication event. A singleton (not duplicated) gene evolves towards two beneficial functions simultaneously. This creates an "adaptive conflict" for the gene, since it is unlikely to execute each individual function with maximum efficiency. The intermediate evolutionary result could be a multi-functional gene and after a gene duplication its sub-functions could be carried out by specialised descendants of the gene. The end result would be the same as under the DDC model, two functionally specialised genes (paralogs). In contrast to the DDC model, the EAC model puts more emphasise on the multi-functional pre-duplication state of the evolving genes and gives a slightly different explanation as to why the duplicated multi-functional genes would benefit from additional specialisation after duplication (because of the adaptive conflict of the multi-functional ancestor that needs to be resolved). Under EAC there is an assumption of a positive selection pressure driving evolution after gene duplication, whereas the DDC model only requires neutral ("undirected") evolution to take place, i.e. degeneration and complementation.

IAD model
IAD stands for 'innovation, amplification, divergence' and aims to explain evolution of new gene functions while preserving its existing functions. Innovation, i.e. the establishment of a new molecular function, can occur via promiscuous side-activities of genes and thus proteins, which they are not optimised to do. For example enzymes can sometimes catalyse more than just one reaction, even though they usually are optimised for catalysing just one reaction. Such promiscuous protein functions, if they provide an advantage to the host organism, can then be amplified with additional copies of the gene. Such a rapid amplification is best known from bacteria that often carry certain genes on smaller non-chromosomal DNA molecules (called plasmids) which are capable of rapid replication. Any gene on such a plasmid is also replicated and the additional copies amplify the expression of the encoded proteins, and with it any promiscuous function. After several such copies have been made, and are also passed on to descendent bacterial cells, a few of these copies might accumulate mutations that eventually will lead to a side-activity becoming the main activity.