Genetic variation

Genetic variation is the difference in DNA among individuals or the differences between populations among the same species. The multiple sources of genetic variation include mutation and genetic recombination. Mutations are the ultimate sources of genetic variation, but other mechanisms, such as genetic drift, contribute to it, as well.

Among individuals within a population
Genetic variation can be identified at many levels. Identifying genetic variation is possible from observations of phenotypic variation in either quantitative traits (traits that vary continuously and are coded for by many genes, e.g., leg length in dogs) or discrete traits (traits that fall into discrete categories and are coded for by one or a few genes, e.g., white, pink, or red petal color in certain flowers).

Genetic variation can also be identified by examining variation at the level of enzymes using the process of protein electrophoresis. Polymorphic genes have more than one allele at each locus. Half of the genes that code for enzymes in insects and plants may be polymorphic, whereas polymorphisms are less common among vertebrates.

Ultimately, genetic variation is caused by variation in the order of bases in the nucleotides in genes. New technology now allows scientists to directly sequence DNA, which has identified even more genetic variation than was previously detected by protein electrophoresis. Examination of DNA has shown genetic variation in both coding regions and in the noncoding intron region of genes.

Genetic variation will result in phenotypic variation if variation in the order of nucleotides in the DNA sequence results in a difference in the order of amino acids in proteins coded by that DNA sequence, and if the resultant differences in amino-acid sequence influence the shape, and thus the function of the enzyme.

Between populations
Differences between populations resulting from geographic separation is known as geographic variation. Natural selection, genetic drift, and gene flow can all contribute to geographic variation.

Measurement
Genetic variation within a population is commonly measured as the percentage of polymorphic gene loci or the percentage of gene loci in heterozygous individuals. The results can be very useful in understanding the process of adaption to the environment of each individual in the population.

Forms
Genetic variation can be divided into different forms according to the size and type of genomic variation underpinning genetic change. Small-scale sequence variation (<1 kilobase, kb) includes base-pair substitution and indels. Large-scale structural variation (>1 kb) can be either copy number variation (loss or gain), or chromosomal rearrangement (translocation, inversion, or Segmental acquired uniparental disomy). Genetic variation and recombination by transposable elements and endogenous retroviruses sometimes is supplemented by a variety of persistent viruses and their defectives which generate genetic novelty in host genomes. Numerical variation in whole chromosomes or genomes can be either polyploidy or aneuploidy.

Maintenance in populations
A variety of factors maintain genetic variation in populations. Potentially harmful recessive alleles can be hidden from selection in the heterozygous individuals in populations of diploid organisms (recessive alleles are only expressed in the less common homozygous individuals). Natural selection can also maintain genetic variation in balanced polymorphisms. Balanced polymorphisms may occur when heterozygotes are favored or when selection is frequency dependent.

RNA viruses
A high mutation rate caused by the lack of a proofreading mechanism appears to be a major source of the genetic variation that contributes to RNA virus evolution. Genetic recombination also has been shown to play a key role in generating the genetic variation that underlies RNA virus evolution. Numerous RNA viruses are capable of genetic recombination when at least two viral genomes are present in the same host cell. RNA recombination appears to be a major driving force in determining genome architecture and the course of viral evolution among Picornaviridae ((+)ssRNA) (e.g. poliovirus). In the Retroviridae ((+)ssRNA)(e.g. HIV), damage in the RNA genome appears to be avoided during reverse transcription by strand switching, a form of genetic recombination. Recombination also occurs in the Coronaviridae ((+)ssRNA) (e.g. SARS). Recombination in RNA viruses appears to be an adaptation for coping with genome damage. Recombination can occur infrequently between animal viruses of the same species but of divergent lineages. The resulting recombinant viruses may sometimes cause an outbreak of infection in humans.

History of genetic variation
Evolutionary biologists are often concerned with genetic variation, a term which in modern times has come to refer to differences in DNA sequences among individuals. However, quantifying and understanding genetic variation has been a central aim of those interested in understanding the varied life on earth since long before the sequencing of the first full genome, and even before the discovery of DNA as the molecule responsible for heredity.

While today's definition of genetic variation relies on contemporary molecular genetics, the idea of heritable variation was of central importance to those interested in the substance and development of life even before the writings of Charles Darwin. The concept of heritable variation—the presence of innate differences between life forms that are passed from parents to offspring, especially within categories such as species—does not rely on modern ideas of genetics, which were unavailable to 18th- and 19th-century minds.

Pre-Darwinian concepts of heritable variation
In the mid-1700s, Pierre Louis Maupertuis, a French scholar now known primarily for his work in mathematics and physics, posited that while species have a true, original form, accidents during the development of nascent offspring could introduce variations that could accumulate over time. In his 1750 Essaie de Cosmologie, he proposed that the species we see today are only a small fraction of the many variations produced by "a blind destiny", and that many of these variations did not "conform" to their needs, thus did not survive. In fact, some historians even suggest that his ideas anticipated the laws of inheritance further developed by Gregor Mendel.

Simultaneously, French philosopher Denis Diderot proposed a different framework for the generation of heritable variation. Diderot borrowed Maupertuis' idea that variation could be introduced during reproduction and the subsequent growth of offspring, and thought that production of a "normal" organism was no more probable than production of a "monstrous" one. However, Diderot also believed that matter itself had lifelike properties and could self-assemble into structures with the potential for life. Diderot's ideas on biological transformation, introduced in his 1749 work Letter on the Blind, were thus focused on variability of spontaneously generated forms, not variability within existing species.

Both Maupertuis and Diderot built on the ideas of Roman poet and philosopher Lucretius, who wrote in De rerum natura that all the universe was created by random chance, and only the beings that were not self-contradictory survived. Maupertuis' work is distinguished from the work of both Lucretius and Diderot in his use of the concept of conformity in explaining differential survival of beings, a new idea among those who believed that life changed over time.

Like Diderot, two other influential minds of the 18th century—Erasmus Darwin and Jean-Baptiste Lamarck—believed that only very simple organisms could be generated by spontaneous generation, so another mechanism was necessary to generate the great variability of complex life observed on earth. Erasmus Darwin proposed that changes acquired during an animal's life could be passed to its offspring, and that these changes seemed to be produced by the animal's endeavors to meet its basic needs. Similarly, Lamarck's theory of the variability among living things was rooted in patterns of use and disuse, which he believed led to heritable physiological changes. Both Erasmus Darwin and Lamarck believed that variation, whether it arose during development or during the animal's life, was heritable, a key step in theories of change over time extending from individuals to populations.

In the subsequent century, William Herschel's telescopic observations of diverse nebulae across the night sky suggested to him that different nebulae could each be in different stages in the process of condensation. This idea, which came to be known as the nebular hypothesis, suggested that natural processes could both create order out of matter and introduce variation, and that these processes could be observed over time. While it may seem to the modern reader that astronomical theories are irrelevant to theories of organic variation, these ideas became significantly conflated with ideas of biological transformation—what we now know as evolution—in the mid-19th century, laying important groundwork for the work of subsequent thinkers such as Charles Darwin.

Darwin's concept of heritable variation
Charles Darwin's ideas of heritable variation were shaped by both his own scientific work and the ideas of his contemporaries and predecessors. Darwin ascribed heritable variation to many factors, but particularly emphasized environmental forces acting on the body. His theory of inheritance was rooted in the (now disproven) idea of gemmules - small, hypothetical particles, which capture the essence of an organism and travel from all over the body to the reproductive organs, from which they are passed to offspring. Darwin believed that the causal relationship between the environment and the body was so complex that the variation this relationship produced was inherently unpredictable. However, like Lamarck, he acknowledged that variability could also be introduced by patterns of use and disuse of organs. Darwin was fascinated by variation in both natural and domesticated populations, and his realization that individuals in a population exhibited seemingly purposeless variation was largely driven by his experiences working with animal breeders. Darwin believed that species changed gradually, through the accumulation of small, continuous variations, a concept that would remain hotly contested into the 20th century.

Post-Darwinian concepts of heritable variation
In the 20th century, a field that came to be known as population genetics developed. This field seeks to understand and quantify genetic variation. The section below consists of a timeline of selected developments in population genetics, with a focus on methods for quantifying genetic variation.


 * 1866 - Heterozygosity: Gregor Mendel's hybridization experiments introduced the concept that in the 1950s came to be recognized as heterozygosity. In a diploid species, one that contains two copies of DNA within each cell (one from each parent), an individual is said to be a heterozygote at a particular location in the genome if its two copies of DNA differ at that site. Heterozygosity, the average frequency of heterozygotes in a population, became a fundamental measure of the genetic variation in a population by the mid-20th century. If the heterozygosity of a population is zero, every individual is homozygous; that is, every individual has two copies of the same allele at the locus of interest and no genetic variation exists.
 * 1918 - Variance: In a seminal paper entitled "The correlation between relatives on the supposition of Mendelian inheritance", R.A. Fisher introduced the statistical concept of variance; the average of squared deviations of a collection of observations from their mean ($\sigma^2=\frac{1}{I}\sum_{i=1}^I(x_i-\mu)^2$ ), where $$\sigma^2$$ is the variance and $$\mu$$ is the mean of the population from which the observations $$x_i$$ are drawn). R.A. Fisher's work in population genetics was not just important to population genetics; these ideas would also form the foundations of modern statistics.
 * 1918, 1921 - Additive and dominant genetic variance: R.A. Fisher subsequently subdivided his general definition of variance into two components relevant to population genetics: additive and dominant genetic variance. An additive genetic model assumes that genes do not interact if the number of the genes affecting the phenotype is small and that a trait value can be estimated simply by summing the effect of each gene on the trait. Under Fisher's model, the total genetic variance is the sum of the additive genetic variance (the variance in a trait due to these additive effects) and the dominant genetic variance (which accounts for interactions between genes).
 * 1948 - Entropy: Unlike variance, which was developed with the purpose of quantifying genetic variance, Claude Shannon's measure of diversity, now known as Shannon entropy, was developed as part of his work in communication theory as a way to quantify the amount of information contained in a message. However, the method quickly found use in population genetics, and was the central method used to quantify genetic diversity in a seminal paper by Richard Lewontin, "The Apportionment of Human Genetic Diversity".
 * 1951 - F-statistics: F-statistics, also known as fixation indices, were developed by population geneticist Sewall Wright to quantify differences in genetic variation within and between populations. The most common of these statistics, FST, considers in its simplest definition two different versions of a gene, or alleles, and two populations that contain one or both of these two alleles. FST quantifies the genetic variability among these two populations by computing the average frequency of heterozygotes across the two populations relative to the frequency of heterozygotes if the two populations were pooled. F-statistics introduced the idea of quantifying hierarchical concepts of variance and would become the foundation of many important population genetic methods, including a set of methods that tests for evidence of natural selection in the genome.