Allele age

Allele age (or mutation age) is the amount of time elapsed since an allele first appeared due to mutation. Estimating the time at which a certain allele appeared allows researchers to infer patterns of human migration, disease, and natural selection. Allele age can be estimated based on (1) the frequency of the allele in a population and (2) the genetic variation that occurs within different copies of the allele, also known as intra-allelic variation. While either of these methods can be used to estimate allele age, the use of both increases the accuracy of the estimation and can sometimes offer additional information regarding the presence of selection.

Estimating allele age based on the allele’s frequency is based on the fact that alleles in high frequency are older than alleles in low frequency (assuming the absence of selection). Of course, many alleles of interest are under some type of selection. Because alleles that are under positive selection can rise to high frequency very quickly, it is important to understand the mechanisms that underlie allele frequency change, such as natural selection, gene flow, genetic drift, and mutation.

Estimating allele age based on intra-allelic variation is based on the fact that with every generation, linkage with other alleles (linkage disequilibrium) is disrupted by recombination and new variation in linkage is created via new mutations. The analysis of intra-allelic variation to assess allele age depends on coalescent theory. There are two different approaches that can be used to analyze allele age based on intra-allelic variation. First, a phylogenetics approach extrapolates an allele’s age by reconstructing a gene tree and dating the root of the tree. This approach is best when analyzing ancient, as opposed to recent, mutations. Second, a population genetics approach estimates allele age by using mutation, recombination, and demography models instead of a gene tree. This type of approach is best for analyzing recent mutations.

Recently, Albers and McVean (2018) proposed a non-parametric method to estimate the age of an allele, using probabilistic, coalescent-based models of mutation and recombination. Specifically, their method infers the time to the most recent common ancestor (TMRCA) between hundreds or thousands of chromosomal sequence (haplotype) pairs. This information is then combined using a composite likelihood approach to obtain an estimate of the time of mutation at a single locus. This methodology was applied to more than 16 million variants in the human genome, using data from the 1000 Genomes Project and the Simons Genome Diversity Project, to generate the atlas of variant age.

History
Population geneticists, Motoo Kimura and Tomoko Ohta, were the first to analyze the association between an allele’s frequency and its age in the 1970s. They showed that the age of a neutral allele can be estimated (assuming a large, randomly mating population) by

$$E(t_1)=(-2p)/(1-p) \ln(p)$$

Where $$ p $$ represents the allele frequency and $$ t_1 $$ is the expected age, measured in units of 2N generations.

More recent studies, however, have focused on the analysis of intra-allelic variation. In 1990, Jean-Louis Serre and his team were the first to assess allele age by analyzing intra-allelic variation. Using a sample of 240 French families, they surveyed two restriction fragment length polymorphisms (RFLP) sites (E1 and E2) that are closely linked to an allele (ΔF508) at the cystic fibrosis locus (CFTR). Recombination theory allows for the calculation of x(t), the expected frequency of E2 in association with the allele ΔF508 in generation t, and y, the frequency of E2 on chromosomes without the ΔF508 allele. The recombination rate, c, is assumed to be known, and so the allele age can be calculated as an estimate of t.

$$t={1\over\ln(1-c)}\ln{x(t)-y \over 1-y}$$

Although Serre et al. (1990) were the first to employ this method, it became increasingly popular after the Risch et al. study in 1995, which analyzed alleles in an Ashkenazi Jewish population.

Examples of allele age estimations
Many intra-allelic variation studies suggest that disease-causing alleles arose rather recently in human history.

Cystic fibrosis
The Serre et al. (1990) study estimated that an allele causing cystic fibrosis arose approximately 181.4 generations ago. Therefore, they estimated that the allele age to be between 3,000 and 6,000 years ago. However, other studies have obtained drastically different estimates. Morral et al. (1994) suggested a minimum age of 52,000 years ago. A reanalysis of the Morral et al. (1994) data by Slatkin and Rannala (2000) estimated an allele age of approximately 3,000 years, which is consistent with the Serre et al. (1990) results.

AIDS-resistance allele (CCR5)
A 32 base pair deletion at the CCR5 locus results in resistance to the HIV infection, which causes AIDS. Individuals who are homozygous for the mutation experience complete resistance to the infection, while heterozygotes only experience partial resistance to the infection, resulting in a delayed onset of AIDS. A study by Stephens et al. in 1998 suggested that this allele originated approximately 27.5 generations, or 688 years ago. These results were obtained using intra-allelic variation analysis. This same study also used the allele frequency and the Kimura-Ohta model to estimate allele age. This method provided very different results, suggesting that the allele appeared more than 100,000 years ago. Stephens et al. (1996) argue that the discrepancy between these age estimates strongly suggest recent positive selection for the CCR5 mutation. Because the CCR5 mutation also offers resistance to smallpox, these results are consistent with the idea that the CCR5 mutation first rose to higher frequency due to positive selection during smallpox outbreaks in European history before being positively selected for due to its role in HIV resistance.

Lactase persistence
Many adults are lactose intolerant because their bodies cease production of the enzyme lactase post childhood. However, mutations in the promoter region of the lactase gene (LCT) result in the continued production of lactase throughout adulthood in certain African populations, a condition known as lactase persistence. A study conducted by Sarah Tishkoff and her team shows that the mutation for lactase persistence has been under positive selection since its recent appearance approximately 3,000 to 7,000 years ago. These dates are consistent with the rise of cattle domestication and pastoralist lifestyles in these regions, making the lactase persistence mutation a strong example of gene-culture co-evolution.