Adaptive evolution in the human genome

Adaptive evolution results from the propagation of advantageous mutations through positive selection. This is the modern synthesis of the process which Darwin and Wallace originally identified as the mechanism of evolution. However, in the last half century, there has been considerable debate as to whether evolutionary changes at the molecular level are largely driven by natural selection or random genetic drift. Unsurprisingly, the forces which drive evolutionary changes in our own species’ lineage have been of particular interest. Quantifying adaptive evolution in the human genome gives insights into our own evolutionary history and helps to resolve this neutralist-selectionist debate. Identifying specific regions of the human genome that show evidence of adaptive evolution helps us find functionally significant genes, including genes important for human health, such as those associated with diseases.

Methods
The methods used to identify adaptive evolution are generally devised to test the null hypothesis of neutral evolution, which, if rejected, provides evidence of adaptive evolution. These tests can be broadly divided into two categories.

Firstly, there are methods that use a comparative approach to search for evidence of function-altering mutations. The dN/dS rates-ratio test estimates ω, the rates at which nonsynonymous ('dN') and synonymous ('dS') nucleotide substitutions occur ('synonymous' nucleotide substitutions do not lead to a change in the coding amino acid, while 'nonsynonymous' ones do). In this model, neutral evolution is considered the null hypothesis, in which dN and dS approximately balance so that ω ≈ 1. The two alternative hypotheses are a relative absence of nonsynonymous substitutions (dN < dS; ω < 1), suggesting the effect on fitness ('fitness effect', or 'selection pressure') of such mutations is negative (purifying selection has operated over time); or a relative excess of nonsynonymous substitutions (dN > dS; ω > 1), indicating positive effect on fitness, i.e. diversifying selection (Yang and Bielawski 2000).

The McDonald-Kreitman (MK) test quantifies the amount of adaptive evolution occurring by estimating the proportion of nonsynonymous substitutions which are adaptive, referred to as α (McDonald and Kreitman 1991, Eyre-Walker 2006). α is calculated as: α = 1-(dspn/dnps), where dn and ds are as above, and pn and ps are the number of nonsynonymous (fitness effect assumed neutral or deleterious) and synonymous (fitness effect assumed neutral) polymorphisms respectively (Eyre-Walker 2006).

Note, both these tests are presented here in basic forms, and these tests are normally modified considerably to account for other factors, such as the effect of slightly deleterious mutations.

The other methods for detecting adaptive evolution use genome wide approaches, often to look for evidence of selective sweeps. Evidence of complete selective sweeps is shown by a decrease in genetic diversity, and can be inferred from comparing the patterns of the Site Frequency Spectrum (SFS, i.e. the allele frequency distribution) obtained with the SFS expected under a neutral model (Willamson et al. 2007). Partial selective sweeps provide evidence of the most recent adaptive evolution, and the methods identify adaptive evolution by searching for regions with a high proportion of derived alleles (Sabeti et al. 2006).

Examining patterns of Linkage Disequilibrium (LD) can locate signatures of adaptive evolution (Hawks et al. 2007, Voight et al. 2006). LD tests work on the basic principle that, assuming equal recombination rates, LD will rise with increasing natural selection. These genomic methods can also be applied to search for adaptive evolution in non-coding DNA, where putatively neutral sites are hard to identify (Ponting and Lunter 2006).

Another recent method used to detect selection in non-coding sequences examines insertions and deletions (indels), rather than point mutations (Lunter et al. 2006), although the method has only been applied to examine patterns of negative selection.

Coding DNA
Many different studies have attempted to quantify the amount of adaptive evolution in the human genome, the vast majority using the comparative approaches outlined above. Although there are discrepancies between studies, generally there is relatively little evidence of adaptive evolution in protein coding DNA, with estimates of adaptive evolution often near 0% (see Table 1). The most obvious exception to this is the 35% estimate of α (Fay et al. 2001). This comparatively early study used relatively few loci (fewer than 200) for their estimate, and the polymorphism and divergence data used was obtained from different genes, both of which may have led to an overestimate of α. The next highest estimate is the 20% value of α (Zhang and Li 2005). However, the MK test used in this study was sufficiently weak that the authors state that this value of α is not statistically significantly different from 0%. Nielsen et al. (2005a)’s estimate that 9.8% of genes have undergone adaptive evolution also has a large margin of error associated with it, and their estimate shrinks dramatically to 0.4% when they stipulate that the degree of certainty that there has been adaptive evolution must be 95% or more.

This raises an important issue, which is that many of these tests for adaptive evolution are very weak. Therefore, the fact that many estimates are at (or very near to) 0% does not rule out the occurrence of any adaptive evolution in the human genome, but simply shows that positive selection is not frequent enough to be detected by the tests. In fact, the most recent study mentioned states that confounding variables, such as demographic changes, mean that the true value of α may be as high as 40% (Eyre-Walker and Keightley 2009). Another recent study, which uses a relatively robust methodology, estimates α at 10-20% Boyko et al. (2008). Clearly, the debate over the amount of adaptive evolution occurring in human coding DNA is not yet resolved.

Even if low estimates of α are accurate, a small proportion of substitutions evolving adaptively can still equate to a considerable amount of coding DNA. Many authors, whose studies have small estimates of the amount of adaptive evolution in coding DNA, nevertheless accept that there has been some adaptive evolution in this DNA, because these studies identify specific regions within the human genome which have been evolving adaptively (e.g. Bakewell et al. (2007)). More genes underwent positive selection in chimpanzee evolution than in human.

The generally low estimates of adaptive evolution in human coding DNA can be contrasted with other species. Bakewell et al. (2007) found more evidence of adaptive evolution in chimpanzees than humans, with 1.7% of chimpanzee genes showing evidence of adaptive evolution (compared with the 1.1% estimate for humans; see Table 1). Comparing humans with more distantly related animals, an early estimate for α in Drosophila species was 45% (Smith and Eyre-Walker 2002), and later estimates largely agree with this (Eyre-Walker 2006). Bacteria and viruses generally show even more evidence of adaptive evolution; research shows values of α in a range of 50-85%, depending on the species examined (Eyre-Walker 2006). Generally, there does appear to be a positive correlation between (effective) population size of the species, and amount of adaptive evolution occurring in the coding DNA regions. This may be because random genetic drift becomes less powerful at altering allele frequencies, compared to natural selection, as population size increases.

Non-coding DNA
Estimates of the amount of adaptive evolution in non-coding DNA are generally very low, although fewer studies have been done on non-coding DNA. As with the coding DNA however, the methods currently used are relatively weak. Ponting and Lunter (2006) speculate that underestimates may be even more severe in non-coding DNA, because non-coding DNA may undergo periods of functionality (and adaptive evolution), followed by periods of neutrality. If this is true, current methods for detecting adaptive evolution are inadequate to account for such patterns. Additionally, even if low estimates of the amount of adaptive evolution are correct, this can still equate to a large amount of adaptively evolving non-coding DNA, since non-coding DNA makes up approximately 98% of the DNA in the human genome. For example, Ponting and Lunter (2006) detect a modest 0.03% of non-coding DNA showing evidence of adaptive evolution, but this still equates to approximately 1 Mb of adaptively evolving DNA. Where there is evidence of adaptive evolution (which implies functionality) in non-coding DNA, these regions are generally thought to be involved in the regulation of protein coding sequences.

As with humans, fewer studies have searched for adaptive evolution in non-coding regions of other organisms. However, where research has been done on Drosophila, there appears to be large amounts of adaptively evolving non-coding DNA. Andolfatto (2005) estimated that adaptive evolution has occurred in 60% of untranslated mature portions of mRNAs, and in 20% of intronic and intergenic regions. If this is true, this would imply that much non-coding DNA could be of more functional importance than coding DNA, dramatically altering the consensus view. However, this would still leave unanswered what function all this non-coding DNA performs, as the regulatory activity observed thus far is in just a tiny proportion of the total amount of non-coding DNA. Ultimately, significantly more evidence needs to be gathered to substantiate this viewpoint.

Variation between human populations
Several recent studies have compared the amounts of adaptive evolution occurring between different populations within the human species. Williamson et al. (2007) found more evidence of adaptive evolution in European and Asian populations than African American populations. Assuming African Americans are representative of Africans, these results makes sense intuitively, because humans spread out of Africa approximately 50,000 years ago (according to the consensus Out-of-Africa hypothesis of human origins (Klein 2009)), and these humans would have adapted to the new environments they encountered. By contrast, African populations remained in a similar environment for the following tens of thousands of years, and were therefore probably nearer their adaptive peak for the environment. However, Voight et al. (2006) found evidence of more adaptive evolution in Africans, than in Non-Africans (East Asian and European populations examined), and Boyko et al. (2008) found no significant difference in the amount of adaptive evolution occurring between different human populations. Therefore, the evidence obtained so far is inconclusive as to what extent different human populations have undergone different amounts of adaptive evolution.

Rate of adaptive evolution
The rate of adaptive evolution in the human genome has often been assumed to be constant over time. For example, the 35% estimate for α calculated by Fay et al. (2001) led them to conclude that there was one adaptive substitution in the human lineage every 200 years since human divergence from old-world monkeys. However, even if the original value of α is accurate for a particular time period, this extrapolation is still invalid. This is because there has been a large acceleration in the amount of positive selection in the human lineage over the last 40,000 years, in terms of the number of genes that have undergone adaptive evolution (Hawks et al. 2007). This agrees with simple theoretical predictions, because the human population size has expanded dramatically in the last 40,000 years, and with more people, there should be more adaptive substitutions. Hawks et al. (2007) argue that demographic changes (particularly population expansion) may greatly facilitate adaptive evolution, an argument that somewhat corroborates the positive correlation inferred between population size and amount of adaptive evolution occurring mentioned previously.

It has been suggested that cultural evolution may have replaced genetic evolution, and hence slowed the rate of adaptive evolution over the past 10,000 years. However, it is possible that cultural evolution could actually increase genetic adaptation. Cultural evolution has vastly increased communication and contact between different populations, and this provides much greater opportunities for genetic admixture between the different populations (Hawks et al. 2007). However, recent cultural phenomena, such as modern medicine and the smaller variation in modern family sizes, may reduce genetic adaptation as natural selection is relaxed, overriding the increased potential for adaptation due to greater genetic admixture.

Strength of positive selection
Studies generally do not attempt to quantify the average strength of selection propagating advantageous mutations in the human genome. Many models make assumptions about how strong selection is, and some of the discrepancies between the estimates of the amounts of adaptive evolution occurring have been attributed to the use of such differing assumptions (Eyre-Walker 2006). The way to accurately estimate the average strength of positive selection acting on the human genome is by inferring the distribution of fitness effects (DFE) of new advantageous mutations in the human genome, but this DFE is difficult to infer because new advantageous mutations are very rare (Boyko et al. 2008). The DFE may be exponential shaped in an adapted population (Eyre-Walker and Keightley 2007). However, more research is required to produce more accurate estimates of the average strength of positive selection in humans, which will in turn improve the estimates of the amount of adaptive evolution occurring in the human genome (Boyko et al. 2008).

Regions of the genome which show evidence of adaptive evolution
A considerable number of studies have used genomic methods to identify specific human genes that show evidence of adaptive evolution. Table 2 gives selected examples of such genes for each gene type discussed, but provides nowhere near an exhaustive list of the human genes showing evidence of adaptive evolution. Below are listed some of the types of gene which show strong evidence of adaptive evolution in the human genome. Bakewell et al. (2007) found that a relatively large proportion (9.7%) of positively selected genes were associated with diseases. This may be because diseases can be adaptive in some contexts. For example, schizophrenia has been linked with increased creativity (Crespi et al. 2007), perhaps a useful trait for obtaining food or attracting mates in Palaeolithic times. Alternatively, the adaptive mutations may be the ones which reduce the chance of disease arising due to other mutations. However, this second explanation seems unlikely, because the mutation rate in the human genome is fairly low, so selection would be relatively weak. 417 genes involved in the immune system showed strong evidence of adaptive evolution in the study of Nielsen et al. (2005a). This is probably because the immune genes may become involved in an evolutionary arms race with bacteria and viruses (Daugherty and Malik 2012; Van der Lee et al. 2017). These pathogens evolve very rapidly, so selection pressures change quickly, giving more opportunity for adaptive evolution. 247 genes in the testes showed evidence of adaptive evolution in the study of Nielsen et al. (2005a). This could be partially due to sexual antagonism. Male-female competition could facilitate an arms race of adaptive evolution. However, in this situation you would expect to find evidence of adaptive evolution in the female sexual organs also, but there is less evidence of this. Sperm competition is another possible explanation. Sperm competition is strong, and sperm can improve their chances of fertilising the female egg in a variety of ways, including increasing their speed, stamina or response to chemoattractants (Swanson and Vacquier 2002). Genes involved in detecting smell show strong evidence of adaptive evolution (Voight et al. 2006), probably due to the fact that the smells encountered by humans have changed recently in their evolutionary history (Williamson et al. 2007). Humans’ sense of smell has played an important role in determining the safety of food sources. Genes involved in lactose metabolism show particularly strong evidence of adaptive evolution amongst the genes involved in nutrition. A mutation linked to lactase persistence shows very strong evidence of adaptive evolution in European and American populations (Williamson et al. 2007), populations where pastoral farming for milk has been historically important. Pigmentation genes show particularly strong evidence of adaptive evolution in non-African populations (Williamson et al. 2007). This is likely to be because those humans that left Africa approximately 50,000 years ago, entered less sunny climates, and so were under new selection pressures to obtain enough Vitamin D from the weakened sunlight. There is some evidence of adaptive evolution in genes linked to brain development, but some of these genes are often associated with diseases, e.g. microcephaly (see Table 2). However, there is a particular interest in the search for adaptive evolution in brain genes, despite the ethical issues surrounding such research. If more adaptive evolution was discovered in brain genes in one human population than another, then this information could be interpreted as showing greater intelligence in the more adaptively evolved population. Other gene types showing considerable evidence of adaptive evolution (but generally less evidence than the types discussed) include: genes on the X chromosome, nervous system genes, genes involved in apoptosis, genes coding for skeletal traits, and possibly genes associated with speech (Nielsen et al. 2005a, Williamson et al. 2007, Voight et al. 2006, Krause et al. 2007).
 * Disease genes
 * Immune genes
 * Testes genes
 * Olfactory genes
 * Nutrition genes
 * Pigmentation genes
 * Brain genes?
 * Other

Difficulties in identifying positive selection
As noted previously, many of the tests used to detect adaptive evolution have very large degrees of uncertainty surrounding their estimates. While there are many different modifications applied to individual tests to overcome the associated problems, two types of confounding variables are particularly important in hindering the accurate detection of adaptive evolution: demographic changes and biased gene conversion.

Demographic changes are particularly problematic and may severely bias estimates of adaptive evolution. The human lineage has undergone both rapid population size contractions and expansions over its evolutionary history, and these events will change many of the signatures thought to be characteristic of adaptive evolution (Nielsen et al. 2007). Some genomic methods have been shown through simulations to be relatively robust to demographic changes (e.g. Willamson et al. 2007). However, no tests are completely robust to demographic changes, and new genetic phenomena linked to demographic changes have recently been discovered. This includes the concept of “surfing mutations”, where new mutations can be propagated with a population expansion (Klopfstein et al. 2006).

A phenomenon which could severely alter the way we look for signatures of adaptive evolution is biased gene conversion (BGC) (Galtier and Duret 2007). Meiotic recombination between homologous chromosomes that are heterozygous at a particular locus can produce a DNA mismatch. DNA repair mechanisms are biased towards repairing a mismatch to the CG base pair. This will lead allele frequencies to change, leaving a signature of non-neutral evolution (Galtier et al. 2001). The excess of AT to GC mutations in human genomic regions with high substitution rates (human accelerated regions, HARs) implies that BGC has occurred frequently in the human genome (Pollard et al. 2006, Galtier and Duret 2007). Initially, it was postulated that BGC could have been adaptive (Galtier et al. 2001), but more recent observations have made this seem unlikely. Firstly, some HARs show no substantial signs of selective sweeps around them. Secondly, HARs tend to be present in regions with high recombination rates (Pollard et al. 2006). In fact, BGC could lead to HARs containing a high frequency of deleterious mutations (Galtier and Duret 2007). However, it is unlikely that HARs are generally maladaptive, because DNA repair mechanisms themselves would be subject to strong selection if they propagated deleterious mutations. Either way, BGC should be further investigated, because it may force radical alteration of the methods which test for the presence of adaptive evolution.

Table 1: Estimates of the amount of adaptive evolution in the human genome
(format of table and some data displayed as in Table 1 of Eyre-Walker (2006))