Soft selective sweep

In genetics, when multiple copies of a beneficial mutation become established and fix together it is called soft sweep. Depending on the origin of these copies, linked variants might then be retained and emerge as haplotype structures in the population. There are two major forms of soft sweeps:


 * 1) A beneficial mutation previously separated in the population neutrally and therefore existed as multiple haplotypes at the time of the selective shift in which the mutation became beneficial. In this way, a single beneficial mutation may carry multiple haplotypes to an intermediate frequency, while itself becomes fixed.
 * 2) Another model happening when multiple beneficial mutations independently occur in short succession of one another — consequently, a second copy occur through mutation before the selective fixation of the first copy.

Soft sweeps can occur from both standing variation and rapidly repeating beneficial mutations.

Overview
A selective sweep occurs when, due to strong positive natural selection, beneficial alleles quickly go to fixation in a population and results in the reduction or elimination of variation among the nucleotides near that allele. A selective sweep can occur when a rare or a formerly absent allele that improves the fitness of the carrier relative to other members of the population increases in frequency quickly due to natural selection. As the frequency of such a beneficial allele increases, genetic variants that happen to be present in the DNA neighborhood of the beneficial allele will also become more prevalent; this phenomenon called genetic hitchhiking. A Selective sweep arise if rapid changes within the frequency of a beneficial allele, driven by positive selection, distort the genealogical history of samples from the region around the selected locus. It is now recognized that not all sweeps reduce genetic variation in the same way, but rather selective sweeps can be categorized into three main categories:


 * 1) The classic selective sweep or hard sweep is expected to occur when beneficial mutations are rare but when a beneficial mutation that has occurred increases in frequency rapidly, drastically reducing genetic variation in the population.
 * 2) Soft sweeps from standing genetic variation (SGV) occurs when previously neutral mutations that were present in a population become beneficial because of an environmental change. Such a mutation may be present on several genomic backgrounds so that when it rapidly increases in frequency it does not erase all genetic variation in the population.
 * 3) A multiple origin soft sweep happens when mutations are common, for example in a large population, so that the same or similar beneficial mutations occur on a different genomic background such that no single genomic background can hitchhike the high frequency.

Whether the selective sweep has occurred can be explored in various ways. One method is to measure linkage disequilibrium, that is whether a given haplotype is overrepresented in the population. Under neutral evolution, genetic recombination will result in the reshuffling of the different alleles within the haplotypes, and no single haplotype will dominate the population. However, during a selective sweep, selection for a positively selected gene variant will also result in hitchhiking of neighboring alleles and less opportunity for recombination. Therefore, the presence of strong linkage disequilibrium might indicate that there has been a selective sweep and can be used to identify sites recently under selection. There have been many scans for selective sweeps in humans and other species using a variety of statistical approaches and assumptions.

Differences between soft and hard sweeps
The main difference between soft and hard selective sweeps lies in the expected number of different haplotypes carrying the beneficial mutation or mutations, and therefore in the expected number of haplotypes that hitchhike to considerable frequency during the selective sweep, and which remain in the population at the time of fixation. This key difference results in different expectations in both the site frequency spectrum and in linkage disequilibrium, and consequently in the frequent test statistics based on these forms. If hard sweeps facilitate evolutionary rescue, then just a single ancestor is responsible for the spread of the advantageous variants and so genetic diversity will be removed from the population as a consequence of adaptation as well as demographic decline. On the other hand, a soft sweep, in which the beneficial allele is independently derived in multiple ancestors, will keep certain ancestral diversity that existed prior to the environmental shift that initiated the fitness changes.

Detecting soft sweeps
Is there any way to separate soft and hard sweeps? Obviously, only recent adaptive events leave a measurable signal at all (hard or soft). Signals from the site frequency spectrum (like the excess of rare alleles that is picked up by Tajima 1989 ) usually fade on time scales of ~ 0.1Ne generations, while signals based on linkage disequilibrium or haplotype statistics only last ~ 0.01Ne generations. To find it easily, selection must be strong (4NeSb≫100). Even then, soft sweeps can be difficult to discriminate from neutrality if they are ‘super soft’, i.e., if there are numerous independent origins of the beneficial allele, or if its starting frequency in the SGV is high. For a strong interpretation of selection versus neutrality, we need a test statistic with reliably high power for hard and soft sweeps. Based on above-described patterns, and as exhibited, tests based on the site frequency spectrum (looking for low- or high-frequency derived alleles) have low power to reveal soft sweeps, whereas haplotype tests can detect both types of sweeps. In contrast to single-origin soft sweeps (which always leave a weaker footprint), the capability to detect multiple-origin soft sweeps can be higher than the capability to detect completed hard sweeps due to the clear haplotype structure right at the selected site. Detecting soft sweeps with a single origin is difficult. Some studies and tests based on a combination of summary statistics have been developed by Peter, Huerta-Sanchez & Nielsen (2012) and by Schrider & Kern (2016). Both tests have reliable power to find soft sweeps for robust selection and a high starting frequency (5–20%) of the selected allele. In addition, well-defined practical instances typically rely on other indications, go with footprint: e.g., a source population is recognized with the selected allele in the SGV (e.g., marine and freshwater sticklebacks, or identified and very recent selection pressure does not leave enough time for the allele to increase from a single copy to the frequency observed today (for example CCR5 adaptation to HIV in humans). On the whole, soft sweeps with multiple origins have better chances to be detected.