Stepwise mutation model

The stepwise mutation model (SMM) is a mathematical theory, developed by Motoo Kimura and Tomoko Ohta, that allows for investigation of the equilibrium distribution of allelic frequencies in a finite population where neutral alleles are produced in step-wise fashion.

Description
The original model assumes that if an allele has a mutation that causes it to change in state, mutations that occur in repetitive regions of the genome will increase or decrease by a single repeat unit at a fixed rate (i.e. by the addition or subtraction of one repeat unit per generation) and these changes in allele states are expressed by an integer (. . . A-1, A, A1, .. .). The model also assumes random mating and that all alleles are selectively equivalent for each locus. The SMM is distinguished from the Kimura-Crow model, also known as the infinite alleles model (IAM), in that as the population size increases to infinity, while the product of the Ne (effective population size) and the mutation rate is fixed, the mean number of different alleles in the population rapidly reaches a peak and plateaus, at which time that value is almost the same as the effective number of alleles.

Differences in the length of "simple sequence repeats" (SSRs) between individuals can thus be used to construct phylogenies (i.e. determine relatedness of individuals) or determine genetic distance between groups of individuals. For example, more genetically distant individuals would show larger differences in the size of SSRs than more closely related individuals. Given the underlying assumptions of the SMM, it has been widely adopted for use with microsatellite markers that contain repeat regions, are co-dominate, and have high rates of mutation.

A number of summary statistics can be used to estimate genetic differentiation using the SMM model. These include number of alleles, observed and expected heterozygosity, and allele frequencies. The SMM model takes into account the frequency of mismatches between microsatellite loci, meaning the number of times there are no mismatches, single mismatches, 2 mismatches, etc. Variance in allele sizes are used to make inferences about the genetic distance between individuals or populations. By comparing summary statistics at different levels of organization it is possible to make inferences about population histories. For example, we can examine the variance of allele size within a subpopulation as well as within the total population to infer something about population history.

Construction of phylogenies under the SMM is, however, complicated by the fact that it is possible to either gain or lose a repeat unit, thus alleles that are identical in size are not necessarily identical by descent (i.e. they show marker-size homoplasy). Therefore, the SMM cannot be used to determine the exact number of mutational events between two individuals. For example, individual A might have gained a single additional repeat (from an ancestor who had 9) whereas individual B might have lost a single repeat (from an ancestor who had 11), resulting in both individuals with identical number of microsatellite repeats (that is, 10 repeats for a particular locus).

Limitations
Some important caveats and limitations to consider when choosing molecular markers for estimating the relatedness of individuals or distinguishing between populations include the following:
 * 1) There are limitations associated with various marker types and the number of markers used can heavily influence analytical results (with a higher number of markers generally showing greater ability to resolve genetic differences).
 * 2) Molecular markers provide only a “sample” of the genetic information in which to compare individuals of populations, and can differ from actual genetic differentiation. For example, it is possible that two individual are identical at a given locus, having the same mutation even from its common ancestor, but could differ at other loci that were not observed (or sequenced).
 * 3) Null alleles are not detectable by plain SMM and will produce very incorrect results.

Extensions
The original SMM has been modified in multiple ways to deal with these short comings, including:
 * 1) taking into account the upper size limit to most microsatellites
 * 2) factoring in the likelihood of large alleles to show higher rates of mutation than small alleles
 * 3) and including variations that suggest that mutations are split between point mutations that disrupt stretches of repeats and the additions or removal of repeat units. This last assumption provides an explanation for why microsatellites do not evolve into enormous arrays of infinite size.
 * 4) Piry et al. 1999 introduces Bottleneck
 * 5) Van Oosterhout et al. 2004 introduces micro-checker which has rapidly become widely used for correcting some common SMM errors: null alleles, preferential allele dropout of large alleles, incorrect guessing of stutter peaks, and typographical errors.