HKA test

The HKA Test, named after Richard R. Hudson, Martin Kreitman, and Montserrat Aguadé, is a statistical test used in genetics to evaluate the predictions of the Neutral Theory of molecular evolution. By comparing the polymorphism within each species and the divergence observed between two species at two or more loci, the test can determine whether the observed difference is likely due to neutral evolution or rather due to adaptive evolution. Developed in 1987, the HKA test is a precursor to the McDonald-Kreitman test, which was derived in 1991. The HKA test is best used to look for balancing selection, recent selective sweeps or other variation-reducing forces.

Neutral Evolution
Neutral Evolution Theory, first proposed by Kimura in a 1968 paper, and later fully defined and published in 1983, is the basis for many statistical tests that detect selection at the molecular level. Kimura noted that there was much too high of a rate of mutation within the genome (i.e. high polymorphism) to be strictly under directional evolution. Furthermore, functionally less important regions of the genome evolve at a faster rate. Kimura then postulated that most of the modifications to the genome are neutral or nearly neutral, and evolve by random genetic drift. Therefore, under the neutral model, polymorphism within a species and divergence between related species at homologous sites will be highly correlated. The Neutral Evolution theory has become the null model against which tests for selection are based, and divergence from this model can be explained by directional or selective evolution.

Formulae
The rate of mutation within a population can be estimated using the Watterson estimator formula: θ=4Νeμ, where Νe is the effective population size and μ is the mutation rate (substitutions per site per unit of time). Hudson et al. proposed applying these variables to a chi-squared, goodness-of-fit test.

The test statistic proposed by Hudson et al., Χ2, is:
 * $$X^2=\frac{\sum_{i=1}^L(S_i^A-\widehat{E}(S_i^A))^2}{\widehat{V}ar(S_i^A)}$$

This states that, for each locus (L) (for which there must be at least two) the sum of the difference in number of observed polymorphic sites in sample A minus the estimate of expected polymorphism squared, all of which is divided by the variance. Similarly, this formula is then applied to Sample B (from another species) and then can be applied to the divergence between two sample species. The sum of these three variables is the test statistic (X2). If the polymorphism within species A, and B, and the divergence between them are all independent, then the test statistic should fall approximately onto a chi-squared distribution.

For a simple explanation, let D1 = divergence between species, or the number of fixed differences in locus one. Similarly D2 = divergence in locus two. Let P1 and P2 = the number of polymorphic sites in loci one and two, respectively (a measure of polymorphism within species). If there is no directional evolution, then D1/D2 = P1/P2.

Example
For these examples, the distance between two species’ loci is determined by measuring the number of substitutions per site when comparing the two species. We can then calculate the rate of mutation (changes to the DNA sequence pre unit of time) if we know the time since the two species diverged from the common ancestor.

A test that suggests neutral evolution: Suppose that you have data from two loci (1 and 2) in two species (A and B). Locus 1 shows high divergence and high polymorphism in both species. Locus 2 shows low divergence and low polymorphism. This can be explained by a neutral difference in the rate of mutations in each loci.



A test that suggests selection: Again suppose you have data as in the last example, only this time locus 2 has equal divergence to locus 1 and yet lower polymorphism in species B. In this case the rate of mutation in each locus is equal, so this can only be explained by a reduction in the effective population size Ne of species B, which is inferred as an act of selection.