User:Arbor/Sandbox

Lewontin’s argument
Lewontin based his argument on blood type samples, but it frequently appears in terms of genes. Edward's gives an example using a simple model where one of two “genes” can appear at a specified locus. In the example in Figure 1, the two populations (“left” and “right”, corresponding to stipulated races) contain either the “red gene” or the “blue gene”. The red gene appears somewhat more often in the left population (70% of the left individuals contain the red gene), and not as often in the right population (30% of the right individuals contain the red gene). Combining both populations into one, we see that the total population is split between red and blue genes (50% of the individuals have the red gene).

Lewontin’s argument basically says that this genetic locus is a poor indicator of which population an individual belongs to: If we use the rule that “the blue gene implies the left race” then we will misclassify 6 of the 20 individuals in the example—the 3 red-gene individuals in the left race and the 3 blue-green individuals in the right race. In general we have a chance of misclassification of 30%, too high to be of taxonomic value.

Another way to express this is by considering variability instead of the probability of misclassification. In the example, the gene varies strongly in the whole population. (Indeed, the population is exactly split into red and green individuals.) But it also varies a lot among individuals within the same population. Contrast this with a hypothetical situation like Figure 2, where race is determined by the gene: two randomly picked individuals from the same race always have the same gene.

Statisticians express this in terms of variance, a number between 0 and 1. The variance of the red gene in the total population is 1-1/2=0.25. The variance of the red gene in the left race is only slightly lower, 0.21. (The numbers are picked so that the variance is the same, 0.21, in the right race, for simplicity.) Expressing the within-population variance as a fraction of the total variance, we arrive at 0.21/0.25=84%. In other words, the model is an example of two populations where the within-population variance “accounts for” 84% of the total variability, just as in Lewontin’s study.

The fallacy
The underlying assumption in Lewontin’s analysis is that racial classifications are based on a single locus. However, the traditional classification into races is based on the correlation between several features. (Discussion from Dawkins.) Edwards:
 * The statistical problem has been understood at least since the discussions surrounding Pearson’s ‘coefficient of racial likeness’ (8) in the 1920s. It is mentioned in all editions of Fisher’s Statistical Methods for Research Workers (1) from 1925 (quoted above). A useful review is that by Gower (9) in a 1972 conference volume The Assessment of Population Affinities in Man. As he pointed out, ‘‘...the human mind distinguishes between different groups because there are correlated characters within the postulated groups.’’