Talk:Tajima's D

I wrote an introduction
I wrote an introduction, let me know if provides sufficient context Jlrflores 22:25, 31 May 2007 (UTC)

In the scientific explanation section, I think it would be more clear if you could define the variables in the equations. Thanks Fnunes 18:06, 29 October 2007 (UTC)

I changed a lot about the significance and removed discussion of quantities that don't exist in this test, eg confidence intervals. —Preceding unsigned comment added by 140.247.90.35 (talk) 22:27, 28 October 2009 (UTC)

Violation source
The seminal paper by Tajima in 1989 (Genetics 123: 585-595) states in the ABSTRACT "... namely the genetic variation within population at the DNA level." This means sampled populations should be tested for the existence of structure. When structure is significant, the test should be independently applied to each subpopulation.

The Wikipedia article suggests structure is not a common or important problem, but it should stress the need to test for structure before applying the test.--Tribu13 (talk) 17:51, 16 January 2008 (UTC)

In many ways this is a test for structure, or for that matter any deviation from the neutral theory, what do you think it is testing for? Evolvedmicrobe (talk) 03:41, 19 October 2011 (UTC)

Had to make a lot more changes, std. deviation discussion, finding significance is way way off. This article trivializes non-trivial details, and is downright incorrect on how significance is assessed. I updated this. Evolvedmicrobe (talk) 03:40, 19 October 2011 (UTC)

very well written explanation
this is the way many of the articles on statistics and population genetics should be written. makes so much sense than just formulas! thanks! Veryhuman (talk) 00:02, 12 February 2008 (UTC)

It was factually inaccurate though, updated to correct things, possibly at the loss of readibility. needs another run through, and the example of history needs to be connected with the article. —Preceding unsigned comment added by 140.247.90.35 (talk) 23:00, 28 October 2009 (UTC)

This article is not well readable. While examples are nice and necessary the articles lacks encyclopedic style and is not concise. Example should be shifted to the end of an article. Sboehringer (talk) 09:49, 4 March 2008 (UTC)

Mathematical details
I agree that the style should be improved to make it more encyclopedic, but this problem seems to be endemic to articles on population genetical topics. However, I have a minor quibble about the mathematical details section. As far as I remember from reading Tajima's 1989 article,

e_1S+e_2S(S-1) $$ is not a precise expression for V(d), but rather an estimate, which is used because the precise expression,

c_1M+c_2M^2 $$ requires that M = 4Nu is known. Since this is not really feasible with data from real populations (as opposed to simulations), the estimate is used. So shouldn't one use an approximately equals sign? Or at least mention that it is an estimate. It might not be a bad idea to include the precise expression. Dontnod (talk) 21:53, 2 August 2010 (UTC)

To do
This article could be useful to many, but I think it needs some fixin. Here is a list of things I noticed:
 * add inline citations
 * fix "scientific explanation" section which gives the wrong impression that D = pi - S
 * fix the introduction which gives the same wrong impression.
 * make the introduction more concise

I think the person who wrote this was confused about what S is. S is the number of segregating sites, which is 4 in the toy example. Tajima's D is based on the expectation that S = theta * x where x is the sum of 1/i for i from 1 to N. Thus, we turn this into a method to estimate theta by noting that theta = E(S)/x. The current version suggests that S/x part is a "normalized" version of segregating sites, and this leads to a mistake in the calculation of D in the example. You should take the 4 and divided it by x, which is the sum of 1/i where i goes from 1 to 10. —Preceding unsigned comment added by Dabs (talk • contribs) 18:56, 5 October 2010 (UTC)

As the above commenter points out, the toy example in the article is incorrect: S needs to be divided by x. For consistency with the rest of the article, however, I think that the variable should be called a1 not x. Also, the sum should not be from 1 to N-1, not from 1 to N. I also don't think that N in this case is 10. While every individual sampled should have two chromosomes, in the example it appears as though only one sequence was obtained from each person (and thus N=5). It would be a bit more realistic if two sequences were obtained from each person (or perhaps the example should be described in terms of number of sequences sampled, not individuals). —Preceding unsigned comment added by 205.208.52.129 (talk) 22:50, 16 December 2010 (UTC)