Talk:Exact test

Correctness?
I raise two doubts about this article:
 * Is it really appropriate to restrict "exact test" to nonparametric tests, or is this just some convention of those working in the field which it would be best not to impose on others?
 * Is it appropriate to phrase the test as summing over those outcomes with lowest probabilities, or should it be summing the probabilities of oucomes which are as more extreme than the observed?
 * Melcombe (talk) 10:34, 13 May 2008 (UTC)


 * Regarding (1) - it should not be restricted to nonparametric tests. The current article though no longer has this issue. Tal Galili (talk) 08:11, 31 July 2019 (UTC)

Can someone check this detail?
I've addes this paragraph:
 * A simple example of the occasion for this concept may be seen by observing that Pearson's chi-squared test is an approximate test. Suppose Pearon's chi-squared test is used to to ascertain whether a six-sided die is "fair", i.e. gives each of the six outcomes equally often.  If the die is thrown n times, then one "expects" to see each outcome n/6 times.  The test statistic is
 * $$ \sum \frac{(\text{observed}-\text{expected})^2}{\text{expected}}
 * $$ \sum \frac{(\text{observed}-\text{expected})^2}{\text{expected}}
 * $$ \sum \frac{(\text{observed}-\text{expected})^2}{\text{expected}}

= \sum_{k=1}^6 \frac{(X_k - n/6)^2}{n/6}, $$ Notice that at the end I say "the test statistic might not be a monotone function of the one above". If I were well-versed in this particular problem, I'd know whether it is or is not. If someone knows that, could they add the appropriate information in place of that last sentence? Michael Hardy (talk) 20:06, 16 May 2008 (UTC)
 * where Xk is the number of times outcome k is observed. If the null hypothesis of "fairness" is true, then the probability distribution of the test statistic can be made as close as desired to the chi-squared distribution with 5 degrees of freedom by making the sample size n big enough.  But if n is small, then the probabilities based on chi-squared distributions may not be very close approximations.  Finding the exact probability that this test statistic exceeds a certain value then requires combinatorial enumeration of all outcomes of the experiment that result in such a large value of the test statistic.  Moreover, it becomes questionable whether the same test statistic ought to be used.  A likelihood-ratio test might be preferred as being more powerful, and the test statistic might not be a monotone function of the one above.
 * where Xk is the number of times outcome k is observed. If the null hypothesis of "fairness" is true, then the probability distribution of the test statistic can be made as close as desired to the chi-squared distribution with 5 degrees of freedom by making the sample size n big enough.  But if n is small, then the probabilities based on chi-squared distributions may not be very close approximations.  Finding the exact probability that this test statistic exceeds a certain value then requires combinatorial enumeration of all outcomes of the experiment that result in such a large value of the test statistic.  Moreover, it becomes questionable whether the same test statistic ought to be used.  A likelihood-ratio test might be preferred as being more powerful, and the test statistic might not be a monotone function of the one above.


 * I hope I got this right. Consider two possible outcomes for sample size n = 7:
 * X = (0, 0, 1, 1, 1, 4) – once a 3, a 4 and a 5 are observed each, and twice a 6;
 * X = (0, 0, 0, 2, 2, 3) – twice a 4 and a 5 are observed each, and thrice a 6.
 * The first has
 * $$\chi^2 = \frac{2\times(0-\tfrac{7}{6})^2+3\times(1-\tfrac{7}{6})^2+1\times(4-\tfrac{7}{6})^2}{\tfrac{7}{6}} = \frac{65}{7},$$
 * and the second
 * $$\chi^2 = \frac{3\times(0-\tfrac{7}{6})^2+2\times(2-\tfrac{7}{6})^2+1\times(3-\tfrac{7}{6})^2}{\tfrac{7}{6}} = \frac{53}{7}.$$
 * So the first outcome is the one for which the null hypothesis of fairness is sooner rejected.
 * Under the null hypothesis of fairness, the first outcome has likelihood C×($1/undefined$)7, where C is the number of different ways of getting this X with 7 throws. In the parameter space, the outcome X is most likely if P(k) = Xk/n, giving for the supremum C×($1/undefined$)3($4/7$)4.So the likelihood ratio for the first is:
 * $$\Lambda(X) = \frac{\left(\tfrac{1}{6}\right)^7}{\left(\tfrac{1}{7}\right)^3\left(\tfrac{4}{7}\right)^4} = \frac{823543}{71663616}.$$
 * Likewise, we find for the second:
 * $$\Lambda(X) = \frac{\left(\tfrac{1}{6}\right)^7}{\left(\tfrac{2}{7}\right)^4\left(\tfrac{3}{7}\right)^3} = \frac{823543}{120932352}.$$
 * Using this test statistic the second outcome is the one for which the null hypothesis is sooner rejected.
 * For n up to 6 I haven't found any such "crossovers", and this is one of two I could construct for n = 7. --Lambiam 22:41, 20 May 2008 (UTC)

Thank you. Just the sort of thing that was needed and that I was too lazy to work out the details of. Now we should think about how to incorporate this information into the article. Michael Hardy (talk) 18:20, 30 May 2008 (UTC)

Incorrect Reference
It appears that there is an incorrect reference. Based on my checking of the Journal, the following reference does not exist (or is incorrectly referenced):
 * Mehta, C. R.& Patel, N. R. 1997. "Exact inference in categorical data". Biometrics, 53(1), 112-117.

It is possible the reference should be as follows:
 * Mehta, C.R. and N.R. Patel, 1998. Exact Inference for Categorical Data. In P. Armitage and T. Colton, eds., Encyclopedia of Biostatistics (Chichester: John Wiley), pp. 1411-1422.

Can anyone confirm? Mari370 (talk) 14:37, 3 November 2010 (UTC) —Preceding unsigned comment added by Mari370 (talk • contribs)


 * I have updated the citations. Interestingly, the first form does appear in some publications, but other cites are for the preprint now replacing what was there. There seems to have been an updated versaion of the Encyclopedia of Biostatistics that has additional authors listed, so presumably the contents are different: presently all 3 are listed in the article. Melcombe (talk) 21:09, 14 April 2012 (UTC)

Error Rate of the Test
I believe the following sentence (from the opening paragraph) requires editing:


 * "This will result in a significance test that will have a false rejection rate always equal to the significance level of the test."

1) It's not clear what the writer meant by "false rejection rate"? I presume s/he was referring to Type I error rate. In this case, the appropriate change should be made, with a wiki-link.

2) However, I found a reference, cited below, which seems to claim that the type I error rate does not conceptually apply to the permutation test. I will research this further. Any comments appreciated.


 * Conceptual Distinction between the Critical p Value and the Type I Error Rate in Permutation Testing

Brad (talk) 22:45, 6 March 2016 (UTC)


 * I agree. I've made changes as you mention in item (1). I didn't go into item (2) though. Tal Galili (talk) 08:01, 31 July 2019 (UTC)