Minimum chi-square estimation

In statistics, minimum chi-square estimation is a method of estimation of unobserved quantities based on observed data. 

In certain chi-square tests, one rejects a null hypothesis about a population distribution if a specified test statistic is too large, when that statistic would have approximately a chi-square distribution if the null hypothesis is true. In minimum chi-square estimation, one finds the values of parameters that make that test statistic as small as possible.

Among the consequences of its use is that the test statistic actually does have approximately a chi-square distribution when the sample size is large. Generally, one reduces by 1 the number of degrees of freedom for each parameter estimated by this method.

Illustration via an example
Suppose a certain random variable takes values in the set of non-negative integers 1, 2, 3,. . . . A simple random sample of size 20 is taken, yielding the following data set. It is desired to test the null hypothesis that the population from which this sample was taken follows a Poisson distribution.



\begin{array}{cc} \text{value} & \text{frequency} \\ \hline 0 & 1 \\ 1 & 2 \\ 2 & 4 \\ 3 & 5 \\ 4 & 3 \\ 5 & 3 \\ 6 & 1 \\ 7 & 0 \\ 8 & 1 \\ >8 & 0 \end{array} $$

The maximum likelihood estimate of the population average is 3.3. One could apply Pearson's chi-square test of whether the population distribution is a Poisson distribution with expected value 3.3. However, the null hypothesis did not specify that it was that particular Poisson distribution, but only that it is some Poisson distribution, and the number 3.3 came from the data, not from the null hypothesis. A rule of thumb says that when a parameter is estimated, one reduces the number of degrees of freedom by 1, in this case from 9 (since there are 10 cells) to 8. One might hope that the resulting test statistic would have approximately a chi-square distribution when the null hypothesis is true. However, that is not in general the case when maximum-likelihood estimation is used. It is however true asymptotically when minimum chi-square estimation is used.

Finding the minimum chi-square estimate
The minimum chi-square estimate of the population mean λ is the number that minimizes the chi-square statistic



\sum \frac{(\text{observed}-\text{expected})^2}{\text{expected}} = \sum_{k=0}^8 \frac{\left((\text{count in cell }k)- 20\left(\frac{\lambda^ke^{-\lambda}}{k!}\right)\right)^2}{20\left(\frac{\lambda^ke^{-\lambda}}{k!}\right)} + \frac{(0-a)^2}{a} $$

where a is the estimated expected number in the "> 8" cell, and "20" appears because it is the sample size. The value of a is 20 times the probability that a Poisson-distributed random variable exceeds 8, and it is easily calculated as 1 minus the sum of the probabilities corresponding to 0 through 8. By trivial algebra, the last term reduces simply to a. Numerical computation shows that the value of λ that minimizes the chi-square statistic is about 3.5242. That is the minimum chi-square estimate of λ. For that value of λ, the chi-square statistic is about 3.062764. There are 10 cells. If the null hypothesis had specified a single distribution, rather than requiring λ to be estimated, then the null distribution of the test statistic would be a chi-square distribution with 10 − 1 = 9 degrees of freedom. Since λ had to be estimated, one additional degree of freedom is lost. The expected value of a chi-square random variable with 8 degrees of freedom is 8. Thus the observed value, 3.062764, is quite modest, and the null hypothesis is not rejected.