User:Randy.l.goodrich/sandbox

Article Evaluation
The article obviously needs help, hence the exercise.


 * Read the original article published by Neyman on 1937 and try to expand upon what has been published.
 * What is Neyman Construction? formal definition. - Frequentist confidence intervals
 * Can we find other citations not published by Jerzy Neyman to support his claim. - Can not find many other publications to cite

Links
User:Randy.l.goodrich/citing sources

User:Reic2482/sandbox Kylie's Sandbox

Neyman Construction
'''Note to the reviewer: This obviously still needs a lot of work. The subject is turning out be more difficult than I originally thought. Please add input/ideas on how we can better. Thank you!'''

In 1937 Jerzy Neyman proposed a frequentist method to construct an interval at a confidence level $$ C, \,$$ such that if we repeat the experiment many times the interval will contain the true value of some parameter a fraction $$ C\,$$ of the time.

Theory
Assume $$ X_{1},X_{2},...X_{n}$$ are random variables with joint pdf $$f(x_{1},x_{2},...x_{n} | \theta_{1},\theta_{2},...,\theta_{k})$$, which depends on k unknown parameters. For convenience, let $$\Theta$$ be the sample space defined by the n random variables and subsequentially define a sample point in the sample space as $$X=(X_{1},X_{2},...X_{n})$$

Neyman originally proposed defining two functions $$L(x)$$ and $$U(x)$$ such that for any sample point,$$X$$, Given an observation, $$X^'$$, the probability that $$\theta_{1}$$ lies between $$L(X^')$$ and $$U(X^')$$ is defined as $$P(L(X^')\leq\theta_{1}\leq U(X^') | X^')$$ with probability of $$0$$ or $$1$$. These calculated probabilities fail to draw meaningful inference about $$\theta_{1}$$ since the probability is simply zero or unity. Furthermore, under the frequentist construct the model parameters are unknown constants and not permitted to be random variables. For example if $$\theta_{1}=5$$, then $$P(2 \leq 5\leq 10)=1$$. Likewise, if $$\theta_{1}=11$$, then $$P(2 \leq 1 \leq 10)=0$$
 * $$L(X)\leq U(X)$$ $$\forall X\in\Theta$$
 * L and U are single valued and defined.

As Neyman describes in his 1937 paper, suppose that we consider all points in the sample space, that is, $$\forall X\in\Theta$$, which are a system of random variables defined by the joint pdf described above. Since $$L$$ and $$U$$ are functions of $$X$$ they too are random variables and one can examine the meaning of the following probability statement:


 * Under the frequentist construct the model parameters are unknown constants and not permitted to be random variables. Considering all the sample points in the sample space as random variables defined the joint pdf above, that is all $$X\in\Theta$$ it can be shown that $$L$$ and $$U$$ are functions of random variables and hence random variables. Therefore one can look at the probability of $$L(X)$$ and $$U(X)$$ for some $$X\in\Theta$$. If $$\theta_{1}^'$$ is the true value of $$\theta_{1}$$, we can define $$L$$ and $$U$$ such that the probability $$L(X) \leq\theta_{1}^'$$ and $$\theta_{1}^'\leq U(X)$$ is equal to pre-specified confidence level$$, C$$.

That is,$$P(L(X)\leq\theta_{1}^'\leq U(X) | \theta_{1}^')=C$$ where $$0\leq C \leq1$$ where $$L(X)$$ and $$U(X)$$ the upper and lower confidence limits for $$\theta_{1}$$

Classic Example
Suppose $$X$$~$$N( \theta,\sigma^2)$$, where $$\theta$$ and $$\sigma^2$$ are unknown constants where we wish to estimate $$\theta$$. We can define (2) single value functions, $$L$$ and  $$U$$, defined by the process above such that given a  pre-specified confidence level ,$$C$$, and random sample $$X^*$$=($$x_1,x_2,...x_n$$)
 * $$L(X^*)=\bar{x} - \frac{ts}{ \sqrt{n}}$$
 * $$U(X^*)=\bar{x} + \frac{ts}{ \sqrt{n}}$$
 * where $$\bar{x}=\frac{1}{n} \sum_{i=1}^n x_i=\frac{1}{n}(x_1,x_2,...x_n)$$, $$s=\sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i- \bar{x})^2}$$
 * and $$t$$ follows a t distribution with (n-1) degrees of freedom. $$t$$~t$({1-C}/2,n-1)$

Another Example
$$ X_1, X_2, ..., X_n $$ are iid random variables, and let $$ T = (X_1, X_2,..., XZ_n) $$. Suppose $$ T\sim N(\mu, \sigma^2) $$. Now to construct a confidence interval with $$ \alpha $$ level of confidence. We know $$ \bar{x} $$ is sufficient for $$ \mu $$. So,
 * $$ p(-Z_\frac{\alpha}{2} \le \frac{\bar{x} - \mu}{\sigma^2} \le Z_\frac{\alpha}{2} ) = 1- \alpha $$
 * $$ p(-Z_\frac{\alpha}{2} \sigma^2 \le \bar{x} - \mu \le Z_\frac{\alpha}{2} \sigma^2 ) = 1 - \alpha $$
 * $$ p(\bar{x} - Z_\frac{\alpha}{2} \sigma^2 \le \mu \le \bar{x} + Z_\frac{\alpha}{2} \sigma^2 ) = 1- \alpha $$

This produces a $$ 100(1-\alpha)\% $$ confidence interval for $$ \mu $$ where,
 * $$ L(T) = \bar{x} - Z_\frac{\alpha}{2} \sigma^2 $$
 * $$ U(T) = \bar{x} + Z_\frac{\alpha}{2} \sigma^2 $$.

Coverage probability
The probability that the interval contains the true value is called the coverage probability.