User:Mikenaaman/sandbox

Almost Sure Hypothesis Testing or A.S. Hypothesis Testing utilizes almost sure convergence in order to determine the validity of a statistical hypothesis with probability one, w.p.1. This is to say that whenever the null hypothesis, $$\textstyle H_0$$, is true, then an A.S. hypothesis test will fail to reject the null hypothesis w.p.1 for all sufficiently large samples. Similarly, whenever the alternative hypothesis, $$\textstyle H_1$$, is true, then an A.S. hypothesis test will reject the null hypothesis with probability one, for all sufficiently large samples. Along similar lines, an A.S. confidence interval eventually contains the parameter of interest w.p.1.

Description
For simplicity, assume we have a sequence of independent and identically distributed normal random variables, $$\textstyle x_i~N(\mu,1)$$, with mean, $$\textstyle \mu $$, and unit variance. Suppose that nature or simulation has chosen the true mean to be $$\textstyle \mu_0 $$,  then the probability distribution function of the mean,$$\textstyle \mu $$,, is given by


 * $$ Pr\left(\mu\le t\right)=\left[ t\in\left[\mu_0,+\infty\right]\right] $$

where an Iversion bracket has been used. A naïve approach to estimating this distribution function would be to replace true mean on the right hand side with an estimate such as the sample mean, $$\textstyle \hat{\mu} $$, but
 * $$ E\left[ t\in \left[\hat{\mu},+\infty\right]\right ]=Pr\left(\hat{\mu}\le t\right)=\Phi(\sqrt{n}(t-\mu_0)) \rightarrow Pr\left(\mu\le t\right) -0.5\left[\mu_0=t\right] $$

which means the approximation to the true distribution function will be off by 0.5 at the true mean. However, $$\textstyle \left[\hat{\mu},+\infty\right]$$ is the 50% one sided confidence interval. More generally, let $$\textstyle Z_{\alpha_{n}} $$ be the critical value of a one sided hypothesis test with significance level $$\textstyle \alpha_n $$, then


 * $$ E\left[t\in \left[\hat{\mu} -Z_{\alpha_n}/\sqrt{n},+\infty \right] \right ] \rightarrow Pr\left(\mu\le t\right) - \lim_{n\rightarrow+\infty} \alpha_n \left[\mu_0=t\right] .$$

If we set $$\textstyle \alpha_n=0.05 $$, then the error of the approximation is reduced by a factor of 10 around the true mean. Of course, if we let $$\textstyle \alpha_n \rightarrow 0$$, then
 * $$ E\left[t\in \left[\hat{\mu} -Z_{\alpha_n}/\sqrt{n},+\infty \right] \right ] \rightarrow Pr\left(\mu\le t\right)  $$

However, this only shows that the expectation is close to the limiting value. Naaman (2016) showed that setting the significance level at $$\textstyle \alpha_n=n^{-p}$$ with  $$\textstyle p>1 $$ results in a finite number of type I and type II errors w.p.1 under fairly mild regularity conditions. This means that for each $$\textstyle t$$, there exists an $$\textstyle N(t)$$,  such that for all $$\textstyle n>N(t)$$,


 * $$ \left[t\in \left[\hat{\mu} -Z_{\alpha_n}/\sqrt{n},+\infty \right] \right ] =Pr\left(\mu\le t\right)  $$

where the equality holds w.p.1. So the indicator function of a one sided A.S. confidence interval is a good approximation to the true distribution function.

Simulation
probability. Under suitable conditions, the sample mean, $\overline{x}$, can be used to construct a $95$\% confidence interval for the mean, $\mu$, and as the sample size grows


 * $$\begin{align}

H_0: \mu=1\\ H_1: \mu=2 \end{align}$$ \begin{align} \Pr\left(\left|\overline{x}-\mu\right| < \hat{\sigma}n^{-.5}1.96\right) \rightarrow 0.95 \end{align} where $\hat{\sigma}^2$ is a consistent estimator of the variance. In this case, the probability that the $95$\% confidence interval contains $\mu$ approaches $0.95$. Many results in statistics focus on issues relating to the rejection of the null when it is false. However, Fisher, who introduced the term null hypothesis, did not even specify an alternative, instead focusing on a well defined null, see \cite{r10}. In this paper, we will be primarily focused on hypothesis testing that performs well, regardless of the validity of the null.

Optional Stopping
For example, suppose a researcher performed an experiment with a sample size of 10 and found no statistically significant result. Then suppose she decided to add one more observation, and retest continuing this process until a significant result was found. Under this scenario,\footnote{A similar process is considered by \cite{r25} for a simulation in the context of animal testing. } given the initial batch of 10 observations resulted in an insignificant result, the probability that the experiment will be stopped at some finite sample size, $$N_{s}$$, can be bounded using Boole's inequality
 * $$ Pr\left(N_s<+\infty\right)<\sum\limits_{n=11}^{\infty}h <0.0952 $$

where $$h=n^{-2}$$$. This compares favorably with fixed significance level testing which has a finite stopping time with probability one; however, this bound will not be meaningful for all bandwidths, as the above sum can be greater than one (the bandwidth in Eq. (\ref{bandwidth}) would be one   example). But even using that bandwidth, if the testing was done in batches of 10, then


 * $$ Pr\left(N_s<+\infty\right)<\sum\limits_{i=2}^{\infty}\left( 10i\right) ^{-1.2} <0.3 $$

which results in a relatively large probability that the process will never end.

Publication Bias
As another example of the power of this approach, if an academic journal only accepts papers with p-values less than 0.05, then roughly 1 in 20 independent studies of the same effect would find a significant result when there was none. However, if the journal required a minimum sample size of 100 and a maximum bandwidth given by $$h<n^{-1.2}$$, then one would expect roughly 1 in 250 studies would find an effect when there was none (if the minimum sample size was 30, it would still be 1 in 60). If the maximum bandwidth was given by $$h<n^{-2}$$ (which will have better small sample performance with regard to type I error when multiple comparisons are a concern), one would expect roughly 1 in 10000 studies would find an effect when there was none (if the minimum sample size was 30, it would be 1 in 900).

Jeffreys-Lindley Paradox
Lindley's paradox occurs when
 * 1) The result $$\textstyle x$$ is "significant"  by a frequentist test of $$\textstyle H_0$$, indicating sufficient evidence to reject $$\textstyle H_0$$, say, at the 5% level, and
 * 2) The posterior probability of $$\textstyle H_0$$ given $$\textstyle x$$ is high, indicating  strong evidence that $$\textstyle H_0$$ is in better agreement with $$\textstyle x$$ than $$\textstyle H_1$$.

However, the paradox does not apply to A.S. hypothesis tests. The Bayesian and the frequentist will eventually reach the same conclusion.