User:Noeer Alotaibi/sandbox

Outline for Wikipedia Article
== outline are ==
 * 1/intro
 * 2/what is the null distribution
 * 3/when do we use them
 * 4/how to choose null hypothesis

Introduction
There many tools a scientist should use when making a hypothesis and setting up an experiment. One of the tools is to help the scientist determine if the result of experiment fall within the parameters of the result of the expectation of hypothesis. That tool is the null distribution. The null distribution doesn’t confirm the hypothesis but it will confirm if the results are outside of the expected result of the parameter. In this article, we will talk about what the null hypothesis is, when we use it and how to choose the null hypothesis. The null hypothesis is always a part of an experiment. The null hypothesis tries to show that among two sets of data, there is no statistical difference between the results of doing one thing as opposed to doing a different thing. For an example of this, a scientist might be trying to prove that people who walk two miles a day have healthier hearts than people who walk less than one mile a day. The scientist would use the null hypothesis to test the health of the hearts of people who walked two miles a day against the health of the hearts of the people who walked less than one mile a day. If there was no difference between the two groups, then the scientist would be able to say the null hypothesis had shown there was no statistical difference between the two groups. Then the scientists could determine that there was no proof that walking was by itself helpful to keep a healthy heart.

what is the null distribution
In the procedure of hypothesis testing, one need form the joint distribution of test statistics to conduct the test and control type I errors. However, the true distribution is often unknown and a proper null distribution ought to be used to represent the data. For example, one sample and two sample test of means can use t statistics which have Gaussian null distribution, while F statistics, testing k groups of population means, have Gaussian quadratic form null distribution 1. The null distribution is defined as the asymptotic distribution of null quantile-transformed test statistics, based on marginal null distribution 2 Resampling procedures, such as non-parametric or model-based bootstrap, can provide consistent estimator for the null distribution. Improper choice of null distribution pose significant influence on type I error and power properties in the testing process.

when do we use them and how to choose null hypothesis
Under bayesian framework, large scale studies allow the null distribution to be put into a probabilistic context with its non-null counterparts. when sample size n is large, say over 10,000, empirical nulls utilize study's own data to estimate an appropriate null distribution. The important assumption is that due to the large proportion of null cases ( > 0.9), the data can show null distribution itself. Theoretical null may fail in some cases. In large scale data set, it is easy to find the deviations of data from the ideal mathematical framework, e.g, independent and identically distributed (i.i.d.) samples. In addition, the correlation across sampling units and cases as well as unobserved covariates may lead to wrong theoretical null distribution 3. Permutation method are frequently used in multiple testing to obtain empirical null distribution generated from data. Permutation techniques and empirical methods can be combined by using permutation null rather than standardized normal in the empirical algorithm.