User:Miron.avidan/sandbox

Definition of false positive rate
Suppose we have m hypothesis tests. Denote the null hypothesis tested by: H1,H2,...,Hm

Summing the test results over $$H_{i}$$ will give us the following table and related random variables:


 * $$m_0$$ is the number of true null hypotheses, an unknown parameter
 * $$m - m_0$$ is the number of true alternative hypotheses
 * $$V$$ is the number of false positives (Type I error)
 * $$S$$ is the number of true positives
 * $$T$$ is the number of false negatives (Type II error)
 * $$U$$ is the number of true negatives
 * $$R$$ is the number of rejected null hypotheses


 * $$R$$ is an observable random variable, while $$S$$, $$T$$, $$U$$, and $$V$$ are unobservable random variables.

The level of significance that is used to test each hypothesys is set based on the form of inference (simultaneous inference vs. selective inference) and its supporting criteria, that were pre-determined by the researcher.

When performing multiple comparisons in a statistical framework such as above, the false positive ratio (as opposed to false positive rate) is the probability of falsely rejecting the null hypothesis for a particular test. Using the terminology suggested here, it is simply $$V/m$$.

Since V is a random variable and m is a known constant ($$ V \leq m $$), the false positive ratio is also a random variable, ranging between 0 and 1.

The false positive ratio is also known as the false alarm ratio.

The false positive rate (or "false alarm rate") is the expectancy of the false positive ratio, expressed by $$E(V/m)$$.

The difference between "false positive rate" and "type I error rate"
While the false positive rate is mathematically equal to the type 1 error rate, it is viewed as a separate term for the following reasons:


 * 1) The type 1 error rate is often associated with the a-prior setting of the significance level by the researcher: the significance level represents an acceptable error rate considering that all null hypotheses are true (the "global null" hypothesis). the choice of a significance level may thus be somewhat arbitrary (i.e setting 10% (0.1), 5% (0.05), 1% (0.01) etc.)


 * As opposed to that, the false positive rate is associated with a post-prior result, which is the expected number of false positives divided by the total number of hypotheses under the real combination of true and non-true null hypotheses (disregarding the "global null" hypothesis). Since the false positive rate is an external parameter, it cannot be identified with the significance level.


 * 1) Moreover, false positive rate is a term usually used regarding a medical test or diagnostic device (i.e "the false positive rate of a certain diagnostic device is 1%"), while type I error is a term associated with statistical tests, where the meaning of the word "positive" is not as clear (i.e "the type I error of a test is 1%").

The false positive rate should also not be confused with the familywise error rate, which is the probability that at least one of the tests that are performed will result in a type I error. As the number of tests grows, the familywise error rate generally converges to 1 while the false positive rate remains fixed.

Lastly, it is important to note the profound difference between the false positive rate and the false discovery rate: while the first is defined as $$E(V/m)$$, the second is defined as $$E(V/R)$$.