User:Petillés/sandbox

The false discovery rate (FDR) is one way of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the expected proportion of rejected null hypotheses that were incorrect rejections ("false discoveries"). FDR-controlling procedures provide less stringent control of Type I errors compared to familywise error rate (FWER) controlling procedures (such as the Bonferroni correction), which control the probability of at least one Type I error. Thus, FDR-controlling procedures have greater power, at the cost of increased rates of Type I errors.

History
The modern widespread use of the FDR is believed to stem from, and be motivated by, the development in technologies that allowed the collection and analysis of a large number of distinct variables in several individuals (e.g., the expression level of each of 10,000 different genes in 100 different persons). By the late 1980s and 1990s, the development of "high-throughput" sciences, such as genomics, allowed for rapid data acquisition. This, coupled with the growth in computing power, made it possible to seamlessly perform hundreds and thousands of statistical tests on a given data set. The technology of microarrays was a prototypical example, as it enabled thousands of genes to be tested simultaneously for differential expression between two biological conditions.

As high-throughput technologies became common, technological and/or financial constraints led researchers to collect datasets with relatively small sample sizes (e.g. few individuals being tested) and large numbers of variables being measured per sample (e.g. thousands of gene expression levels). In these datasets, too few of the measured variables showed statistical significance after classic correction for multiple tests with standard multiple comparison procedures. This created a need within many scientific communities to abandon FWER and unadjusted multiple hypothesis testing for other ways to highlight and rank in publications those variables showing marked effects across individuals or treatments that would otherwise be dismissed as non-significant after standard correction for multiple tests. In response to this, a variety of error rates have been proposed—and become commonly used in publications—that are less conservative than FWER in flagging possibly noteworthy observations.

Literature
The FDR concept was formally described by Yoav Benjamini and Yosi Hochberg in 1995 as a less conservative and arguably more appropriate approach for identifying the important few from the trivial many effects tested. The FDR has been particularly influential, as it was the first alternative to the FWER to gain broad acceptance in many scientific fields (especially in the life sciences, from genetics to biochemistry, oncology and plant sciences). In 2005, the Benjamini and Hochberg paper from 1995 was identified as one of the 25 most-cited statistical papers.

Definitions
Based on definitions below we can define $Q$ as the proportion of false discoveries among the discoveries $$\left(Q = \frac{V}{R}\right)$$. The false discovery rate (FDR) is given by:


 * $$\mathrm{FDR} = Q_e = \mathrm{E}\!\left [Q \right ] = \mathrm{E}\!\left [\frac{V}{V+S}\right ] = \mathrm{E}\!\left [\frac{V}{R}\right ], $$

where $$ \frac{V}{R} $$ is defined to be 0 when $$ R = 0 $$. One wants to keep FDR below a threshold q.

Controlling procedures
The settings for many procedures is such that we have $$H_1 \ldots H_m$$ null hypotheses tested and $$P_1 \ldots P_m$$ their corresponding p-values. We order these p-values increasingly and denote them by $$P_{(1)} \ldots P_{(m)}$$. A procedure that goes from a small p-value to a large one will be called a step-up procedure. In a similar way, in a "step-down" procedure we move from a large corresponding test statistic to a smaller one.

Benjamini–Hochberg procedure
The Benjamini–Hochberg procedure (BH step-up procedure) controls the FDR at level $$\alpha$$. It works as follows:


 * 1) For a given $$\alpha$$, find the largest $k$ such that $$P_{(k)} \leq \frac{k}{m} \alpha.$$
 * 2) Reject the null hypothesis (i.e., declare discoveries) for all $$H_{(i)}$$ for $$i = 1, \ldots, k$$.

The BH procedure is valid when the $m$ tests are independent, and also in various scenarios of dependence. It also satisfies the inequality:


 * $$E(Q) \leq \frac{m_0}{m}\alpha \leq \alpha$$

If an estimator of $$m_0$$ is inserted into the BH procedure, it is no longer guaranteed to achieve FDR control at the desired level. Adjustments may be needed in the estimator and several modifications have been proposed.

Note that the mean $$\alpha$$ for these $m$ tests is $$\frac{\alpha(m+1)}{2m}$$, the Mean(FDR $$\alpha$$) or MFDR, $$\alpha$$ adjusted for $m$ independent (or positively correlated, see below) tests. The MFDR calculation shown here is for a single value and is not part of the Benjamini and Hochberg method; see AFDR below.

Benjamini–Hochberg–Yekutieli procedure
The Benjamini–Hochberg–Yekutieli procedure controls the false discovery rate under positive dependence assumptions. This refinement modifies the threshold and finds the largest $k$ such that:


 * $$P_{(k)} \leq \frac{k}{m \cdot c(m)} \alpha $$


 * If the tests are independent or positively correlated: $$c(m)=1$$
 * Under arbitrary dependence: $$c(m) = \sum _{i=1} ^m \frac{1}{i}$$

In the case of negative correlation, $$c(m)$$ can be approximated by using the Euler–Mascheroni constant.


 * $$\sum _{i=1} ^m \frac{1}{i} \approx \ln(m) + \gamma + \frac{1}{2m}.$$

Using MFDR and formulas above, an adjusted MFDR, or AFDR, is the min(mean $$\alpha$$) for $m$ dependent tests $$= \frac\mathrm{MFDR}{c(m)}$$.

The other way to address dependence is by bootstrapping and rerandomization.

Estimating the FDR
Let $$\pi_0 $$ be the proportion of true null hypotheses, and $$\pi_1 = 1-\pi_0 $$ be the proportion of true alternative hypotheses. Then $$N \pi_0 $$ times the average p-value of rejected effects divided by the number of rejected effects gives an estimate of the FDR.

Adaptive and scalable
Using a multiplicity procedure that controls the FDR criterion is adaptive and scalable. Meaning that controlling the FDR can be very permissive (if the data justify it), or conservative (acting close to control of FWER for sparse problem) - all depending on the number of hypotheses tested and the level of significance.

The FDR criterion adapts so that the same number of false discoveries (V) will have different implications, depending on the total number of discoveries (R). This contrasts with the family wise error rate criterion. For example, if inspecting 100 hypotheses (say, 100 genetic mutations or SNPs for association with some phenotype in some population):
 * If we make 4 discoveries (R), having 2 of them be false discoveries (V) is often unbearable. Whereas,
 * If we make 50 discoveries (R), having 2 of them be false discoveries (V) is often bearable.

The FDR criterion is scalable in that the same proportion of false discoveries out of the total number of discoveries (Q), remains sensible for different number of total discoveries (R). For example:
 * If we make 100 discoveries (R), having 5 of them be false discoveries ($$q=5\%$$) can be bearable.
 * Similarly, if we make 1000 discoveries (R), having 50 of them be false discoveries (as before, $$q=5\%$$) can still be bearable.

The FDR criterion is also scalable in the sense that when making a correction on a set of hypotheses, or two corrections if the set of hypotheses were to be split into two - the discoveries in the combined study are (about) the same as when analyzed separately. For this to hold, the sub-studies should be large with some discoveries in them.

Dependency among the test statistics
Controlling the FDR using the linear step-up BH procedure, at level q, has several properties related to the dependency structure between the test statistics of the $m$ null hypotheses that are being corrected for. If the test statistics are:
 * Independent: $$\mathrm{FDR} \le \frac{m_0}{m}q$$
 * Independent and continuous: $$\mathrm{FDR} = \frac{m_0}{m}q$$
 * Positive dependent: $$\mathrm{FDR} \le \frac{m_0}{m}q$$
 * In the general case: $$\mathrm{FDR} \le \frac{m_0}{m} q / \left( 1 + \frac{1}{2} + \frac{1}{3} + \cdots + \frac{1}{m} \right) \approx \frac{m_0}{m}q / (\log (m) + \gamma + \frac{1}{2m})$$, where $$\gamma$$ is the Euler–Mascheroni constant.

Proportion of true hypotheses
If all of the null hypotheses are true ($$m_0=m$$), then controlling the FDR at level $q$ guarantees control over the FWER (this is also called "weak control of the FWER"): $$\mathrm{FWER}=P\left( V \ge 1 \right) = E\left( \frac{V}{R} \right) = \mathrm{FDR} \le q$$, simply because the event of rejecting at least one true null hypothesis $$ \{V \ge 1\} $$ is exactly the event $$ \{V/R = 1\} $$, and the event $$ \{V = 0\} $$ is exactly the event $$ \{V/R = 0\} $$ (when $$ V = R = 0 $$, $$ V/R = 0 $$ by definition). But if there are some true discoveries to be made ($$m_0<m$$) then $FWER &ge; FDR$. In that case there will be room for improving detection power. It also means that any procedure that controls the FWER will also control the FDR.