Concentration inequality

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value (typically, its expected value).

The law of large numbers of classical probability theory states that sums of independent random variables, under mild conditions, concentrate around their expectation with a high probability. Such sums are the most basic examples of random variables concentrated around their mean.

Concentration inequalities can be sorted according to how much information about the random variable is needed in order to use them.

Markov's inequality
Let $$X$$ be a random variable that is non-negative (almost surely). Then, for every constant $$a > 0$$,


 * $$\Pr(X \geq a) \leq \frac{\operatorname{E}(X)}{a}.$$

Note the following extension to Markov's inequality: if $$\Phi$$ is a strictly increasing and non-negative function, then


 * $$\Pr(X \geq a) = \Pr(\Phi (X) \geq \Phi (a)) \leq \frac{\operatorname{E}(\Phi(X))}{\Phi (a)}.$$

Chebyshev's inequality
Chebyshev's inequality requires the following information on a random variable $$X$$:


 * The expected value $$\operatorname{E}[X]$$ is finite.
 * The variance $$\operatorname{Var}[X] = \operatorname{E}[(X - \operatorname{E}[X] )^2]$$ is finite.

Then, for every constant $$a > 0$$,


 * $$\Pr(|X-\operatorname{E}[X]| \geq a) \leq \frac{\operatorname{Var}[X]}{a^2},$$

or equivalently,


 * $$\Pr(|X-\operatorname{E}[X]| \geq a\cdot \operatorname{Std}[X]) \leq \frac{1}{a^2},$$

where $$\operatorname{Std}[X]$$ is the standard deviation of $$X$$.

Chebyshev's inequality can be seen as a special case of the generalized Markov's inequality applied to the random variable $$|X-\operatorname{E}[X]|$$ with $$\Phi(x) = x^2$$.

Vysochanskij–Petunin inequality
Let X be a random variable with unimodal distribution, mean μ and finite, non-zero variance σ2. Then, for any $\lambda > \sqrt{\frac{8}{3}} = 1.63299\ldots,$


 * $$\text{Pr}(\left|X-\mu\right|\geq \lambda\sigma)\leq\frac{4}{9\lambda^2}.$$

(For a relatively elementary proof see e.g. ).

One-sided Vysochanskij–Petunin inequality
For a unimodal random variable $$X$$ and $$r\geq0$$, the one-sided Vysochanskij-Petunin inequality holds as follows:


 * $$\text{Pr}(X-E[X]\geq r)\leq

\begin{cases} \dfrac{4}{9}\dfrac{\operatorname{Var}(X)}{r^2 + \operatorname{Var}(X)} & \text{for }r^2 \geq\dfrac{5}{3} \operatorname{Var}(X),\\[5pt] \dfrac{4}{3}\dfrac{\operatorname{Var}(X)}{r^2 + \operatorname{Var}(X)}-\dfrac{1}{3} & \text{otherwise.} \end{cases} $$

Paley–Zygmund inequality
In contrast to most commonly used concentration inequalities, the Paley-Zygmund inequality provides a lower bound on the deviation probability.

Chernoff bounds
The generic Chernoff bound requires the moment generating function of $$X$$, defined as $$M_X(t):=\operatorname{E}\!\left[e^{tX}\right].$$ It always exists, but may be infinite. From Markov's inequality, for every $$t>0$$:


 * $$\Pr(X \geq a) \leq \frac{\operatorname{E}[e^{tX}]}{e^{ta}},$$

and for every $$t<0$$:


 * $$\Pr(X \leq a) \leq \frac{\operatorname{E}[e^{tX}]}{e^{ta}}.$$

There are various Chernoff bounds for different distributions and different values of the parameter $$t$$. See for a compilation of more concentration inequalities.

Bounds on sums of independent bounded variables
Let $$X_1, X_2,\dots,X_n$$ be independent random variables such that, for all i:
 * $$a_i\leq X_i\leq b_i$$ almost surely.
 * $$c_i := b_i-a_i$$
 * $$\forall i: c_i \leq C$$

Let $$S_n$$ be their sum, $$E_n$$ its expected value and $$V_n$$ its variance:
 * $$S_n := \sum_{i=1}^n X_i$$
 * $$E_n := \operatorname{E}[S_n] = \sum_{i=1}^n \operatorname{E}[X_i]$$
 * $$V_n := \operatorname{Var}[S_n] = \sum_{i=1}^n \operatorname{Var}[X_i]$$

It is often interesting to bound the difference between the sum and its expected value. Several inequalities can be used.

1. Hoeffding's inequality says that:
 * $$\Pr\left[|S_n-E_n|>t\right] \le 2 \exp \left(-\frac{2t^2}{\sum_{i=1}^n c_i^2} \right) \le 2 \exp \left(-\frac{2t^2}{n C^2} \right)$$

2. The random variable $$S_n-E_n$$ is a special case of a martingale, and $$S_0-E_0=0$$. Hence, the general form of Azuma's inequality can also be used and it yields a similar bound:
 * $$\Pr\left[|S_n-E_n|>t\right] < 2 \exp \left(-\frac{2t^2}{\sum_{i=1}^n c_i^2}\right)< 2 \exp \left(-\frac{2t^2}{n C^2} \right) $$

This is a generalization of Hoeffding's since it can handle other types of martingales, as well as supermartingales and submartingales. See Fan et al. (2015). Note that if the simpler form of Azuma's inequality is used, the exponent in the bound is worse by a factor of 4.

3. The sum function, $$S_n=f(X_1,\dots,X_n)$$, is a special case of a function of n variables. This function changes in a bounded way: if variable i is changed, the value of f changes by at most $$b_i-a_it\right] < 2 \exp \left(-\frac{2t^2}{\sum_{i=1}^n c_i^2} \right)< 2 \exp \left(-\frac{2t^2}{n C^2} \right)$$

This is a different generalization of Hoeffding's since it can handle other functions besides the sum function, as long as they change in a bounded way.

4. Bennett's inequality offers some improvement over Hoeffding's when the variances of the summands are small compared to their almost-sure bounds C. It says that:
 * $$\Pr\left[|S_n-E_n| > t \right] \leq

2\exp\left[ - \frac{V_n}{C^2} h\left(\frac{C t}{V_n} \right)\right],$$ where $$h(u) = (1+u)\log(1+u)-u$$

5. The first of Bernstein's inequalities says that:
 * $$\Pr\left[|S_n-E_n|>t\right] < 2 \exp \left(-\frac{t^2/2}{V_n + C\cdot t/3} \right)$$

This is a generalization of Hoeffding's since it can handle random variables with not only almost-sure bound but both almost-sure bound and variance bound.

6. Chernoff bounds have a particularly simple form in the case of sum of independent variables, since $$\operatorname{E}[e^{t\cdot S_n}] = \prod_{i=1}^n {\operatorname{E}[e^{t\cdot X_i}]}$$.

For example, suppose the variables $$X_i$$ satisfy $$X_i \geq E(X_i)-a_i-M$$, for $$1 \leq i \leq n$$. Then we have lower tail inequality:
 * $$\Pr[S_n - E_n < -\lambda]\leq \exp\left(-\frac{\lambda^2}{2(V_n+\sum_{i=1}^n a_i^2+M\lambda/3)}\right)$$

If $$X_i$$ satisfies $$X_i \leq E(X_i)+a_i+M$$, we have upper tail inequality:
 * $$\Pr[S_n - E_n > \lambda]\leq \exp\left(-\frac{\lambda^2}{2(V_n + \sum_{i=1}^n a_i^2+M\lambda/3)}\right)$$

If $$X_i$$ are i.i.d., $$|X_i| \leq 1$$ and $$\sigma^2$$ is the variance of $$X_i$$, a typical version of Chernoff inequality is:
 * $$\Pr[|S_n| \geq k\sigma]\leq 2e^{-k^2/4n} \text{ for } 0 \leq k\leq 2\sigma.$$

7. Similar bounds can be found in: Rademacher distribution

Efron–Stein inequality
The Efron–Stein inequality (or influence inequality, or MG bound on variance) bounds the variance of a general function.

Suppose that $$X_1 \dots X_n$$, $$X_1' \dots X_n'$$ are independent with $$X_i'$$ and $$X_i$$ having the same distribution for all $$i$$.

Let $$X = (X_1,\dots, X_n), X^{(i)} = (X_1, \dots , X_{i-1}, X_i',X_{i+1}, \dots , X_n).$$ Then



\mathrm{Var}(f(X)) \leq \frac{1}{2} \sum_{i=1}^{n} E[(f(X)-f(X^{(i)}))^2]. $$

A proof may be found in e.g.,.

Bretagnolle–Huber–Carol inequality
Bretagnolle–Huber–Carol Inequality bounds the difference between a vector of multinomially distributed random variables and a vector of expected values. A simple proof appears in (Appendix Section).

If a random vector $$(Z_1, Z_2, Z_3, \ldots, Z_n) $$ is multinomially distributed with parameters $$ (p_1, p_2, \ldots, p_ n) $$ and satisfies $$ Z_1 + Z_2 + \dots + Z_n = M, $$ then
 * $$ \Pr\left( \sum_{i=1}^n |Z_i -M p_i| \geq 2M \varepsilon \right)

\leq 2^n e^{-2M\varepsilon^2}. $$

This inequality is used to bound the total variation distance.

Mason and van Zwet inequality
The Mason and van Zwet inequality for multinomial random vectors concerns a slight modification of the classical chi-square statistic.

Let the random vector $$(N_1, \ldots ,N_k) $$ be multinomially distributed with parameters $$n $$ and $$(p_1,\ldots, p_k)$$ such that $$ p_i > 0$$ for $$ i < k.$$ Then for every $$C > 0 $$ and $$ \delta > 0 $$ there exist constants $$ a, b, c > 0,$$ such that for all $$ n \geq 1$$ and $$ \lambda ,p_1, \ldots, p_{k-1} $$ satisfying $$ \lambda > Cn \min \{p_i | 1 \leq i \leq k-1 \} $$ and $$ \sum_{i=1}^{k-1} p_i \leq 1 -\delta, $$ we have
 * $$ \Pr\left( \sum_{i=1}^{k-1} \frac{(N_i-np_i)^2}{np_i}> \lambda \right)

\leq a e^{bk-c\lambda}. $$

Dvoretzky–Kiefer–Wolfowitz inequality
The Dvoretzky–Kiefer–Wolfowitz inequality bounds the difference between the real and the empirical cumulative distribution function.

Given a natural number $$n$$, let $$X_1, X_2,\dots,X_n$$ be real-valued independent and identically distributed random variables with cumulative distribution function F(·). Let $$F_n$$ denote the associated empirical distribution function defined by
 * $$F_n(x) = \frac1n \sum_{i=1}^n \mathbf{1}_{\{X_i\leq x\}},\qquad x\in\mathbb{R}.$$

So $$F(x)$$ is the probability that a single random variable $$X$$ is smaller than $$x$$, and $$F_n(x)$$ is the average number of random variables that are smaller than $$x$$.

Then
 * $$\Pr\left(\sup_{x\in\mathbb R} \bigl(F_n(x) - F(x)\bigr) > \varepsilon \right) \le e^{-2n\varepsilon^2} \text{ for every } \varepsilon \geq \sqrt{\tfrac 1 {2n} \ln2}.$$

Anti-concentration inequalities
Anti-concentration inequalities, on the other hand, provide an upper bound on how much a random variable can concentrate around a quantity.

For example, Rao and Yehudayoff show that there exists some $$C > 0$$ such that, for most directions of the hypercube $$x \in \{\pm 1\}^n$$, the following is true:

\Pr\left(\langle x, Y\rangle = k\right) \le \frac{C}{\sqrt{n}}, $$ where $$Y$$ is drawn uniformly from a subset $$B \subseteq \{\pm 1\}^n$$ of large enough size.

Such inequalities are of importance in several fields, including communication complexity (e.g., in proofs of the gap Hamming problem ) and graph theory.

An interesting anti-concentration inequality for weighted sums of independent Rademacher random variables can be obtained using the Paley–Zygmund and the Khintchine inequalities.