Cantelli's inequality

In probability theory, Cantelli's inequality (also called the Chebyshev-Cantelli inequality and the one-sided Chebyshev inequality) is an improved version of Chebyshev's inequality for one-sided tail bounds. The inequality states that, for $$\lambda > 0,$$



\Pr(X-\mathbb{E}[X]\ge\lambda) \le \frac{\sigma^2}{\sigma^2 + \lambda^2}, $$

where


 * $$X$$ is a real-valued random variable,
 * $$\Pr$$ is the probability measure,
 * $$\mathbb{E}[X]$$ is the expected value of $$X$$,
 * $$\sigma^2$$ is the variance of $$X$$.

Applying the Cantelli inequality to $$-X$$ gives a bound on the lower tail,



\Pr(X-\mathbb{E}[X]\le -\lambda) \le \frac{\sigma^2}{\sigma^2 + \lambda^2}. $$

While the inequality is often attributed to Francesco Paolo Cantelli who published it in 1928, it originates in Chebyshev's work of 1874. When bounding the event random variable deviates from its mean in only one direction (positive or negative), Cantelli's inequality gives an improvement over Chebyshev's inequality. The Chebyshev inequality has "higher moments versions" and "vector versions", and so does the Cantelli inequality.

Comparison to Chebyshev's inequality
For one-sided tail bounds, Cantelli's inequality is better, since Chebyshev's inequality can only get



\Pr(X - \mathbb{E}[X] \geq \lambda) \leq \Pr(|X-\mathbb{E}[X]|\ge\lambda) \le \frac{\sigma^2}{\lambda^2}. $$

On the other hand, for two-sided tail bounds, Cantelli's inequality gives



\Pr(|X-\mathbb{E}[X]|\ge\lambda) = \Pr(X-\mathbb{E}[X]\ge\lambda) + \Pr(X-\mathbb{E}[X]\le-\lambda) \le \frac{2\sigma^2}{\sigma^2 + \lambda^2}, $$

which is always worse than Chebyshev's inequality (when $$\lambda \geq \sigma$$; otherwise, both inequalities bound a probability by a value greater than one, and so are trivial).

Proof
Let $$X$$ be a real-valued random variable with finite variance $$\sigma^2$$ and expectation $$\mu$$, and define $$Y = X - \mathbb{E}[X]$$ (so that $$\mathbb{E}[Y] = 0$$ and $$\operatorname{Var}(Y) = \sigma^2$$).

Then, for any $$u\geq 0$$, we have

\Pr( X-\mathbb{E}[X]\geq\lambda) = \Pr( Y \geq \lambda) = \Pr( Y + u \geq \lambda + u) \leq  \Pr( (Y + u)^2  \geq (\lambda + u)^2 ) \leq \frac{\mathbb{E}[(Y + u)^2] }{(\lambda + u)^2} = \frac{\sigma^2 + u^2 }{(\lambda + u)^2}. $$ the last inequality being a consequence of Markov's inequality. As the above holds for any choice of $$u\in\mathbb{R}$$, we can choose to apply it with the value that minimizes the function $$u \geq 0 \mapsto \frac{\sigma^2 + u^2 }{(\lambda + u)^2}$$. By differentiating, this can be seen to be $$u_\ast = \frac{\sigma^2}{\lambda}$$, leading to

\Pr( X-\mathbb{E}[X] \geq\lambda) \leq \frac{\sigma^2 + u_\ast^2 }{(\lambda + u_\ast)^2} = \frac{\sigma^2}{\lambda^2 + \sigma^2} $$ if $$\lambda > 0$$

Generalizations
Various stronger inequalities can be shown. He, Zhang, and Zhang showed (Corollary 2.3) when $$\mathbb{E}[X]=0,\,\mathbb{E}[X^2]=1$$ and $$\lambda\ge0$$:



\Pr(X\ge\lambda) \le 1- (2\sqrt{3}-3)\frac{(1+\lambda^2)^2}{\mathbb{E}[X^4]+6\lambda^2+\lambda^4}. $$

In the case $$\lambda=0$$ this matches a bound in Berger's "The Fourth Moment Method",

\Pr(X\ge 0) \ge \frac{2\sqrt{3}-3}{\mathbb{E}[X^4]}. $$ This improves over Cantelli's inequality in that we can get a non-zero lower bound, even when $$\mathbb{E}[X]=0$$.