User:InfoTheorist/Hoeffding's inequality proof

In this section, we give a proof of Hoeffding's inequality. For the proof, we require the following lemma. Suppose $$X$$ is a real random variable with mean zero such that $$P(X\in [a,b])=1$$. Then

$$ \mathrm{E}[\exp(sX)]\leq \exp\left(\frac{s^{2}(b-a)^{2}}{8}\right). $$

To prove this lemma, note that if one of $$a$$ or $$b$$ is zero, then $$P(X=0)=1$$ and the inequality follows. If both are nonzero, then $$a$$ must be negative and $$b$$ must be positive.

Next, recall that $$f(x)=\exp(sx)$$ is a convex function on the real line. Thus for any $$x\in [a,b]$$,

$$ \exp(sx)\leq \frac{b-x}{b-a}\exp(sa)+\frac{x-a}{b-a}\exp(sb). $$

Combining the previous inequality with the fact that $$\mathrm{E}X=0$$ results in

$$ \begin{align} \mathrm{E}[\exp(sX)] & \leq \frac{b\exp(sa)-a\exp(sb)}{b-a} \\ & = (1-\theta+\theta \exp(s(b-a)))\exp(-s\theta(b-a)), \\ \end{align} $$

where $$\theta=-\frac{a}{b-a}>0$$. Next let $$u=s(b-a)$$ and define $$\phi:\mathbb{R}\rightarrow\mathbb{R}$$ as

$$ \phi(u)=-\theta u+\log (1-\theta+\theta \exp(u)). $$

Note that $$\phi$$ is well defined on $$\mathbb{R}$$ as $$\theta>0$$ and for all $$u\in\mathbb{R}$$,

$$ \exp(u)>1-\frac{1}{\theta}=\frac{b}{a} $$

since $$\frac{b}{a}<0$$. The definition of $$\phi$$ implies $$\mathrm{E}[\exp(sX)]\leq \exp(\phi(u))$$. By Taylor's theorem, for every real $$u$$ there exists a $$v$$ between $$0$$ and $$u$$ such that

$$ \phi(u)=\phi(0)+u\phi'(0)+\frac{u^{2}}{2}\phi''(v). $$

Note that $$\phi(0)=0$$,

$$ \phi'(0)=-\theta+\frac{\theta \exp(u)}{1-\theta+\theta \exp(u)}|_{u=0}=0, $$

and

$$ \begin{align} \phi''(v) &= \frac{\theta \exp(v)(1-\theta+\theta \exp(v))-\theta^{2}\exp(2v)}{(1-\theta+\theta \exp(v))^{2}}\\ &=\frac{\theta \exp(v)}{1-\theta+\theta \exp(v)}\left(1-\frac{\theta \exp(v)}{1-\theta+\theta \exp(v)}\right)\\ &\leq \frac{1}{4}, \end{align} $$

where the last inequality holds because for all $$t\in [0,1]$$, $$t(1-t)\leq \frac{1}{4}$$. Therefore, $$\phi (u)\leq \frac{1}{8}u^{2}=\frac{1}{8}s^{2}(b-a)^{2}$$. This implies

$$ \mathrm{E}[\exp(sX)]\leq \exp\left(\frac{s^{2}(b-a)^{2}}{8}\right). $$

Using this lemma, we can prove Hoeffding's inequality. Suppose $$X_{1},\dots,X_{n}$$ are $$n$$ independent random variables such that for every $$i$$, $$1\leq i\leq n$$, we have $$P(X_{i}\in [a_{i},b_{i}])=1$$. Let $$S_{n}=\sum_{i=1}^{n}X_{i}$$. Then for $$s,t\geq 0$$, Markov's inequality and the independence of the $$X_{i}$$'s imply

$$ \begin{align} P(S_{n}-\mathrm{E}S_{n}\geq t) &= P(\exp(s(S_{n}-\mathrm{E}S_{n}))\geq \exp(st))\\ &\leq \exp(-st)\mathrm{E}[\exp(s(S_{n}-\mathrm{E}S_{n}))]\\ &=\exp(-st)\prod_{i=1}^{n}\mathrm{E}[\exp(s(X_{i}-\mathrm{E}X_{i}))]\\ &\leq \exp\left(-st+\frac{s^{2}}{8}\sum_{i=1}^{n}(b_{i}-a_{i})^{2}\right). \end{align} $$

To get the best possible upper bound, we find the minimum of the right hand side of the last inequality as a function of $$s$$. Define $$g:\mathbb{R}\rightarrow\mathbb{R}$$ as

$$ g(s)=-st+\frac{s^{2}}{8}\sum_{i=1}^{n}(b_{i}-a_{i})^2. $$

Note that $$g$$ is a quadratic function and thus achieves its minimum at

$$ s=\frac{4t}{\sum_{i=1}^{n}(b_{i}-a_{i})^{2}}. $$

Thus we get

$$ P(S_{n}-\mathbb{E}S_{n}\geq t)\leq \exp\left(-\frac{2t^{2}}{\sum_{i=1}^{n}(b_{i}-a_{i})^{2}}\right). $$