User:Aravind V R/Formulas/Information theory

Entropy
Entropy is defined as
 * $$\Eta(X) = \operatorname{E}[\operatorname{I}(X)]

= \operatorname{E}[-\log(\mathrm{P}(X))] = -\sum_{i=1}^n {\mathrm{P}(x_i) \log_b \mathrm{P}(x_i)}$$ Conditional entropy
 * $$ \Eta(X|Y)=-\sum_{i,j}p(x_{i},y_{j})\log\frac{p(x_{i},y_{j})}{p(y_{j})}$$

Joint entropy $$\Eta(X,Y) = -\sum_{x\in\mathcal X} \sum_{y\in\mathcal Y} P(x,y) \log_2[P(x,y)]$$

Properties
Nonnegativity
 * $$\Eta(X) \ge 0$$

Subadditivity
 * $$\max \left[\Eta(X),\Eta(Y) \right] \leq \Eta(X,Y) \leq \Eta(X) + \Eta(Y)$$

Relation between entropies
 * $$\Eta(Y,X) = \Eta(Y)+\Eta(X|Y)$$

Kullback–Leibler divergence or relative entropy is defined as
 * $$ D_{\mathrm{KL}}(P\|Q) \equiv \sum_{i=1}^n p_i \log_2 \frac{p_i}{q_i}$$

Mutual information


\operatorname{I}(X;Y) = \sum_{y \in \mathcal Y} \sum_{x \in \mathcal X}   { p(x,y) \log{ \left(\frac{p(x,y)}{p(x)\,p(y)} \right) }}, $$ for discrete random variables $$X$$ and $$Y$$ Conditional mutual information
 * $${} \operatorname{I}(X;Y|Z) = \mathbb {E}_Z \big(\operatorname{I}(X;Y)|Z\big)

= {} \sum_{z\in \mathcal{Z}} \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}} {p_Z(z)\, p_{X,Y|Z}(x,y|z) \log\left[\frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}\,(x|z)p_{Y|Z}(y|z)}\right]} = {} \sum_{z\in \mathcal{Z}} \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}} p_{X,Y,Z}(x,y,z) \log \frac{p_{X,Y,Z}(x,y,z)p_{Z}(z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}.

$$

Properties
Nonnegativity
 * $$\operatorname{I}(X;Y) \ge 0$$

Symmetry
 * $$\operatorname{I}(X;Y) = \operatorname{I}(Y;X)$$

Relation to entropy
 * $$\begin{align}

\operatorname{I}(X;Y) &{} \equiv \Eta(X) - \Eta(X|Y) \\ &{} \equiv \Eta(Y) - \Eta(Y|X) \\ &{} \equiv \Eta(X) + \Eta(Y) - \Eta(X, Y) \\ &{} \equiv \Eta(X, Y) - \Eta(X|Y) - \Eta(Y|X) \end{align}$$ Relation to Kullback–Leibler divergence
 * $$ \operatorname{I}(X; Y)

= D_\text{KL}\left(p(x, y) \parallel p(x)p(y)\right). $$ Chain rule for mutual information
 * $$ I(X; Y, Z) = I(X; Y ) + I(X;Z|Y ) $$

Gibbs' inequality

 * $$ - \sum_{i=1}^n p_i \log_2 p_i \leq - \sum_{i=1}^n p_i \log_2 q_i $$ with equality if and only if $$ p_i = q_i $$

Corollary
 * $$ D_{\mathrm{KL}}(P\|Q) \equiv \sum_{i=1}^n p_i \log_2 \frac{p_i}{q_i} \geq 0.$$

Shearer's inequality
If X1, ..., Xd are random variables and S1, ..., Sn are subsets of {1, 2, ..., d} such that every integer between 1 and d lies in at least r of these subsets, then


 * $$ H[(X_1,\dots,X_d)] \leq \frac{1}{r}\sum_{i=1}^n H[(X_j)_{j\in S_i}]$$

where $$ (X_{j})_{j\in S_{i}}$$ is the Cartesian product of random variables $$X_{j}$$ with indices j in $$S_{i}$$

Pinsker's inequality
If $$P$$ and $$Q$$ are two probability distributions on a measurable space $$(X, \Sigma)$$, then
 * $$\delta(P,Q) \le \sqrt{\frac{1}{2} D_{\mathrm{KL}}(P\|Q)},$$

where
 * $$\delta(P,Q)=\sup \bigl\{ |P(A) - Q(A)| \big| A \in \Sigma \text{ is a measurable event} \bigr\}$$ is called the total variation distance (or statistical distance) between $$P$$ and $$Q$$

Fano's inequality
If the random variables X and Y represent input and output messages with a joint probability $$P(x,y)$$ and e represent an occurrence of error; i.e., that $$X\neq \tilde{X}=f(Y)$$ being an approximate version of $$X$$, then
 * $$H(X|Y)\leq H(e)+P(e)\log(|\mathcal{X}|-1),$$

Han's inequality
Han's inequality holds for any submodular set-function. Let X1,. . ., Xn be discrete random variables. Then
 * $$H(X_{[n]})\leq \frac{1}{n-1}\sum_{i=1}^n H(X_{[n]\setminus i})$$.