Data processing inequality

The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.

Statement
Let three random variables form the Markov chain $$X \rightarrow Y \rightarrow Z$$, implying that the conditional distribution of $$Z$$ depends only on $$Y$$ and is conditionally independent of $$X$$. Specifically, we have such a Markov chain if the joint probability mass function can be written as
 * $$p(x,y,z) = p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y)$$

In this setting, no processing of $$Y$$, deterministic or random, can increase the information that $$Y$$ contains about $$X$$. Using the mutual information, this can be written as :
 * $$ I(X;Y) \geqslant I(X;Z),$$

with the equality $$I(X;Y) = I(X;Z) $$ if and only if $$ I(X;Y\mid Z)=0 $$. That is, $$Z$$ and $$Y$$ contain the same information about $$X$$, and $$X \rightarrow Z \rightarrow Y$$ also forms a Markov chain.

Proof
One can apply the chain rule for mutual information to obtain two different decompositions of $$I(X;Y,Z)$$:



I(X;Z) + I(X;Y\mid Z) = I(X;Y,Z) = I(X;Y) + I(X;Z\mid Y) $$

By the relationship $$X \rightarrow Y \rightarrow Z$$, we know that $$X$$ and $$Z$$ are conditionally independent, given $$Y$$, which means the conditional mutual information, $$I(X;Z\mid Y)=0$$. The data processing inequality then follows from the non-negativity of $$I(X;Y\mid Z)\ge0$$.