User:Chenxugz/sandbox

The relationship between mutual information and minimum mean-square error (MMSE) in additive Gaussian noise channels serves as a hidden bridge connecting information theory and estimation theory. The results hold for input signals with arbitrary distributions and can be extended to the family of channels characterized by Lévy process. The results have found an abundant applications in mismatched estimation, EXIT chart analysis for sparse-graph codes and compressive sensing.

=Definition= Consider two random variables $$X$$ and $$Y$$. $$X$$ is the input signal and $$Y$$ is a noisy observation of $$X$$ characterized by $$Y = \sqrt{\gamma} X + N,$$ where $$N \sim \mathcal{N}(0,1)$$ is a standard Gaussian random variable. Note that if the average power of $$X$$ is 1, then the signal-to-noise-ratio (SNR) is exactly $$\gamma$$. For a general case when $$X$$ does not have a unit power, SNR is proportional to $$\gamma$$.

The mutual information between X and Y is calculated as $$ I(X;Y) = E\left\{\ln \frac{p_{XY}(X,Y)}{p_X(X) p_Y(Y)} \right\}$$ nats/channel use.

The MMSE estimation is to find an estimated signal $$\hat{X}(Y)$$ such that the mean-square error (MSE) $$ E\{(X-\hat{X})\}$$ is minimized. The MMSE estimator is $$\hat{X}(Y)= E\{X| Y; \gamma \}$$ and the achieved MMSE is given by $${\rm mmse}(\gamma) = E\left\{(X - E\{X| Y; \gamma \} )^2 \right\}. $$

An interesting finding is that there exists a simple and neat relationship between $$I(\gamma) = I(X; \sqrt{\gamma}X+N)$$ and $${\rm mmse}(\gamma)$$, the two important quantities in information theory and estimation theory, for input signals with arbitrary distributions.

=Results for Discrete-time Gaussian Channels=

Scalar Channel
It can be seen from the simple Gaussian input case that there exists a neat relationship between $$I(\gamma)$$ and $${\rm mmse}(\gamma)$$. Surprisingly the relationship holds for input signals with arbitrary distribution. Let $$N$$ be a standard Gaussian random variable independent of $$X$$. For every input distribution $$P_X$$ that satisfies $$E X^2 < \infty$$, the following satisfies $$\frac{d}{d \gamma} I(X; \sqrt{\gamma}X+N) = \frac{1}{2} {\rm mmse}(\gamma) .$$

Vector Channel
Consider the vector Gaussian channel

$$\mathbf{Y} = \sqrt{\gamma} \mathbf{H} \mathbf{X} +\mathbf{N} ,$$

where $$\mathbf{H}$$ is a deterministic $$L \times K$$ matrix, $$\mathbf{Y}$$ is a $$L \times 1$$ random vector, $$X$$ is a $$K \times 1$$ random vector and the noise matrix $$N$$ consists of independent standard Gaussian entries. Specifically, the channel is specified by the conditional probability between input and output $$p_{\mathbf{Y|X}} (\mathbf{y|x}) = (2 \pi)^{-L/2} \exp(-\frac{1}{2} ||\mathbf{y}-\sqrt{\gamma} \mathbf{Hx}||^2)$$, where $$||\cdot||$$ denotes the $L^2$ norm  of the vector. The MMSE estimator for $$\mathbf{X}$$ is $$\hat{\mathbf{X}}(\mathbf{Y}) = E\{\mathbf{X} | \mathbf{Y}, \gamma\}$$ and the MMSE for $$\mathbf{H} \mathbf{X}$$ is $${\rm mmse}(\gamma) = E\left\{ ||\mathbf{H} \mathbf{X} - \mathbf{H} \hat{\mathbf{X}}(\mathbf{Y})||^2 \right\} .$$ Then for every input distribution satisfying $$E\{||\mathbf{X}||^2\} < \infty$$, the following relation between MMSE and mutual information holds,

$$\frac{d}{d \gamma} I(\mathbf{X}; \sqrt{\gamma} \mathbf{H} \mathbf{X} + \mathbf{N}) = \frac{1}{2} {\rm mmse}(\gamma).$$

Two examples
Example 1: Consider the case where $$X \sim \mathcal{N}(0,1)$$, i.e., its distribution is standard Gaussian. The mutual information between $$X$$ and $$Y$$ is a well-known result of the capacity for Gaussian channel with input power constraint $$ I(X;Y) = \frac{1}{2} \ln (1+\gamma).$$

The MMSE estimator is a scaling of the observed signal $$Y$$ calculated as $$\hat{X}(Y) = \frac{\sqrt{\gamma}}{1+\gamma} Y $$ and hence the MMSE is $$ {\rm mmse}(\gamma) = \frac{1}{1+\gamma}.$$

Obviously, the relationship between mutual information and MMSE satisfies $$ \frac{d I(\gamma)}{d \gamma} = \frac{1}{2} {\rm mmse}(\gamma) .$$

Example 2: Consider the case where $$X \in \{+1,-1\}$$ with equal probability. The mutual information for this binary-input additive Gaussian channel is

$$I(\gamma)= \gamma - \int_{- \infty}^{\infty} \frac{e^{-y^2/2}}{\sqrt{2 \pi}} \ln \cosh(\gamma -\sqrt{\gamma}y) d y$$

and the MMSE is

$${\rm mmse}(\gamma)= 1 - \int_{- \infty}^{\infty} \frac{e^{-y^2/2}}{\sqrt{2 \pi}} \tanh(\gamma -\sqrt{\gamma}y) d y$$.

By direct calculation, it can be verified that the mutual information and MMSE satisfies the relationship.

=Proof=



There are different ways to prove the relationship between mutual information and MMSE in Gaussian channels. It can be proved rigorously from the definitions of mutual information by deriving the differential with respect to $$\gamma$$ is exactly a one-half scaling of MMSE. Another way of proving the result is to use the idea of incremental-channel. The approach is to consider the difference between two channels with a vanishing difference of SNR. Applying the existence results of mutual information and MSE for a channel with vanishing SNR can yield the result. This technique makes use of the fact that Gaussian noise has independent increments and therefore a Gaussian channel can be divided into a series of degraded channels.

Proof using incremental-channel:

For any $$\gamma >0 $$ and $$\delta >0$$. Consider a cascade of two Gaussian channels $$Y_1 = X + \sigma_1 N_1$$ and $$Y_2 = Y_1 + \sigma_2 N_2$$, where $$X$$ is the input and $$N_1$$ and $$N_2$$ are independent standard Gaussian random variables. Let $$\sigma_1^2 = \frac{1}{\gamma+\delta}$$ and $$\sigma_1^2 + \sigma_2^2 = \frac{1}{\gamma}$$. It is equivalent to saying that the channel between $$X$$ and $$Y_2$$ with $$SNR=\gamma$$ is a degraded channel with respect to the channel between $$X$$ and $$Y_1$$ with $$SNR=\gamma+\delta$$. Actually the output of the stronger channel (with a larger SNR) $$Y_1$$ can be written in terms of the output of the degraded channel and input as $$Y_1 = \frac{1}{\gamma+\delta} (\gamma Y_2+\delta X + \sqrt{\delta} N$$), where $$N = \frac{1}{\sqrt{\delta}} (\delta \sigma_1 N_1 - \gamma \sigma_2 N_2)$$. Moreover, $$N$$ has a zero mean and unit variance, and is independent of $$X$$ and $$Y$$ due to the fact that $$E\{N Y_2\} = E\{N(X+\sigma_1 N_1 + \sigma_2 N_2) \} = E\{N(\sigma_1 N_1 + \sigma_2 N_2) = 0 \}.$$

To prove $$\frac{d}{d \gamma} I(X; \sqrt{\gamma}X+N) = \frac{1}{2} {\rm mmse}(\gamma) $$, it suffices to show that $$I(X;Y_1) - I(X;Y_2) = \frac{\delta}{2} {\rm mmse}(\gamma) + o(\delta)$$.

Since $$X-Y_1-Y_2$$ is a Markov chain, from the chain rule of mutual information, it can easily shown that $$I(X; Y_1) - I(X;Y_2) = I(X; Y_1 | Y_2)$$. Recall the relationship between $$Y_1$$ and $$(X,Y_2)$$, $$I(X; Y_1 | Y_2= y_2) = I(X; \gamma Y_2 + \delta X + \sqrt{\delta} N| Y_2 = y_2) = I(X; \sqrt{\delta}X+N)$$. In other words, given a realization of $$Y_2$$, the channel between $$X$$ and $$Y_1$$ is equivalent to a channel with a vanishing $$SNR$$. Utilizing the equality $$I(X; \sqrt{\delta}X+W) = \frac{\delta}{2} E\{ (X-EX)^2\} + o(\delta)$$, where $$W \sim \mathcal{N}(0,1)$$ given in, it satisfies

$$ I(X; Y_1 | Y_2= y_2) = \frac{\delta}{2} E\{ (X - E\{X| Y_2 = y_2\})^2 | Y_2 = y_2 \} + o(\delta)$$.

Taking the expectation over $$Y_2$$ yields $$I(X; Y_1 | Y_2) = \frac{\delta}{2} E\{ (X - E\{X| Y_2 \})^2 \} + o(\delta)$$. Therefore, $$I(X; Y_1) - I(X;Y_2) = I(\gamma+\delta) - I(\gamma) = \frac{\delta}{2} {\rm mmse}(\gamma) + o(\delta)$$.

=Results for Continuous-time Gaussian Channels=

The continuous-time Gaussian channels with input process $$X_t$$ and output process $$Y_t$$ is characterized by the stochastic differential equation (SDE)

$$d Y_t = \sqrt{\gamma} X_t dt + d W_t,$$

where $$W_t$$ is a standard Wiener process.

Let $$\mu_X$$ denote the probability measure induced by $$\{X_t\}$$ in the interval of interest. The input-output mutual information is defined by

$$I(X_0^T; Y_0^T) = \int \ln \Phi d \mu_{XY},$$

if the Radon-Nikodym derivative $$\Phi = \frac{d \mu_{XY}}{d \mu_X d\mu_Y}$$ exists.

The mutual information rate is defined as $$I(\gamma) = \frac{1}{T} I(X_0^T; Y_0^T).$$ The causal and noncausal MMSE at any time $$t \in [0,T]$$ are defined as $${\rm cmmse}(t,\gamma) = E\left\{(X_t - E\{X_t | Y_0^t; \gamma \})^2 \right\}$$ and $${\rm mmse}(t,\gamma) = E\left\{ (X_t - E\{X_t | Y_0^T; \gamma \})^2 \right\} $$. The average causal and noncausal MMSE are defined as $$ {\rm cmmse}(\gamma) = \frac{1}{T} \int_0^T {\rm cmmse}(t, \gamma) dt $$ and $$ {\rm mmse}(\gamma) = \frac{1}{T} \int_0^T {\rm mmse}(t, \gamma) dt. $$

The relationships between noncausal MMSE, causal MMSE and mutual information are summarized as follows,

Noncausal MMSE and mutual information: $$ \frac{d}{d \gamma} I(\gamma) = \frac{1}{2} {\rm mmse}(\gamma) $$.

Causal MMSE and mutual information due to Duncan : $$I(\gamma) = \frac{\gamma}{2} {\rm cmmse}(\gamma)$$.

Causal MMSE and non-causal MMSE: $${\rm cmmse}(\gamma) = \frac{1}{\gamma} \int_0^{\gamma} {\rm mmse} (\beta) d\beta $$. This result follows naturally from the relationship between mutual information and noncausal MMSE and the relationship between causal MMSE and mutual information. It has been shown that the causal MMSE at a certain $${\rm SNR}=\gamma$$ is equal to the smoothing average of noncausal MMSE achieved with SNR uniformly distributed betweeen 0 and $$\gamma$$.

=Extensions on Other Channels=

The incremental-channel proof technique, which only requires the noise process to have independent increments, can be utilized to find out the relationship between mutual information and conditional mean estimation on channels characterized by Levy processes. The following is such result for Poisson channels.

Poisson Random Transformation
Consider an input signal $X$ following a distribution $$P_X$$ and an output signal $$Y = \mathcal{P}(X)$$ characterized by the conditional probability

$$P_{Y|X}(k|x) = \frac{1}{k !} x^k e^{-x}$$.

The conditional mean estimate of $$X$$ given the output $$Y$$ is denoted by $$ = E\{X|\mathcal{P}(X)\}$$.

For every $$\lambda >0$$ and positive random variable $$X$$ with $$E\{X \log X\} < \infty$$, the derivative of the input-output mutual information of the Poisson random transformation $$X \to \mathcal{P}(X)+\lambda$$ satisfies

$$\frac{d}{d \lambda} I(X; \mathcal{P}(X+\lambda)) = E\left\{\ln (X+\lambda) - \ln  \right\} = E\left\{ \ln \frac{X+\lambda}{} \right\}$$

For every $$\alpha >0$$, $$\lambda \geq 0$$ and positive $$X$$ with $$E\{X \log X\} < \infty$$, the relation between mutual information and conditional mean estimation satisfies

$$\frac{\partial}{\partial \alpha} I(X; \alpha X + \mathcal{P}(\alpha X + \lambda)) = E \left\{X \ln \frac{\alpha X + \lambda}{<\alpha X + \lambda>} \right\} $$

Continuous-time Poisson Channel
Let $$X_0^T$$ denote the input process and the output process $$Y_0^T = \mathcal{P}_0^T (X_0^T)$$ is a Poisson process modulated by $$X_0^T$$. Specifically, for any $$0 \leq t < s \leq T$$,

$$P\{Y_s - Y_t = k | X_0^T \} = \frac{1}{k !} \Lambda^k e^{-\Lambda},$$ where $$\Lambda = \int_t^s X_l d l$$.

The conditional mean of the input at time $$t$$ given by the output process from time 0 to time $$s$$ is denoted as $$_s = E\{X_t | \mathcal{P}_0^s(X_0^s) \}$$. Note that $$_t$$ corresponds to the causal conditional mean estimate and $$_T$$ corresponds to the non-causal conditional mean estimate.

Suppose the input process satisfies $$E \int_0^T |X_t \ln X_t| dt < \infty$$, then for every $$\lambda >0$$,

$$\frac{d}{d \lambda} I \left(X_0^T; \mathcal{P}_0^T (X_0^T + \lambda) \right) = \int_0^T E \left\{ \ln \frac{X_t + \lambda}{_T} \right\} $$

and

$$\frac{\partial}{\partial \alpha} I \left(X_0^T; \mathcal{P}_0^T(\alpha X_0^T + \lambda) \right) = \int_0^T E \left\{X_t \ln \frac{\alpha X_t + \lambda}{<\alpha X_t + \lambda>_T} \right\}. $$

=Applications=

The relationship between MMSE and mutual information finds applications in various areas. It can be applied to derive new representations on information measures in terms of MMSE. For example, the entropy can be calculated as $$H(X) = \frac{1}{2} \int_0^{\infty} E\left\{ (g(X) - E\{g(X)| \sqrt{\gamma} g(X) +N \})^2 \right\} d \gamma$$ for any one-to-one mapping function $$g$$ and standard Gaussian random variable $$N$$. Some interesting properties of mismatched estimation are derived harnessing the mutual information and MMSE relationship. For the application in compressive sensing, such result has played a critical role in connecting MMSE and Rényi information, which is essential to prove the information-theoretically optimal compressive sensing. = Citations =

=References=