Similarities between Wiener and LMS

The Least mean squares filter solution converges to the Wiener filter solution, assuming that the unknown system is LTI and the noise is stationary. Both filters can be used to identify the impulse response of an unknown system, knowing only the original input signal and the output of the unknown system. By relaxing the error criterion to reduce current sample error instead of minimizing the total error over all of n, the LMS algorithm can be derived from the Wiener filter.

Derivation of the Wiener filter for system identification
Given a known input signal $$s[n]$$, the output of an unknown LTI system $$x[n]$$ can be expressed as:

$$x[n] = \sum_{k=0}^{N-1} h_ks[n-k] + w[n]$$

where $$h_k$$ is an unknown filter tap coefficients and $$w[n]$$ is noise.

The model system $$\hat{x}[n]$$, using a Wiener filter solution with an order N, can be expressed as:

$$\hat{x}[n] = \sum_{k=0}^{N-1}\hat{h}_ks[n-k]$$

where $$\hat{h}_k$$ are the filter tap coefficients to be determined.

The error between the model and the unknown system can be expressed as:

$$e[n] = x[n] - \hat{x}[n]$$

The total squared error $$E$$ can be expressed as:

$$E = \sum_{n=-\infty}^{\infty}e[n]^2$$

$$E = \sum_{n=-\infty}^{\infty}(x[n] - \hat{x}[n])^2$$

$$E = \sum_{n=-\infty}^{\infty}(x[n]^2 - 2x[n]\hat{x}[n] + \hat{x}[n]^2)$$

Use the Minimum mean-square error criterion over all of $$n$$ by setting its gradient to zero:

$$\nabla E = 0$$ which is $$\frac{\partial E}{\partial \hat{h}_i} = 0$$ for all $$i = 0, 1, 2, ..., N-1$$

$$\frac{\partial E}{\partial \hat{h}_i} = \frac{\partial}{\partial \hat{h}_i} \sum_{n=-\infty}^{\infty}[x[n]^2 - 2x[n]\hat{x}[n] +  \hat{x}[n]^2 ]$$

Substitute the definition of $$\hat{x}[n]$$:

$$\frac{\partial E}{\partial \hat{h}_i} = \frac{\partial}{\partial \hat{h}_i} \sum_{n=-\infty}^{\infty}[x[n]^2 - 2x[n]\sum_{k=0}^{N-1}\hat{h}_ks[n-k] +  (\sum_{k=0}^{N-1}\hat{h}_ks[n-k])^2 ]$$

Distribute the partial derivative:

$$\frac{\partial E}{\partial \hat{h}_i} = \sum_{n=-\infty}^{\infty}[-2x[n]s[n-i] + 2(\sum_{k=0}^{N-1}\hat{h}_ks[n-k])s[n-i] ]$$

Using the definition of discrete cross-correlation:

$$R_{xy}(i) = \sum_{n=-\infty}^{\infty} x[n]y[n-i]$$

$$\frac{\partial E}{\partial \hat{h}_i} = -2R_{xs}[i] + 2\sum_{k=0}^{N-1}\hat{h}_kR_{ss}[i - k] = 0 $$

Rearrange the terms:

$$R_{xs}[i] = \sum_{k=0}^{N-1}\hat{h}_kR_{ss}[i - k] $$ for all $$i = 0, 1, 2, ..., N-1$$

This system of N equations with N unknowns can be determined.

The resulting coefficients of the Wiener filter can be determined by: $$W = R_{xx}^{-1} P_{xs}$$, where $$ P_{xs} $$ is the cross-correlation vector between $$x$$ and $$s$$.

Derivation of the LMS algorithm
By relaxing the infinite sum of the Wiener filter to just the error at time $$n$$, the LMS algorithm can be derived.

The squared error can be expressed as:

$$ E = (d[n] - y[n])^2 $$

Using the Minimum mean-square error criterion, take the gradient:

$$\frac{\partial E}{\partial w} = \frac{\partial}{\partial w}(d[n] - y[n])^2$$

Apply chain rule and substitute definition of y[n]

$$\frac{\partial E}{\partial w} = 2(d[n] - y[n]) \frac{\partial}{\partial w}(d[n] - \sum_{k=0}^{N-1}\hat{w}_kx[n-k])$$

$$\frac{\partial E}{\partial w_i} = -2(e[n])(x[n-i])$$

Using gradient descent and a step size $$\mu$$:

$$w[n+1] = w[n] - \mu\frac{\partial E}{\partial w}$$

which becomes, for i = 0, 1, ..., N-1,

$$w_i[n+1] = w_i[n] + 2\mu(e[n])(x[n-i])$$

This is the LMS update equation.