Talk:Least mean squares filter

Could use work
This article could use some work but since I don't understand LMS, I can't do it. — Preceding unsigned comment added by 68.162.70.155 (talk) 05:15, 2005 October 13 (UTC)

Least Squares
Minimizing the sum of squares distance is the same as minimizing the average squared distance (providing the number of points does not change). I do not understand why this is a separate article. I therefore move that this be merged into Least squares. - JustinWick 18:05, 5 December 2005 (UTC)
 * LMS is an adaptive filter algorithm. Therefore, it belongs firmly in the area of digital signal processing. Whereas Least squares describes a mathematical technique. There must be a link from Least squares to LMS and vice versa. Faust o 17:07, 22 January 2006 (UTC)
 * Only one use of the term least means squares is an adaptive filter, it is also frequently used in statistical minimization problems. I propose that this page be moved to Least mean squares (filter). --Salix alba (talk) 00:11, 4 February 2006 (UTC)
 * Done Faust o 15:36, 7 February 2006 (UTC)

Stochastic gradient descent
Is LMS a Stochastic gradient descent algorithm? --Memming 17:10, 12 November 2006 (UTC)
 * Yes, added the link there Jmvalin 06:13, 2 February 2007 (UTC)

NLMS and more
Just added NLMS and a sort of overview figure. I changed the notation for the filter. If everyone objects and want something else, I can always update the figure. The main reason for having h and \hat{h} is that I want to add some derivations for the optimal rate, so I need to distinguish between the "real" filter and the adapted one. Jmvalin 06:07, 2 February 2007 (UTC)

Just did another batch of improvements. I'd be interested on feedback on that -- and the rest of the article Jmvalin 13:45, 7 February 2007 (UTC)

Undefined symbol.
Several places in the article the symbol $$\mathbf{h}^{H}(n) \,$$ is used, but it is not defined. I cannot tell from context what it is supposed to mean. Can someone who knows please update the article so this symbol is well defined? 207.190.198.130 00:55, 7 November 2007 (UTC)


 * It's the Hermitian transpose. I've now added a mention of this to the article.  Oli Filth(talk) 09:23, 7 November 2007 (UTC)


 * The definition occurred long after the term is used (in several places). Moved reference to Hermitian to "definition of symbols" section. Laugh Tough (talk) 17:35, 27 February 2012 (UTC)

The star symbol is also undefined in $e^*$ — Preceding unsigned comment added by 108.41.16.196 (talk) 21:47, 15 July 2020 (UTC)
 * It means conjugate. It is a common notation in signal processing.  Still, it could be explained. Constant314 (talk) 22:17, 15 July 2020 (UTC)

Misleading figure and explanations
In an article as this one I'm expecting that by simply looking at the figure to grasp as much as possible about the subject. Unfortunately, be inspecting the figure one will remain with the impression that the impulse response of the filter is computed sample by sample. That is, for the n-th input and output samples it is possible to estimate only the n-th sample of the impulse response, which is not the case. I guess that a more consistent notation would be to make n a sub / superscript because n is not the input sample number but the iteration number.

Another thing, I can't find anywhere in the text any information regarding the length of the impulse response. I can see that x(n) is in fact a column vector consisting of the previous p samples of the input signal and by this I can assume the same length for the transfer function, but I think this should be made more clear in the text.89.136.41.31 (talk) 14:27, 19 February 2009 (UTC)Apass

Gradient computation
In the section Idea, in the discussion about the steepest descent method, shouldn't it be "to take the partial derivatives with respect to the CONJUGATES of the individual entries of the filter vector"?

Indeed we want to compute the gradient of the modulus square here, and the gradient of a real function $$f(z)$$ of the complex variable $$z$$ is given by

\nabla f(z) = 2 \frac{d f(z)}{d z^*}. $$

I think it should be corrected, otherwise it can be misleading and the equation providing $$ \nabla C(n)$$ may seem general whereas it is in fact a very special case (because here $$e(n)$$ is expressed with respect to the conjugate of the filter coefficients...)

Ivilo (talk) 13:22, 15 October 2009 (UTC)


 * This has to be explained in more details. Where is 2 coming from? The way it is written in the article itself, it just magically appear. Also, are you sure that it it full derivative with respect to z*? Not partial? MxM (talk) 16:31, 24 April 2018 (UTC)

Step Size Factor
would anyone address the Step Size (μ) limitations?

One must pay attention to define the Step Size clearly.

I wish I could do it, yet I struggle to understand the contradictions about it from different sources. --Royi A (talk) 15:13, 30 December 2009 (UTC)
 * I added a section about mean stability and convergence. I omitted the proof, as it would clutter the page significantly, and anyone can look it up in the references. There should also be a section about mean square performance (specifically, one should be informed that the steady state mean square error increases monotonically with increasing $$\mu$$), maybe I'll add something later. BTW, this is my first edit on Wikipedia - does anyone know why the math looks different in the different equations, although the tags are identical? Can I do something about that? It looks weird that a greek letter is italic in one place and plain in another.

128.97.90.235 (talk) 05:54, 18 May 2010 (UTC)

Relationship to the least squares filter section needs work
I've been trying to make sense of the equation:

$$ \boldsymbol{\hat\beta} = (\mathbf{X} ^\mathbf{T}\mathbf{X})^{-1}\mathbf{X}^{\mathbf{T}}\boldsymbol y. $$

I cannot relate it to anything in the article or figures. $$ \boldsymbol{\hat\beta}$$ is defined nowhere. X and y are casually referred to as "input matrix" and "output vector". Does "output vector" mean the entire history of the output? It says X is a matrix, but the input to me looks like a vector. From what is written, I gather that the "least squares filter" is something different from the Wiener filter. I have three books on adaptive filtering, including Widrow's and none of them mention a "least squares filter". There is no "least squares filter" article. The LMS algorithm converges under certain conditions to the Wiener filter solution. I propose to delete this equation and all mention of "least squares filter." Discussion of the LMS algorithm vs the Wiener filter covers the issue. Constant314 (talk) 04:38, 20 November 2016 (UTC)

Convergence/stability section is incorrect
Started [discussion](https://dsp.stackexchange.com/questions/87091/for-which-values-of-step-size-is-lms-filter-stable) on dsp.SE on ideas to fix. — Preceding unsigned comment added by 2601:645:8800:5c70:b019:9d63:3e17:e7d3 (talk) 00:16, 2023 March 18 (UTC)


 * Fix it yourself or make the case here and maybe someone else will fix it. Constant314 (talk) 00:22, 18 March 2023 (UTC)