Talk:Gauss–Markov theorem

In the beginning the beginning confused me
But as I remembered variance as a squared thing too, it became evident. If the variances dont vary wildly, then the sum of squares dont vary wildly either. And the rest of the conditions are not hard to agree upon. We dont need the gaussian-distribution (normal) to make it "optimal". — Preceding unsigned comment added by 2001:4643:E6E3:0:B99A:4E6A:45C0:A343 (talk) 23:04, 7 January 2019 (UTC)

Untitled
The introduction is confusing to me. Not clear what is assumed and what is not. And no further reference for why.

I don't understand the definition of the least squares estimators: they are supposed to make the sum of the squares as small as possible, but this sum of squares appears to be a random variable. Are they supposed to make the expected values of those random variables as small as possible? AxelBoldt 18:31 Jan 24, 2003 (UTC)

For fixed values of xi and Yi the sum of square is a function of βi for i=1,2. I'll add some material clarifying that. Michael Hardy 19:40 Jan 24, 2003 (UTC)

What does homoscedastic mean?

The term is explicitly defined in the article, but I will make it more conspicuous. It means: having equal variances. Michael Hardy 21:29 Jan 24, 2003 (UTC)

Should it be made explicit that the variance $$\sigma^2$$ is unknown? Albmont 13:17, 27 February 2007 (UTC)

What is the meaning of $${\rm cov}\left(\varepsilon_i,\varepsilon_j\right)=0$$? $$\varepsilon_i,\varepsilon_j$$ are scalars. Covariance is defined only for vectors, isn't it? Sergivs-en 06:05, 16 September 2007 (UTC)


 * Usually the concept of covariance is initially introduced as the covariance between two scalar-valued random variables. See covariance. Michael Hardy 13:58, 16 September 2007 (UTC)

linear vs nonlinear least squares
A suggestion is for someone with an interest in this to review the treatment under least squares. The basic Gauss-Markov apparently applies to the linear case. It would be interesting if there are nonlinear generalizations.Dfarrar 14:10, 11 April 2007 (UTC)

Nice article. Assuming that the β's and x's are not random (only the ε's are), would it be more accurate to say, then, that Ŷ is a stochastic process? - just a thought Ernie shoemaker 02:20, 26 July 2007 (UTC)

Very confusing. Also, should be stated in general vector form of theorem
This article is very confusing and does not explain what it sets out to. Saying that the least squares estimator is the "best" one means nothing. How is "best" defined? Later in the article, it seems to imply that "best" means the one with smallest MSE, and so obviously the least squares estimate is "best".

Also, why not generalise to estimation of a vector of quantities, where the errors have a given correlation structure? —Preceding unsigned comment added by 198.240.128.75 (talk) 16:16, 5 February 2008 (UTC)

best?
What does best estimator mean exactely? Best like Consistent estimator? --217.83.22.80 (talk) 01:19, 14 March 2008 (UTC)


 * At least in the context of "best linear unbiased estimator" (BLUE), "best" means "minimum mean squared error within the class of linear unbiased estimators" (according to this paper by G.K. Robinson). As we're restricting to unbiased estimators, minimum mean squared error implies minimum variance. So BLUE might less confusingly be "minimum variance linear unbiased estimator" (MVLUE?). Perhaps BLUE caught on simply as it's shorter and more pronouncible. Google gives this book preview which points out that the Gauss-Markov theorem states that "$$\hat\beta$$ is BLUE, not necessarily MVUE" as the MVUE estimator may in general be non-linear, but we can't know what it is without specifying a parametric probability distribution. The article needs editing to explain this better but estimation theory is really not my strong point. Qwfp (talk) 05:32, 14 March 2008 (UTC)


 * Can there be more than one unbiased estimator? Is it possible to find another estimator with less mean squared error (should be similar to standard error?) with the standard OLS aproach if there is a violation of the gauß-markov-theorem? --217.83.23.127 (talk) 23:49, 16 March 2008 (UTC)


 * Yes, of course there are many unbiased estimators. There's a space of "linear unbiased estimators" (each one a linear combination of the response variables) and there are also non-linear ones (e.g. the average of the max and the min of a sample is not linear in the observations, and is an unbiased estimator of the population mean). Michael Hardy (talk) 01:08, 17 March 2008 (UTC)

Best? What about bounded PDF?
As I read this article (and linear least squares), it sounds like the least-squares solution should be the BULE even in cases where the solution it gives is impossible. For example, suppose I measure a length three times and suppose that my measurement noise has a PDF of a uniform distribution within ±1mm of the true value. Suppose the length is actually 1mm and by chance I get 0.1mm, 0.1mm, and 1.9mm. All three measurements are off by 0.9mm, which is possible given my PDF. But the mean (least-squares estimate) of the samples is 0.7mm, which is 1.2mm away from the measurement of 1.9mm, which is impossible. My three measurements are consistent with a true value anywhere between 0.9mm and 1.1mm. I imagine the true likelihood function would be highest at 0.9mm and lowest at 1.1mm. This example seems to contradict the Gauss–Markov theorem. What am I missing? How do the assumptions of the Gauss–Markov theorem avoid this problem? —Ben FrantzDale (talk) 19:21, 6 December 2008 (UTC)


 * It can't contradict the Gauss–Markov theorem if it's not a linear function of the tuple of observed random variables, nor if it is biased. Maximum likelihood estimators are typically biased.  And it is well-known that unbiased estimation can result in "impossible" solutions, whereas maximum likelihood cannot. Michael Hardy (talk) 23:15, 6 December 2008 (UTC)


 * That makes sense; duah :-). I may add that clarification to this article and/or to linear least squares. Thanks. —Ben FrantzDale (talk) 15:51, 7 December 2008 (UTC)

Use conditional on X, not to assume X deterministic
Suppose we don't assume Xi to be deterministic, Don't we also need Xi and epsilon i to be independent? 69.134.206.109 (talk) 16:54, 21 March 2009 (UTC)

Don't we need E(e|X)=0 (conditional), as well as E(e)=0 (unconditional) so that we can say E(X'e)=0. In the proof, I am having trouble seeing how the expected value distributes... --169.237.57.103 (talk) 16:27, 27 June 2011 (UTC)

Typo in the expression ot C' (C-transpose)
C' is equal to X (X X')^-1 + D'

At the time of my comment, C' was written as: X (X' X)^-1 + D'

And final expression for V(β) is therefor: σ^2 (X X')^-1 + σ^2 DD' — Preceding unsigned comment added by Pathdependent1 (talk • contribs)


 * If
 * $$ C = (X'X)^{-1}X' \, $$
 * then
 * $$ C' = X(X'X)^{-1}. \, $$
 * The matrix in parentheses is symmetric, i.e. it is its own transpose. Note that
 * $$ (AB)' = B'A' \, $$
 * i.e. the order gets reversed and the two matrices get transposed. Applying this when
 * $$ A = X'\text{ and }B = X \, $$
 * means X gets changed to X' and vice-versa, and the order of multiplication then gets reversed. That means the transpose of (X'X) is (X'X).
 * Hence there was no typo; the expression was correct. Michael Hardy (talk) 22:06, 13 July 2011 (UTC)
 * Hence there was no typo; the expression was correct. Michael Hardy (talk) 22:06, 13 July 2011 (UTC)

Copyright problem removed
Prior content in this article duplicated one or more previously published sources. The material was copied from: http://wiki.answers.com/Q/When_are_OLS_estimators_BLUE. Infringing material has been rewritten or removed and must not be restored, unless it is duly released under a compatible license. (For more information, please see "using copyrighted works from others" if you are not the copyright holder of this material, or "donating copyrighted materials" if you are.) For legal reasons, we cannot accept copyrighted text or images borrowed from other web sites or published material; such additions will be deleted. Contributors may use copyrighted publications as a source of information, but not as a source of sentences or phrases. Accordingly, the material may be rewritten, but only if it does not infringe on the copyright of the original or plagiarize from that source. Please see our guideline on non-free text for how to properly implement limited quotations of copyrighted text. Wikipedia takes copyright violations very seriously, and persistent violators will be blocked from editing. While we appreciate contributions, we must require all contributors to understand and comply with these policies. Thank you. Voceditenore (talk) 14:04, 10 October 2011 (UTC)
 * Please note also that the content of wiki.answers.com is reader-generated and should never be used as a source. Voceditenore (talk) 14:04, 10 October 2011 (UTC)

Important points?
There are two remarks which -- if correct -- I think should be further emphasized in the article.

The first remark is explained here. It can be summed up as: OLS  =   Argmin of residual   =   (XX')^(-1)X'Y   =   argmin(among linear, unbiased estimators) of error   =   BLUE

The second remark pertains to linearity. As far as I can tell, there are two relations which are linear, which are used in the derivation. The first linear relation is the data-generating model Y = XB + err. The second is the requirement that the estimator be linear in the data. The fact that there are two linear relations is sometimes left out of the discussion. For example here. Only the linearity of the data-generating model is pointed out, even though they derive the BLUE.

Similarly, the OLS "happens to be linear", the BLUE has it as a requirement.

Any experts that can comment on these remarks? — Preceding unsigned comment added by 2001:700:3F00:0:189C:4003:11DF:E92E (talk) 17:22, 24 April 2013 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Gauss–Markov theorem. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20040213071852/http://emlab.berkeley.edu/GMTheorem/index.html to http://emlab.berkeley.edu/GMTheorem/index.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 21:11, 11 October 2017 (UTC)

Hessian Matrix Proof
(′ is Transpose). At the end of the proof, when we get H = 2X′X, we can just use that if B=√2 X, then H=B′B. From the Wikipedia article on definite matrixes, it follows that H is definite. In this article we have a different proof (a pretty nice one in my opinion, just long and maybe unnecessary). Should we remove it and replace it by the simpler theorem? Lainad27 (talk) 02:07, 9 November 2021 (UTC)

Nevermind, forgot X is not always a square matrix... Lainad27 (talk) 07:40, 9 November 2021 (UTC)