Talk:Mean squared error

INCONSISTENT ARTICLE TITLE
There is another article published on Wikipedia and titled "Root mean square error". Notice the use of the adjective "square", rather than "squared".

For the sake of consistency, I suggest to use "square" everywhere, including the title of this article, and indicate in the text that "squared" can be also used:

A disambiguation is also necessary.

It is also true that a google search yields about twice as many hits for "mean square error" as "mean squared error". Cazort (talk) 17:22, 23 December 2007 (UTC)


 * Even though nobody's responded in three years, I'm going to add my support that something should be changed. The title says "mean squared error", but the first line says "mean square error". Maybe there should be consistency within the same article, at least? 67.174.115.100 (talk) 10:02, 15 November 2010 (UTC)
 * I agree, so I'll move it. Qwfp (talk) 09:24, 16 November 2010 (UTC)

MSE examples wrong?
I suspect that the MSEs presented for $$S^2_{n-1}$$ and $$S^2_{n}$$ might be wrong. At least the formulas presented are completely different from those shown in Mood, Graybill and Boes (1974) Introduction to the Theory of Statistics (see pages 229 and 294). The formulas presented in MGB, which is a classic, are more complex and include terms in $$\mu_4$$. The derivation of the MSE for $$S^2_{n-1}$$, which is the same as the variance in this case, seems a rather tedious task. I think someone with some expertise in this area should have a look on this issue. --Bluemaster (talk) 21:06, 22 June 2008 (UTC)


 * μ4 implies they're looking at a multivariate estimator, most likely the James-Stein estimator, which is a different animal than the univariate ones. -- DanielPenfield (talk) 22:41, 22 June 2008 (UTC)
 * No, it is indeed the univariate estimator in this case: $$\mu_4$$ in this context is just $$E[(X-\mu_x)^4]$$, the fourth central moment. The MSE expression in Mood, Graybill and Boyes for $$S^2_{n-1}$$ is $$ \frac{1}{n} [\mu_4-\frac{n-3}{n-1}\sigma^4] $$. A possible explanation is that this expression simplifies to the one presented in the article, but to be in the safe side, an expert review would be advisable.--Bluemaster (talk) 01:48, 23 June 2008 (UTC)
 * The formula presented as MSE for $$S^2_{n}$$ is clearly wrong if the MSE for $$S^2_{n-1}$$ is correct (not sure). As $$S^2_{n}=\frac{n-1}{n} S^2_{n-1}$$

it is easy to show that $$\operatorname{MSE}(S^2_{n})=\frac{2n-1}{n^2}\sigma^4$$, accepting the result on $$\operatorname{MSE}(S^2_{n-1})$$ as correct. --Bluemaster (talk) 12:45, 24 June 2008 (UTC)
 * After some research I believe I've found out what is going on: the result $$\operatorname{MSE}(S^2_{n-1})=\frac{2}{n-1}\sigma^4$$, presented in the article, is not general but is correct under the assumption that $$(n-1)S^2_{n-1}/\sigma^2$$ has $$\chi^2$$ distribution with $$n-1$$ degrees of freedom, which has variance $$2(n-1)$$. This result follows, for instance, from the assumption of normality on the $$X_i$$s used in the computation of $$S^2_{n-1}$$. The general result, without distributional assumptions, is, I believe, the one presented in Mood, Graybill and Boes (1974, p. 229 and 294) that is $$\operatorname{MSE}(S^2_{n-1})= \frac{1}{n} [\mu_4-\frac{n-3}{n-1}\sigma^4]$$, with $$\mu_4=\operatorname{E}[(X-\mu)^4]$$. The result $$\operatorname{MSE}(S^2_{n})=\frac{2n+1}{n^2}\sigma^4$$ is clearly wrong and will be corrected. The correct result, derived from $$\operatorname{MSE}(S^2_{n-1})=\frac{2}{n-1}\sigma^4$$, is $$\operatorname{MSE}(S^2_{n})=\frac{2n-1}{n^2}\sigma^4$$.  The conclusion that $$\operatorname{MSE}(S^2_{n})<\operatorname{MSE}(S^2_{n-1})$$ is indeed correct for the Gaussian distributions, despite these problems. I will wait for comments during the next days and will proceed to make some changes in the article to reflect these findings if there is no disagreement or more insights. --Bluemaster (talk) 13:15, 25 June 2008 (UTC)
 * Corrections made. Bluemaster (talk) 03:03, 10 July 2008 (UTC)

Also contradictory: In the section "regression"
"In regression analysis, the term mean squared error is sometimes used to refer to the unbiased estimate of error variance: the residual sum of squares divided by the number of degrees of freedom... Note that, although the MSE is not an unbiased estimator of the error variance, it is consistent, given the consistency of the predictor."

We have to choose; It cannot be both biased and unbiased.

--Livingthingdan (talk) 14:33, 25 January 2014 (UTC)


 * I've clarified it in the text. Loraof (talk) 18:21, 19 August 2016 (UTC)

MSE
The article does not give explicit formulae of the MSE for the estimators in the example. Could someone fill this in?

Someone has suggested that the page for Root mean square deviation (RMSD) be merged with mean squared error. I do not think that it makes sense to do this for several reasons: 1. MSE is a measure of error, whereas RMSD method for comparing two biological structures. 2. RMSD is used almost exclusively in the context of protein folding, whereas MSE is used to describe statistics 3. Merging the articles would result in losing the meaning of the RMSD article.

Note that root mean squared deviation is different than root mean squared error.


 * My two cents: --DanielPenfield 17:40, 1 November 2006 (UTC)
 * RMSE = estimator of average error, RMSD = estimator of average distance. They're measuring the same thing:  differences or variation.
 * RMSD is used in disciplines other than bioinformatics/biostatistics&mdash;try googling RMSD and "electrical engineering", for example.
 * Merging the articles should preserve the RMSD tie-in.


 * My opinion: -- PdL -- January 11 2007 (UTC)
 * D in RMSD typically stands for "deviation", not for "distance". The distance and the difference between two scalar values are not exactly the same thing: the distance is the absolute value of the difference.
 * Deviation is the difference between the real value of a variable and its estimated or expected or predicted or "desired" value (for instance, the mean).
 * On the other hand, "error" is the difference between an estimated value of a variable and its real value. There are errors "of estimate" as well as errors "of measurement", and they are all with respect to the (often unknown) real value of the variable.
 * I conclude that a deviation is the additive opposite of an error. I agree that both words indicate differences, but they have not exactly the same meaning, and it is inappropriate to use theme as synonims.

Content
Sweeping critique: This article is pretty useless to anyone but a math major.

Specific suggestion: If someone agrees with me on the following statement, then it would be helpful if added into the article--

"MSE is also sometimes called the variance; RMSE is also sometimes called the standard deviation."

I'm pretty sure that's correct, but I won't add it without confirmation.


 * Agreed that the article could be made more friendly to those of use who haven't studied statistical theory. BTW, MSE and RMSE estimate the variance and standard deviation.  To equate them would be inaccurate.  --DanielPenfield 17:40, 1 November 2006 (UTC)

Some indication of how MSE differs from the variance would be useful.


 * MSE has a lot in common with variance but they are not the same! As an example, suppose you are trying to estimate the mean of a random variable that has a normal distribution with mean m and variance 1.  The mean, m, is a fixed number, but it is unknown.  Now, suppose you take a sample from this random variable.  If you try to estimate m, your estimator is taking the sample and using it to guess m.  A simple case would be to take one sample and have your guess for a be whatever value is picked.  But you could also pick many samples and use, say, the median of your sample as the estimator.  Or you could just guess the value 0 no matter what (this would be a bad estimator but it would still be an estimator!).  The variance of the estimator is going to be the amount by which the estimator varies about ITS mean, not the true mean.  The MSE is the amount that the estimator varies about its TRUE mean, which in this example is the number m.  For an unbiased estimator, the MSE and the variance are the same.  But often, it is not possible to find an unbiased estimator, or in cases a biased estimator might be preferred.  I hope this answers the questions given here.  Cazort (talk) 17:18, 23 December 2007 (UTC)

"MSE is also sometimes called the variance; RMSE is also sometimes called the standard deviation." Well, the MSE is a random variable itself that needs to be estimated. It's not just a number. If it has been estimated, it gives a measures of the variation of an estimator with repect to a known parameter. But it is not the variance as it also accounts for the bias of the estimator. Squim 10:59, 24 December 2006 (UTC)

In Examples, is it really true that $$S^2 = \frac{1}{n}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2$$ has a lower MSE than the unbiased estimator $$S^2 = \frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2$$? I agree that it has a lower variance, but this is offset when calculating the MSE by the bias term.

Providing a practical example with real numbers would be desirable. —Preceding unsigned comment added by 131.203.101.15 (talk) 22:14, 16 September 2007 (UTC)

I agree whole heartedly with this sweeping critique. Like many mathematics articles on wikipedia, it's written by experts for experts instead of by experts for laymen, but since laymean don't really understand where to start asking questions, the problem is never fixed. I speak English and if you tell me something in English, I can understand it. But if you write something in a mathematical equation using symbols that are by conventions known only to those who have studied mathematics formally, I will not understand you. Hence, if you tell me "the mean squared error equals the average (mean) of the squares of the variance" I know what you mean. I don't know what
 * $$\operatorname{MSE}(\hat{\theta})=\operatorname{E}\big[(\hat{\theta}-\theta)^2\big].$$

means, except that I'm able to guess from the context. Please state all these equations in English. $$\operatorname{E}$$ is just a capital "E" where I'm from. 134.114.203.37 (talk) 18:40, 20 January 2011 (UTC)

It would have been nice if it was mentioned that $$\operatorname{E}$$ is the symbol for the Expected value.MahdiEynian (talk) 08:39, 4 September 2012 (UTC)

I agree with the previous comment that this article is pretty useless to anyone but a math major. In my opinion, most people look up MSE to get a general idea what it is and how to calculate it - and not for the ultra-precise mathematical dedinition. If I don't know what the MSE is, I am very likely also not to know what all those other greek symbols on the page mean... (and worse - I can't even google them). What about a simple paragraph for the layman first, along the lines of MSE = average((y-x)^2) / average((y^2 - x^2))... — Preceding unsigned comment added by 146.203.126.246 (talk) 21:58, 31 January 2012 (UTC)

Squared error loss
Squared error loss redirects to this page, and appropriately so. However, because of this, and because that term is fairly commonly used, and also because this is a question of naming and definitions, I think that the remark about the term squared error loss should remain at the very top of the page, with the definition. Cazort (talk) 23:08, 26 December 2007 (UTC)

Accessibility of this Article
I first want to say that I am fully committed to making this page accessible, but there are some disagreements about how to do this. In particular, I have been editing the page in order to make it more concise, and removing the explanations/expositions of topics that are duplicated elsewhere. The way I look at things is this:


 * A concept like Mean squared error builds off a number of other topics. It would be hard to understand MSE without understanding concepts like expected value and variance.  Also, to understand the difference between MSE and variance requires understanding more subtle concepts like that of a random variable, a sample, an estimator, estimand, and estimate.


 * Using technical terms does not necessarily make the article less accessible, nor does replacing them with expanded explanations make it less more so.


 * Defining technical terms within in the article is not the appropriate when it leads to duplication elsewhere on wikipedia. Rather, these terms should be referenced on other pages.  This is the whole point of wikipedia!  Wikipedia is based around the idea of a web of knowledge, not a more-or-less linear exposition of knowledge like is found in most textbooks.

Cazort (talk) 00:15, 28 December 2007 (UTC)

Defining terms, no. But is defining the variables used in the examples section possible? I would do it myself if only I had the knowledge. Inclusion of the variable names would, in one fell swoop, change this article's value to me from nearly useless to something frequently referenced. It is very difficult to apply currently.Cranhandler (talk) 22:31, 24 October 2009 (UTC)

Normative statements
I think that a statement "The error is phrased as a mean of squares ... because ..." is problematic because it does not specify what is meant by "because". I think that there are different things going on here, which is that MSE is used in some circumstances solely out of convenience and because the exact choice of loss function has little bearing on the result. However, in other situations, it is used because it approximates some loss function arising in utility theory. In other situations, it might be inappropriate. Still more, there are circumstances where there are compelling theoretical reasons to use it--such as its direct relationship to the expected value, whereas mean absolute error is a more natural way to measure the error of an estimate of a median. On these grounds, I would like to say that I don't think we should avoid normative statements about the loss functions, rather, I think we really ought to include them, and to discuss in more detail exactly why MSE is used in different situations. Cazort (talk) 00:28, 28 December 2007 (UTC)

multiple usages.
This article confuses two distinct usages of MSE. I've clarified the distinction in the first section, but other sections still have the same confusion.

Also, with regard to square vs squared: It's squared error, of which we then take either the expectation or an average (depending on usage). Therefore Mean Squared Error is correct, Mean Square Error is incorrect. (Similarly, chi-square distribution is incorrect. Chi-squared is correct).

--Zaqrfv (talk) 22:24, 13 August 2008 (UTC)


 * The second usage would be more correctly placed at residual sum of squares. See Errors and residuals in statistics —3mta3 (talk) 18:48, 26 October 2009 (UTC)

Removed link from SSD
Per WP:DABNOT SSD refers to Sum of Squared Differences and that redirects to this page. A DAB requires an article show the acronym for which it is linking. If someone can add SSD in reference to Sum of Squared Differences, then they can add the link back to SSD (disambiguation). &sect; Music Sorter &sect;  (talk) 00:50, 3 July 2011 (UTC)

Very Poorly Written Article
This is a very poorly written article. I ran into MSE, and wanted to learn more. I have a strong math background. Yes, I couldn't make heads or tails of the article. For the most part, the article simply spouts complex equations and obscure verbiage with little or no context. It fails to answer such basic questions as: What's the purpose of MSE? What are a few practical examples of MSE? This is probably less a criticism of wikipedia, which is just a medium. It's just an observation of poor writing and communication skills. 97.126.59.10 (talk) 15:39, 9 August 2011 (UTC)


 * There have been attempts in the past to at least make sure the lede is in "plain English" (for example, see this version), but someone always comes along to rewrite it in the most incomprehensible way possible. When pressed, they'll claim to support the WP:Make technical articles accessible, but their actions never match their words.  -- DanielPenfield (talk) 15:56, 9 August 2011 (UTC)

The mean section: https://en.wikipedia.org/wiki/Mean_squared_error#Mean is hard to understand see https://stats.stackexchange.com/q/375101/99274, which I discuss here https://stats.stackexchange.com/a/375227/99274. Can this be improved?CarlWesolowski (talk) 23:07, 4 November 2018 (UTC)

I have to agree. I'm looking at whether to use Root Mean Square Error to perform image difference calculation in a program. That article is unintelligible to me, but since RMSE is just sqrt(MSE), I thought this article might help me. It did not at all. What is the purpose of MSE? What are the pros/cons vs the Mean Absolute Error? When would it be useful to use MSE over other error formulas? Why are we squaring things? I understand that MSE is a statistics concept, but you shouldn't have to be a statistician to read the first 3 paragraphs of this article. Cowlinator (talk) 22:45, 6 August 2019 (UTC)

Link to maximum likelihood
I'm surprised that there is no link to max likelihood. Namely, if data is Gaussian distributed, the ML is the same as minimizing MSE. This justifies the MSE.

I find that Numerical Recipes has a good description of this.

I'm no statistician, but if people say I'm right, I'm happy to write something. — Preceding unsigned comment added by Cosine12 (talk • contribs) 22:46, 10 March 2021 (UTC)