Talk:Variance/Archive 1

Properties 4.1/4.2 are wrong / Verification of ALL properties required !
"In a finite population or sample, if the variable is extended with a number that is larger than all other numbers of the variable, then the variance will increase." This is not true, counter example: Given the samples 2,9,10 we have a mean of 7 and a variance of 38/3=12.7 resp. 38/2=19. Now we consider another sample 11, which is larger than the rest. New mean (of 2,9,10,11)is 8 and new variance is 50/4=12.5 resp. 50/3=16.7 The variance has decreased, because the new sample is closer to the mean than the average distance to the mean was before, furthermore the mean changes. The same applies to a smaller value (e.g. add negative signs to all numbers). This shows that care must be taken when using such informal statements, although I really appreciate them, because they provide quick intuition. However, dont add such a sentence if you cannot prove it.

Kevin 89.53.3.68 21:55, 15 March 2007 (UTC)
 * Well spotted. I've removed the offending points from the article. --Salix alba (talk) 00:21, 16 March 2007 (UTC)
 * Fine! I was quite surprised that there is a completely wrong statement (this is NOT a typing error but a wrong sentence, well meant of course, but obviously driven by intuition and not using mathematical reasoning as it should) in such a fundamental article like Variance. I have the bad feeling that if there is one error it is not unlikely that there is another error in the other "informal" properties. Therefore I propose someone should have a close look at all the informal properties (or better: provide a proof to each e.g. in the "formal" section). Until each and every propertiy is verified, we should keep the expert tag or a warning for the unexperienced reader. What do you think ?
 * Kevin 89.53.27.38 10:12, 17 March 2007 (UTC)


 * I'm removing the tag. I reviewed the list, and every item is either clearly correct (the square of a real number is always positive – do we really need a reference for that?) or else a wiki-link is provided to an article (e.g., Chebyshev's inequality) that is presumably well enough referenced. I did see one computational error in the bit about Fahrenheit vs. Centigrade (extra factor of 10) &hellip; I'll fix that, too. Oh – on Kevin's original complaint, the sentence probably could have been fixed by saying "...if the variable is extended with a number that is significantly farther from the mean than all other numbers of the variable, then the variance will increase." Or is that too imprecise? DavidCBryant 14:34, 20 April 2007 (UTC)


 * Obviously, you are right and I was wrong. My apologies. I would like some kind of rephrasing like David suggests, but I would avoid the word "significant" because that has a very different meaning in statistics. JulesEllis 00:41, 11 May 2007 (UTC)

Use of [] vs
I was wondering what rule was being used to determine whether E[X] or E(X) was used. Here are some rules I have come across:

My cosupervisor recommends using E[X] but Var(X).

"Probability and Statistics for Engineers and Scientists (5th ed)" (Walpole, Myers 1993) appears to use normally E(X), but uses E[f(X)] to distinguish outer brackets from inner brackets.

"Linear Regression Analysis" (Seber, 1984) uses E[X], var[X] and cov[X,Y]. Gmatht 10:02, 22 February 2006 (UTC)

Simplicity of the article
"looks as if any intelligent undergraduate would be able to follow it without much effort."

The required effort being to look somewhere other than Wikipedia for an entry level introduction to the concept.

And are highschoolers are not entitled to a Wikipedia entry they can understand?

How about unintelligent undergraduates?

How about PhD Molecular Biologists (myself)? &mdash;The preceding unsigned comment was added by 194.171.7.39 (talk • contribs) 14:51, 9 February 2006 (UTC)


 * I'm sorry you find it unclear. Did you understand expected value and standard deviation?  The page defines variance as "how far from the expected value [a random variable's] values typically are".  I can't think of a more simplified explanation.  Do you think an opening section, such as homomorphism's, would help? --Mgreenbe 16:07, 9 February 2006 (UTC)


 * I'm on a Biomedical undergraduate programme studying descriptive statistics for laboratory work and I have to say I can't make head nor tail of this article. --Iscariot 19:20, 7 November 2006 (UTC)
 * Use basic wikipedia then!

I'm an economics undergraduate and i already know what standard deviation is, but this article makes no sense at all —Preceding unsigned comment added by 62.172.143.205 (talk) 02:24, 16 March 2010 (UTC)

Wow. This (and Mean, and Standard Deviation) are horrible. Undecipherable. Tons of variables being used without explanation. The grammar is awful. It reads like it was ripped from one of those 100 page math books from the turn of the 19th century--absolutely useless without the lectures. &mdash;The preceding unsigned comment was added by 68.100.26.175 (talk • contribs) 19:43, 5 April 2004 (UTC)


 * I was concerned when I read the words above, since I dislike bad grammar and overly complicated verbiage (see my recent editing of counterexample). But then I looked at the page, and it looks as if any intelligent undergraduate would be able to follow it without much effort.  No "variables" are unexplained (and I have often been upset to find Wikipedia articles in which mathematical notation is unexplained; I'm a stickler about such things).  But do go ahead and improve it if you can. And this article is quite light on the use of "variables"; I don't understand why you say "tons".  (I have not looked at mean and standard deviation today.) Michael Hardy 18:06, 5 Apr 2004 (UTC)

i have to agree with the above complaints (and i understand the subject of the article completely). there are too many symbols meaning the same quantity or concept. just a heads up that i'm gonna read through this and do symbol consolidation (and i'll try to make the symbols compatible with other related articles) and perhaps a little exposition regarding the difference between population variance (divide by N) and sample variance (divide by N-1) and why there are these two slightly different formulae for ostensibly the same quantity. Rbj 02:33, 7 May 2006 (UTC)

This article is impenetrable. Skipper per (talk) 15:54, 2 July 2010 (UTC)

Variance as analogous to moment of inertia!?!?
I removed this aside someone had put at the bottom, as it was just plain silly.


 * ... and I've put it back, since it OBVIOUSLY makes sense. The mathematical analogy between the two is clear.  Whether it is in some way fruitful is not clear to me at this moment; maybe someone can add something. Michael Hardy 22:48, 15 Apr 2005 (UTC)


 * I clarified the sentence.--Patrick 00:19, 16 Apr 2005 (UTC)


 * Funny, I tried to explain this relation for many years in one of my introductory statisics courses for psychology students, and it was always a futile attempt. That is, if you want to understand what variance is, then this relation helps no one. I have never had any benefit from this analogy. So on the one hand agree that the analogy is fascinating, but on the other hand I'm pretty sure that doesn't help anyone. Perhaps it makes sense on a page about moment inertia, but not here. I don't vote for removal, but it should not receive more attention than it has now (which is little). The point is that the concept of variance is very widely used while moment inertia is very specific. So if it has to be mentioned here, we can as well add a section about quantum uncertainty or investment risk, which are also operationalized as variance. Hmm, perhaps that would not be a bad idea indeed. JulesEllis 07:32, 14 January 2007 (UTC)


 * Investment risk? That's a practical application of variance in a particular circumstance, quite unlike the moment of inertia which has nothing to do with statistics. Fact is that these concepts are linked at a sufficiently fundamental level that statistics actually borrowed the term from mechanics. Which is why variance is also known as the second order central moment, and why we have moment generating functions. --Het 10:03, 18 January 2007 (UTC)


 * What is so fundamental about borrowing the term? It seems rather superficial to me. You say that statistics borrowed the term from mechanics, but this is only true for the term moment and not for the term variance. Furthermore, this borrowing pertains only to the word but not to the concept. I just checked the articles of De Moivre and Fisher and I do not see any mentioning of moment of inertia. Gauss called it mean error, so it seems he was not all too concerned about the similarity either. Well, I do not know everything, so perhaps you're right, but then please explain what is so fundamental about the relation with moment of inertia. Do some statisticians frequently use theorems or insights from mechanics when they reason about variance? That must be an area of statistics that totally escaped me. Otherwise I'm inclined to think that this is no more than a footnote to the history of the term moment. A relation that IMHO is fundamental is the similarity between the additivity of variances and the theorem of Pythagoras. JulesEllis 03:35, 10 February 2007 (UTC)

$$\operatorname{var}(s^2)$$
Can anybody provide an estimator for the variance of the variance estimator? If I calculate the population variance from a sample I do not only want to know how good the result is on average (e.g. unbiased), but also how much the estimates may change for different samples.


 * Certainly, if the population is normally distributed, then the usual unbiased variance estimator S2 satisfies


 * $${(n-1)S^2 \over \sigma^2} \sim \chi^2_{n-1},$$


 * so the variance of that is the variance of the chi-square distribution with n &minus; 1 degrees of freedom, i.e., it is 2(n &minus; 1). Thus, the variance of S2 itself is


 * $${2\sigma^4 \over n-1}.$$

Michael Hardy 23:28, 20 January 2006 (UTC)


 * Michael, I would also like to have this in the article. I found a formula for this at http://mathworld.wolfram.com/SampleVarianceDistribution.html, with reference to Kenney and Keeping 1951, p. 164; Rose and Smith 2002, p. 264. I suggest that we add it to the article. JulesEllis 00:33, 27 August 2007 (UTC)

N vs N-1
"Intuitively, computing the variance by dividing by N instead of N − 1 underestimates the population variance. This is because we are using the sample mean as an estimate of the unknown population mean μ, and the raw counts of repeated elements in the sample instead of the unknown true probabilities."

I have several problems with these sentences. First, "intuitively" is an unfortunate word choice--math is not intuitive to many people. Second, it is not at all intuitive why dividing by N would underestimate the population variance. The explanation does not help: Why does it matter that we use the sample mean instead of the population mean? Furthermore, what precisely is meant by "raw counts of repeated elements"? Does this have something to do with sampling with replacement? At the very least, these terms need to be defined or linked, and the explanation should be clarified. If I understood what was being asserted here, I would do it--but I don't.

Danfrog 21:06, 22 March 2006 (UTC)


 * see my comment above. i will explain why the difference between dividing by N or N-1.  (for an intuitive "taste": think of a sample of 1 from an RV with a positive variance, the numerator will be zero, but the denominator will be 1 yeilding a calculated variance of 0, without any problem.  we want a 0/0 there to indicate that there is a limit issue and the variance is not likely to be zero.  then consider a sample of just 2.  now, assuming they're not equal, you can get an idea of what the variance is and dividing by 1 rather than 2 will get you the right answer.  it's not a proof, just an intuitive hint.) Rbj 02:42, 7 May 2006 (UTC)

alternate proof of unbiasedness
To me, the current alternate proof is nearly as long as the original. To me, a quicker way is to write $$S^2$$ as


 * $$S^2=\frac{1}{n(n-1)}\sum_{i=1}^{n-1}\sum_{j=i+1}^n\left(x_i-x_j\right)^2$$

which, while not computationally efficient, serves to illustrate that variance is decomposed into exactly $${n}\choose{2}$$ pairs of distances. Then the desired result is a direct consequence of


 * $$\operatorname{E}\left[\left(x_i-x_j\right)^2\right]=2\sigma^2$$ for $$i\neq j$$ and $$ \sum_{i=1}^n\sum_{j=i+1}^n1={{n}\choose{2}}$$ Btyner 19:24, 26 April 2006 (UTC)
 * Hmmm, now it occurs to me that independence is required for that scratch shortcut to work, so never mind. Btyner 03:49, 17 May 2006 (UTC)


 * I used it in the introductory text that I added. I also added a shorter, more abstract proof. I didn't delete the older proofs because I'm not sure that mine is more readable.JulesEllis 00:33, 15 January 2007 (UTC)

R. A. Fisher developed it?
If so, as suggested by another article, should not it be mentioned in this one? I suggest only a short paragraph like this:
 * The term was first used by R. A. Fisher, in his 1918 paper "The Correlation Between Relatives on the Supposition of Mendelian Inheritance", where he first shows that mendelian inheritance was compatible wis continous variation of characters, differently from what previously seemed.

--Extremophile 06:10, 1 May 2006 (UTC)

Alternative formula
There is another formula that is slightly easier to calculate if the data is in a table or if the mean is an awkward number, and that is:


 * $$\sigma^2=\frac{\sum_{i=1}^n x_i^2}{n}-\bar{x}^2$$

But I don't know where to put it into the article. Any suggestions? x42bn6 Talk 07:22, 27 May 2006 (UTC)

True variance
There is a pretty substantial article at true variance which overlaps with much of this one. I really don't think we need two articles about this, but merging would be quite a task. Btyner 03:21, 31 May 2006 (UTC)
 * D'oh, now I see the whole discussion at Talk:True variance. What a shame ... Btyner 03:25, 31 May 2006 (UTC)

Technical template
I'm sorry, but I'm currently taking a Statistics class and most of this page is gobbledygook to me. My book says that "intuition" is right and using n instead of n-1 for the sample variance does indeed underestimate the population variance, and the article's explanation of why this is not so makes no sense to me. Please, let's use real, normal English explanations alongside the technical expositions. I came here trying to understand why n-1 makes the sample variance an unbiased estimator of the population variance; I leave knowing no more than when I came. -- Calion | Talk 03:54, 28 November 2006 (UTC)

Common sense introduction
I have added a more common sense introduction to the definition. The definitions in terms of expectations were mathematical correct, but it is obvious for me (statistics teacher on a university for more than two decades) that they don't make sense for anyone who has not taken a course in mathematical statistics. I think that there is more to statistics than just that, as the concepts have been used long before axiomatic probability theory developed. A strictly mathematical definition is fine for such technical concepts as the gamma distribution, which are most likely to be used by specialists, but not for such a basic concept as variance. So I have written a long introduction in an attempt to explain it really well without scaring people away with formulas. At the same time it also explains the n-1 versus n problem. JulesEllis 07:22, 14 January 2007 (UTC)JulesEllis

I'm what you might call an "amateur mathematician" (I'm into computer science and category theory) and statistics has not been my forte. I just wanted to say that this conversational intro by JuleEllis struck me as a very well-written intuitive introduction to some of the basic notions of variance. As someone who never paid much attention to statistics, I found this intro very clearly written and easy to understand, as well as providing an excellent motivation for the various ways to compute the variance. —Preceding unsigned comment added by 84.57.12.20 (talk) 15:55, 4 October 2007 (UTC)


 * I like the idea of what you're doing and I feel bad for deleting much of it. I think you could make the discussion much more concise however...what you're writing is more appropriate for a textbook or something.  The text is still there though...you may want to salvage a lot of it and put it back in the article in some form or another.  I think the article is a bit too try and technical.  On the other hand I think we need to keep it short.  I think that if you could make your explanation concise it would be very valuable to put under the "Definition" section, and not in a separate heading.  Cazort 22:13, 4 October 2007 (UTC)

Properties
I have also tried to write a more introductory text for the properties. However, I don't know yet how to get square signs in it, and I didn't fill in the scale parameters of the Fahrenheit - Celsius transformation. This should be done. Also, I'm not sure how to make the integration with the more formal part of the text. I think that the introductory texts should pertain mostly to finite populations, even though that is a compromise to mathematical generality. Mathematical generality is desirable in the later parts of the article though.

JulesEllis 07:22, 14 January 2007 (UTC)Jules Ellis


 * Done. I also added some more general theory about the variance of sums. I also added the variance decomposition formula, as it is essential for analysis of variance. I consider this specialist information.
 * JulesEllis 20:17, 14 January 2007 (UTC)


 * The text reads "The variance of a finite sum of uncorrelated random variables is equal to the sum of their variances", but shouldn't it read "The variance of a finite sum of positive uncorrelated random variables is equal to the sum of their variances"? Otherwise this statement would seem to contradict the law of large numbers. Vectro 18:18, 2 July 2007 (UTC)

Im pretty sure the sum of variances under this section is wrong. It should sum over both i AND j, not just i. The author also mixes up the indeces a couple of lines down. Correct me if im wrong. — Preceding unsigned comment added by 171.66.168.2 (talk) 17:59, 21 March 2012 (UTC)

Suggestion for the definition section
I want to make these changes, but I'm not sure that others agree. Please comment:

1. Add to the definition section the two definitions of the sample variance (move some stuff of the other section to this place). For a reader who is unfamiliar with the topic it must be confusing that we give one definition and then talk about three different things.

2. Make the notation more consistent, and always use $$s^2$$ for the unbiased estimate and V for the other version.($$\hat{\sigma}^2$$ would be possible too, but I think that V reads easier as it respects the convention to use Greek only for paramaters.

3. Reserve the name sample variance for $$s^2$$. I believe that this is the default meaning of the term, and that texts that use the term differently are rare and usually say explicitely that they use the term in another meaning.

4. Reserve another term for V. Any suggestions? Perhaps uncorrected sample variance? I believe that there is no name that is generally agreed upon, so this it should be made clear that the term is only used locally in this article for ease of presentation.

5. Add that V is the ML estimator of the variance of a normal distribution.

6. Add that the asymptotic distribution of V and $$s^2$$ is a normal distribution as a consequence of the central limit theorem; specify its variance.

JulesEllis 19:53, 15 January 2007 (UTC)

Request for citation or clarification
Hi, is there any citation or corroboration for this statement?

", and the standard deviation that is obtained from the unbiased n-1 version of the variance is not unbiased."

71.198.188.12 07:44, 23 February 2007 (UTC)
 * I can verify this. However the bias is small, gets relatively smaller as n increases, and there is a constant known as $$c_4$$ in some circles which can be used to construct an unbiased estimator of $$\sigma$$ when the errors are normal. It's not too hard to prove. Btyner 12:18, 24 March 2007 (UTC)
 * This was really beyond the scope of this article so I've made a new one which for now lives at Unbiased estimation of standard deviation. Btyner 18:32, 24 March 2007 (UTC)
 * Looks good, much needed. A slight preference for calling unbiased estimation of variance as opposed to sd. --Salix alba (talk) 22:48, 24 March 2007 (UTC)
 * Why? The whole point of making it was to show how &sigma; could be unbiasedly estimated. If we moved the stuff about unbiased estimation of &sigma;2 to that article, then we could call it "unbiased estimation of variance and standard deviation". Btyner 15:10, 25 March 2007 (UTC)

A simple explanation is that the standard deviation is a nonlinear function (square root) of the variance so the property of being unbiased does not carry over. This is because the operations of taking the mean and applying a function in general do not commute unless that function is linear. By definition an estimate is unbiased when its mean equals to the right thing.

About the variance itself, please see also the new introduction to Estimation of covariance matrices that I have just added. Jmath666 22:31, 25 March 2007 (UTC)

squares vs. absolute values
The article now states: The squaring is done to get the negative signs of some differences away. In principle, you could also do that by taking the absolute values (i.e., just dropping the signs), but squaring is more convenient for mathematicians.

As I understand it, squaring is not an arbitrary choice as the article implies. Variances are additive (à la property #8) but “absolute value variances” are not. Can someone more mathematically knowledgeable confirm this? If so, it should be pointed out in the text. --75.15.152.144 08:31, 24 February 2007 (UTC)


 * You are right. That sort of thing is supposed to be covered by "more convenient for mathematicians", but in actuality that is a pretty meaningless phrase.  The introduction is in my opinion quite terrible and will not help a non-math-savvy reader to understand any better.  There are also serious errors in there: "The multiplication by 0.5 can be justified because if you consider all pairs, then you see each difference twice (namely as number 1 - number 2 and as number 2 - number 1)." This is numerology and wrong. The fact that 2 squares are formed is already compensated by dividing by 2 when the average square is formed.  The reason for the 0.5 is because that is what matches the mathematical definition of variance (which could easily be altered by a constant without major harm).  Also this way of computing the variance requires you to include the zero differences between a value and itself.  This is not at all intutitive and way harder to grasp than measuring the difference between each value and the mean. The later "explanation" of the n-1 factor is also pure numerology and has nothing whatever to do with the real reason the n-1 factor is used.  I propose to replace the introduction section entirely but will wait for objections. --McKay 02:51, 26 February 2007 (UTC)


 * Of course it is not arbitrary. When I wrote the sentence "because it is more convenient for mathematicians" I meant the additivity property, but I didn't want to become too technical at that stage. Feel free to replace it by something more clear. But if you refer to additivity, please explain why that would be a reason to prefer this definition. I think the reason is that it is mathematically convenient. That additivity is a nice property might be obvious for mathematicians, it is not obvious for other people. Someone else added the phrase about differentiability. That is convenient too, indeed. With respect to McKay's comment: I agree with your point about the factor 0.5. I wrote the original version of that intro, so I feel free to remove it directly after this post. I do not agree with you about n - 1. I know that the reason is the unbiasedness, but most lay persons will not understand that. This is so because they are often unable to imagine much more than the data at hand, let alone a (for them) fairly abstract concept as the sampling distribution of the mean. Understanding unbiasedness requires understanding the sampling distribution, however, because it is basically a statement about the expected value of that distribution. Furthermore, it can be argued that the unbiasedness is actually a poor reason to divide by n-1, because the ensuing standard deviation will still be biased and it will generally not entail the maximum likelihood estimate for the variance. The zeros on the diagonal provide just another way to understand why the division by n-1 instead of n yields a reasonable measure. You use the word "numerology", but in fact it provides an exact definition of the variance in a finite space with equal probabilities: The variance is the mean squared difference of distinct pairs, divided by 2. Where's the numerology in that? However, if the article contains a formulation that suggests that this is the only reason to divide by n-1, then I agree that this formulation should be changed. You also say that this way of introducing the variance is harder to grasp than the measuring the difference between each value and the mean. I disagree. When you ask people without statistical training to assess the variation in a row of numbers, they will start looking at pairwise differences, and not first compute the mean. JulesEllis 01:18, 11 May 2007 (UTC)


 * I agree that pairwise differences have intuitive appeal. I also like the bit about the diagonals as an explanation for the "n - 1", which I always find baffling whenever it pops up in statistics.  Instead of "convenient for mathematicians," how about "has some nice mathematical properties" or "has some convenient mathematical properties"?  --Coppertwig 19:44, 15 June 2007 (UTC)

Variance...not arbitrary? I think alas that it is arbitrary. There are some theorems that you can prove about optimality of variance under certain conditions but many of these theorems either use very strong assumptions of total normality (and fail under more reasonable assumptions), or they use circular reasoning, showing that variance makes sense when you are measuring your loss by something like mean squared error. I think that we need to strictly monitor this article so that we make sure that we treat variance as it is--as one way of measuring variation from the mean...a way that has certain enticing mathematical properties, and is widely used...but is not the only way of doing things and does not always have compelling physical, mathematical, or philosophical reasons for its use. Cazort 22:07, 4 October 2007 (UTC)

n?
"That is, the variance of the mean decreases with n." - Whats n? Fresheneesz 07:18, 21 March 2007 (UTC)
 * n means sample size (how many subjects). Herenthere (Talk) 00:44, 25 March 2007 (UTC)

clumsy
A lot of this looks a bit clumsily written. I'll be back. Michael Hardy 00:43, 25 March 2007 (UTC)


 * I agree entirely. The language is all over the place, and generally too conversational in tone. DRE 20:55, 27 July 2007 (UTC)
 * I totally agree. Cazort 22:04, 4 October 2007 (UTC)

Undefined variables
Every variable used in an expression should be defined. Consider the following extract:


 * If the random variable is discrete, this is the same as:
 * $$\sum_i (x_i - \mu)^2 p_i\,.$$

So, what's $$p_i$$? There are other examples of such "magic variables" pointed out above. EmmetCaulfield 12:38, 9 May 2007 (UTC)

Numbering system
The numbering in the Properties, Introduction section is supposed to match up with the numbering in Properties, formal. I think 10 matches but I suspect 9 doesn't. "6 and 8 jointly imply that" is suspicious, since 8 is what is being proven maybe? Actually, I'm suspicious about all the section numbers, because who knows when someone might have inserted a section, changing all the numbers. Besides, 8a 8b and 8c are mentioned but there is only a simple section 8 above. How about naming each section, instead (or in addition to numbering them) to help keep things straight. --Coppertwig 19:49, 15 June 2007 (UTC)
 * I think this needs to be changed: "Properties 6 and 8 jointly imply that..." in 8c in the formal section.  First of all, I think it would be more specific and clearer to say "6.3 and 8b".  Secondly, referring to 8b is confusing since usually the numbers refer to sections in the introduction, while here another section in the formal section needs to be referred to, suggesting the need to revamp the numbering system.  Thirdly, I don't think any of the preceding discussion establishes that cov(aX,bY) = ab cov(X,Y), which is needed here.  Suggestions on how to change it are welcome. --Coppertwig 19:15, 16 June 2007 (UTC)
 * Over two years later and these invalid references are still there... I'll try to fix some of them. —Keenan Pepper 19:41, 31 January 2010 (UTC)

Fotiable?
"A more understandable measure is the square root of the variance, called the standard deviation. As its name implies it gives in a standard fotiable for all real numbers..."

There are no definitions of the word fotiable available from Google, and Wikipedia itself doesn't have an article or definition for it. This word should either be defined in the article, or be replaced with a word that can be defined.

Also, the word in does not belong in that sentence. 130.195.5.7 21:23, 18 July 2007 (UTC)

Wrong/ambiguous formula -- divide by n-1
Shouldn't the variance be the sum of the squares divided by n-1? This page simply defines it as the sum of the squares of the differences. There is a page on Wolfram MathWorld that gives two different formulas, the "population variance", which does not divide by n-1, and the "sample variance", which does divide by n-1. This needs to be clarified in this article because only one method is presented. --Wykypydya 18:53, 29 July 2007 (UTC)


 * Sigh.... I don't know why I didn't immediately see a hundred copies of this question above this one on this talk page, since it seems as if we've gone through this that many times.
 * This page does NOT "simply define it as the sum of squares of the differences"; rather, it multiplies each difference by the corresponding probability. In case the probability distribution is uniform on a set of n points, then that probability is 1/n.
 * It is only when one is estimating a population variance by using a sample variance that one divides by n &minus; 1. And the reasons for doing that are highly debatable.  But that gives an unbiased estimate. Michael Hardy 19:07, 29 July 2007 (UTC)
 * It is only when one is estimating a population variance by using a sample variance that one divides by n &minus; 1. And the reasons for doing that are highly debatable.  But that gives an unbiased estimate. Michael Hardy 19:07, 29 July 2007 (UTC)
 * It is only when one is estimating a population variance by using a sample variance that one divides by n &minus; 1. And the reasons for doing that are highly debatable.  But that gives an unbiased estimate. Michael Hardy 19:07, 29 July 2007 (UTC)

Unit Variance
Does not explain what Unit Variance actually is. --81.86.122.174 15:45, 2 August 2007 (UTC)

"Elementary description"
Could somebody please remove this section. I tried but my change was undone. I'm sorry, but the text is simply horrific. The definition of variance is very simple - use it. Here are some examples from that section:

"compute the difference between each possible pair of numbers; square the differences; compute the mean of these squares; divide this by 2. The resulting value is the variance."

I'm not saying this isn't correct, but why should this strange n^2 algorithm be presented as the first way to calculate variance?

"In principle, this can be done by taking the absolute values (i.e., just dropping the signs), but squaring is more convenient for mathematicians, as the squared function is differentiable for all real numbers, and the absolute value is non-differentiable at zero."

Why not to the power of four then, or something like that? Variance is defined as it is - end of story. The way it is defined gives it some interesting properties. For example, look up Chebyshev's inequality.

"So it could be argued that the diagonal should not be counted when computing the mean of the squares. "

AARGH!!!

"is done, then the variance would be 0.5 × (0 + 1 + ... + 1 + 0) /12 = 1.667"

No. The variance is the variance and the unbiased estimator for variance is another thing.

"generalized into a third definition of the variance:"

This isn't the definition of variance.

"The variance according to the definitions 3 or 4 is sometimes called the 'unbiased estimate'."

It is the unbiased estimate, which is not "another definition" of variance. This only confuses the reader.

130.188.8.12 10:45, 17 August 2007 (UTC)


 * After the introduction the article gives the definition that you want. So why should an introduction for lay people be deleted? I would agree with you if the article was written exclusively for mathematicians and statisticians. This isn't the case. The concept of variance is sufficiently important to try to give lay people an intuition of what it is. Note that such lay people might not even understand the simplest math formula, like x + 1 = 2. For previous versions of the article, which were written as you suggest, many people complained that it was unreadable, and the article was considered too technical. JulesEllis 23:35, 26 August 2007 (UTC)


 * If one cannot understand that simple equation, how can one understand the following sentences?
 * "It can be defined in several ways such as the following algorithm: compute the difference between each possible pair of numbers; square the differences; compute the mean of these squares; divide this by 2. The resulting value is the variance."
 * 130.233.243.229 09:33, 6 September 2007 (UTC)
 * Because the latter explanation does not contain a formula. 90% of the people stop reading when they see a formula, simply because they expect that they will never understand it. Obviously, mathematicians are not among these 90%. See the many complaints above about the readability of the paper.JulesEllis 04:02, 29 October 2007 (UTC)

I would like to vote to have this section seriously re-done or removed. I may try my hand at editing it but in my opinion, it would be good to either delete it or move it lower in the article. What do others think? It seems to me that this section isn't defining the "elementary" way of looking at variance but is merely describing one equivalent way of looking at it. Personally, I find it an interesting way of looking at things...but the style of exposition doesn't seem appropriate to the rest of mathematical articles on wikipedia. Cazort 21:37, 4 October 2007 (UTC)
 * Before the section "Elementary description" was added, many non-mathematicians complained that they didn't understand a word of this article. See many comments above. Now, the section has been removed by someone and I have no doubt that it is again unreadable for anyone except mathematicians and statisticians. Frankly, I think the person who removed it did a disservice to all who want to know something about variance without having much math education. I would have no problem with this if it was a specialized math topic that most likely will be visited by mathematicians only. However, this is not the case. JulesEllis 03:52, 29 October 2007 (UTC)

Proof of the effect of a linear transformation on the Variance
Could the following proof perhaps be included in the section on Formal properties, right below "effect of a linear....."

$$Var(aX + b) = E([aX + b - aE(X) - b]^2$$

$$= E(a^2[X - E(X)]^2)$$

$$= a^2Var(X)$$

I would do this myself, but I am not too confident about the formating conventions, and don't want to disrupt anything. Thanks! 62.214.253.142 18:35, 7 September 2007 (UTC)

Error in formula for Property 8.b
The first formula for Property 8.b does not match the similar formula on the Covariance page. I believe the one on the Covariance page is the correct one. Specifically, the formula is missing the sum of the variances.

70.251.113.146 17:02, 25 September 2007 (UTC)

Nevermind. I neglected the fact that Cov(X, X) = Var(X), so the sum the variances is, in fact, already included. —Preceding unsigned comment added by 70.251.113.146 (talk) 17:09, 25 September 2007 (UTC)

Problem with Style of This Article
I think that the style of this article is inappropriate for an encyclopedia article and is inconsistent with the style of the rest of mathematics articles on wikipedia. I find it ironic saying this because I usually advocate the other way around, but this page is too pedagogical. It reads like a textbook. I think we should delete much of the material, including the proofs. Wikipedia pages generally do not provide proofs of most mathematical results and I think that there is not a huge problem with this--this is what sites like PlanetMath are for. What does everyone else think? Cazort 22:01, 4 October 2007 (UTC)

By the way, I propose rewriting the "properties" section in mathematical notation, and removing most of the extended information from the "formal" section, removing some of the examples, and merging them into one section, much shorter. Cazort 22:03, 4 October 2007 (UTC)


 * I have the opposite opinion. About a year ago the page looked about the same as it is now, and then many people complained that they didn't understand it. This is the reason why the page was more like a textbook and of a different style than other mathematics articles. Most mathematics articles will be consulted only by people with some minimum math abilities, like being able to read an equation. This isn't true for variance. Frankly, I find it a disgrace that there are apparently so many mathematicians with so little respect for lay people's wish to grasp some important concepts at their own level, without having to go through a mathematics course first. E.g. how many non-mathematicians do you think will understand what the expectation operator means? My guess is that this is less than 1%, and for all others the present article will be unreadable. Is that what you want? I strongly urge you to undo the deletion of the "Elementary description" section. 82.93.234.194 03:33, 1 November 2007 (UTC)


 * I agree with you that this article needs to be made more accessible, but I think that we should try to make the whole article accessible, rather than having an "elementary description" section and then a separate section on properties, and then yet another section on "formal" aspects of those properties. That was one of the things I was objecting to.  As another example, I think the "Characteristic property" is one of the most important properties of variance but the way it's described makes it so arcane that no one could understand it.  Lastly, I also don't think that an expanded, chatty tone is necessarily the best way to make wikipedia accessible--wikipedia is a wiki and I think the best way to make it accessible is to make it concise and well cross-referenced: more words does not necessarily make it easier for people to obtain information.  Of course, I think a lot of sections right now DO need more words and more explanation.  The "Elementary description" wasn't exactly an elementary description so much as an elementary example.  How about making an "elementary examples" section, and including some images in addition to some simple examples?  Cazort (talk) 23:15, 31 January 2008 (UTC)


 * The problem is that there are two totally different potential reader groups for the article. One group consists of mathematicians and statisticians who already know the concept and just want to have an overview of some important facts. The other group consists of people who have no idea whatsoever about statistics, who do not know what a random variable is, do not know what "E" means, do not know what expectation is, and who may not even know what squaring is. You cannot address both groups at the same time. Mathematicians are trained to expect a rigorous, general definition at the outset, and exactly that will confuse and scare away most other people. The present article is totally useless for this last group. They will simply stop reading in the middle of the first formula, because they know that they will never understand anything of it. Adding a graph won't help them, because they won't understand the graph, simply because they never learned how to read such graphs. Regardless of the excellent explanations that you may add to the article, the mere fact that it starts with a formula will make it inaccessible for 90% of the people. Nevertheless, these people could have learned something from the old version of the article, if only you guys could accept that there exist people who do not eat formulas for breakfast. I can understand if you think the article was too conversational, but this could have been changed without replacing the bottom-up approach (from example to general formula) by the present top-down approach (from general formula to example). But have it your way, I'm done with it. It is clear that I am the only person with the opinion that it is important to make elementary concepts like variance understandable for lay people. I was always shocked if people proposed dramatic cuts to the finances of mathematics departments, but right now I understand it and I even agree with it. Scientists with this attitude do not deserve a single penny.JulesEllis (talk) 02:53, 7 May 2008 (UTC)


 * Also, note that the article is rated as too technical for a general audience. Replacing more text by math will make it worse. Frankly, I believe the present article is totally inadequate for non-mathematicians. This is a shame. The topic is too important - not only for mathematicians. But now I understand why so many are not interested in math. The present article is a showcase of how mathematicians tend to obscure easy concepts rather than clarify them. JulesEllis (talk) 06:46, 20 November 2007 (UTC)

Anyone have any decent graphs to post here that might explain variance visually? I'm imagining a graph of a sample with high variance vs. one with low variance. —Preceding unsigned comment added by 65.91.102.204 (talk) 19:55, 31 January 2008 (UTC)
 * I think that's a great idea! Cazort (talk) 23:15, 31 January 2008 (UTC)

When finite-population terms matter
"In the course of statistical measurements, sample sizes so small as to warrant the use of the unbiased variance virtually never occur... if the difference between n and n−1 ever matters to you, then you are probably up to no good anyway"

There are (fairly common) scenarios where the difference between n and n-1 is very important - in particular, multistage sampling can create a situation where large n at the first stage makes the sample large enough to be 'reputable' but small n at a later stage means finite-population corrections are important to the results.

Example: an acquaintance of mine is studying water pollution. Each measurement she takes is the sum of true pollutant level, systematic error, and random error. She wants to measure pollution levels at various times and places, but also needs to show that the random error in her measurements was within acceptable limits (i.e. that the population variance in the random-error component is less than some constant). Since the work takes place over an extended period of time, she can't just test against known samples at the start to demonstrate consistency; she needs to show that consistency is maintained throughout the work.

Over time she takes 500 water samples, each with its own level of pollutants, then divides each of these into three subsamples and measures pollutant level. The difference between these three measurements is due to random error, and so their variance is a sample variance for random error, from which we can estimate the population variance for random error. That estimate on its own is very inaccurate - but as long as it's unbiased, we can combine it with the other 499 samples to get a much more accurate estimate of population variance. In this case, using n instead of n-1 would result in dividing by 3 instead of 2, and so underestimating population variance by a factor of one-third.

This also applies when we're going the other way, and trying to use knowledge of population variance to estimate the sample variance (and hence, accuracy) of a given experimental design. In social research, for instance, we might easily end up visiting hundreds of households but only selecting a subset of the people in each household, and the variance associated with that selection is important to accuracy of the results. Given the number of people who live in a typical household, the difference between n and n-1 can be pretty important.

I'd edit the article, but frustratingly, I don't have citable sources handy. --144.53.251.2 (talk) 00:12, 6 February 2008 (UTC)


 * I find the whole chunk of text leading up to this quotation to be inappropriate. In any particular situation, either n or n-1 is right and the other is wrong. Good practice is to use the correct formula and not use the incorrect formula. What is the point of going on at length about the effect of making a mistake? I suggest this part of the article be reduced to almost nothing. McKay (talk) 04:06, 2 March 2009 (UTC)

Simple examples needed
I think that adding some simple examples to the article would be good. For example, an important simple example is the variance of an indicator random variable, which would be very good to add. zermalo (talk) 23:14, 1 April 2008 (UTC)

attention to characteristic property
The subsection "characteristic property" presently has
 * "The second moment of a random variable attains the minimum value when taken around the mean of the random variable, i.e. $$\mathrm{E} X = \mathrm{argmin}_a \mathrm{E} (X - a)^2$$. This property could be reversed, i.e. if the function $$\phi$$ satisfies $$\mathrm{E} X = \mathrm{argmin}_a \mathrm{E} \phi(X - a)$$ then it is necessary of the form $$\phi = a x^2 + b$$."

I think the conditions for the "reversed" result need to be firmed up. I think the stated condition need to hold for all the distributions of X, not just a single one? Melcombe (talk) 16:11, 14 April 2008 (UTC)
 * I've added "for all random variables X", and further tightened up this paragraph a bit. --Lambiam 20:35, 23 April 2008 (UTC)

Maths in Definition
Can someone look at the maths formatting in subsection "Discrete case" under definition? I don't know what notation is actually intended here, but the results look very odd ...the part dealing with probability masses. Melcombe (talk) 12:51, 15 May 2008 (UTC)

Bienaymé formula
It has no proof, and no separate entry. Furthermore, I can't find many references to it online, let alone a proof. Perhaps a proof/entry could be constructed? —Preceding unsigned comment added by 89.0.150.221 (talk) 19:41, 6 January 2009 (UTC)

There is another problem too. The formula here is for any finite number of random variables and cites Uncorrelated for the definition of that concept. However Uncorrelated states "Uncorrelatedness is a relation between only two random variables.". I will fix this problem. McKay (talk) 09:53, 29 January 2009 (UTC)

It looks like this website explains it, but I couldn't understand it. http://sepwww.stanford.edu/sep/prof/pvi/rand/paper_html/node16.html —Preceding unsigned comment added by 190.94.3.118 (talk) 16:48, 13 November 2009 (UTC)

sample variance
"One common source of confusion is that the term sample variance may refer to either the unbiased estimator s2 of the population variance, or to the variance of the sample viewed as a finite population." -- As opinined above, I question whether the obsolete and rare usage with denominator n should be included here at all under the name "sample variance". My impression (as a mathematician who is not a statistician) is that these days "sample variance" is a standard concept and other uses of the phrase would be regarded as wrong. Is there a modern significant reference that shows I'm wrong? McKay (talk) 09:08, 6 March 2009 (UTC)


 * (Referring to a response that was later deleted by the poster):


 * That page you linked has multiple issues, not the least of which is the fact that it only references your publications, whereas the concept of sample variance is clearly not your own invention. However this page is not the right place to discuss the article on ukrainian wiki, the question raised here is whether the concept “sample variance” should be defined with the denominator n, or (n−1), or both. The linked page doesn't provide any reasonable resources to help with this question.  …  st pasha  »  07:41, 3 December 2009 (UTC)

Since no further discussion or references have appeared, I am removing use of "sample variance" for anything except what it means in almost all contemporary sources, i.e. the formula with denominator n-1. McKay (talk) 07:44, 10 February 2011 (UTC)

Expected Deviation
"Unlike expected deviation, variance has different units from the variable" -- expected deviation leads to this page... 66.168.1.178 (talk) 18:27, 2 November 2009 (UTC)


 * I've redirected expected deviation to absolute deviation. Michael Hardy (talk) 19:34, 2 November 2009 (UTC)

Die/Dice
I was about to correct the erroneous use of "dice" instead of "die" (the singular), but I see that


 * 1) this appears to be an on-going point of contention and
 * 2) someone has added the comment 'please do not change back to "die": "die" is historically correct, but "dice" is more comprehensible nowadays'.

It therefore seems useful to make the case for change explicit.

The argument that "dice" should be preferred because it is more comprehensible nowadays is, I believe, fallacious. Firstly, it seems more likely that its author is merely expressing his own opinion in implying that "die" is insufficiently comprehensible, as opposed to relying on some form of evidence. In contrast, there are a number of contributors who not only recognise the error but wish to correct it.

Secondly, it is of course quite true that the meaning, usage and spelling of English words all change over time. However, such evolution often begins in misuse, and it is reasonable to expect of a reference work that it perpetuate correct usage rather than yield to illiteracy.

ScotSez: Hear Hear! I concur. "Dice" is the plural form. In this example a single die is being thrown. I vote for "die", not "dice" —Preceding unsigned comment added by Zirconscot (talk • contribs) 22:53, 6 January 2010 (UTC)


 * I too was about to correct to correct dice to die. Then I noticed the revert war and don't want to join in.
 * How about a note on the first "die" referring to this justification for "die" not "dice"? TrevMrgn (talk) 17:28, 29 January 2010 (UTC)

Missing Parenthesis
ChristianCHRR says on 16.01.10: the opening parenthesis after "expected absolute deviation 1.5" has no corresponding closing parenthesis. —Preceding unsigned comment added by ChristianCHRR (talk • contribs) 01:06, 17 January 2010 (UTC)

Can you please simplify the bullshit on the main page. —Preceding unsigned comment added by 194.66.72.76 (talk) 18:39, 24 January 2010 (UTC)

Intro paragraph
I did not expect to be greeted by a solid wall of incomprehensible text when I visited this page. I know this material, but trying to read the intro paragraph is impossible. I quote - "...is the expectation, or mean, of the deviation squared of that variable from its expected value or mean." Nice first sentence. Could someone clean it up? 193.60.90.97 (talk) 12:24, 24 May 2010 (UTC)


 * I have attempted an improvement by starting a new lead section, retaining the old stuff as a follow-on section. Melcombe (talk) 16:47, 25 May 2010 (UTC)

I'm seconding OP's comment. The first sentence should be a common sense definition of variance, not a bunch of stuff about what a variance means, where we talk about variance, etc.. The first sentence of the cat article should say "A cat is a species of feline, found throughout the world, and commonly kept as a pet in many societies" not "Cats are naturally predatory, small animals that like to eat mice and milk, and resemble tigers in some respects." Likewise, the first thing I learn about variance should not be that is a measure of dispersion, that it describes (in some wholly unspecified way) how far a values lie from the mean, etc. 24.121.54.180 (talk) 15:35, 17 December 2010 (UTC)

Variance formula confusing
The variance formula is not something that an average person would know how to use, e.g. something like:

Var(X) = (X1 - x')^2 + (X2 - x')^2 + (X3 - x')^2 ... + (Xn - x')^2 ---                                  n

This should then be followed by a simple example for a total population of about 5 numbers and described as a "population variance". This should then be compared to and contrasted with a "sample variance", along with an explanation of why this is necessary, and then the alternative formula provided (with the 'n-1' in the denominator) and the same example set worked out. —Preceding unsigned comment added by 208.54.192.186 (talk) 16:12, 6 September 2010 (UTC)


 * I agree. This article seems to be directed at people who have a graduate degree in mathematics and it isn't helpful for the typical wikipedia user.--70.160.112.254 (talk) 22:27, 15 January 2011 (UTC)


 * I hope my edits today in the Basic discussion section help in this regard. Duoduoduo (talk) 23:22, 16 January 2011 (UTC)

"why this is done" would be more accurate than "why this is necessary". Unbiasedness of statistical estimators is overrated. Michael Hardy (talk) 23:26, 16 January 2011 (UTC)

Bad definition
The definition given does not actually define anything. The single angle notation  is not defined anywhere nearby -- I can only assume it is an alternative for expectation -- nor does the article on random variables tell us how to add real numbers to random variables. In what sense are we subtracting the mean from the random variable X in the expression E[(X-mu)^2 ]? This definition does not even type check. —Preceding unsigned comment added by 75.145.77.185 (talk) 21:02, 11 January 2011 (UTC)


 * I hope my edit right after the definition has helped to answer what is meant by subtracting the mean from a random variable. As for the <> notation, I've deleted it because it's not defined.  It was put in by an anon on 24 March 2010 with no edit summary, so maybe it was vandalism.  If someone knows that it was right, please put it back in with notation defined.  Duoduoduo (talk) 23:07, 11 January 2011 (UTC)

The notation
 * $$ \langle X \rangle \, $$

for expected value of a random variable X, is used by physicists. Probably someone who didn't know that it's not universal put it there expecting it to be understood. Michael Hardy (talk) 23:25, 16 January 2011 (UTC)

"the whole distribution"
Regarding section: "Estimating the variance"

It says: "Instead one estimates the mean and variance of the whole distribution"

What does that mean? Is it supposed to say the "whole population"?

213.165.179.229 (talk) 22:29, 17 July 2011 (UTC)

Move Disambiguations To The Introduction
The term "variance" is ambiguous, even when confined to the context of statistics. Among its possible meanings are

1. The variance of a probability distribution (aka "population variance")

2. The variance of a sample, specified as a formula involving sample values (hence it is a random variable)

3. A specific numerical value of the above sample variance, as in "The variance was 25.34"

4. An esimator of the population variance, specified as formula involving sample values (hence it is a random variable)

5. A specific numerical value of the above estimator, as in "The variance is 25.34".

Some of the above ambiguity can be avoided by adding adjectives to the word "variance", but readers looking for information about the term "variance" might well encounter the word in writings that are not precise. So it would helpful to disambiguate the term "variance" in the introduction of the article. I agree that technically the current article does treat 1,2,4 since they all involve a random variable and the current article does describe estimators. This content is clear to specialist, but I think it would be best if distinct meanings for "variance" were mentioned in the introduction, so that non-specialists are warned about the complexity of the term.

An example of the type of confusion that can be cleared up by disambiguation, is this: At each of times i = 1,2,3 suppose we measure samples of independent random variables A and B and obtain the vectors A = {-2,0,2}, B = {2,0,-2}. The random variable defined as C[i] = A[i] + B[i] has sample values {0,0,0}. So "C has less variance that the variance of A plus the variance of B".

There are also terms which may or may not be connected with one of the above ideas. For example, measurements and specifications for measuring devices often specify an "uncertainty" or simply plus-or-minus some numerical value. As far as I know there is no universal definition of what such terminology indicates. These tangential topics should not be treated in this article, but it would be useful to have a sentence mentioning their existence and perhaps some links to articles about them.

Tashiro (talk) 01:59, 24 July 2012 (UTC)

New picture
I don't like the new added picture. The red columns suggest some frequency or probability distribution. Nijdam (talk) 09:03, 29 August 2012 (UTC)

The only thing the pictures show is something like: .....x.xx.xxx....... and .x..x....x.....x..x..x..

Even presented this way it shows better the idea of variance than the pictures. Nijdam (talk) 10:05, 17 September 2012 (UTC)


 * I agree. I just took the image down. A scatter plot makes way more sense and should be less ambiguous. Also, numbers are not necessary so long as the scatter is plotted on the same graph (or stacked graphs with clearly the same scale). —Ben FrantzDale (talk) 11:48, 17 September 2012 (UTC)

var(x,w) in Matlab
what does it mean for var(x,w) in MATLAB?
 * w is an optional argument - if w=1 then you get the variance as defined by the 2nd moment
 * i.e. $$var(x,1) = \frac{1}{N}\sum_{i=1}^{N}(x_i-\bar{x})^2$$,
 * but if w=0 you get the sample variance as discussed (somewhere) in the article
 * i.e. $$var(x,0) = \frac{1}{N-1}\sum_{i=1}^{N}(x_i-\bar{x})^2$$


 * In other words, var(1:6,1) gives the correct variance of a six sided die, but for a random sample e.g. A=ceil(6*rand(1,10)), the sample variance is obtained from var(A,0), which is equivalent to var(A).
 * — Preceding unsigned comment added by 82.41.4.33 (talk) 23:04, 8 February 2010‎ (UTC)

Finite population correction
Am I right in thinking that there is a different formula for an unbiased estimate of the population variance when you are sampling from a finite population? Should this be included here?


 * Yes, you are right. This is discussed in the "Population variance and sample variance" section. Jasondet (talk) 07:49, 16 May 2013 (UTC)

— Preceding unsigned comment added by 217.172.65.183 (talk) 14:43, 18 September 2006‎ (UTC)

Variance and standard deviation merger proposal
Any cons? Fgnievinski (talk) 05:24, 10 April 2012 (UTC)
 * What arguments are there in favour of meging? Nijdam (talk) 07:16, 10 April 2012 (UTC)
 * Well, beyond the trivial fact that σ=√σ2, all the rest is redundant, between the two articles. Fgnievinski (talk) 05:30, 11 April 2012 (UTC)
 * There is some overlapping, but not much. Nijdam (talk) 07:37, 11 April 2012 (UTC)
 * Everything that can be said about variance is true about standard deviation, and vice versa, modulo a square root. Fgnievinski (talk) 10:24, 11 April 2012 (UTC)


 * I am against it at this point. Obviously, the two notions are related, but the current article on the variance focus more on population parameters, while the standard deviation article appears to deal more with sample characeristics and estimation (as is common in probability and statistics courses). FilipeS (talk) 11:08, 14 June 2012 (UTC)
 * I agree with your assessment of the disparity between the two articles, but this fact should be taken as one more reason for proceeding with the merger -- after all, these two concepts are exactly equivalent (modulo a square root), so they shouldn't be allowed to differentiate and grow apart. Fgnievinski (talk) 01:14, 24 February 2013 (UTC)


 * Having said this, the Standard deviation article is a bit disjointed and could do with some improvement. I would support a merger of Computational formula for the variance into Variance, and perhaps a merger of Algorithms for calculating variance into Variance or into Standard deviation. In my opinion, the Standard deviation article is currently a bit bloated, and could be trimmed down. FilipeS (talk) 11:08, 14 June 2012 (UTC)


 * I am against merging. For many users, the Wiki article is their primal encounter with that particular topic.  Simplicity is a virtue in that context.  The effort of one added mouse click is irrelevant.  DaveC52 (talk) 12:17, 24 February 2013 (UTC)


 * Merging standard deviation and variance doesn't make sense to me. It is true that standard deviation is computed from variance, but so are many useful statistics, and if you look at the way the two statistics are used, the way they are used isn't so similar that they disappear into each other. Siberianmetal (talk) 11:36, 6 April 2013 (UTC)


 * I think that both terms deserve their own articles, and as that seems to be the consensus view I've removed the merge template. 16:32, 10 April 2013 (UTC)


 * While the population variance and standard deviation are related by a simple square root, the sample variance and standard deviation are quite different. The details should appear on separate pages in my opinion. Jasondet (talk) 07:54, 16 May 2013 (UTC)

Covariance
The variance is normally not defined in terms of the covariance. The variance is a prorperty of univariate distributions, the covariance of bivariate distributions. Nijdam (talk) 06:18, 23 April 2013 (UTC)


 * I agree with this comment and updated the definition to refer simply to the second moment accordingly. I kept a reference to the relationship to the covariance though since it is a useful concept for some. Jasondet (talk) 07:56, 16 May 2013 (UTC)

Making notation consistent
I'm going to make some changes late in the article to establish consistent notation. Right now in the section Population variance and sample variance we have in the subsection Sample Variance:


 * Taking directly the variance of the sample gives:


 * $$\sigma_y^2 = \frac 1n \sum_{i=1}^n \left(y_i - \overline{y} \right)^2 $$

and then


 * Correcting for this bias yields the unbiased sample variance:
 * $$\sigma_S^2 = \frac{1}{n-1} \sum_{i=1}^n \left(y_i - \overline{y} \right)^2 $$

with no definition given for S, though $$\sigma_S^2$$ is usually (and subsequently here) notated as s2. Then in Distribution of the sample variance we have an analysis of s2 with no definition having been given for it. Then in Samuelson's inequality we have


 * Values must lie within the limits m ± s (n − 1)1/2.

with m and s undefined (and s meaning something different from previously). Finally, in Relations with the harmonic and arithmetic means we have


 * ... where m is the minimum of the sample

in which m means something different from in the previous subsection.

So I'm about to rationalize the notation -- please give me a chance to finish before revising! Duoduoduo (talk) 16:34, 16 May 2013 (UTC)

Done! Duoduoduo (talk) 17:34, 16 May 2013 (UTC)

Poisson variance derivation
Right now the Poisson section says:


 * The variance is equal to:


 * $$ \operatorname{Var}(X) = \sum_{k=1}^{n} \frac{\lambda^k}{k!} e^{-\lambda} (k-\lambda)^2 = \lambda,$$

It seems to me that this can't be right, since the value of the undefined n would affect the sum. Is n supposed to be infinity? Duoduoduo (talk) 14:03, 10 July 2013 (UTC)


 * True, and easy to check that the formula given above is false. Also the zero term is missing. Since all terms in the sum are non-negative, and the zero term is positive,
 * $$ \sum_{k=1}^n \frac{\lambda^k}{k!} e^{-\lambda}(k-\lambda)^2 <

\sum_{k=0}^\infty \frac{\lambda^k}{k!} e^{-\lambda} (k-\lambda)^2 = E(X-E[X])^2 = Var(X) = \lambda.$$

Mathstat (talk) 20:14, 10 July 2013 (UTC)
 * Formula for binomial variance in same section is also wrong. Mathstat (talk) 20:19, 10 July 2013 (UTC)

Meaning (interpretation) of Variance
While reading the article on variance, I found that a paragraph on how variance should be interpreted (in an intuitive way) was somehow missing. I don't feel confident to write that bit myself, but if someone could add this part it would I believe make the article more interesting and complete. — Preceding unsigned comment added by Marc saint ourens (talk • contribs) 18:29, 1 September 2013 (UTC)


 * Done. Thanks very much for the suggestion! Duoduoduo (talk) 16:56, 2 September 2013 (UTC)

Variance for six-sided die
Isn't the variance given in the article only correct for an infinite number of rolls of a die? For one roll, the variance is zero. For two rolls, a quick calculation suggests the variance is 5/8. Grover cleveland (talk) 16:27, 4 June 2014 (UTC)
 * Variance is not dependent on an expeiment, but only on properties of the die. You are confused with sample variance. Nijdam (talk) 20:06, 4 June 2014 (UTC)
 * Grover, you are correct. There are actually two distinct concepts, both called "variance".  One kind of variance is calculated from samples.  The other kind is derived from the equations of a theoretical probability distribution, and represents the variance that would be calculated from an infinite number of samples generated by that distribution.  More care should be taken within the community to be clear whether they mean distribution variance versus sample variance versus population variance.  The wikipedia page is an especially important place to be careful, because it's purpose is geared more towards learning, and less for convenience by those already familiar with the concepts.Seanhalle (talk) 17:05, 16 September 2015 (UTC)

Too encyclopedic and mathematical
normal people come here to look for simple applicable definition of variance and are overwhelmed by technical details.

I found this to be more usefull to me: http://www.investopedia.com/terms/v/variance.asp

And I was distinguished Science graduate 20 years ago. — Preceding unsigned comment added by 105.237.231.16 (talk) 03:06, 16 March 2015 (UTC)


 * I agree, this page is written in a very obtuse way. 73.222.107.229 (talk) 02:23, 12 July 2015 (UTC)


 * Agreed. http://mathworld.wolfram.com/Variance.html is far superior. I wonder if this stems from differences in opinion among editors about the nature and purpose of Wikipedia. Is it a record of knowledge only comprehensible to experts in the field, or is it a resource for learning about fields where the reader is not an expert? Currently, maths articles lean heavily towards being records of knowledge written by experts and comprehensible exclusively by those experts. I actually avoid Wikipedia for maths these days and this page is an example of why. I can't get the simple equation for the unbiased variance out of the reams of esoteric  written in this article. There is also no article for Sample variance so I can't search for it unless I write it in a talk comment. Instead it is hidden 3/4 of the way down a very long article (and still inferior to the mathworld version). Doug (talk) 18:56, 13 November 2015 (UTC)

Misleading text in intro: reality does not contain statistical distributions, they are models created by people and fit to observations
This text of the intro makes incorrect and misleading statements: "The variance is a parameter that describes, in part, either the actual probability distribution of an observed population of numbers, or the theoretical probability distribution of a not-fully-observed population from which a sample of numbers has been drawn. In the latter case, a sample of data from such a distribution can be used to construct an estimate of the variance of the underlying distribution; in the simplest cases this estimate can be the sample variance."

The text implies that an "actual distribution of an observed population" is something that exists in reality. In fact, this notion is a common mistake, driven by the way the human mind is constructed. For some people the world of numbers is real, in some sense, and is commonly conflated with what exists outside of our heads.

What can, though, be said to exist is a population of numbers that are generated according to a probability distribution model. Those numbers, when measured, fit to that same distribution model very well. But that same set of numbers can be fit to any distribution, with varying degrees of goodness of fit. The set of numbers has no inherent "actual distribution". It only has a particular distribution that it happens to fit best to. Even if one takes the case of an endless stream of observations generated from a distribution model, that stream of observations can still be fit to any distribution, it just is best predicted by the same model as the one used to generate the numbers.

This is a general concept that, in my experience in teaching statistics, has led to a high degree of confusion among students. It takes diligence to avoid this common misconception, because the human mind naturally wants to equate the models that our heads have inside them with the reality that is outside of our heads. When taking a step back, it is clear that the human mind has only models inside it, which it fits to observations coming in. When those models fit well, we experience that emotionally as the model IS the reality outside our heads. We feel that somehow that model we have somehow exists out there, generating the observations. In many cases, no harm is done by believing this.

However, in statistics we reach the point where we are at the boundary point, where we are examining this very bifurcation between models and reality. Allowing this emotional tendency to let us lose sight of the difference between a model and the reality that the model is fit to, I have found to be the cause of confusion for students who try to learn statistics. That conflation messes with their heads.

Therefore, I propose to alter these words to the following:

"The variance is a measure that is calculated from observations, and is independent of any particular probability distribution model. For each kind of probability distribution model, an equation can generally be derived that relates parameters of the model to the variance.  If the observed population consistently yields observations that match well to a particular distribution model, then a relatively small sample of observations can be used to construct a good estimate of the variance that would be obtained if all possible samples were taken."

(This carefully worded statement avoids many commonly taken for granted assumptions, such as that the generating process is stationary, and it highlights the difference between a model that we fit to observations versus the measured reality that is generating the observations. And it still gets the concept across that we need only a few samples to come close the omniscient value -- in the case that the generator matches to the model well)

If there are no strenuous objections, I will come back in a week or two and make this change. Seanhalle (talk) 05:10, 11 September 2015 (UTC)


 * The existing text is an attempt to explain the difference between "population variance" and "sample variance". It doesn't do that very well, but it isn't talking about fitting models at all. Your text is not about the same thing and doesn't serve as a replacement. It will also confuse readers: "The variance is a measure that is calculated from observations" is not the whole truth. Both theoretical distributions and finite populations have variances that are independent of observation. Also the fact that the sample variance is an unbiased estimator of the population variance doesn't depend on the probability model except in as much as the population variance is finite. Your words seems to hint instead at the power of tests, but nobody will get it. McKay (talk) 07:52, 11 September 2015 (UTC)


 * thank you for the response. I hear what you are saying regarding my proposed alternative, however the need remains for an improvement over what is there now.  I do, though, disagree that theoretical distributions have a variance.  That is, if by "theoretical distribution", you are referring to things such as the binomial distribution, poisson distribution, and so forth.  Such things are better viewed as generators of observations.  Such generators do not have an inherent variance, but rather have a means to calculate the variance that would be obtained from an infinite set of observations generated by the distribution.  They are most commonly used as models, and one compares actual observations to observations that would be generated by the model.  If the fit is good, then one can use the equations of the distribution, which tell you characteristics that would be measured on an infinite set of observations generated by that model..  and thereby predict that further real world observations will conform to those calculated characteristics.


 * With this view, such a distribution model has no inherent variance, just a way to predict the variance measured from observations generated from the model. This goes to the heart of the way of thinking that makes statistics difficult for students.  Making observations be the fixed point, and treating distributions as generators of observations clears up the confusion and has resulted in dramatic changes in the learning experience of students.


 * Your point is well taken, though regarding population variance versus sample variance.


 * As a compromise, I would be willing to go with the following:


 * "The variance is a characteristic of a set of observations, which are either measured from a real world system or generated by a theoretical probability distribution or other generating model. In the ideal case, all possible observations of the system would be available, where the variance calculated from this set is called the population variance.  In most cases, however, only a finite sample size is available, and the variance calculated from this is called the sample variance and is considered an estimate of the full population variance.  Theoretical probability distributions can be viewed as generators of observations, and the variance of an infinite set of observations generated by them can be mathematically determined via an equation.  Such generators are used in thought experiments to generate a finite sample size, or real world observations are often fitted to these theoretical probability distributions.  In either case, the set of observations is considered a not-fully-observed sampling of the possible population. Because the sample sizes are incomplete, the variance calculated using the samples is an estimate of the variance of the full population; such an estimate can be calculated in several ways, the simplest of which is just the straight forward variance of the sample."

Seanhalle (talk) 09:29, 11 September 2015 (UTC)


 * I hope you had a good weekend. I like your attempt to distinguish population from sample variance from population variance. But I can't accept that theoretical distributions don't have variances; that is contrary to the standard definitions of probability theory. A distribution is just a mapping from an event space to real numbers satisfying some axioms, and the variance has a precise mathematical meaning. I know what you mean by "theoretical probability distributions can be viewed as generators of observations", but I'm a mathematician who dabbles in probability. I think most readers won't have a clue what you mean. Actually I think that the relationship between actual populations and abstract models of them belongs in some other article, perhaps statistics, and only adds unnecessary complexity here. McKay (talk) 05:10, 14 September 2015 (UTC)


 * This is an excellent discussion. Your point is well taken.  If this were a forum solely for mathematicians, then it would make sense to leave the wording in a form that is the most comfortable for mathematicians.  However, I would suggest that this forum is actually the opposite.  I would be surprised if very many mathematicians came to the wikipedia page in order learn what variance is.  And, if they did happen to do so, I expect that they would immediately see what the suggested wording is aiming at, and would feel uncomfortable with it, but would nonetheless be unfazed, and would move forthwith to the equations and the precision within the body of the article.  Hence, I expect that a quite small number of visitors will have the value of this article reduced by the wording suggested above for the summary section.


 * In contrast, the majority of visitors I would expect to be those with a typical background, from across a wide variety of fields. The article should be tailored to provide that audience the maximal benefit.  Given that, the wording should help people in that audience the most, even if it does make mathematicians a bit uncomfortable.  Hopefully we can avoid a situation where the tyranny of the minority causes harm to the large majority.  If the wording conveys concepts that give the reader the ability to successfully work with variance in whatever facet of life they apply it, then it is okay for the wording to choose alternatives to strict mathematical rigor.  The important point is that it does no harm to the majority, while providing them with measurable value.  In its current form, the wording is more tailored to mathematicians and as a result is impenetrable to normal people.  As such, insistence on strict rigor ends up harming the large majority of visitors to the page, in order to satisfy the sensibilities of the (few) mathematicians who visit.  Although legitimate, and I feel their pain, that pain of the few is measured against the pain of the page being impenetrable to the majority.  That harm should be avoided.


 * However, I do see your point. In the following, I have added some disclaimer words that make it clear that the wording is not meant to be mathematically rigorous, but rather to be valuable to the large majority of readers.  The concepts are sound, and useful, and the disclaimers alert the reader to look further for full rigor if they so desire.  In the end, no harm is done, but high value is gained, by many.


 * "There are two distinct concepts that are both called "variance". One variance is a characteristic of a set of observations.  The other is part of a theoretical probability distribution and is defined by an equation.  When variance is calculated from observations, those observations are either measured from a real world system or generated by a theoretical probability distribution or other generating model.  If all possible observations of the system are present then the calculated variance is called the population variance.  Normally, however, only a subset is available, and the variance calculated from this is called the sample variance.  The variance calculated from a sample is considered an estimate of the full population variance.  There are multiple ways to calculate an estimate of the population variance, as discussed in the section below.


 * The two kinds of variance are closely related. To see how, consider that a theoretical probability distribution can be used as a generator of observations.  If an infinite number of observations are generated using a distribution, then the sample variance of that infinite set will match the distribution's equation derived variance."

Seanhalle (talk) 16:56, 16 September 2015 (UTC)


 * 99.7263% (based on a sample size of one) visitors to this page are looking for an explanation of and equations to estimate the sample variance. The remaining visitors who are looking for a presentation of population variance have no need of this article. Please take this into consideration. Doug (talk) 19:09, 13 November 2015 (UTC)

Should in fact be 99.7264% Nijdam (talk) 10:58, 16 November 2015 (UTC)

Physics moments analogy
The diagram depicting the area of the squares for the distribution {2, 4, 4, 4, 5, 5, 7, 9} appears on my screen with the vertical height as simply '2'. When one hits the enlarge box, it is correctly displayed as 'σ^2'. Is this easily fixed when in normal view ?Jerryfrog (talk) 03:59, 18 April 2017 (UTC)

Intro improvements:
The intro should include or the first equation shown should be the form of the variance that people usually encounter when learning it. Then the article should go on to explain how that relates to the actual formal definitions and more technical usage. i.e. have this equation be in the intro or right after and explain how it relates to the overall concept because this is the equation that >95% of the people coming to the page will be wanting to learn about


 * $$\operatorname{\sigma^2} = \frac{1}{N-1} \sum_{i=0}^{N-1} (x_i - \mu)^2$$

That is the equation that that majority of visitors can relate to or will need to use if they are learning it. That equation should be the bridge into the more formal definitions. I'm not qualified to provide that bridge, but I am qualified to say this article would be far better if it led off with this more familiar equation and how it is used/relates to the other more formal stuff. — Preceding unsigned comment added by 172.250.254.17 (talk) 23:49, 23 August 2017 (UTC)


 * I think your claims that this is the first formula that people learn and that this is what ">95%" of people are looking for are without basis. I certainly didn't learn this first (and if I did, I would have been highly confused by the N &minus; 1, but that's a separate issue).  Nevertheless, I think leading off with a formula out of the blue is generally a bad idea.  And it is given fairly soon in the first section (at least the population version, not the sample – that's later).  Now, maybe some motivation about where the definition comes from could be a good addition, but that's a bit different.  --Deacon Vorbis (talk) 00:26, 24 August 2017 (UTC)

Definition
, I believe you are supposed to start a discussion here when your edit is challenged as opposed to reverting again. As to the definition, could you give sources, which support your version? Thank you. Retimuko (talk) 18:26, 7 December 2017 (UTC)


 * It is a fact which is well known all over the world. We have $$ \overline x $$ for descriptive statistics or samples respectively, and we have $$\mu$$ in probability theory. Both may be named 'mean', but only the latter may be named 'expectation value'. This is so fundamental and undisputable in mathematics, that it needs not to be backed by any sources. --Karl24042017 (talk) 19:13, 7 December 2017 (UTC)


 * If this is so well known, it must be easy to give one reputable source, mustn't it? "I just know this to be true" is not a very persuasive argument. Retimuko (talk) 20:27, 7 December 2017 (UTC)


 * Sorry, I don't have the time to get some schoolbooks out, that I - as everyone aquainted to mathematics - do no more need! And it is a rule of WP, that things, that are commonly undoubted, need not proved by citation. --Karl24042017 (talk) 21:29, 7 December 2017 (UTC)

Grammar/punctuation errors?
Shouldn’t the following sentence:

> Variance is an important tool in the sciences, where statistical analysis of data is common.

Instead be:

> Variance is an important tool in sciences where statistical analysis of data is common.

? behrangsa (talk) 01:30, 3 February 2018 (UTC)


 * Both versions make grammatical sense, but there's a slight difference in meaning between the two. The first is clarifying that statistical analysis of data is common in the sciences ("the sciences" as a common way to refer to different branches of science).  The second tends to indicate that variance is only important in those branches of science where statistical analysis of data is common.  But given that data analysis is pretty ubiquitous among all the sciences, I think it should remain the way it is.  –Deacon Vorbis (carbon &bull; videos) 01:39, 3 February 2018 (UTC)

Distribution of the sample variance
Description "where κ is the kurtosis of the distribution and μ4 is the fourth central moment." the kurtosis part seems to be wrong. Probably "excess kurtosis" was meant. 5.179.31.80 (talk) 06:37, 8 June 2018 (UTC)Wiki User5.179.31.80 (talk) 06:37, 8 June 2018 (UTC)
 * Nope, it's correct as it is; just simplify the expression, and you get the kurtosis formula. –Deacon Vorbis (carbon &bull; videos) 12:15, 8 June 2018 (UTC)

Indeed, thanks for clarification. For some reason thought that kurtosis of normal distribution is 0 (in reality it's 3). Tried to plug k=0 into formula for general distribution and hoped to obtain simplification into formula for normal distribution. Using k=3 fixes the discrepancy. By the way, is there a reason why for normal distribution E(s^2) is in normal brackets but for general it is in square?5.179.31.80 (talk) 20:17, 8 June 2018 (UTC)Wiki User5.179.31.80 (talk) 20:17, 8 June 2018 (UTC)

Clarification for notation
"where κ is the kurtosis of the distribution and μ4 is the fourth central moment." sounds ambiguous. Is it meant for y(i) or s^2 5.179.31.80 (talk) 09:08, 21 June 2018 (UTC)Wiki user5.179.31.80 (talk) 09:08, 21 June 2018 (UTC)

Use of term "variation"
At the end of the section "Population variance and sample variance", the final phrase says "This is known as the biased sample variation". Shouldn't that last word be "variance"? "Biased sample variation" had 33 hits on Google when I checked (at least some of which are versions of this article) whereas "Biased sample variance" had some 27,000. RMGunton (talk) 20:58, 12 December 2018 (UTC)

Notation
Section "I.i.d. with random sample size" has terrible confusing notation, inconsistent with the rest of the page. It does not seem to belong there - if it even belongs there logically. 147.229.99.101 (talk) 16:45, 21 October 2019 (UTC)