Talk:Standard deviation/Archive 1

Real Life Examples??
Wording is terrible, changed it to Examples of Applications, maybe change it to Examples of Modern Applications... or better yet: applications or modern applications --132.198.188.47 (talk) 16:55, 20 October 2008 (UTC)

TI-83/84
Could someone add how to do it on a TI-83/84, thanks - posted 11/12/06
 * The calculator? Far too specific for an encyclopedic article. The manual would be a much better place to look :) Richard001 20:42, 27 January 2007 (UTC)

Yes, push [2nd], then [0]. to bring up the catolog, then push [LN], to take you to the m's section. Scroll until you find stdDev(. Press enter. Enter then values desired, in this format... stdDev({1,3,3,6,2,34,2}), then press enter. Do not forget to enter the { as the calculator will not read the values without them. This will give you the standard diviation for any set of numbers.

-J-

History
This should talk about where the concept of standard deviation came from as well (astronomy)

ye, gauss used it to find positions of planets. hence the name the gaussian distribution.
 * or was it comets? i forget.

A great reference is "The History of Statistics: The Measurement of Uncertainty before 1900" by Stephen M. Stigler, 1986, Belknap Press of Harvard University Press. —Preceding unsigned comment added by Zestforlife (talk • contribs) 14:31, 3 June 2009 (UTC)

Accuracy of this article?
This paragraph is incorrect:

For example, in the population {4, 8}, the mean is 6 and the standard deviation is 2. This may be written: {4, 8} ≈ 6±2. In this case 100% of the values in the population are within one standard deviation of the mean.

The variance is 2, but the standard deviation is the square root of 2. I would update it myself, but I don't know how to embed all those nifty symbols.

(Coppertwig) No, you're wrong. It's right. To calculate the S.D., you take the differences from the mean of 6, which are 2 and 2. You square these, getting 4 and 4. You take the average of them: that gives you 4. Take the square root and you get the S.D. of 2. The variance is 4, not 2. --Coppertwig 2006 Nov 3 18:27UT

The computer implementation under "rapid calculations" is wrong. I believe its missing a minus sign between new score minus old mean squared. I couldnt get it to work and then found this page that has the same formula but with a minus sign instead of implying multiplication: S(k) = S(k-1) + (x(k) - M(k-1)) * (x(k) - M(k)) http://mathcentral.uregina.ca/QQ/database/QQ.09.02/carlos1.html —Preceding unsigned comment added by 128.252.233.244 (talk) 18:22, 27 March 2009 (UTC)

Complaints about accessibility
I agree about the accessibility, it should be accessible in the intro. However, as a past PhD student, an MBA student, and student of statistics, I need all the detailed proofs, and deleting them in the name of accessibility is purely retarded.Aaronchall (talk) 04:53, 6 March 2009 (UTC)

AAAAAAAAAHHHHHHHHHHHHH Seriously - this is way too comlicated. I am trying to use SD in a school project and none of these formulas make sense. Sure, for a more advanced maths student or a mathematician, these formulas are great - but as is a problem with many maths and science articles on wikipedia, this is totaly inaccesibe to your average person. I suggest that in each page you have a simple explanation that is understandable. For example 1 - Determine the mean of a set of scores 2 - Determine the difference between each score snd the mean; this difference is called the deviation. (Score - Mean) 3 - Square each deviation 4 - Calculate the mean of the squared deviations 5 - This result is called the variance 6 - Standard deviation equals the square root of the variance

Another plea for this article to be more accessible, after all most people who look up "standard deviation" on wikipedia will be people who don't know what it is and want to know. They are not statistical hermits who will get a warm feeling inside about the elegance of the statistical proofs. The worked examples are a step in the right direction but there needs to be more accessible style from the beginning 79.75.23.211 (talk) 18:25, 7 December 2008 (UTC)

This is horribly, horribly written
As someone else pointed out, what is the point of writing it in the language of a statistics PhD student, when those people already know this very basic concept? This should be written in plain English in the introduction. The technical definitions could go later in the entry, but for the average encyclopedia user they are USELESS.

Other people have already suggested plain English definitions and explanations, and they should be the focus of this entry. I don't know why they were edited out. Aroundthewayboy 16:02, 25 February 2007 (UTC)

As a statistics PhD student, I can assure you that is NOT written in the language I would use - it's awful all around. Hadleywickham 02:58, 6 March 2007 (UTC)

---

Many thanks for stating my thoughts -- much more clearly than I've been able. I've found this article mostly useless for gaining much intuition as to what the std dev is, is not, what it tells me, the underlying assumptions, etc -- expressed in plain english. Put another way: if I have to "know" math to "understand" wiki entries about math -- well, then those wiki entries are, from my point of view, nearly useless. —Preceding unsigned comment added by 24.5.201.29 (talk) 23:35, 2 September 2007 (UTC)

I came to this page wanting a reasonable explanation of standard deviation written in the style and tone of other wikipedia entries. Instead I find a page full of formulae and theorem written for an audience that is already well versed in complex mathematics. Students without the ability to read formula are now excluded from even a basic understanding of standard deviation. I believe that this article should be divided up into a layperson's explanation and a theoretical explanation of standard deviation and not mixed together. This article in no way will assist people studying maths or statistics at a basic or applied level.

I have also found a number of complaints and suggestions to this effect that I have grouped together to assist anyone who may be able to clarify the page.

Change formulae presentation
The formulae pictures in this article are too confusing, simple basic formulae pictures or diagrmas i.e a=mx+(c/d) rather than those wierd diagrams with symbols above and below the sigma total sign. Im doing A level maths and even I find those symbols confusing and unhelpful. --Globe01 12:58, 4 March 2007 (UTC) This was written a long time ago, and the current notation is widely used and understood, can we delete this comment? Aaronchall (talk) 05:06, 6 March 2009 (UTC)
 * No, we don't delete comments on Talk pages unless they are pure vandalism or off-topic. This page is a record of past discussion as well as current issues.  The dates on comments make it clear what is old and what is new. McKay (talk) 06:06, 6 March 2009 (UTC)

==

The opening paragraph states "it is a measure of the average difference between the values of the data in the set." Shouldn't this be "it is a measure of the average difference between the values of the data in the set from the mean?

Correct me if I'm wrong, but I always thought the std dev equation had (n-1) in the denominator, not n: where n is the number of data pts.


 * This is kind of explained in the article, but to put it differently: The standard deviation of a (finite or infinite) population is defined as the square root of the population variance. The variance of a finite population is defined as the sum of the squared mean deviations, divided by the population size. If one is trying to estimate the population variance based on a (small) sample, then it would be wrong in general to equate the sample with the population.  In particular, estimating the population variance as the sum of the squared mean deviations divided by the sample size yields a biased estimator of the population variance.  Dividing by n&minus;1 instead of n yields an unbiased estimator. However, in the extreme case where n equals the population size (of a finite population), it would be wrong to divide by n&minus;1, since one has seen the entire population, hence there is no uncertainty about its variance and one can simply calculate it instead of estimating it. It's the distinction between the variance of a finite population and its unbiased estimator that leads to the terms n and n&minus;1, respectively, in the denominator (and to widespread confusion). --MarkSweep 07:23, 19 May 2005 (UTC)


 * The std dev example and its answer in the article were incorrect, as was the equation. I edited it to reflect that indeed the std dev equation should be (n-1) in the denominator.  Therefore, now the equation, math and example answer are now correct.  I'm not sure if the original creator of the example had it correct or not or whether it was edited.  I'm new to using wikipedia, and it's somewhat suprising that material can be easily edited.  Anyway, the example equation for std dev is currently correct (although I didn't take the time to go through the equation to derive std dev as the panel above the example did.  I don't know whether it is correct, but I did edit it as well since its derived equation for std dev also did NOT include n-1 in the denominator).  I'm no statistics GIANT LEMONS, but I know how to use excel and a calculator.  The answer was incorrect but please correct me if I'm wrong. --renius  9 Sept 2005

I thought this deserved clarification, so I put some in there --Pdbailey 01:43, 16 September 2005 (UTC)

The denominator should be n-1 the SD stated is wrong, someone should fix this. Really the example should include 5 numbers to make it easier to see the use of SD. There is almost no purpose in knowing the SD for a 2 number distribution. —Preceding unsigned comment added by 140.180.143.99 (talk) 15:12, 3 March 2008 (UTC)

why dont u just use a online calculator??? :D it worked for me

NOTE: Correct me if I am wrong; but doesn't "N" denote a population and "n" denote a sample size of a population? Meaning that the sample std deviation should contain "n" in the equation, not "N"?


 * I tend to think that any book or article can choose to define "n" and "N" and any other variables how they please, as long as they're consistent within that book or article. However, if there's a convention to define it a certain way, it's usually better to follow it.  In any case, this article should state clearly what "n" and "N" represent here.  I won't object if you choose to be bold and edit the article to consistently change how "n" and "N" are used.  Sounds to me like probably an improvement.  Just my opinion -- I can't speak for others of course. --Coppertwig 17:52, 8 November 2006 (UTC)

But what does the result of the standard deviation tell us?
Why perform the equation? What do the results say? Kingturtle 07:43, 6 Nov 2003 (UTC)
 * To say it differently.....this article needs a section explaining the application of standard deviation. When is it used? And what can the results tell us? Kingturtle 06:19, 10 Feb 2004 (UTC)

NOTE: Correct me if I am wrong; but doesn't "N" denote a population and "n" denote a sample size of a population? Meaning that the sample std deviation should contain "n" in the equation, not "N"?

REQUEST: I'd like to some discussion of the meaning of the relative sizes of the mean and the standard deviation. For example, if the mean is 5 and the standard deviation is 10 (ie - twice as big as the mean) doesn't that tell you something important about the data set? But what? I'm not a statistician, so I can't answer my own question. But when you see data like this: the average cost of a certain medical procedure is $100 and the standard deviation is 150; given than the cost isn't going to go below $0, this data has to be highly skewed. Does it matter to anything? —Preceding unsigned comment added by 160.39.88.211 (talk) 19:43, 16 April 2008 (UTC)

Standard deviation is a statistical term that provides a good indication of volatility and dispersion of a set of values Anoopnair2050 (talk) 09:39, 9 July 2008 (UTC)

sigma level
when you read a quote like this "To see a very light Higgs (say, 115 GeV) at the 5 &#963; level will require a year of running", does that mean that the signal is within 5 standard deviations or something? I hear that usage a lot in experimental physics, and I am unsure what it means. is a higher sigma confidence level good or bad? I once heard that things at the 3&#963; level are suspect, and one can't trust the result. So what's 5 &#963;? anyone can explain this? i think it would be a nice addition to this article... Lethe


 * If I understand correctly, such quotes mean "if we assume that this data is the result of random statistical fluctuation rather than an actual signal, this is how many sigmas away from the mean our results are." In other words, the sigma level gives a measure of how likely it is that you are just seeing a fluke.  In your Higgs example, the meaning is that if they ran the detector for a year, a false detection would require a statistical fluctuation of five sigmas away from the mean.  Higher sigma means more confidence, because it means your signal would be an increasingly large fluke if it weren't real.  Isomorphic 03:54, 12 Jul 2004 (UTC)

Thanks for the explanation. I'm going to try adding it to the article. -Lethe

Std dev is a measure of spread. Knowing the average is all very well, but you need to know how close your data is to the average. The higher the st dev, the more spread out the data is.Anoopnair2050 (talk) 09:44, 9 July 2008 (UTC)

What about underlying assumptions?
This is quite a short and incomplete treatment of standard deviation.

It should be mentioned that a Normal distribution about the true mean is assumed. The article doesn't differentiate between the true mean and the sample mean, which in turn explains the difference between the true standard deviation and the sample standard deviation (and why the normalisation constant is n-1 and not n, i.e. the sample mean takes away one degree of freedom as it is computed from the data set and only estimates the true mean).

Nothing is said about confidence intervals, which is what standard deviations are useful for: 1-sigma is the 67% interval, i.e. 67% of the data is within 1 standard deviation, 2-sigma is 90%, 3-sigma is 95% and so on.

Again, the standard deviation can only be interpreted if the data is distributed normally about the mean. Before I add more criticism, let me have a look at the article(s) for mean and normal distributions.


 * The standard deviation exists for any population, and always has meaning regardless of whether the distribution is normal. The confidence interval interpretation you're giving only works for normal distributions, but nothing about the standard deviation itself assumes normality. Isomorphic 15:01, 26 Oct 2004 (UTC)


 * It is nonsense to say "a Normal distribution about the true mean is assumed" or that confidence intervals are the only thing that standard deviations are useful for, or that "the standard deviation can only be interpreted if the data is distributed normally about the mean". If you're doing certain things with confidence intervals, then "the standard deviation can only be interpreted if the data is distributed normally about the mean" if you're doing certain other things with confidence intervals, or many other things not involving confidence intervals but for which the standard deviation is useful, then normality should not be assumed. Michael Hardy 20:31, 26 Oct 2004 (UTC)

Making it easier to read
26 Oct 2004 : FROM

Simply put, the standard deviation tells us how far a typical member of a sample or population is from the mean value of that sample or population.

TO

Simply put, the standard deviation tells us how far a typical member of the population (or sample) is from the mean value of that population (or sample).

(Coppertwig) I suggest adding as the second sentence of the article, either the sentence above; or the sentence above with "(or sample)" deleted; or the following:  "It is a measure of how much the individual elements tend to deviate from the average." or something similar. Something in plain language to give an idea of what the standard deviation is. --Coppertwig 2006 Nov. 3 18:36UT

Thanks for making this article easier to read and understand. I don't know what it was like before. I was trying to understand the "bell curve" last night while reading something in a book. This helped a lot and it gave me insight on standard deviation as well. I recall seeing the bell curve in school and I thought I understood it until I tried to make one up. This made things really clear. Kudos. Ti-30X (talk) 02:45, 6 May 2009 (UTC)

Interpretation and application
An anon just changed the 5 to a 4 in the sentence "For example, the three samples (0, 0, 14, 14), (0, 6, 8, 14), and (6, 6, 8, 8) each have an average of 7. Their standard deviations are 7, 5 and 1, respectively" as his first edit. I'm paranoid against vandalism, so is he right? -- Kizor 10:46, 21 May 2005 (UTC)


 * No he is not - I have just reverted it.--Niels Ø 14:19, May 21, 2005 (UTC)


 * Thank you, my good fellow! -- Kizor 19:24, 21 May 2005 (UTC)


 * It is indeed 5 and I can imagine why the anon changed it to 4: he simply calculated the average difference to the mean. Unfortunately this naive calculation delivers by coincidence the correct result for the first and the third data set! Maybe you better choose different sets. BTW: You write "For example, the three samples (0, 0, 14, 14), ..." but those are not samples but complete data sets. -- Steve Miller 31 Aug 2005

Always begin with the simplest example
The mean of two numbers, A and B, is (A+B)/2. The standard deviation is |A-B|/2. The mean is center and the standard deviation is radius. The clue for generalizing this definition to more than just two numbers is symmetry. The definitions are invariant against switching A and B: (A+B)/2=(B+A)/2 and |A-B|/2=|B-A|/2. Now, any symmetric function can be written in terms of the sums of powers: u=A^0+B^0, v=A^1+B^1, w=A^2+B^2, &c. The mean is simply (A+B)/2=v/u. The standard deviation is (sqrt(uw-v^2))/u, because |A-B|=sqrt((A-B)^2)=sqrt(A^2+B^2-2AB)=sqrt(2(A^2+B^2)-(A+B)^2)=sqrt(uw-v^2). The generalized definitions, mean=v/u, standard deviation =(sqrt(uw-v^2))/u, satisfy nice conditions: symmetry to permutation of the numbers by construction, homogeneity to multiplying all numbers by the same factor. Bo Jacoby 09:50, 9 September 2005 (UTC)

motivation for std deviation
There is a motivation for using this measure of dispersion instead of the mean deviation. Can somebody add this? --Pdbailey 01:41, 16 September 2005 (UTC)

I've wondered this myself, anyone considering adding this should check this link out:. Intangir 06:01, 24 November 2005 (UTC)


 * Thanks to Intangir for that link. It leads to a lengthy discussion, some of it is nonsense, much of it repetition, but it also contains some good points, of which I rephrase two below. So this is how I'd answer the question without becoming too technical:


 * The reason we use the standard deviation (squareroot of the mean of the squared deviations) rather than the mean deviation (mean of the absolute values of the deviations) is that the ensuing math is simpler and more useful if dispersion is measured by standard deviation (or by variance, the mean of the square deviations, i.e. the square of the standard deviation), instead of by mean deviation. The square function involved in the variance has nice mathematical properties, as compared to the absolute value function involved in the mean deviation.


 * A key example is this: Let $$A$$ and $$B$$ be independent random variates with unknown (or arbitrary) distributions, but with known dispersions. E.g., $$A$$ is the weight of a man, and $$B$$ is the weight of a woman. What is the dispersion of $$A+B$$, i.e., of the total weight of a couple (assuming independence)?

σσ::*If disperion is measured by mean deviation, this question cannot be answered in general.
 * If measured by variance, the answer is simple: $$Var(A+B) = Var(A)+Var(B)$$.
 * If measured by standard deviation, the answer is still fairly simple: $$\sigma_{A+B} = \sqrt{\sigma_A{}^2 + \sigma_B{}^2}$$.


 * Similarly, if $$X_1, X_2, X_3, \ldots, X_n$$ are independent variates, what is the dispersion of the mean of all the $$X_i$$'s, $$\overline{x} = \frac{X_1+X_2+X_3+\ldots+X_n}{n}$$?


 * Mean deviation: Cannot be answered in general.
 * Variance: $$Var(\overline{x}) = \frac{Var(X_1)+Var(X_2)+Var(X_3)+\ldots+Var(X_n)}{n} = \overline{Var(x)}$$.
 * The above is incorrect. You would need to divide by n2. Michael Hardy 01:59, 21 March 2006 (UTC)
 * Standard deviation: $$\sigma_{\overline{x}}=\sqrt{\frac{\sigma_1{}^2+\sigma_2{}^2+\sigma_3{}^2+\ldots+\sigma_n{}^2}{n}}$$.
 * ...and the above is incorrect for the same reason. Michael Hardy 02:00, 21 March 2006 (UTC)
 * Here follows another, more technical, reason. We have hitherto assumed that deviations always are measured from the mean value, but if we allow ourselves to measure them from any other suggested central value, different values for our measures of dispersion will follow. The mean of the squared deviations is minimized by taking the central value to be the mean; the same is not the case for the mean deviation. Thus, the standard deviation is a natural companion to the mean. The mean deviation may be more naturally associated with the median.
 * --Niels Ø 13:03, 28 November 2005 (UTC)


 * This should be discussed in the article, the question of why standard deviation is used instead of a simplye calculation of the mean deviation from the mean is one that many students have likely wondered when first learning of this concept. If this explanation can be moved into the article, in terms kept as simple as possible, it would make the underlying reason standard deviation is defined as it is much clearer. Richard001 08:23, 28 January 2007 (UTC)


 * I thought that one reason for using SD over the mean absolute deviation (MAD) is that it captures variation between the values. For example, {-2,-2, 2, 2} has both an SD and a MAD of 2. But {-3, -1, 1, 3}, which intuitively seems more "scattered", has an SD of 2.236, while the MAD is still 2. I thought this is why variance is called variance. Is this just an unwanted side-effect? Is SD really just supposed to approximate MAD with a more elegant formula?
 * —Forlornturtle (talk) 16:01, 28 May 2008 (UTC)

I would also be interested in this...the example i have in mind is:

Suppose you have road death statistics for 5 years n1..n5, then the Police introduce a new road safety measure, and you have the next year's (n6) statistics....how do you work out if there was a significant drop in road deaths? My first guess was that if it is a drop of more than the mean absolute deviation, this is 50% significance...could you do this with standard deviation?


 * That is a very important question. Let the death rate per year be the nonnegative real number M, and let the actual number of deaths in some year be the nonnegative whole number n. Knowing M, the probability distribution of n is a called the poisson distribution. The mean value of n is M, and the standard deviation of n is the square root M1/2. Now actually the situation is the opposite one. Knowing n, the likelihood distribution of M is called a gamma distribution. The mean value of M is n+1, and the standard deviation of M is (n+1)1/2. This is a mathematical fact, but it is not intuitively obvious. (The addition of 1 is important, but often forgotten. That no accident happened this year, n=0, does not imply that the road is completely safe, M=0, but rather that M is distributed with mean value 1 and standard deviation 1). The first five years have the death rate Ma and the sixth year has the death rate Mb . The question is whether Ma is significantly greater than Mb based on the observations (n1, n2, n3, n4, n5, n6). Let's compute it. The mean value for 5Ma is n1+n2+n3+n4+n5+1, and the standard deviation of 5Ma is (n1+n2+n3+n4+n5+1)1/2.  The mean value for Ma is (n1+n2+n3+n4+n5+1)/5, and the standard deviation of Ma is (n1+n2+n3+n4+n5+1)1/2/5. The mean value of Mb is n6+1 and the standard deviation of Mb is (n6+1)1/2.  Now the mean value, μ, of the difference Ma&minus;Mb is the difference of the mean values, μ = (n1+n2+n3+n4+n5+1)/5 &minus; (n6+1), and the standard deviation, σ, of the difference Ma&minus;Mb is the square root of the square sum of the standard deviations of the terms, which is σ = ((n1+n2+n3+n4+n5+1)/25+(n6+1))1/2. We need to compute the ratio U = (μ-0)/σ in order to determine whether the the difference between the two death rates differ significantly from zero. If |U| > 3, then the difference is highly significant. If |U| < 2, then the observations do not indicate a difference. U has a normal distribution with mean value 0 and standard deviation 1. Bo Jacoby 15:04, 23 July 2007 (UTC).

Hello 140.110.227.89
What are you trying to do ? Your edits seem senseless to me, and some of them are wrong. (You cannot sum from 1 to N+1 when there are only N members in the set). Please explain your intentions here at the discussion page before editing the article. Otherwise it seems to be vandalism. Bo Jacoby 06:49, 20 October 2005 (UTC)

How to estimate the probability to be away from the mean.
We have Chebyshev's inequality if the distribution is unknown. In the article there is also info about that issue for the normal distribution.

I ask: how can we do the same for other distribution? Like Poisson distribution, binomial distribution, and so on... --Yochai Twitto 15:19, 31 December 2005 (UTC)

A mis-type mistake and N versus N-1
On the "Standard Deviation" article, I note that the last two equations in the first section are identical where, as I read it, the second is supposed to be a simplification of the first. I'm not sure what simplification the author intended, but I doubt it's right the way it is.

The discussion commenters seem to have confusion between standard deviation and standard *error*. The former is the tendency of a random process to deviate from its mean (average) while the latter is an *estimate* of the former from observed data. The standard deviation of a collection of observations is

sqrt(sum of squared deviations/N)

where N is the number in the entire set while the standard error is             sqrt(sum of squared deviations/(N-1))

where N is the number of observations. In the standard-error case, N is presumed much smaller than the number of occurrances or potential occurances (this last is where I'm on soft ground but a real statistician could clear it up). In some mathematical-statistical sense, the other one of the N-1 explains the difference between the observed sample mean and the true distribution mean, if that helps.

The Adam 20:53, 27 January 2006 (UTC) The Adam

important detail
In order to obtain the results shown in the page, there is one detail missing, in the set, the seven is missing, the length of the set should be five, not four. i don't know how to use formulas, but i am pretty sure of this. try doing it with seven and the result you will obtain, if you do it the way it is, the deviation will be sqrt(10/3), instead of sqrt(10/4)

Shortcut?
Regarding the first heading, "Definition and shortcut calculation of standard deviation"... What does the shortcut refer to? This is unclear.

Also, the equation $$s = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2}$$ is repeated at the end of that section. Is it supposed to be different the second time?

And now the simple way without the formulas
Standard deviation can be calculated in five easy steps


 * 1. Calculate the average value
 * 2. Calculate the difference between each value and the average value
 * 3. Calculate the square for each value.
 * 4. Add them and then calculate the average of that number
 * 5. Now get the square root out of that number and you are done

Simple enough

~ Booyabazooka 05:03, 27 February 2006 (UTC)
 * You are perfectly right. Go ahead and make improvements. Bo Jacoby 07:27, 27 February 2006 (UTC)


 * I applaud the "simple way" entry. I feel that Wikipedia has a problem with mathematics.  If it is possible to throw formulas into a page, the math guys will do everything they can to replace all the English text with math formulas.  In my opinion, it is a purposeful attempt to hide information from people who don't know what how to read the formulas.  At least in this article, a person with barely a high school math background can read the "simple way" and calculate a standard deviation.  Without this section, they would be left wondering what the hell those formulas mean and how to enter them into their spreadsheet. --Kainaw (talk) 15:40, 7 March 2006 (UTC)
 * My gut reaction was: how is this proposed "simple" way any less a formula than any other way of expressing it, and how is it the last bit different from the way standard deviation is ALWAYS described? "a purposeful attempt"???  That is nonsense at best.  Where are these people who don't know how to read formulas?  One would have to think that they exist before one could possibly make any purposeful attempt to hide anything from them?  Why would one wish to hide something from such people? Michael Hardy 15:25, 9 September 2007 (UTC)
 * Careful. Assume good faith. I find it highly doubtful that people are deliberately trying to 'hide' knowledge by rendering it in formulas. More likely, mathematically literate people are trying to make articles more clear, concise and specific by rendering long explanations as simple (to them) formulae. For example, the 'simple way' given above is vague and unclear because it simply says "average" without specifying which one (mean, mode, median). Also, there are some points (such as possibly averages) where the language conveys different mathematical concepts to different people. By contrast, the maths means only one thing. SAnd if you wish to remember how to calculate the SD, and do it repeatedly, it is much easier and quicker with the formulae. So by all means include simple explanations and methods, but remember great care is needed. It isn't as black-and-white as all that. 84.43.94.121 17:37, 15 March 2006 (UTC)


 * I assume good faith until I am told "take more math you idiot" when asking for an English description of a formula. For example, in Zipf's law, I was treated as adding a simple description in bad faith for regurgitating Zipf's own description of his law as a simple 1/f series (without tossing in a complicated formula).  I understand the warm and fuzzy feeling a person gets when they can wrap up three or four paragraphs in a single formula.  I won a $100 bet by writing a bubble sort in Java with one loop (instead of the 2 loops expected by the definition).  But, there is not valid excuse in my mind for restricting access to knowledge by expecting a Wikipedia reader to be able to read any formula beyond high school math. --Kainaw (talk) 00:37, 21 March 2006 (UTC)


 * I understand that the so-called "simple" way will be easier to understand for those who do not know the mathematical notation. But I don't think it's any simpler than the other way, nor is it a different way; it's just expressed in a different language.  For those who do know the notation, the form involving the notation is easier to read, since you don't have to plow through that vast and complicated collection of words.  The notation is simpler, but not easier for those who don't know it. Michael Hardy 02:13, 21 March 2006 (UTC)


 * OK, I can see how "take more math you idiot" when asking for an English description of a formula would rile you up, and it wasn't appropriate behaviour of the other wikipedian. I do understand the need to include easily understandable explanations and methods for these things, but I really don't think it's true that people are trying to make it harder for others to access the information; more likely, they don't feel the average reader of the article knows as little maths as you feel they do. Whoever is right, this article now contains a simpler form of the method. Yay! Skittle 12:47, 31 March 2006 (UTC)

Re-inserted the simple part at the end. I took care of the vagueness which motivated the last edit. I argue that this section is not redundant, but merely explains the procedure of obtaining standard deviation for a variable(one of the most popular ways for small populations, albeit) in non-technical English.

std dev is the average distance of each bit of data from the arithmetic mean. your explaination is good, but ive added a coupla changes -


 * 1. Calculate the (arithmetic mean of the data) average value
 * 2. Calculate the difference between each value and the average value (by subtraction)
 * 3. Calculate the square for each value. (because some is positive and some negative. to get the average distance right we square all the values)
 * 4. Add them and then calculate the (arithmetic mean)average of that number (this value called the varience)
 * 5. Now get the square root out of that number and you are done

yours is a good explanation.i think it should be added to the proper page. —Preceding unsigned comment added by 212.159.75.167 (talk) 19:42, 13 October 2007 (UTC)


 * Two things are wrong with this: a) I wouldn't say that "std dev is the average distance of each bit of data from the arithmetic mean." That would be the mean absolute deviation, and for that you don't need to do a root-mean-square. b) A related issue in point 3: we don't square to get rid of the sign, we do it to capture the true "dispersion" as opposed to just the average deviation. I explain this in this blog entry.


 * More generally, I'm not sure a "recipe" like this is much more helpful than using formulae. It still won't help people understand what standard deviation really means and why it's calculated the way it is. Forlornturtle 10:40, 16 October 2007 (UTC)

A comment
Lord love us and save us! I've just put in a link to this article from the one on the Pyrometric cone, so I though I'd better have a look at it. Gentlemen please, do try and write something that a reasonable bright non-statistician might understand (the statisticians already know all about this stuff, think about your intended audience). Regards, Nick. Nick 12:24, 29 March 2006 (UTC) Question Please include a simple explanation that lets me know the importance of the standard deviation in regards to its relationship to zero. The article states that a number of more than zero increases the variance. How high does the number have to go before the vairiance is significant?

It depends on the data you're looking at; you can only really compare standard deviations of similar data sets. For example, if you are looking at the age of school children in a single class, a standard deviation of a year is usually very large. If you're looking at the population of a country, a standard deviation of a year is very small. I'll try and add something to the article if it isn't there. Skittle 12:44, 31 March 2006 (UTC)

Formulas in 'human' readable form
Not being a TeX user, I can't parse the formulas given in my head. It would be a great help if there were bitmaps, SVGs or some other rendering of the formulas in addition to the nice ones already there.


 * Wikipedia can display TeX or graphical formulas. Click on "my preferences" at the top right, then "Math", and you can set if you want to see the TeX or the real formula. Billgordon1099 (talk) 19:39, 12 June 2008 (UTC)

Date Error
Presumably, regarding Leonardo da Vinci, the date should read 1494 not 1894?

Derivation
Could someone kindly add a section about the derivation of the SD formulae? I'm particularly interested in the meaning of N-1. Thank you.

N-1 indicates that you are evaluating only a sample of data Bsodmike (talk) 12:31, 24 January 2009 (UTC)

Here is a proof. I will assume $$E(X)=0$$ for convenyance, the proof becomes somewhat more involved without this, but is essentially unchanged. Then, if the ith and jth draws are not correlated (but not necassarily independant), or more formally $$Cov(x_i,x_j) = \delta(i,j) sigma^2$$

Many of these steps used $$E(x_i x_j) = E(x_i x_j) - E(x_i) E(x_j) = Cov(x_i,x_j) = 0$$. i would include this on the main page, but there isn't much of a prescident for including long proofs, perhaps at the end? and it would need to look nicer too. Note that this theorem was suggested to me by Michael Hardy on my talk page. Pdbailey 03:25, 1 June 2006 (UTC)

I would like to see this proof included. You would need to state clearly what you are proving. I think it needs some fixing up: in line 2, I think you need parentheses to indicate that the summation symbol applies to all three terms (I think); there may be other similar changes needed. The last few lines of the proof could possibly be shortened, but some of the earlier lines could be expanded to show more clearly how you're using that Cov=0 thingy you mention at the bottom; maybe a few English words to break up the derivation and explain things a bit, and/or including a term that works out to zero and then on the next line putting an actual zero where that term was. Where summation symbols are used, I prefer to see either parentheses that tell you that all the following terms are included in the summation, or parentheses that tell you they are not. --Coppertwig 2006 Nov 3 17:01UT

reason for article for standard deviation and one for variance
Does anybody know a good reason to have separate articles for standard deviation and variance? Pdbailey 04:44, 14 June 2006 (UTC)
 * No good reason. Merging the articles is a good idea. Bo Jacoby 08:02, 11 July 2006 (UTC)
 * And for that matter, pooled variance and pooled standard deviation should be merged. Btyner (talk) 19:53, 17 January 2009 (UTC)

An axiomatic approach
I find this section pretty useless. Does anyone like it? McKay 16:26, 5 July 2006 (UTC)
 * There is otherwise no motivation for the complex formula defining the standard deviation. It is a nice fact that the mean value μ and the standard deviation σ is completely characterized by the simple algebraic properties a+(μ±σ) = (a+μ)±σ and a(μ±σ) = aμ±aσ, together with the symmetry condition and the initial condition (+1,&minus;1) ≈ ±1 . But the section can certainly be improved. Bo Jacoby 07:50, 11 July 2006 (UTC)

Aren't those simple algebraic properties simply the distributive property of multiplication over addition etc. and are true of any set of real numbers, so that if S.D. were defined differently, those would still be true? Perhaps what is actually meant is not being clearly stated. I wasn't able to follow this section at all. It says "The two numbers" and then has a bunch of equations. If it has a meaningful message, it needs more words to explain the purpose of all those equations, i.e. what is supposedly being proven? --Coppertwig 2006 Nov. 3 18:55UT

notation confusion?
For the equations on this page the standard deviation is notated as S while on other wiki pages ( http://en.wikipedia.org/wiki/Bias_%28statistics%29 ) it is notated as S^2. I don't want to make any edits myself because I am not sure what the general notation for this is, but I believe that one of these pages may need to be changed. Also as a side note, a more in-depth explination as to how the standard deviation relates to the the gaussian curve (ie 68.27% in +/- 1 stdev) would be useful. 65.89.12.2 19:49, 11 July 2006 (UTC)
 * Statistics is for historical reasons a very messy branch of mathematics, unlike geometry which for historical reasons is a very clean branch of mathematics. Euclid was a greater mathematician than Ronald Fisher. That cannot be helped by a minor edit in wikipedia. The square σ2 of the standard deviation σ is called the variance. A statistical population is a multiset of numbers, and a statistical sample is a submultiset of the population. The mean and standard deviation of the population is called by the greek letters μ and σ, and the mean and standard deviation of the sample is then often called by the corresponding latin letters M and S. Deriving sample information from the population is called deductive reasoning, and deriving population information from the sample is called inductive reasoning or inferential statistics. The standard deviation of the gaussian curve is descibed in the normal distribution article. Bo Jacoby 08:20, 12 July 2006 (UTC)

Why This Methodology?
I have always wondered this about the standard deviation (I assume the derivation would answer this question): Why not use a formula that takes the average of the (absolute value of (the differences between the samples and the mean)). Taking the square root of a sum of squares does not "undo" the original squares - it introduces some factor of difference. (unsigned comment by 205.228.12.194)


 * You could use any norm really to measure dispersion. There is the convenience aspects of squaring, but only with this definition can we use the standard form [(X - mu) / sigma]. Pdbailey 13:58, 31 August 2006 (UTC)

It's largely because of the additivity of variances: If X and Y are independent random variables, then the variance of their sum is just the sum of their variances. That doesn't work for any simple function of the mean absolute deviation. Michael Hardy 01:25, 10 September 2007 (UTC)


 * http://amarsagoo.blogspot.com/2007/09/making-sense-of-standard-deviation.html answers the question nicely. —Preceding unsigned comment added by RyanCu (talk • contribs) 22:01, 13 September 2007 (UTC)

I don't think that page answers the question asked above. The question above is: why not use the mean absolute deviation? The page you link to just says the reason for squaring and then taking square roots is NOT to eliminate the sign. Say that that's NOT what the reason is, stops short of saying what the reason is. Michael Hardy 02:42, 14 September 2007 (UTC)

Error?
The first example where (4,8) is the population with mean 6. The standard deviation is 2. Isn't one standard deviation 100% of this population, not two standard deviations? 6 +/- z(std. dev) where z=# of standard deviations.

There is a more glaring error in this section. For the population (4,8), the mean is indeed 6 but the standard deviation is root(2) and not 2.

2 is the variance of the distribution (4,8) and the standard deviation is defined as being the square root of the variance.


 * Is there an error in my calculation?


 * Var = 1/2 [ 2^2 + 2^2] = 2^2


 * --Pdbailey 04:22, 22 September 2006 (UTC)

no error
The standard deviation of (4,8) is 2 as stated. The variance is 2^2=4 as computed above, and the standard deviation is the square root of the variance, which is =2. It is also correct that 100% of this population {4,8} is within one standard deviation from the mean value. This extreme case does not apply to every population. Bo Jacoby 16:04, 23 September 2006 (UTC)

World Record
I do not know why there is a reference to world record values in the Interpretation and application section, this comment does not seem appropriate, if this creates a problem with the example I would propose changing the text from "distances traveled by four athletes in 2 minutes" to "distances traveled by four athletes in 3 minutes" and dropping the reference to world records (I don't recall 1000m event, 1500m yes). Dcorrin 14:58, 6 October 2006 (UTC)

Example from larger population
Just above the heading "Interpretation and application" there is a comment for the gernalization to the entire population for changing N to 3 for the example, which seems simple enough, however the sum limit is also N, but the set is size 4, so which 3 values should be taken from the set? By strict formula we would exclude x4, however nothing was stated about the organization of the set which just happens to be in ascending order. So I see that there are two formulsa, which I didn't notice on first reading, so I would propose changing the text from "convention would replace the N (or 4) here with N−1 (or 3)." to "convention would replace the 1/N (or 1/4) with 1/(N-1) (or 1/3) giving a result of 1.8257." Dcorrin 15:03, 6 October 2006 (UTC)

N-1 sentence
Does anyone get the point of this: "The necessity of the N − 1 (instead of N) can be rationalized if one realizes that the vector lies in an N − 1 dimensional space."? It is true that these N points lie in an N-1 dimensional space, but what does that have to do with the denominator? The reason for using N-1 is that the estimator becomes unbiased for the variance, which is already stated. McKay 01:49, 30 October 2006 (UTC)


 * You're right, that's not a good explanation. Also I don't like the N-1 being referred to as a "convention" in the example with actual numbers:  it's not just a "convention", there's a good reason for it (as shown in the derivation on the discussion page).  I suggest adding a plain-language explanation for the reason for the N-1 rather than N, something like this:  "If you knew the actual mean of the population, you could estimate the standard deviation of the population by seeing how much the sample values deviate from that mean.  But you don't know the mean of the population;  you only know the sample mean, and if you use that as an estimate of the population mean, it will tend to be slightly closer to each value in your sample, on average, since those are the values you calculated it from, so the standard deviation will seem to be smaller than it actually is. Using N-1 rather than N corrects for this by making your estimate of the standard deviation a little bigger again, and there is a proof that this is just the right amount, at least in the sense of making sure the estimate of variance is unbiased.  For example, suppose the population mean is 100 and you take 3 samples which happen to be 99, 110 and 111.  If you knew the population mean you could estimate the standard deviation based on how these samples differ from the mean, and get an estimate of sqrt(1-squared plus 10-squared plus 11-squared) or about 8.6.  But you don't know the population mean, so you would estimate the population mean by taking the mean of your samples, approx. 106.7.  But this number is quite a bit closer to most of your samples -- after all, it's calculated from them.  So for the standard deviation you'd only get 5.43 if you use N rather than N-1 in the formula.  In other words, the sample values tend to be closer to the sample mean, on average, than they are to the population mean."  Maybe that's too long, just take the last sentence? --Coppertwig 2006 Nov 3 16:59UT


 * I agree an "elementary" explanation (without actual proof that N-1 is the right amount) should be given. But as far as I recall my stat courses 20 years ago, it's correct (but perhaps not very helpful) to say that it's related to the dimension of a vector space.--Niels Ø 06:54, 3 November 2006 (UTC)


 * It may be correct, but only if both the writer and the reader have some clue as to what the vector space has to do with it. For example, the writer should be able to explain, if asked, what vector space he/she is talking about, how exactly that vector space is defined and how that has anything to do with the N-1 in the formula.  The typical reader should not be left baffled, either.


 * I suppose the vector space could be defined as: the set of all ordered sets of deviation from the mean that you could get.  For example from the sample (8 3 7) you get the vector (2 -3 1) which is the deviations from the sample mean of 6.  Since these vectors necessarily have their elements adding to zero, they are not all possible N-dimensional vectors but an N-1-dimensional subset of them.  It is still not clear to me how this puts an N-1 into the formula.  How about just saying "The sample values tend to be closer to the sample mean, on average, than they are to the population mean, so using N-1 rather than N corrects for this, to give an estimate of the S.D. of the whole population, rather than just the actual S.D. of the sample." --Coppertwig 2006 Nov. 3 19:18UT


 * The article is definitely lacking a layman's explanation of the N-1 term. I like Coppertwig's version, but we could make it shorter, e.g.: "Since one typically only has data on a limited sample from a population, only the sample mean is known, not the true population mean. The root-mean-square deviation from that sample mean will, by definition, tend to be smaller than from the true mean, since the sample mean is derived from and therefore optimized for the sample. Dividing by N-1 rather than by N yields a slightly larger standard deviation estimate, and it can be proven that this adjustment is in fact correct, in the sense that it produces an ubiased estimate." I'm not sure if there's a simple way of explaining why the adjustment is correct. Any thoughts? Forlornturtle 14:50, 30 July 2007 (UTC)

Correct me if I'm wrong but isn't it n-1 not N-1. —Preceding unsigned comment added by Godiva Mustang (talk • contribs) 01:12, 16 July 2008 (UTC)

Random change to Chebyshev rule?
Someone came and changed "50%" to "0%" in the Chebyshev rule; I just changed it back. I don't know the Chebyshev rule, but the last line gives a general rule in terms of k, and according to that, it should be 50% since 1.4 is approximately the square root of 2. The person who had changed it to 0% seems to have some counterproductive edits in his/her contribution list. Coppertwig 23:56, 3 November 2006 (UTC)

Use of "degrees" in the "weather" example
It seems safe to assume that the degrees in this example are °F. However I'm more used to seeing temperatures in °C, and I'm probably not alone. I suggest that we could change "degrees" to "°F" to remove the ambiguity. Should I just make this change or should I wait for someone else to respond? Hnc14 13:47, 20 December 2006 (UTC)


 * Done Richard001 02:07, 28 January 2007 (UTC)

Chebychev rules
An anonymous user experimented with adding the idea of "percent" to this line and reverted the user's own edit. I've reinstated the percent version. I think the idea of "percent" or the idea of "fraction" needs to be included in this line; otherwise, it tends to imply  that (1 - 1/k2) is an integer showing how many values meet the criterion, and some readers may waste time trying to figure out whether 1/k2 is a negative integer. Another variation could be: --Coppertwig 11:53, 9 February 2007 (UTC)
 * At least 100 * (1 - 1/k2) percent of the values are within k standard deviations from the mean.
 * At least a fraction (1 - 1/k2) of the values are within k standard deviations from the mean.

Why at least 94% for 4 stdandard deviations from the mean? 100 * (1 - 1/42) is less than 94. —Preceding unsigned comment added by 212.179.145.66 (talk) 14:06, 23 June 2008 (UTC)

population versus sample
Explaining the revert I just did: "We will show how to calculate the standard deviation of a population. Our example will use the ages of four young children: { 5, 6, 8, 9 }." The word "population" is better here, not "sample". This population could be the entire non-adult population of a particular household. The standard deviation is calculated using information about every member of the population. Someone else, who has only met three of the children, might make an estimate of the standard deviation based on information about those three. Those three would be a sample representing the entire population of four. Feel free to discuss. --Coppertwig 03:19, 13 February 2007 (UTC)

± notation
I have never seen the {4, 8} ≈ 6±2 notation used to denote the mean and standard deviation (I am a 3rd year PhD student). Could someone please provide a reference? —The preceding unsigned comment was added by Hadleywickham (talk • contribs) 02:56, 6 March 2007 (UTC).
 * See plus-minus sign. Bo Jacoby 23:19, 12 March 2007 (UTC).

I have never seen this used in statistics, and it seems likely to be confusing giving the mixture of mathematical and experimental meanings. Hadleywickham 00:39, 13 March 2007 (UTC)


 * Plus-minus does often express inaccuracy an an inaccurate way, that the true value is probably in the stated neighbourhood of the stated value, but of course it can also express inaccuracy in an accurate way. What do you mean by "mixture of mathematical and experimental meanings"? Bo Jacoby 08:52, 13 March 2007 (UTC).

The notation is misleading and should disappear from the article. The only common usage like this is in indicating measurement or calculation error bounds in the physical sciences, engineering, etc.. But then the range is usually 2 or 3 standard deviations, not 1 standard deviation. McKay 23:54, 13 March 2007 (UTC)


 * A result, which is either 4 or 8, is 6 ± 2 because 4=6&minus;2 and 8=6+2. The mean value of 4 and 8 is μ = (4+8)/2 = 6, and the standard deviation is σ = sqrt( ( (4&minus;6)^2+(8&minus;6)^2 )/2 ) = 2. The plus-minus sign does not necessarily indicate the range, which is often not known or not defined, and usually not interesting because it gives a too pessimistic view of the situation. The range of a normally distributed variable is infinite rather than 2 or 3 standard deviations. The useful measure of inaccuracy is the standard deviation. Bo Jacoby 08:56, 14 March 2007 (UTC).

I don't know how that addresses the question. Show us examples in actual use where the value after the ± is intended to be the standard deviation. A case where it is accidentally the standard deviation because there are only two possibilities will not suffice. McKay 05:31, 15 March 2007 (UTC)


 * What else should it mean ? Why should it change meaning from one to two standard deviations when there are more than two possibilities? Bo Jacoby 11:53, 15 March 2007 (UTC).

It means different things in different circumstances. Unless you can provide some evidence to the contrary, it is NOT used for the standard deviation. It is, however, sometimes used for the standard error, perhaps you are confused between the two? Hadleywickham 18:09, 15 March 2007 (UTC)


 * Often the number behind the plus-minus is not used in a precise meaning other than to provide some clue about the inaccuracy. It may be an estimated half-range, or an estimated width between some quantiles, or it may just be an educated guess. This is perfectly legitimate and sufficient for many purposes. I just gave you an example where it means the standard deviation, namely when there are two possibilities. When there is only one possibility, it is 0, which is also the standard deviation, so now you have got a second example. That it is used in many meanings is no argument why it cannot in a specific situation be used in a specific meaning. Bo Jacoby 19:30, 15 March 2007 (UTC).

The fact that it can be used to represent the standard deviation doesn't mean it's a good idea to use it in a encyclopedia style article. What would be lost if this confusing notation was removed? Hadleywickham 17:37, 16 March 2007 (UTC)

An axiomatic approach
Unless anyone has strong arguments for keeping it, I think the axiomatic approach section should be removed. It is basically a poor description (for discrete rv's only) that the mean and variance are the first two central moments of a distribution. Hadleywickham 17:14, 6 March 2007 (UTC)
 * Please improve it rather than remove it. The properties $$a+(\mu \pm \sigma ) = (a+\mu )\pm \sigma$$ and $$a\cdot (\mu\pm\sigma ) = a\cdot\mu\pm a\cdot\sigma$$ are important and not found elsewhere. Bo Jacoby 10:58, 8 March 2007 (UTC).

Well they're fairly obvious from inspection of the standard deviation function, so I think a proof should take that approach. The current "proof" doesn't make much sense to me, unless it's trying to show $\pm$ operator is commutative under addition and multiplication. I that is ill advised given that the $\pm$ notation is non-standard. Hadleywickham 14:06, 8 March 2007 (UTC)


 * The point is that the two functions mean value and standard deviation may be defined by the requirement that they satisfy some simple conditions. Not that they, as already defined, do satisfy these conditions. So this section answers the question: "Why this definition and not some other definition?" Bo Jacoby 16:53, 8 March 2007 (UTC).

I don't think it does answer the question of why these definitions of mean and standard deviation are used. To do that, I would expect a historical retrospective of who first used them and why, and why they continued to be used. It also it not clear to me that it proves the requirements for the general case. Hadleywickham 19:48, 8 March 2007 (UTC)


 * Historically, other measures of central tendency and statistical dispersion were used, such as median and range, which do not depend analytically on the values of the population. 22:10, 8 March 2007 (UTC).

And they still are. I'm not sure what your point is. The mean and standard deviation happen to have some nice properties. So do other estimators. Hadleywickham 14:07, 9 March 2007 (UTC)


 * Why, then, introduce mean and std.dev. if the other estimators have such nice properties? That's the question. Bo Jacoby 16:48, 9 March 2007 (UTC).

Because they're useful! - they summarise the first two moments of the distribution. Median and MAD are also useful. So are skew and kurtosis (summarising the 3rd and 4th moments). Hadleywickham 20:11, 9 March 2007 (UTC)


 * The definition of the standard deviation seems pedagogically unmotivated. Generalizing stepwise, two observations x = (x1,x2) have a 'central tendency' defined by the midpoint &mu; = (x1+x2)/2, and a halfrange &sigma; = |x1&minus;x2|/2. The observations are summarized by writing x ~ &mu; ± &sigma;. Now the question is: How generalize &mu; and &sigma; to more than two observations? Obviously &mu; and &sigma; satisfy the nice properties $$a+(\mu \pm \sigma ) = (a+\mu )\pm \sigma$$ and $$a\cdot (\mu\pm\sigma ) = a\cdot\mu\pm a\cdot\sigma$$ because
 * &mu;(a+x) = &mu;(a+(x1,x2)) = &mu;(a+x1,a+x2) = (a+x1+a+x2)/2 = a+(x1+x2)/2
 * = a+&mu;(x)
 * and similar calculations show that &sigma;(a+x) = &sigma;(x) and &mu;(a&middot;x) = a&middot;&mu;(x) and &sigma;(a&middot;x) = |a|&middot;&sigma;(x). The subsection on axiomatic approach derives analytical expressions which is readily generalized to more than two observations. These expressions are the mean and the standard deviation definitions, which are thus justified. Feel free to improve the section to improve clarity. Bo Jacoby 12:40, 10 March 2007 (UTC).

Except your "standard deviation" is the mean absolute deviation! I think this section should be removed or placed on another page. It is non standard and doesn't aid understanding of the standard deviation. Hadleywickham 15:12, 10 March 2007 (UTC)
 * No sir, that is not the mean absolute deviation. It is the standard deviation $$\sigma = s_0^{-1} (s_0s_2-s_1^2)^{\frac{1}{2}}$$ where s0 is the number of elements of the population, s1 is the sum and s2 is the square sum. This formula is derived rather than postulated. Bo Jacoby 14:43, 11 March 2007 (UTC).

How does that correspond to your statement that: &sigma; = |x1&minus;x2|/2 ? I still don't understand what you are trying to achieve with this section. Your notation is non-standard and the presentation is confusing. If you are trying to motivate it pedagogically, you will need to simplify the presentation dramatically. I think this section should be removed until you can do that. Hadleywickham 14:45, 12 March 2007 (UTC)


 * When there are only two observations, the half-range and the standard deviation is the same number. The number of observations is s0 = 2 . The sum is s1 = x1 + x2, and the sum of squares is s2 = x12 + x22. So
 * $$\sigma = s_0^{-1} (s_0s_2-s_1^2)^{\frac{1}{2}} = 2^{-1} (2(x_1^2+x_2^2)-(x_1+x_2)^2)^{\frac{1}{2}} = 2^{-1} (2x_1^2+2x_2^2-(x_1^2+x_2^2+2x_1x_2))^{\frac{1}{2}} = 2^{-1} (x_1^2+x_2^2-2x_1x_2)^{\frac{1}{2}} = 2^{-1} |x_1-x_2| $$.
 * The point is that the formula for the standard deviation is the multi-observation generalization of the two-observation formula &sigma; = |x1&minus;x2|/2. If this is new to you, it might be new and useful to other readers too. Bo Jacoby 17:18, 12 March 2007 (UTC).

I don't think the point of an encyclopedia entry is to introduce "new and interesting" findings. I see that the half-range and standard devation are the same for two numbers - I don't understand why you need to introduce the half range. Hadleywickham 04:17, 21 March 2007 (UTC)
 * I agree, but this is not new finding. The naive meaning of a±b is of course the two-element set (or multiset, if b=0): {a+b, a&minus;b}. The analytical generalization of this a±b to describe more than two numbers is that a is the mean and b is the standard deviation. One non-analytical generalization is that a is the mid-range and b is the half-range. Bo Jacoby 13:18, 21 March 2007 (UTC).

It is time to get serious about this ± stuff. Bo Jacoby, please identify a reliable published source where this notation is used in the manner you want us to use it here. That's what the rules require and at the moment they are not being obeyed. Arguments about whether this notation is wonderful or not or should be used or not are inadmissable. McKay 05:50, 17 April 2007 (UTC)


 * As advised, this material is now gone. Don't reinsert it without citation of a reliable published source. McKay 05:38, 23 April 2007 (UTC)

Quote from plus-minus sign:
 * The use of &plusmn; for an approximation is most commonly encountered for presenting the numerical value of a quantity together with its tolerance or its statistical margin of error. For example, "5.7 &plusmn; 0.2" denotes a quantity that is specified or estimated to be within 0.2 units of 5.7; it may be anywhere in the range from 5.7 &minus; 0.2 to 5.7 + 0.2. More precisely, in scientific usage it usually comes with a probability of being within the interval, usually that of 2 standard deviations, or 95.4%.

This has no citation of a reliable published source either. Go ahead and clean it up. Bo Jacoby 22:03, 23 April 2007 (UTC).

Suggested wording
Current wording in introduction: "For a population, the standard deviation can be estimated by a modified standard deviation (s) of a sample."

My suggested wording: "The standard deviation of a population can be estimated by a modified standard deviation (s) of a sample of that population."


 * isn't that the std error? - Ik

Also: population standard deviation needs to be defined. This article only defines standard deviation for a random variable. A klunky way to fix it: after this sentence "If the random variable X takes on the values $$x_1,\cdots,x_N$$ (which are real numbers) with equal probability, then its standard deviation can be computed as follows."  insert "(Or, if those values are all the members of a population, the standard deviation of the population is also calculated in the same way.)". Alternatively, insert a definition at the very beginning of the section on estimating population standard deviation. --Coppertwig 16:53, 13 June 2007 (UTC)
 * Sorry -- maybe it's already adequately defined in the introduction. --Coppertwig 16:58, 13 June 2007 (UTC)

Numerically unstable
The second formula given for the standard deviation (the one-pass formula) is numerically unstable. The problem with it is that it takes the difference of two positive numbers that might be very close together. If the positive numbers themselves aren't exact but have been rounded off somehow, then taking their difference can magnify this rounding error. This is called cancellation. In fact, sometimes that formula can give a negative answer! Here are two references:

Tony F. Chan and Gene H. Golub and Randall J. LeVeque, "Algorithms for computing the sample variance: Analysis and recommendations", "The American Statistician", 37:3 (242-247), 1983.

Nicholas J. Higham, "Accuracy and Stability of Numerical Algorithms", 2nd ed., SIAM Press, 2002.

Higham gives an example of a small data set that breaks the one-pass algorithm but doesn't break the two-pass algorithm (which is much more stable). Let's say we are working with IEEE 754 single-precision ("float" in C) and have three numbers: 10000, 10001, and 10002. Then the two-pass formula gives a sample variance of 1.0, which is exactly right, but the one-pass formula gives you 0, which has a relative error of 100%.

Higham discusses a one-pass formula that gives better accuracy.

Hilbertastronaut 06:30, 26 August 2007 (UTC)

Incorrect Equation
The equation that follows "The above expression can also be replaced with", can be interpreted incorrectly very easily. As written it appears as:

sqrt( (1/(N-1) * sum_of_squares - N * mean^2 )

The derivation following this equation shows how the sum of (x_i - mean)^2 is equivalent to (sum_of_squares - N * mean^2). Using the plain old rules of arithmetic, multiplication has a higher precedence than subtraction. So that formula is easily interpreted as:

sqrt( ( 1/(N-1) * sum_of_squares )  -   N * mean^2 )

But the derivation shows that it should instead be interpreted as:

sqrt( 1/(N-1) *  ( sum_of_squares - N * mean^2 ) )

It is also possible that I am being dense, and if so, I'd like to understand where I've gone wrong.

MarkSMann 14:41, 8 October 2007 (UTC)

Continuous case
How about including the formula for the standard deviation of a continuous random variable.

Where --Egriffin 15:59, 31 October 2007 (UTC)
 * $$\sigma = \sqrt{\int (x-\mu)^2 \, p(x) \, dx}$$
 * $$\mu = \int x \, p(x) \, dx$$

-Good addition, a proof would be nice too. —Preceding unsigned comment added by 198.96.36.48 (talk) 01:48, 4 January 2008 (UTC)

Picture is butt
This picture is butt. The blue curve is supposed to be a random variable? That curve ain't random, dawgs. —Preceding unsigned comment added by 141.154.109.226 (talk) 18:57, 31 October 2007 (UTC)

I agree. The picture is butt. :) Why not have an image of the normal curve with tic marks indicating mu + 1 sigma, mu + 2 sigma, etc.  The normal curve is a commonly used example for teaching the concept of standard deviations, and is strongly associated with the concept of stddev for most students and adults.

Could someone find such an image? 209.131.62.113 (talk) 21:01, 17 April 2008 (UTC)

Geometric Interpretation
It's incorrect. By the end it states that the distance between P and R is $$\sigma\sqrt{N}$$, and it's not. The correct expression is $$\sigma\sqrt{N-1}$$.

I guess he/she meant the sample's standard deviation, not the population's. —Preceding unsigned comment added by 69.243.109.210 (talk) 07:03, 25 November 2007 (UTC)

± notation
Sorry to revive a previous discussion (do I readd as I just did or should I have added there and it would go to the bottom?), but people use the ± sign typically for stdev, although it is not a rule written in stone, it is used, so it should be mentioned at least, as it is not wrong, just a bit ambigous and technical. I doubt ppl will say "I have lots of pencils, they are 10 ± 2 cm long" after reading it anyway! --Squidonius (talk) 17:57, 29 November 2007 (UTC) (just realised: pencils start all the same size: so it will actually be have a negative skew).Squidonius (talk) 17:57, 29 November 2007 (UTC)

Is the Finance section wrong with it's example conclusion?
"For example, you have a choice between two stocks: Stock A historically returns 5% with a standard deviation of 10%, while Stock B returns 6% and carries a standard deviation of 20%. On the basis of risk and return, an investor may decide that Stock A is the better choice, because Stock B's additional percentage point of return generated (an additional 20% in dollar terms) is not worth double the degree of risk associated with Stock A. Stock B is likely to fall short of the initial investment more often than Stock A under the same circumstances..."

The example qualifies the Standard Deviation of B as a Risk, which has a negative connotation. This is the reasoning behind the decision that stock B "is not worth". This is wrong as the higher deviation means that the stock might produce 20% lower results but also 20% higher. Where as stock A can only produce 10% more, but also only 10% less. The only advantage stock A has over stock B is in it's stability, NOT in it's risk. The investor would only choose stock A if it preferred stability over higher returns. The wording should be changed.

It goes on to state "Stock B is likely to fall short of the initial investment more often than Stock A under the same circumstances... ". The conclusion that stock B will "fall short of the initial investment more often" is only half the story, because higher Standard Deviation means it will also fall OVER the initial investment estimate by the same rate. --86.124.228.185 (talk) 17:28, 3 January 2008 (UTC)

Refer: 3 Interpretation and application_3.1.2 Sports_Standard deviation; a tool for betting
I am proposing to add the following paragraph in this section

'''The statistics of Team A have lower standard deviation than the statistics of Team B. If this was the only piece of information given to me, I would back Team A to win. Reason: Team A is consistent. Even if they have an off day, they will only perform a little worse than what they usually do. On the other hand, Team B will be setting records they don't want to set on their off day!'''

Finance example
Quite apart from other possible outright errors in this example, as suggested in someone else's earlier remarks, I don't feel the general drift of this example -- that a high standard deviation in a stock's price in and of itself constitutes a sensible measure of the "overall risk of the asset" -- should be left as is. If this example is to be preserved at all, it should be rewritten to acknowledge that many factors can contribute to the overall risk of an asset. Also, it should be rewritten to acknowledge the fact that many successful investors disagree with the essential premise that a high standard deviation is in and of itself _any_ significant measure of risk, for stocks that meet sufficient 'deep-value' criteria. Warren Buffett has discussed this topic in one or more of his annual letters to BRK shareholders, if I recall correctly. I don't plan to edit the article myself along those lines, but I did remove the closing sentence which was clearly non-NPOV in expression: it ordered the reader to use this metric! Publius3 (talk) 07:43, 22 April 2008 (UTC)

Don't like the diagram
The diagram ("Given a random variable (in blue)...") seems more confusing than useful to me. It's nicely done, but I don't think it helps. Billgordon1099 (talk) 19:41, 12 June 2008 (UTC)


 * I agree. Perhaps I'll try to knock up something clearer this weekend. —Forlornturtle (talk) 09:55, 13 June 2008 (UTC)


 * Done —Forlornturtle (talk) 17:51, 15 June 2008 (UTC)

The old sample vs. population SD thing...
Given the number of times the Example has had to be set back to the correct calculations for the population SD, and given that the hidden comments in the text go unnoticed, would it be best to rethink this bit of presentation? Perhaps have two columns in some form of table, one giving calculations for the population SD and the other those for the sample SD. There could also be a bit more to point out that there are two different things. Melcombe (talk) 09:04, 8 July 2008 (UTC)

another formula
$$\sigma = \sqrt {{(x_1-\bar {x})^2+(x_2-\bar {x})^2+...+(x_n-\bar {x})^2} \over {n}}$$

CommonMaster (talk) 23:49, 5 January 2009 (UTC)


 * Hasn't that formula been in this article for a long time already? Michael Hardy (talk) 23:34, 10 January 2009 (UTC)

This Article Sucks
This is Wikipedia, not a technical manual. Even Albert Einstein argued that the biggest problem with scientific and mathematical documents written for the public is that they are too Jargony. Will somebody who majored in Statistics but isn't a complete robot rewrite this? —Preceding unsigned comment added by MiriamKnight (talk • contribs) 03:48, 6 January 2009 (UTC)


 * I will look at this soon.
 * But I don't understand why people who express complaints of this kind always have to exaggerate so much. This article in its present form can be understood by mathematically inclined high-school pupils. Michael Hardy (talk) 22:39, 10 January 2009 (UTC)
 * But I don't understand why people who express complaints of this kind always have to exaggerate so much. This article in its present form can be understood by mathematically inclined high-school pupils. Michael Hardy (talk) 22:39, 10 January 2009 (UTC)


 * That HS pupil, MiriamKnigyt, is angry Wikipedia won't do his homework for him. Don't take it personally --Kvng (talk) 12:18, 31 March 2009 (UTC)


 * I concur with Michael Hardy; Sure, I have the mathematical aptitude of an engineer with a Master's degree and have no problem understanding the contents of this article. Someone with a poor mathematical background may have trouble understanding most of this, but then again they would have similar issues with other mathematical definitions in any encyclopaedia.  This is not 'Wikipedia for Dummies', however, it might be useful to present a very 'dumbed down' version at the beginning of the article or possibly offer a simplistic entry and this *complete* entry for those more inclined? Bsodmike (talk) 12:38, 24 January 2009 (UTC)


 * All right, I tried to improve the introduction and first section. I'm not a statistician, so ideally someone who understands this material better would check what I've written to make sure it's roughly accurate.  Also, the first section could use some more work, since the two examples are probably a bit repetitive, and the formula for the standard deviation of a discrete data set is now given twice. Jim (talk) 18:21, 15 February 2009 (UTC)


 * Such detail is useful even for non-mathematically oriented users. I come from a not-too-strong mathematical background (up to multi-variable calculus) and find this article usable and understandable in many parts.  Moreover, having it all there makes it more flexible for understanding the various uses of standard deviation (and there are many--as you can see above--and many people clamoring for understanding).  Certainly could use some touchups, but keep the detail! The introduction must be better, as I found it very helpful for understanding SD. Yizzerin (talk) 17:11, 16 February 2009 (UTC)

BIG MISTAKE ON STANDARD DEVIATION BASIC EXAMPLE 1
There are 8 elements in the example, so everything in that end square root should be divided by 7, not 8. I checked a stats book for this. —Preceding unsigned comment added by 98.212.203.130 (talk) 02:16, 2 April 2009 (UTC)


 * It is correct. You are confusing standard deviation and sample standard deviation.  The former is when you have the whole population, the latter is when you have a sample and want to estimate the standard deviation of the whole population.  These are different things.  McKay (talk) 07:09, 2 April 2009 (UTC)

Sigh........ Someone shows up here every few weeks and claims that is a mistake. Sometimes they "correct" it in the article. Is this an argument against teaching statistics to idiots? Michael Hardy (talk) 14:59, 2 April 2009 (UTC)

...seriously: Should we include some comment within the article on this? Michael Hardy (talk) 15:02, 2 April 2009 (UTC)


 * I tried..... McKay (talk) 22:21, 2 April 2009 (UTC)

"Data set"
I reverted the change from "population" to "data set". A "data set" is usually derived from a sample, and that's when you use n &minus; 1 rather than n in the denominator. As we see above, lots of people seem confused enough about that point already. Michael Hardy (talk) 00:55, 3 April 2009 (UTC)


 * I see your point. My concern was that "population" in this sense is a technical usage that readers might not be familiar with.  I don't have a better suggestion, though.  McKay (talk) 01:15, 3 April 2009 (UTC)

RMS
This is misleading since the average deviation from the mean must always be zero. I would expect even a beginner in statistics or arithmetic to be aware of that ! -

"It may be thought of as the average difference of the scores from the mean of distribution, how far they are away from the mean."

The correct comparison is "...RMS difference of the scores from the mean of distribution". The RMS (root mean square) function is familiar to many people from its use in electrical engineering, so is an appropriate, accurate and well known concept.

Andrew Smith —Preceding unsigned comment added by 82.32.50.77 (talk) 08:39, 18 October 2009 (UTC)

Error
How come both 73 and 76 inches are 190 centimeters?

73 inches (190 cm) inches), while almost all men (about 95%) have a height within 6 inches of the mean (64 inches (160 cm) – 76 inches (190 cm) —Preceding unsigned comment added by 60.240.91.73 (talk) 13:52, 20 June 2009 (UTC)


 * That was done be the "convert" template. I've gotten rid of that template and corrected the arithmetic. Michael Hardy (talk) 16:57, 20 June 2009 (UTC)

Estimating population standard deviation with interquartile range
The short subsection Standard deviation was added by an IP editor on 25 May 2009. I'm not sure how useful it is, or whether its final claim is correct. The factor of 1.35 surely assumes a normal distribution. The stated (but unsourced) asymptotic relative efficiency of 0.37 appears to be assuming a normal distribution too. I can't spot any justification in the reference given (10.1016/j.jspi.2005.08.028; requires subscription) for the last claim that IQR/1.35 can be more efficient than the sample SD as an estimator of the population SD when the data has thick tails. Maybe an estimator based on some multiple of the IQR could be more efficient than the sample SD for a given known distribution with thick tails, but the factor of 1.35 must surely depend on the particular distribution. And if you know the distribution it would be more efficient to fit its parameters by maximum likelihood and use them to calculate the population SD. Or am I missing something? If not, i'm tempted to remove this section completely as it seems of no practical value. Qwfp (talk) 13:18, 21 June 2009 (UTC)
 * I remember using this once when I was working by hand, with no computer at hand and a calculator capable only of the simplest arithmetic, and I needed a quick result. (I don't remember if 1.35 is the right number, but that was easy enough to work out by hand.)  So yes: maximum likelihood is more accurate, but not always easy to do fast. Michael Hardy (talk) 05:15, 3 August 2009 (UTC)
 * DAAG implies 1.33. See here. But they could be estimating 1.35 as they are approaching it from the other direction. -- Avi (talk) 06:08, 3 August 2009 (UTC)

The actual number should be 1.349, from a table of the Standard Normal, no? -- Avi (talk) 06:16, 3 August 2009 (UTC) (1.34897997243323 from Excel solver). -- Avi (talk) 06:21, 3 August 2009 (UTC)


 * The more I look at this the more I'm convinced Qwfp's solution is the right one. The 1.35/1.349 estimate for the normal distribution (AKA qnorm(0.75)-qnorm(0.25)) is, as the writer has admitted, less efficient than the usual sample standard deviation estimate, and Qwfp's argument for why the IQR is rarely best seems 100% right. If anyone wants to improve this, then please do, but if nothing has changed in a couple weeks I'll delete the section, and I hope no one is offended. --Eb Oesch (talk) 21:10, 11 January 2010 (UTC)

Unbiased SD estimate
The following comment was added to the article by . --MarkSweep &#x270D; 13:18, 14 September 2005 (UTC)

Comment: I think the formula for unbiased estimate of the standard deviation is:

s = \sqrt{\frac{1}{n-1.5} \sum_{i=1}^n (x_i - \overline{x})^2} .$$ the (n-1) is unbiased for variance, I don't know how to derive this, but a statistician should be able to do so.

This is an OK web site.

http://www.statistics-help-online.com/node55.html

Basically small samples tend to be distributed closer to the mean. To get a better estimate of the population std dev the sample std dev is scaled down.


 * Where does that 1.5 in the denominator come from? I found http://www.math.niu.edu/~rusin/known-math/01_incoming/update_stat using it without an explanation. Is it only an educated guess? (I hope it isn't a statistician's joke on unsuspecting newbies) I came here in search for an answer why a paper even used 2 in the denominator! Stevemiller (talk) 02:50, 26 February 2008 (UTC)


 * If an unbiased estimate of SD is required, see Unbiased estimation of standard deviation but note that it only applies for the normal distribution. The formulae disagree with the "1.5" correction. Melcombe (talk) 15:22, 14 October 2008 (UTC)


 * The denominator is certainly 1... no idea where the 1.5 comes from but as mentioned before me... the standard form is for a normal distribution(which is accurate with large n, see central limit theorem), and the denominator should be n-1... —Preceding unsigned comment added by 67.159.74.87 (talk) 19:55, 3 March 2009 (UTC)
 * The comment above is confused. Certainly n &minus; 1 gives an unbiased estimator of the variance.  Not of the standard deviation.  The square root of the variance is the SD, but the square root of an unbiased estimator of the variance is NOT an unbiased estimator of the SD. Michael Hardy (talk) 20:40, 3 March 2009 (UTC)


 * Well, if you consider an estimator of the form
 * $$s = \sqrt{\frac{1}{n-\kappa} \sum_{i=1}^n (x_i - \bar{x})^2}$$
 * with unknown κ, then by the reasoning from χ distribution, expected value of this estimator is
 * $$\operatorname{E}s = \sigma\cdot\sqrt{\frac{2}{n-\kappa}}\frac{\Gamma(n/2)}{\Gamma\big((n-1)/2\big)}$$
 * Thus the estimator is unbiased when
 * $$\kappa = n - 2\frac{\Gamma(n/2)^2}{\Gamma\big((n-1)/2\big)^2},$$
 * an expression which quickly converges to a constant κ = 1.5. For example already at n=5 the κ is ≈ 1.47. In general the “optimal” value of κ is equal to $$\scriptstyle 1.5 + \frac{\text{excess kurtosis}}{4}$$. ...  st pasha  » talk » 14:18, 2 August 2009 (UTC)


 * If there is a good citation for this result, it would be good to include it in the Unbiased estimation of standard deviation article. Melcombe (talk) 09:21, 22 October 2009 (UTC)

Error in article
I think that the equation in this line is incorrect. The equation for a mean standard deviation from the above is s=Sum(x − m)2 / n − 1 so s=mean standard deviation x is the number in the set m= mean of the number set n= the number of numbers in the set. 128.84.217.56 (talk) 03:02, 19 October 2009 (UTC)

Usage of the word sample
in some points the word sample instead of observations. e.g.: "where N is the number of samples used to sample the mean" from the section "Relationship between standard deviation and mean". I would suggest changing this to: where $$N$$ ist the samplesize. or: where $$N$$ is the size of the sample used to estimate the mean. or: the number of observations in the sample used to ... this can be very confusing since the idea is to estimate the mean over several samples from just one observed sample. Ppardal (talk) 15:20, 21 October 2009 (UTC)

Limitations vs Variance article
In limitations section there is a claim, that it is impossible to compute standard deviation of whole population, given the standard deviation of this population subgroups.

In Variance article though, in section 3, Properties, there is a property of variance: "Suppose that the observations can be partitioned into equal-sized subgroups according to some second variable. Then the variance of the total group is equal to the mean of the variances of the subgroups plus the variance of the means of the subgroups"

And later: "In a more general case, if the subgroups have unequal sizes, then they must be weighted proportionally to their size in the computations of the means and variances. The formula is also valid with more than two groups, and even if the grouping variable is continuous."

And if standard deviation equals sqrt(variance), then it is possible to compute standard deviation based on subgroups. Unless Variance article has some errors there.

212.160.172.70 (talk) 15:42, 19 January 2010 (UTC)

difference between 'mean difference from the mean' and standard deviation
can this be covered by the article? —Preceding unsigned comment added by 212.54.222.127 (talk) 14:55, 3 May 2010 (UTC)
 * The mean difference from the mean is zero. Perhaps you're thinking of mean absolute deviation? That is mentioned at the end of the 'Worked example' section. Qwfp (talk) 15:32, 3 May 2010 (UTC)

Tampering
I have removed the claim 'It helps detect tampering of data.' If it's true it needs justification. If anyone reinstates it can they please write 'tampering with' not 'tampering of'. 81.131.57.101 (talk) 10:07, 24 May 2010 (UTC)

Height as an example
The observed distribution of heights seems like a good example to illustrate standard deviation. However it suffers because of the apparent need to use two sets of units. I think it would work better without this constant need to skip over parenthetical additions. Obviously, the next point of discussion is which set of units to employ. Presumably this has been and is being discussed elsewhere within Wikipedia with no obvious resolution. My opinion is that since most of the world uses SI and metric prefixes and the few countries stubbornly clinging to other systems officially use SI and metric prefixes in their scientific communities it should be centimetres. Obviously there is going to be strong disagreement from a part of the community on this issue. Without looking at the history of this page I would guess that there have been battles over the units in the past. Still, I think the article would be better with an example using one unit or no units. —Preceding unsigned comment added by 142.104.154.108 (talk) 21:32, 18 August 2010 (UTC)

Chebyshev's inequality
The sentence in this section on the Standard deviation page makes no grammatical sense. It appears a line was edited out or something of that nature. Given that I don't know much about this topic (hence, my reviewing the Wiki), I'm requesting the correction here.

From the article:
 * "Chebyshev's inequality ensures, for all distributions for which the standard deviation is defined, the within a number of standard deviations is at least that as follows." —Preceding unsigned comment added by 65.241.18.25 (talk) 13:57, 28 August 2010 (UTC)

Terminology
The article uses the expression "standard deviation of the sample" versus "sample standard deviation" and implies that this is standard terminology. I am a professional statistician and have never seen this. Can a citation be found for this usage? From my point of view I belm²ieve it is not common, but, of course, I cannot prove a negative. I just am worried that the term is not as unambiguous as the author may think that it is. Doctorambient (talk) 20:43, 11 September 2010 (UTC) -'''-82.178.151.228 (talk) 09:29, 7 December 2010 (UTC)ضفقذشä

standard error (deviation) of the standard deviation
Can someone include a section on estimating the standard error of the standard deviation estimate? I found an approximate formula for large N (>~16): STDDEV(sigma) = sigma / sqrt(2N) at http://davidmlane.com/hyperstat/A19196.html, but no explanation or derivation. Someone with more knowledge than I should put this section in, and with more detail. A citation of an appropriate textbook is needed.Milliemchi (talk) 01:31, 30 December 2010 (UTC)

Means of Two Sample Populations Shown in Graphic
Sorry; I'm confused re the mean of the red population shown. Surely it is less than 100? Stickeebeek (talk) 02:21, 3 February 2011 (UTC)

very confusing
"standard deviation of the sample" vs. "sample standard deviation" to refer to different quantities are painfully confusing. Are those names correct? Can we change the names to something less confusing or emphasize strongly on the subtle difference? What is being refered to here is the Standard Error of the mean(SEM)which is the sample SD divided by the square-root of the sample size. It is the standard deviation of the distribution of sample means. —Preceding unsigned comment added by 152.3.182.116 (talk) 18:46, 11 February 2011 (UTC)

Assessment comment
Substituted at 21:16, 4 May 2016 (UTC)