Talk:68–95–99.7 rule

Merging or keeping as own article
Shouldn't this just redirect to normal distribution? Richard001 06:30, 11 April 2007 (UTC)


 * No. This page gives a practical dervied result of a normal distribution. Its also much more directly useful to most people than the math-heavy normal distribution article (which is linked to). --Nantonos (talk) 13:16, 29 June 2008 (UTC)


 * However, the only useful information are the percentages, which you can work out anyway from the Normal Distribution article. It's pretty misleading to wrap it up in an 'empirical rule', since it's not *empirical* by that definition. Darktachyon (talk) 09:24, 22 October 2010 (UTC)

It seems like the mention of rejecting the normality data based on a"6σ" event occurring more than once in 1.5 million years is subject to the Gambler's fallacy. The probability doesn't determine how often it happens over time, but rather how likely the event is to occur for each specific instance, which in the case of the example, is a day. The expected time between events may be 1.5 million years, but this says nothing about time between individual events. You can't outright reject a coin's fairness if it lands on heads ten times in a row, so likewise you can't really reject the normality of the data based on only a few samples, regardless of the magnitude of probabilities involved. Of course this doesn't rule out that the initial prediction is incorrect, but it requires more samples than 2 or 3 to make a statistically significant conclusion. --Styrofoamboots (talk) 05:06, 6 May 2009 (UTC)

"A description and illustration using Java Applets by Balasubramanian Narasimhan"
Appears to be an error in the example used in "A description and illustration using Java Applets by Balasubramanian Narasimhan"

Exerpt:

An Example

Let us apply the Empirical Rule to Example 1.17 from Moore and McCabe.

The distribution of heights of American women aged 18 to 24 is approximately normally distributed with mean 65.5 inches and standard deviation 2.5 inches. From the above rule, it follows that

68% of these American women have heights between 65.5 - 2.5 and 65.5 + 2.5 inches, or between 63 and 68 inches,

95% of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 63 and 68 inches.

It should read:

95% of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 60.5 and 70.5 inches. --Hallbg (talk) 17:15, 21 July 2009 (UTC)
 * The demo linked to was reported earlier as not working, but it worked OK in Firefox 3.5.5 today for me. --Kay Dekker (talk) 21:14, 14 November 2009 (UTC)

Why 68, 95, and 99.7
Where do the figures come from. What is the reasoning behind using these values? — Preceding unsigned comment added by 92.27.94.107 (talk) 10:32, 8 March 2012 (UTC)

Yeah, why not 69.2, or 96? Is this rule used to determine a std deviation, or does a std dev always follow this rule? Mang (talk) 04:04, 2 December 2012 (UTC)

three-sigma rule is a more elegant name
From my stat courses I rarely heard professors saying "68–95–99.7 rule". Don't you think "three-sigma rule" is a more elegant title for this page? 128.97.77.169 (talk) 02:25, 1 August 2014 (UTC)
 * I agree the title is unintuitive, as this is the page I have to look up when I want to know the value of the error function corresponding to "n" sigma (we list 1 to 7 sigma in half-sigma steps). I often need this for "back-of-the-envelope" estimates when I get into arguments, and it is hard to remember what exactly I need to google to get to this page directly (essentially, the page title is the information you are actually trying to look up, so for 1 to 3 sigma if you remembered the page title, you wouldn't need to look it up in the first place).
 * perhaps it would be more "wikilike" to choose a boring but straightforward title like "table of error-function values" and then do a huge table in steps of .1 or so,  giving not just erf(x) but also the more useful information we have here, i.e. erf(x*2^-.5) and (1-erf(x*2^.-5)^-1.
 * as for "three sigma rule", idk, this sounds as if it was a rule dealing with a 3-sigma case, while "68-95-99.7" is actually a list of cases of n sigma, with a modest n=1..3. The page title actually helped me remember "68-95-99.7" by now, but as 4 or 5 sigma also occur in everyday considerations, I keep having to look it up anyway. Right, I could just try to remember more values, but then I'd just keep coming back to check half-sigma steps etc.
 * --dab (𒁳) 10:15, 5 January 2015 (UTC)
 * erm what the sigma… 202.28.251.61 (talk) 09:57, 16 July 2024 (UTC)

I just noted that "three-sigma rule" may have a more subtle meaning besides simply "let's use 99.7% as 'near-certain'" -- I found the expression in the title of a paywalled paper, which has the abstract:
 * For random variables with a unimodal Legesgue [sic] density, the 3[sgrave] rule is proved by elementary calculus. It emerges as a special case of the Vysochanskiĭ-Petunin inequality, which in turn is based on the Gauss inequality.

Whatever this is about, it goes beyond assumption of normal distribution. --dab (𒁳) 10:46, 5 January 2015 (UTC)

I think I see what is going on, it's a case of an actual theorem trickling down into popular usage as a heuristic, although I still have to figure out the specifics of the actual theorem (apparently you need a bunch of assumptions ("variables with a unimodal Legesgue density") and it needs to be explained what these assumptions represent and how plausible they are for everyday non-normally distributed statistics. It seems this is a result of the early 1990s, so on one hand it cannot be extremely trivial, but on the other hand you should expect that there are pedestrian explanations in textbooks by now. --dab (𒁳) 10:59, 5 January 2015 (UTC)

German reference missing
There is no link to the German chapter that talks about the same thing: https://de.wikipedia.org/wiki/Standardabweichung#Streuintervalle I already tried to get it linked, but the script seems not to provide a chapter-specific link. If that's true, there is no way to link it as the main article talks about standard deviation, which is already linked. Anybody has an idea how to solve this or should it just be ignored? Moedn (talk) 18:33, 13 February 2016 (UTC)

Contrasting to non-normal data
Hi, new here, so apologies in advance for bad form. From the intro: The "three sigma rule of thumb" is related to a result also known as the three-sigma rule, which states that even for non-normally distributed variables, at least 98% of cases should fall within properly-calculated three-sigma intervals. [2] This seems in contradiction to the Chebyshev's inequality page which gives the bound as 88.9%? Should this be modified to "most distribution's" with a reference to http://www.qualitydigest.com/inside/twitter-ed/are-you-sure-we-don-t-need-normally-distributed-data.html ? Doc-aj (talk) 16:28, 11 March 2016 (UTC)

A compliment
Hi, also new here, just wanted to compliment the wikigeniuses who gave the Table of Numerical Values the "Approx. freq. for daily event" section. Should anyone ever try to argue that it doesn't belong on Wikipedia (math = original research?), here's my vote, as a user, that this was helpful, informative, and exactly the sort of information that belongs in Cyclopedies.

Oh, and I see that there was discussion almost a decade ago complaining about the title. I like the title.Gsnerd (talk) 01:05, 15 June 2016 (UTC)

Black Swan Paragrah
Is the paragraph about "The Black Swan" necessary? I'm not sure Taleb counts as a credible enough statistician to be mentioned here; even if he were, he might have a better place on the page for the normal distribution.

But on top of that, the un-cited critique of his ideas is, as far as I can tell, flat-out wrong. The gambler's fallacy doesn't say anything about high-probability events not undermining confidence in a theory. Second, with a sufficiently high prior against something, a single event can completely undermine a hypothesis. If every person on Earth hovered off the ground for twenty minutes tomorrow, you wouldn't say, "Well that's extremely unlikely according to the current laws of physics, but it only happened once. We'll revise our theories if it happens a few more times". You would start asking questions immediately.

I added a "citation needed" tag but I think the whole paragraph should be deleted. — Preceding unsigned comment added by Sam Jaques (talk • contribs) 22:19, 11 November 2016 (UTC)

Pr
What is the Pr in the equation? It would be helpful if someone would describe that. 198.213.89.136 (talk) 20:24, 17 July 2017 (UTC)
 * Done. Jamplevia (talk) 17:40, 8 January 2022 (UTC)

"histogram-based illustration"
I'm sorry, I don't see how this is an improvement. The values discussed are those of a normal distribution, so why would you show a sample of an "approximately normal distributed set", when all that does is give you random deviations forcing you to insert "approximately" in every statement? Just say these are the expectation values if the distribution is normal. --dab (𒁳) 05:32, 30 August 2018 (UTC)

Thanks
Very useful article when you need these. One thing, though. I would have made the "outside the range" column for each end. That is, half the probability and twice the number before it happens, as it seems that more often one needs to know that. But then again, it isn't so hard to multiply/divide by 2. Gah4 (talk) 11:09, 11 July 2021 (UTC)

Usage of sigma
I cannot find any article which covers usage of sigma probability. For example, five-sigma is required to prove the existence of a new particle such as the Higgs boson, whereas two-sigma is the standard in radiocarbon dating. Can anyone point me to an article covering this aspect? If not, that is a major gap in Wikipedia's coverage. Dudley Miles (talk) 11:02, 19 November 2022 (UTC)

Table of Numerical Values
This seems to be original research. It is, imho, poorly done, but useful. The assumptions underlying it should be declared. The term "expected" seems to have a technical meaning, rather than the common (English) one. Average is a better understood term. The 15 decimal digit precision is both absurd and wrong. There is no Normal population of which any random variable is Normal to 15 digit accuracy. There is no clarification that the "outside the range" is for BOTH tails (i.e. two-tailed), while many situations will be concerned with the % below X-sigma of the mean or above X-sigma of the mean, BUT NOT BOTH. I also believe the point that if an outlier is "expected" to occur once every "time period" that it will certainly (given a random sampling) occur more often some periods and less often others, and only average that frequency.174.130.71.156 (talk) 15:27, 29 January 2023 (UTC)
 * It is probably copied from an unacknowledged source rather than being original research, but not properly explained. The more I look at it the less I understand it. It is also way beyond the scope of the article, which is about sigma 1 to 3. I would delete. Dudley Miles (talk) 17:51, 29 January 2023 (UTC)
 * Expected, as in expectation value, is a common term in statistics. It may or may not correspond to some users' meaning. I suspect that the table is within WP:CALC. As not so many calculators have erf, it is nice to have a quick reference.  Though I agree that 15 digits is unneeded.  It is pretty obvious that outside means two-tailed, as the first one is over 60%. But also is the common meaning of outside. I suspect I would rather have one-sided, though. Given the popularity of six-sigma, we should go at least that far.  I might not mind intermediate values, though.
 * Seems that I added this a year ago, but forgot to sign. It should be January 2023.  In any case, it would be nice to have one sided, or one tailed values. Gah4 (talk) 20:54, 22 January 2024 (UTC)
 * Expected, as in expectation value, is a common term in statistics. It may or may not correspond to some users' meaning. I suspect that the table is within WP:CALC. As not so many calculators have erf, it is nice to have a quick reference.  Though I agree that 15 digits is unneeded.  It is pretty obvious that outside means two-tailed, as the first one is over 60%. But also is the common meaning of outside. I suspect I would rather have one-sided, though. Given the popularity of six-sigma, we should go at least that far.  I might not mind intermediate values, though.
 * Seems that I added this a year ago, but forgot to sign. It should be January 2023.  In any case, it would be nice to have one sided, or one tailed values. Gah4 (talk) 20:54, 22 January 2024 (UTC)

asymptotic limit
For larger than the 8 sigma and of the table, there should be a simple asymptotic limit. I might find the reference, but if someone else finds it first, they can add it. Gah4 (talk) 20:56, 22 January 2024 (UTC)