Talk:Confidence interval/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2 Archive 3 Archive 4

A question

Can any body tell me why the interval estimation better than point estimation?(shamsoon) —Preceding unsigned comment added by Shamsoon (talkcontribs) 14:53, 23 February 2008 (UTC)

inserted a little about this point in intro to article Melcombe (talk) 10:00, 26 February 2008 (UTC)

Disputed point

This "dispute" relates to the section "How to understand confidence intervals" and was put in by "Hubbardaie" (28 February 2008)

I thought I would start a section for discussing this.

Firstly there is the general question about including comparisons of frequentist and Bayesian approaches in this article when they might be better off as articles specifically for this comparison. I would prefer it not appearing in this article. I don't know if the same points are included in the existing frequentist vs Bayesian pages. Melcombe (talk) 16:46, 28 February 2008 (UTC)

Secondly, it seems that some of the arguments here may be essentially the same as that used for the third type of statistical inference -- fiducial inference -- which isn't mentioned as such. So possibly the "frequentist vs Bayesian" idea doesn't stand up. Melcombe (talk) 16:46, 28 February 2008 (UTC)

That fiducial intervals in some cases differ from confidence intervals was proved in 1936 by Bartlett in 1936. Michael Hardy (talk) 17:17, 28 February 2008 (UTC)
I agree that moving this part of the article somewhere might be approapriate. I'll review Bartlett but, as I recall, he wasn't making this exact argument. I believe the error here is to claim that P(a<x<b) is not a "real" probability because x is not a random variable. Acually, that is not a necessary criterion. Both a and b are random in the sense that they were computed from a set of samples which were selected from a population by a random process.Hubbardaie (talk) 17:27, 28 February 2008 (UTC)
Also, a frequentists holds only that the only meaning of "probability" is in regards to the frequency of occurances over a large number of trials. Consider a simple experiment for the measurement of a known parameter by random sampling (such as random sampling from a large population where we know the mean). Compute a 90% CI based on some sample size and we repeat this until we have a large number of separate 90% CI's each based on their own randomly selected samples. We will find, after sufficient trials, that the known mean of the population falls within 90% of the computed 90% CI's. So, even in a strict frequentist sense, P(a<x<b) is a "real" probability (at least the distinction, if there is a real distinction, has no bearing on observed reality).Hubbardaie (talk) 17:32, 28 February 2008 (UTC)
I think one version of a fiducial derivation is to argue that it is sensible to make the conversion from (X random, theta fixed) to (X fixed,theta random) directly without going via Bayes Theorem. I thought that was what was being done in this section of the article. Melcombe (talk) 18:05, 28 February 2008 (UTC)
Another thing we want to be careful of is that while some of the arguments were being made by respected statisticians like R.A. Fischer, these were views that were not widely adopted. And we need to separate out where the expert has waxed philisophically about a meaning and where he has come to a conclusion with a proof. At the very least, if a section introduces these more philisophical issues, it should not present it as if it were an uncontested and mathematically or empirically demonstrable fact.Hubbardaie (talk) 18:13, 28 February 2008 (UTC)
P(a<x<b)=0.9 is (more or less) the definition of a 90% confidence interval when a and b are considered random. But once you've calculated a particular confidence interval you've calculated particular values for a and b. In the example on the page at present, a = 82 - 1.645 = 80.355 and b = 82 + 1.645 = 83.645. 80.355 and 83.645 are clearly not random! So you can't say that P(80.355<x<83.645)=0.9, i.e. it's not true that the probability that x lies between 80.355 and 83.645 is 90%. This is related to the prosecutor's fallacy as it's related to confusing P(parameter|statistic) with P(statistic|parameter). To relate the two you need a prior for P(parameter). If your prior is vague compared to the info from the data then the prior doesn't make a lot of difference, but that's not always the case. I'm sure I read a particularly clear explanation of this fairly recently and I'm going to rack my brains to try to remember where now.Qwfp (talk) 18:40, 28 February 2008 (UTC)
I think fiducial inference is a historical distraction and mentions of it in the article should be kept to a very minimum. I think the general opinion is that it was a rare late blunder by Fisher (much like vitamin C was for Pauling). That's why I added this to the lead of fiducial inference recently (see there for ref):

In 1978, JG Pederson wrote that "the fiducial argument has had very limited success and is now essentially dead."

Qwfp (talk) 18:40, 28 February 2008 (UTC)
I think that you would find on a careful read of the prosecutors fallacy that it addresses a very different point and it would be misapplied in this case. I'm not confusing P(x|y) with P(y|x). I'm saying that experiments show that P(a<x<b) is equal to the frequency F(a < x < b) over a large number of trials (which is consistent with a "frequentist" position). On another note, no random number is random *once* it is chosen, but clearly a and b were computed from a process that included a random variable (the selection from the population) If I use a random number generator to generate a number between 0 and 1 with a uniform distribution, there is a 90% chance it will generate a value over .1. Once I have this number in front of me, it is an observed fact but *you* don’t yet know it. Would you say that P(x>.1) is not really a 90% probability because the actual number is no longer random (to me) or are you saying that its not really random because the “.1” wasn’t random? If we are going down the path of distinguishing what is “truly random” vs. “merely uncertain”, I would say we would have to solve some very big problems in physics, first. Even at a very fundamental level, is there a real difference between “random” and “uncertain” that can be differentiated by experiment? Apparently not. The “truly random” distinction won’t matter if nothing really is random or if there is no way to experimentally distinguish “true randomness” from observer-uncertainty. Hubbardaie (talk) 20:05, 28 February 2008 (UTC)

I've tracked down at least one source that i was half-remembering in my (rather hurried) contributions above, namely

  • Lindley, D.V. (2000), "The philosophy of statistics", Journal of the Royal Statistical Society: Series D (The Statistician), 49: 293–337, doi:10.1111/1467-9884.00238

On page 300 he writes:

"Again we have a contrast similar to the prosecutor's fallacy:

  • confidence—probability that the interval includes θ;
  • probability—probability that θ is included in the interval.

The former is a probability statement about the interval, given θ; the latter about θ, given the data. Practitioners frequently confuse the two."

I don't find his 'contrast' entirely clear as the difference between the phrases "probability that the interval includes θ" and "probability that θ is included in the interval" is only that of active vs passive voice; the two seem to mean the same to me. The sentence that follows that ("The former...") is clearer and gets to the heart of it I think. I accept it's not the same as the prosecutor's fallacy, but as Lindley says, it's similar.

I think one way to make the distinction clear is to point out that it's quite easy (if totally daft) to construct confidence intervals with any required coverage that don't even depend on the data. For instance, to get a 95% CI for a proportion (say, my chance of dying tomorrow) that I know has a continuous distribution with probability 0 of being exactly 1:

  • Draw a random number between 0 and 1.
  • If it's less than 0.95, the CI is [0,1].
  • If it's greater than 0.95, the CI is [1,1], i.e. the single value 1 (the "interval" is a single point)

In the long run, 95% of the CIs include the true value, satisfying the definition of a CI. Say I do this and the random number happens to be 0.97. Can I say "there's a 95% probability that I'll die tomorrow" ?

Clearly no-one would use such a procedure to generate a confidence interval in practice. But you can end up with competing estimation procedures giving different confidence intervals for the same parameter, both of which are valid and have the same model assumptions, but the width of the CIs from one procedure varies more than those from the other procedure. (To choose between the two procedures, you might decide you prefer the one with the smaller average CI width). For example, say (0.1, 0.2) and (0.12, 0.18) are the CIs from the two procedures from the same data. But you can't then say both "there's 95% probability that the parameter lies between 0.10 and 0.20" and "there's 95% probability that the parameter lies between 0.12 and 0.18" i.e. they can't both be valid credible intervals.

Qwfp (talk) 22:46, 28 February 2008 (UTC) (PS Believe it or not after all that, I'm not in practice a Bayesian.)

Good, at least now we have a source. But I have two other very authoritative sources:A.A. Sveshnikov "Problems in Probability Theory, Mathematical Statistics and the Theory of Random Functions", 1968, Dover Books, pg 286 and a very good online source Wolframs's Mathmatica source site http://mathworld.wolfram.com/ConfidenceInterval.html. The former source states on pg 286 the following (Note: I could not duplicate some symbols exactly as Sveshnikov shows them, but I replaced them consistently so that the meaning is not altered)
"A Confidence interval is an interval that with a given confidence a covers a parameter θ to be estimated. The width of a symmetrical confidence interval 2e is determined by the condition P{|θ - θ'|<=e}=a, where θ' is the estimate of the parameter θ and the probability {emphasis added} P{|θ - θ'|<=e} is determined by the distribution law for θ'"
Here Sveshnikov makes it clear that he is usin the confidence interval as a statement of a probability. When we go to the Mathworld site it defines the confidence interval as:
"A confidence interval is an interval in which a measurement or trial falls corresponding to a given probability." {emphasis added}
I find other texts such as Morris DeGroot's "Optimal Statistical Decisions" pg 192-3 and Robert Serfling's "Approximation Theorems in Mathematical Statistics" pg 102-7 where confidence intervals are defined as P(LB<X<UB) {replacing their symbols with LB and UB}. It is made clear earlier in each text that the P(A) is the probability of A and the use of the same notation for the confidence interval apparently merits no further qualification.
On the other hand, even though none of my books within arm’s reach make the distinction Qwfp’s source makes, I found some other online sources that do attempt to make this distinction. When I search on ( “confidence interval”, definition, “probability that” ) I found that a small portion of the sites that come up make the distinction Qwfp’s source is proposing. Sites that make this distinction and sites that define confidence interval as a probability both include sources from academic institutions and what may be laymen. Although I see the majority of sites clearly defining a CI as a ‘’’probability’’’ that a parameter falls within an interval, I now see room for a possible dispute.
The problem is that the distinction made in this “philosophy of statistics” source and in other sources would seem to have no bearing on its use in practice. What specific decisions will be made incorrectly if this is interpreted one way or the other? Clearly, anyone can run an experiment on their own spreadsheet that shows that 90% of true means will fall within 90% CI’s when such CI’s are computed a large number of times. So what is the pragmatic effect?
I would agree to a section that, instead of matter-of-factly stating one side of this issue as the “truth”, simply points out the difference in different sources. To do otherwise would constitute original research.Hubbardaie (talk) 05:05, 29 February 2008 (UTC)
One maxim of out-and-out Bayesians is that "all probabilities are conditional probabilities", so if talking about probabilities, one should always make clear what those probabilities are conditioned on.
The key, I think, is to realise that Sveshnikov's statement means consideration of the probability P{(|θ - θ'|<=e) | θ }, i.e. the probability of that difference given θ, read off for the value θ = θ'. This is a different thing to the probability P{(|θ - θ'|<=e) | θ' }.
I think the phrase "read off for the value θ = θ' " is correct for defining a CI. But an alternative (more principed?) might be to quote the maximum value of |θ - θ'| s.t. P{(|θ - θ'|<=e) | θ } < a.
Either way, the wheels would still fall off in the example of the next section. Jheald (talk) 15:09, 29 February 2008
Sveshnikov said what he said. And I believe you are still mixing some unrelated issues about the philisophical position of Bayesian (which I prefer to call subjectivist) vs. frequentist view. Now we are going in circles, so just provide citations of the arguments you want to make. The only fair way to treat this is as a concept that lacks concensus even among authoritative sources. —Preceding unsigned comment added by Hubbardaie (talkcontribs) 29 February 2008

I have moved some of the section that I believe is correct to before the "Dispute" marker in the article, so that the "disputed bit" is more clearly identified. I hope the remaining is what was meant. Melcombe (talk) 18:02, 13 March 2008 (UTC)

Two questions: the first is just seeking clarification about the dispute. Is the issue in dispute analagous to this: I bought a lottery ticket yesterday and had a 1 in a million chance of winning last night. Today, I have heard nothing about what happened in the lottery last night. Some of you are saying that I can no longer assert that I have a 1 in a million chance of having won the lottery - I either have or I haven't; objective uncertainty no longer exists and therefore I cannot assign a probability. Others of you are saying that I can still say that there's a 1 in a million chance that I've won (on the basis that I have no information now that I didn't have yesterday). [For "Winning lottery", read "getting a confidence interval that really does contain the true value".] —Preceding unsigned comment added by 62.77.181.1 (talk) 16:21, 30 April 2008 (UTC)

For the first question ... strictly the "dispute" should be about the accuracy of what is in the article, or about the accuracy of how it represents different interpretations or viewpoints of things, where such differences exist. Unfortunately the discussion here has turned into a miasma with a different type of dispute ongoing. Your analogy is only partly related to the question here, since here there are three quantities involved ... the two end points of the interval, and the thing in the middle. In your analogy, you have something (the outcome of the lottery draw) that is at one stage random, at other fixed (one drawn) but unknown, and at other fixed and known. For CI's the endpoints are at one stage random, and at another fixed and known, while in the traditional CI, the thing in the middle is always fixed (and unknown). For a Bayesian cedible interval, the end-points are first random then fixed as before, while the thing in the middle is either random and unknown, or fixed and unknown and where in both cases the unknown-ness is dealt with by allowing the thing to be represented by a probability distribution, however, between the two stages, there is a change in the distribution used to reprersent the thing in the middle. So the difference is that, for traditional CI's, the probability statement relates to the end-points at the stage when they are random with the thing in middle treated as fixed, while, for Bayesian cedible intervals, the probability statement relates to a stage where the end-points are treated as fixed. I expect that is as clear as mud, which is why having proper mathematical statements about what is going on is important. Melcombe (talk) 09:05, 1 May 2008 (UTC)

Second question: is the assertion in the main article (under methods of derivation) about a duality that exists between confidence levels and significance testing universally accepted? That is, if a population parameter has a finite number of possible values, and if I can calculate, for each of these, the exact probability of the occurence of the observed or a "more extreme" outcome, is it safe from dispute to assert that the 95% confidence interval consists precisely of those values of the population parameter for which this probability is not less than 0.05? Or does that duality just happen to hold in common circumstances? —Preceding unsigned comment added by 62.77.181.1 (talk) 16:21, 30 April 2008 (UTC)

For the second question ... the significance test inversion works in general but note that (i) you don't need to restrict yourself as in "if a population parameter has a finite number of possible values"; (ii) each different significance test would produce different confidence intervals, so you should not think of there being a "the" confidence interval. The "significance test inversion" justification, which is reasonably simple, is one of the things that still needs to be included in the article. Melcombe (talk) 16:39, 30 April 2008 (UTC)

I have some confusion too. You calculate confidence intervals such that 90% will contain the true mean, but you don't know the true mean. Thus, you don't know which of your intervals contain it and which don't. So looking at a single one of your intervals, why is it not accurate then to say that the probability that this interval contains the true mean is 0.9? Are there experiments that give evidence against this? It seems that at least the common sense would be satisfied: just create a confidence interval from samples, then reveal the true mean and see if it is in the interval. Repeat many times. This would give something besides 90%? -kd

"why is it not accurate then to say that the probability that this interval contains the true mean is 0.9" — because in the viewpoint related to confidence intervals, there are no random quantities around once you have evaluated the endpoints of the interval. "Are there experiments that give evidence against this?" — computer simulation experiments are done for confidence intervals to check the coverage (proportion of intervals containing the true value). But to make this relate to your question (which is about a single outcome of an interval), you would need to first get a single outcome of the interval and then do multple realisations of confidence intervals but count only those whose end-points agree with the initial interval ... the proportion of these intervals covering the true value will be either 0 or 1. There is a probability of 90% only when the outcome of the confidence interval is still random. Melcombe (talk) 16:39, 20 May 2008 (UTC)
It is my understanding that once you have constructed the confidence interval from the outcome of an experiment, you can nolonger speak of probabilities because the interval has been determined--- it is nolonger random. So you cannot say that the particular interval has a 90 percent probability of covering the true value. But you can say that with 90 percent confidence the true value lies in the interval. Note the difference between "confidence" and "probability".Ty8inf (talk) 03:49, 22 May 2008 (UTC)
I know there are no random quantities once the interval is evaluated. The true mean is not a random variable, but since it is unknown, isn't the "revelation" of it in a way a RV? In a real-life situation where you never learn the true mean, the experiment I meant was closer to this: Let's say I know the true mean of a RV. You get samples, construct a 90% confidence interval. Now, I know that your confidence interval does or does not contain the true mean with total certainty. However, from your perspective, if you had to bet on whether or not your interval contains the true mean once I revealed it to you, wouldn't you call the odds 9-to-1 that it did? Isn't this really the position of somebody looking at data when they will never know the true mean? I do recognize that the definition makes sense, that an interval either does or does not contain the true mean. However, am I still applying that incorrectly with the odds example I gave? Or, given that one out of every three doors you open will have $30 behind it, and you have a door in front of you. That door certainly has or does not have $30. But not knowing, can't you treat it as an RV in the sense that your expected gain is $10, or you'd pay up to $10 to play this game, however you want to look at it? -kd —Preceding unsigned comment added by 151.190.254.108 (talk) 18:00, 20 May 2008 (UTC)
You can't do what you suggest within the established interpretation of confidence intervals, where unknown quantities are treated as fixed. There are other approachs to interval estimation where you can and do use probability to represent a state of knowledge about unknown quantities: see Bayesian inference, Bayesian probability, Probability interpretations and credible interval and interval estimation. Sometimes the resulting intervals are very similar in outcome but the interpretations are very different. More importantly, the steps required to construct interval estimates for new problems are substantially different and there are types of problems for which only confidence intervals are currently practicable, and some where only credible intervals are practicable. Thus it is important to maintain a good distinction between what is meant by any given type of interval estimate. Melcombe (talk) 08:57, 21 May 2008 (UTC)
I sort of see what you mean, but can't you still carry out a sort of experiment as I wrote with the calculations of the interval by the definition of confidence interval? Can you perhaps elaborate on what you mean by "you can't do what you suggest"? I'm not clear what part of what I suggested you can't do. The credible interval article on wikipedia says "It is widely accepted, especially in the decision sciences, that "credible interval" is merely the subjective subset of "confidence intervals"." Now, if I'm totally blowing it let me know and I'll read more before asking again, but so far I can't get my head around why confidence intervals can't be interpreted as credible intervals by the experimenter when the true mean of his data is unknown. It seems to me like confidence intervals have randomness in the interval but not the mean, while credible intervals are the opposite, but I can't think of an example where they don't give the same information.
This article centres on the non-subjective approach to statistical inference, which is why "you can't do what you suggest" by which I mean that you can't treat the unknown thing you are trying to estimate as random. The reason why they can't be interpreted sometimes one way and sometimes another, is that the means of constructing the intervals is different and done within inference frameworks which are different. The article contains a comparison at Confidence interval#Comparison to Bayesian interval estimates. Melcombe (talk) 17:41, 21 May 2008 (UTC)

Example of a CI calculation going terribly wrong

Here's an example I put up yesterday at Talk:Bayesian probability

Suppose you have a particle undergoing diffusion in a one degree of freedom space, so the probability distribution for it's position x at time t is given by
Now suppose you observe the position of the particle, and you want to know how much time has elapsed.
It's easy to show that
gives an unbiased estimator for t, since
We can duly construct confidence limits, by considering for any given t what spread of values we would be likely (if we ran the experiment a million times) to see for .
So for example for t=1 we get a probability distribution of
from which we can calculate lower and upper confidence limits -a and b, such that:
Having created such a table, suppose we now observe . We then calculate , and report that we can state with 95% confidence, or that the "95% confidence range" is .
But that is not the same as calculating P(t|x).


The difference stands out particularly clearly, if we think what answer the method above would give, if the data came in that .
From we find that . Now when t=0, the probability distribution for x is a delta-function at zero. So the distribution for is also a delta-function at zero. So a and b are both zero, and so we must report a 100% confidence range, .
On the other hand, what is the probability distribution for t given x? The particle might actually have returned to x=0 at any time. The likelihood function, given x=0, is

Conclusion: data can come in, for which confidence intervals decidedly do not match Bayesian credible intervals for θ given the data, even with a flat prior on θ. Jheald (talk) 15:26, 29 February 2008 (UTC)

What about a weaker proposition, that given a particular parameter value, t = t*, the CI would accurately capture the parameter 95% of the time? Alas, this also is not necessarily true.

What is true is that a confidence interval for the difference calculated for a correct value of t would accurately be met 95% of the time.
But that's not the confidence interval we're quoting. What we're actually quoting is the confidence interval that would pertain if the value of t were . But t almost certainly does not have that value; so we can no longer infer that the difference will necessarily be in the CI 95% of the time, as it would if t did equal .
This can be tested straightforwardly, as per the request above for something that could be tested and plotted on a spreadsheet. Simply work out the CIs as a function of t for the diffusion model above; and then run a simulation to see how well it's calibrated for t=1. If the CIs are calculated as above, it becomes clear that those CIs exclude the true value of t a lot more than 5% of the time. Jheald (talk) 15:38, 29 February 2008 (UTC)
You make the same errors here that you made in the Bayesian talk. You show an example of a calculation for a partcular CI, then after that, simply leap to the original argument that the confidence of CI is not a P(a<x<b). You don't actually prove that point and, as my citations show, you would have to contradict at least some authoritative sources in that claim. All we can do is a balanced article that doesn't pretent that the claim "A 95% CI does not have a 95% probability of containing the true value" as an undisputed fact. It seems, again, more like a matter of an incoherent definition that can't possibly have any bearing on observations. But, again, let's just present all the relevant citations.Hubbardaie (talk) 16:18, 29 February 2008 (UTC)
Let's review things. I show an example where there is a 100% CI that a parameter equals zero; but the Likelihood function is
Do you understand what that Likelihood function means? It means that the posterior probability P(t|data) will be different in all circumstances from a spike at t=0, unless you start off with absolute initial certainty that t=0.
That does prove the point that there is no necessity for the CI to have any connection with the probability P(a<t<b | data). Jheald (talk) 16:44, 29 February 2008 (UTC)
We are going in circles. I know exaclty wat Likelihood function means but my previous arguments already refute your conclusion. I just can't keep explaining it to you. Just show a citation for this argument and we'll move on from there.Hubbardaie (talk) 18:03, 29 February 2008 (UTC)
A confidence interval will in general only match a Bayesian credible interval (modulo the different worldviews), if
  • (i) we can assume that we can adopt a uniform prior for the parameter;
  • (ii) the function P(θ'|θ) is a symmetric function that depends only on (θ'-θ), with no other dependence on θ itself;
  • and also (iii) that θ' is a sufficient statistic.
If those conditions are met (as they are, eg in Student's t test), then one can prove that P(a<t<b | data) = 0.95.
If they're not met, there's no reason at all to suppose P(a<t<b | data) = 0.95.
Furthermore, if (ii) doesn't hold, it's quite unlikely that you'll find, having calculated a and b given t', that {a<t<b} is true for 95% of cases. Try my example above, for one case in particular where it isn't. Jheald (talk) 18:45, 29 February 2008 (UTC)
Then, according to Sveshnikov and the other sources I cite, it is simply not a correctly computed CI, since the CI must meet the standard that the confidence IS the P(a<x<b). I repeat what I said above. Let's just lay out the proper citations and represent them all in the article. But I will work through your example in more detail. I think I can use an even more general way of describing possible solution spaces for CI's and whether for the set of all CI's over all populations, where we have no a priori knowledge of the population mean or variance, if P(a<x<c)is identical to the confidence. Perhaps you have found a fundamental error in the writings of some very respected reference sources in statistics. Perhaps not.Hubbardaie (talk) 22:53, 29 February 2008 (UTC)
After further review of the previous arguments and other research, I think I will concede a point to Jheald after all. Although I believe as Sveshnikov states, that a confidence interval should express a probability, not all procedures for computing a confidence interval will represent a true P(a<x<b) but only for two reasons:
1) Even though a large number of confidence intervals will be distributed such that the P(a<x<b)=the stated confidence, there will, by definition, be some that contradict prior knowledge. But in this case we will still find that such contradictions will apply to a small and unlikely set of 95% of CI's (by definition)
2) THere are situations, especially in small samples, where 95% confidence intervals contradict prior knowledge even to the extent that a large number of 95% CI's will not contain the answer 95% of the time. In these cases it seems to be because prior knowledge contradicts the assumptions in the procedure used to compute the CI. In effect, an incorrectly computed CI. For example, suppose I have a population distributed between 0 and 1 by a function F(X^3) where X is a uniformly distributed random varialbe between 0 and 1. CI's computed with small samples using the t-stat will produce distributions that allow for negative values even when we know the population can't produce that. Furthermore, this effect cannot be made up for by computing a large number of CI's with separate random samples. Less than 95% of the computed 95% CI's will contain the true population mean.
The reason, I think, Sveshnikov and others still say that CI do, in fact, represent a probablity P(a<x<b) is because that is the objective, by definition, and that where we choose sample statistics based assumptions that a priori knowledge contradict, we should not be surprise we produced the wrong answer. So Sveshnikov et al would, I argue, just say we picked the wrong method to compute the CI and that the ojective should always be to produce a CI that has a P(a<x<b) as stated. We have simply pointed out the key shortcoming of non-Bayesian methods when we know certain things about the population which don't match the assumptions of the sampling statistic. So, even this concession doesn't exactly match everything Jheald was arguing, I can see where he was right in this area. Thanks for the interesting discussion.Hubbardaie (talk) 17:38, 1 March 2008 (UTC)

Non-Statisticians

Another question: Can you easily calculate a confidence interval for a categorical variable? How is this done? For a percentage, a count...? There is an example in the opening of the article about CIs for political candidates, implying that it's no big deal. But I was reading somewhere else that CIs don't apply for categorical (ie which brand do you prefer?) type variables. SPSS is calculating them, but I'm having a hard time interpreting this information. Thanks for your help! —Preceding unsigned comment added by 12.15.60.82 (talk) 20:27, 21 April 2008 (UTC)

Easily? That depends on your background. I suggest you either look at the section of this article "Confidence intervals for proportions and related quantities", or go directly to Binomial proportion confidence interval for something more complete. Melcombe (talk) 08:53, 22 April 2008 (UTC)

I have to say, that I came to this page to get an understanding of CIs and how they are calculated - but as the page presently stands it is simply to complex for anybody who doesn't have a good grounding in statistics to understand it. Not a criticisms per say but thought that need to be pointed out.

I assume that this is being edited by statisticians but just as an example the opening paragraph: "In statistics, a confidence interval (CI) is an interval estimate of a population parameter. Instead of estimating the parameter by a single value, an interval likely to include the parameter is given. How likely the interval is to contain the parameter is determined by the confidence level or confidence coefficient. Increasing the desired confidence level will widen the confidence interval."

Is simply to complex and relies on to much previous knowledge of stats to make sense. WIKI is after all supposed to be written for the general reader. I know that to you guys this paragraph will seem easily understandable but trust me, for a non statistician it is uselessMaras brother Ted (talk) 20:57, 27 March 2008 (UTC)

I agree completely and I'm a statistician. There appear to be some statistician-wannabes in here pontificating about the old Bayesian vs. Frequentist debate and whether the term "confidence interval" refers to a range that has a stated probability of containing the value in question. There is no such practical difference in the real world and the entire discussion is moot. As others have pointed out in this discussion page, many mathematical texts on statistics (as the discussions cite more than once) explicitly define a confidence interval in terms of a range that has a stated probability of contaiing the true value. Any discussion that a confidence interval is NOT really about a probability of a value falling in a range is entirely irrelevant to the practical application of statistics. Anyone who disagrees should just start an article to answer the question "How Many Angels Can Dance on the Head of a Pin?". It would be every bit as productive. I just wish I had more time to rewrite this artcle.ProfHawthorne (talk) 19:47, 28 March 2008 (UTC)
Hear, hear! Um, now.... who's going to fix it? Listing Port (talk) 20:30, 28 March 2008 (UTC)

...and I am a statistician and I think "ProfHawthorne" is wrong. I note that this talk page comment is (so far) his or her only edit to Wikipedia. Michael Hardy (talk) 21:00, 28 March 2008 (UTC)

I'm not a statistician and I think the intro is terrible. For one thing, it has seven paragraphs, violating WP:LEAD and for another it's first sentence (and many other sentences) infringe on WP:JARGON which is "especially important for mathematics articles." Listing Port (talk) 21:15, 28 March 2008 (UTC)
I agree the lead section needs improvement. I do not agree with "ProfHawthorne", though. I hope we can develop something that gets the gist across clearly to the lay reader, without becoming technically inaccurate. -- Avenue (talk) 00:42, 29 March 2008 (UTC)
When did I argue that we can't develop something that makes the point without being technically inaccurate? Everything I said was entirely accurate. See my comments to Hardy below.ProfHawthorne (talk) 14:04, 29 March 2008 (UTC)

Just wanted to say thanks for all of the replies and hope this begins some useful debate and changes. I also had to add that I am not ungrateful to the work people have done on this but simply that it is not "understandable." To give you an idea I found the answer much easier to understand in a book entitled "Advanced statistics" in clinical trials" or something similar in my university library. Maras brother Ted (talk) 11:51, 29 March 2008 (UTC)

Hardy is correct that this talk page was my first comment in Wikipedia. But the conversation about confidence intervals is so misguided I could no longer sit on the sidelines. If Hardy thinks I'm wrong, point to the "error" specifically. Start by a general review of stats texts and note the overwhelming majority that define a confidence interval in terms of an interval having a stated probability of containing a parameter. Such a relationship, in fact, the basis of Tchebysheff's Inequality, a fundamental constraint on all probability distributions. If you are a statistician, you would already know this.ProfHawthorne (talk) 13:46, 29 March 2008 (UTC)

I make the following points:

  • This article has as much need to be technically correct as any other article on mathematics, so some complexity is necessary, although we can try for a simpler introduction;
  • The best way to promote misunderstanding is to use vaguely defined concepts and ill-informed versions of these, and a lot of supposed Bayesian-frequentist comparisons on these pages seem to suffer from this. "Confidence interval" does have a well-defined meaning in mathematical statistics and it is best not to cloud the issue of what "Confidence interval" means. If there is to be discussion and comparisons of other interpretations of interval estimates then this can most sensibly made in connection with either the interval estimation articles or the more general articles on statistical inference. There really is no point in having repeated "this is wrong because it isn't Bayesian" stuff put in every article that tries to give a description of what a particular statistical approach is doing;
  • I don't think that these pages should become a "cookbook" for statistics where people think they can find something to tell them how to analyse their data without actually thinking. There seems too much danger of just looking-up a summary table without remembering about all the assumptions that are built-in;
  • I wasn't aware that these pages were trying to replace existing text-books;
  • There needs to be some recognition of the fact that "statistics" is a difficult subject so that anything that implies that people can just look at these articles and do "statistics" for themselves should be avoided. If they haven't done a course in statistics at some appropriate level, then they should be encouraged to find a properly trained statistical, and preferably one with relevant experience.

Melcombe (talk) 09:58, 31 March 2008 (UTC)

Yes, yes, yes, yada yada, that's all fair and good, but the important fact is that your changes are excellent! Thanks! :) Listing Port (talk) 22:13, 31 March 2008 (UTC)
I think Melcombe's additions are very reasonable. However, it now contradicts the "meaning & interpretation" section. I see that someone has modified that section back to the old misconceived Bayesian vs. Frequentist distinction. The author removed the "no citations" and "controversy" flag but still offers no citations. It also contrdicts the definiton Melcombe aptly provided so I think we can still call it controversial. Since no citations are provided, its still only original research (even if it weren't controversial, which it is). I'll add the flags back to this section and we can all discuss thisHubbardaie (talk) 15:25, 4 April 2008 (UTC)

Dispute tag back

I just added the dispute and citaion flags back to the "meaning" section. Another point on the controversy is to note the entire example given earlier in the article consistently uses the notation of the form P(a<X<b) indicating a confidence interval is actually a range with a probability of containing a particular value - contrary to the uncited claims of the disputed section. —Preceding unsigned comment added by Hubbardaie (talkcontribs) 15:31, 4 April 2008 (UTC)

I removed the dispute flag having deleted a block that I thought was not necessary. Since the dispute flag was put back I have rearranged the text so that stuff on similar topics is under a single section, meaning that there are now two dispute tags left in. I think we need to be clear about exactly what bits are of concern I think that the subsection headed "How to understand confidence intervals" is/was OK, but the next subsection headed "Meaning of the term confidence" is newer and possibly not needed (or better off in some other article). I moved the stuff headed "Confidence intervals in measurement" into a separate section and left another dispute tag there. My immediate question is whether it is only therse two chunks of text that ther4e may be a problem with? Melcombe (talk) 16:30, 4 April 2008 (UTC)
Calling this a disputed issue would even be a bit too generous. The section about how to interpret the meaning of a confidence interval is simply wrong by a country mile. I work with a large group of statisticians and I mentioned this article to my peers, today. One thought it was a joke. I'm on the board for a large conference of statisticians and I can tell you that this idea that a confidence interval is not really a range with a given probability of containing a value would probably (no pun) be news to most of them. The confidence coefficient of a confidence interval is even defined as the probability that a given confidence interval contains the estimated quantity. Most of the article and the examples given are consitent with this accepted view of confidence intervals but this disputed section goes off on a different path. From my read of the other discussions, it appears that the only source anyone can find for the other side of this debate is a single philosophy book which was not quoted exactly. There are many other good sources provided and they all seem to agree with those who say the confidence coefficient really IS a probability. Or did I get taken by a late April Fool's joke?DCstats (talk) 19:43, 4 April 2008 (UTC)
Well said. I suggest the simplest fix is just the deletion of that entire section. It is inconsistent with the rest of the article, anyway.ERosa (talk) 13:22, 6 April 2008 (UTC)

I suggest a major rewrite

I have read the informal and the theoretical definitions of "confidence interval" in this article and find the informal definition to be way too vague, and the theoretical one to almost completely miss the point. I strongly urge a major rewrite of this article.

By the way, i took some time to google "confidence interval" to find out what is readily available on the web via course notes or survey articles. I spent less than 30 minutes on this, but I must say that there is a huge amount of poor and misleading exposition out there, which may explain in part where this article got its inspiration from. Then again, maybe not. Daqu (talk) 02:49, 6 April 2008 (UTC)

I agree. The part of this article that discusses how to "properly" intepret the meaning of the confidence interval is being refuted by many in here for good reason. Someone has some very wrong ideas about confidence intervals, statistics and just basic probability. The previously noted lack of sources in that section should be our first clue. ERosa (talk) 13:20, 6 April 2008 (UTC)

IMO one fundamental problem is that confidence intervals don't actually mean what most people (including many scientists who use them) actually believe them to mean, and explaining this issue is non-trivial, especially when the potential audience comes with erroneous preconceptions and may not even have any suspicion that they are fundamentally wrong.Jdannan (talk) 02:43, 24 April 2008 (UTC)

I agree there are a lot of misconceptions, some shown here in this article. But the body of the article uses language quite consistent with every mathemaical statistics text I have on my shelves. The article correctly explains that for a 95% confidence interval the following must hold:
In other words, at least for this part of the article, the author correctly describes a confidence interval as an interval with a stated probability of containing the value. Others in this discussion that attempt to directly refute this use of the term end up contradicting most of the math in this article and every reputable text on the topic (only one side on this discussion has provided citations). —Preceding unsigned comment added by ERosa (talkcontribs) 14:11, 24 April 2008 (UTC) Oops, forgot to signERosa (talk) 14:15, 24 April 2008 (UTC)
Exactly. I think we have some problems with some flawed thinking about stats in here.DCstats (talk) 19:19, 25 April 2008 (UTC)
But even the language here is open to misinterpretation. Prior to making the observation and calculating the interval, one can say that a confidence interval will have a certain probability of containing the true value (ie, considering the as-yet-unknown interval as a sample from an infinite population - 95% of the confidence intervals will contain the parameter). But once you have the numbers, this is no longer true, and one cannot say that the parameter lies in the specific interval (once it has been calculated) with 95% probability. IME it is not trivial to explain this in simple but accurate terms. In fact numerous scientists in my field don't appear to understand it.Jdannan (talk) 00:29, 25 April 2008 (UTC)
But ... even "once it has been calculated" there is still the probability-linked interpretation that, (whatever the outcome) it is the outcome of a procedure that had a given probability of covering the "true" value". Melcombe (talk) 08:56, 25 April 2008 (UTC)
I suppose anything ever written is open to misinterpretation in the sense that some people will misinterpret it. But the expression used above by ERosa is entirely consistent with every credible reference used in universities (one in which I've taught). Whatever misinterpretation it might be open to by laymen, apparently the mathematical statisticians find it to be completely unambiguous.
Your claim that you only have a confidence interval "prior to making the observations and calculating theinterval" but that is not longer true "once you have the numbers" belies some deep misunderstanding of the subject. The confidence interval based on samples can ONLY be computed after we have the observations and the calculation is made. What exactly do you think the confidence interval procedure for the z-stat and t-stat are about? The 90% confidence interval actually has a 90% probability of containing the estimated value. This is experimentally testable.DCstats (talk) 19:19, 25 April 2008 (UTC)
Not sure how you got that from what I wrote. I (obviously) never made the absurd statment that one only has a confidence interval prior to making the observations. What I was trying to convey was that the statistical property (eg 90% coverage) can only ever apply to a hypothetical infinite population of CIs calculated in the same way from hypothetical observations that haven't been made. Saying that a given confidence interval has a 90% chance of containing the parameter is equivalent to saying that a given coin toss, once you have already seen it come up heads, has a 50% chance of landing tails. Are you really saying that you would be happy with a statement that the confidence interval [0.3,0.8] has a 50% chance of containing a parameter, when that parameter that is known to be an integer? (eg the number of children of a given person)?Jdannan (talk) 22:55, 25 April 2008 (UTC)
I thought I was getting it from your direct quotes and, on further reading, it seems I quoted you accurately. Anyway, you are using an irrelevant example. When a 90% CI is computed, it is usually the case that all you know about it is the samples taken, not the true value. You are correct when you say that in an arbitarilly large series of 90% CI's computed from random samples, 90% will contain the estimated value. You are incorrect when you claim that saying a 90% CI has a 90% chance of containing the answer is like saying that a coin toss has a 50% chance of being tails once you have already seen it come up heads. The two are different because in the case of a 90% CI gathered from a random sample, you do not have certain knowledge of the parameter (eg. the mean). That's why its called a 90% CI and not a 100% certain point.
If you have additional information other than the samples themselves, then the methods that do not consider that knowledge do not apply and the answer they give wouldn't be the proper 90% CI. In those cases where other knowledge is available, 90% CI still means the same thing, but you have to use a different method (discrete Bayesian inversions for your integer example) to get an interval that is actually a 90% CI. You seem to be confusing situation where you somehow obtained the knowledge of true value and the situation where we are not certain of the value and can only estimate it after a random sample.
I would really like to see what source you are citing for this line of reasoning. It is truly contrary to all treatment of the concept in standard sources and is contrary even to the meathematical expressions used throughout this article where a confidence interval is described as a probility that a value is between two bounds.DCstats (talk) 02:07, 26 April 2008 (UTC)
"If you have additional information other than the samples themselves, then the methods that do not consider that knowledge do not apply and the answer they give wouldn't be the proper 90% CI." Why on earth not? If the (hypothetical, infinite) population of confidence intervals calculated by a specific method have a given probability of coverage, then they are valid confidence intervals. Can you give any reference to support your assertion that this is not true when a researcher has prior knowledge? There are numerous simple examples where perfectly valid methods for generating confidence intervals with a specific coverage will produce answers where the researcher knows (after looking at a given interval so generated) whether the parameter does, or does not, lie in that specific interval.Jdannan (talk) 03:06, 26 April 2008 (UTC)
Because "on earth" you haven't met the conditions required for the procedure to produce the right answer. If you already know the exact mean of a population, THEN take a random sample, the interval will not be a 90% CI, by definition. You just answered your own question when you asked "If the (hypothetical, infinite) population of confidence intervals calculated by a specific method have a given probability of coverage, then they are valid confidence intervals." If those conditions really apply and the probability is valid, then the CI is valid. If you know in each of those cases what the true mean is or that the answer must fit other constraints (like being an integer) then the the standard t-stat or z-stat procedure is NOT your 90% CI. The hypothetically infinite population thought experiment will prove it. If you take a random sample of a population where you already know the mean to be greater than 5, then use the t-statistic to compute a "90%" CI of 2 to 4, then we know that the actual probability the interval contains the answer is 0% (because of our prior knowledge). The meaning of 90% CI didnt change, but using a procedure that incorrectly ignores known constraints will, of course produce the wrong answer. You just use a different method to compute the PROPER 90% CI that would meet the test over a large number of samples. You have simply misunderstood the entire point. Just like in any physics problem, if you leave out a critical factor in computing, say, a force, you have not proven that force means other than it does. You simply proved that you get the wrong answer when you leave out something. If you need a source for the fact that a 90% CI *really* does mean that an interval X, Y has a probability P(X<m<Y) where m is the paremeter, then you only need to pick up any mathematical stats text. I see references offered earlier in this discussion that look perfectly valid. Can you offer any reference that a 90% CI is NOT defined in this way? —Preceding unsigned comment added by DCstats (talkcontribs) 13:12, 26 April 2008 (UTC)
I forgot to sign that. Also, if you have prior knowledge and wish to compute a 90% CI where there really is a 90% chance the interval contains the answer, then you have to use Bayesian methods. The t-stat and z-stat only apply when you don't have other knowledge of constraints. That's why they are called non-Bayesian. So if you set up a test where you took a large number of samples and 90% of the 90% CIs contain the true value, and you want to account for other known constraints, then the non-Bayesian methods will produce the wrong answer. The definition of 90% CI has not changed, mind you. But the procedures that fail to take into account known constraints will not meet the test of repeated sampling.DCstats (talk) 13:28, 26 April 2008 (UTC)
I think you might want to clarify that you are refering to prior knowledge in a large number of samples, you do have to know the true mean of the population in order to know that 90% of the 90% CI's meet contain the true population mean. That aside, DCstats is right. If we know that the mean is not between 2 and 4, then 2 to 4 cannot be the 90% CI according to how the term is defined. So I don't understand what Jdannan is saying. Are you saying that if you have knowledge that makes it 100% certain that the mean is not between 2 and 4, that 2 and 4 are still the 90% CI as long as you used a method that (incorrectly) ignored prior knowledge? That makes no sense. If a researcher knows that the answer cannot be within the interval she just computed, then she just computed it incorrectly. What's the confusion?ERosa (talk) 13:50, 26 April 2008 (UTC)
Oh dear. I asked for a reference, not an essay. Never mind, you have (both) demonstrated quite conclusively that you are hopelessly confused over this, so this will be my last comment on this topic. One specific error in what you are saying: a method that has 90% coverage is indeed a valid method for generating confidence intervals even though it may generate stupid intervals on occasion. Here are two references that support my point (many supposed definitions are rather vague and blur the distinction between frequentist and Bayesian probability): [1] [2]. One of them even specifically makes the point that a CI does not generally account for prior information. Let's go back to my trivial example about integers: if X is an unknown integer, and x is an observation of X that has N(0,1) error then it is trivial to construct a 25% (say) confidence interval centred on the observed x - ie (x-0.32,x+0.32) and these intervals WILL have 25% coverage (that is, if one draws more x from the same distribution and creates these intervals, 25% will contain X) and yet these intervals will not infrequently contain no integers at all. The fact that one can create OTHER intervals, or do a Bayesian analysis, has no bearing on whether these intervals are valid 25% confidence intervals for X, and I do not believe you will find a single credible reference to say otherwise.Jdannan (talk) 02:44, 28 April 2008 (UTC)
You are an excellent example of how a little knowledge (in this case, very little) can be a dangerous thing. First, it was already made clear that the citations provided previously in this discussion (Sveshnikov, Mathworld, etc.) are sufficient to make the point that if a and b are a 90% CI of x, then P(a<x<b)=.9, literally. That's why you will find that notation consistently throughout all (valid and widely accepted) sources on this topic. If an interval does not have a 90% chance of containing x, then it is simply not the 90% CI of x, period. Even the first of your own citations is consistent with this and it directly contradicts the definition of your second source. Did you even read these? Furthermore,the Bayesian vs. frequentist distinction has no bearing in practice and is only an epistemological issue. It can (and has) been shown that where a Bayesian probability of 80% has been computed, over a large number of trials, the frequency will be such that 80% of the events will occur. The Bayesian vs. frequentist distinction simply has no relevance to the practical application here and your example is just as moot as it was before. I believe you are one of a group (hopefully very small group) of laymen who continue to propogate a fundamental misconception about the nature of the Bayesian vs. frequentist debate and attempts to redefine what should be fairly straightforward concepts like confidence intervals with this confused concept. ERosa (talk) 13:29, 29 April 2008 (UTC)
You know, I think you wrote that first sentence without seeing even a scrap of irony! -- Avenue (talk) 15:09, 29 April 2008 (UTC)
There would only be irony if I were the one with the "very little knowledge". In this group, that is definitely not the case. I'm simply explaining that the literal meaning of the mathematical expression P(a<x<b) really is a probability that x will fall between a and b. This is contrary to one source provided by Jdannan and completely consistent with the other source provided by Jdannaan. (a,b) is a confidence interval of x with C confidence if and only if P(a<x<b)=C. The second source Jdannan provided actually says (ironically, in complete disagreement with his first source) that a 95% CI does NOT actually have a 95% chance of containing the estimated parameter. In other words, that source is saying that where (a,b) is 90% CI, P(a<x<b) does NOT equal .9. This is quite different from the Sveshnikov and Mathworld sources provided above as well as different from every graduate level text you will find on the topic. Period. There are sources already provided (one of which is Jdannon's own source) that P(a<x<b) is equal to the stated confidence of the interval. I think it is fair to ask for a graduate level text (not a website) that explicitly states that P(a<x<b)<>C. If you can produce a valid source, then will will show only that different valid sources take opposing positions (since the sources in favor of my position are certaintly valid). If you can't produce one, then we will have settled who should be seeing which irony. Fair enough? ERosa (talk) 18:50, 29 April 2008 (UTC)
Certainly it is true that P(a<x<b) <> P(a<x<b) if the P,a,x or b on the left hand side are not the same as or do not have the same meaning as the P,a,x or b on the right hand side. Just because the same symbols are used does not make them the same. You do actually have to think about what these mean and how they are interpreted. Which is why the article here, and any other on statistics, needs to be specific about such things. Melcombe (talk) 09:20, 30 April 2008 (UTC)
Time for a little bit of actual stats knowledge in this conversation and, for that matter, a little bit of basic logic. When the values of a, x, b are held constant, P(a<x<b) means the same in both situations. This is why identical notation is used throughout every textbook on the topic and it is used without any hint that somehow the meaning had changed from one chapter to the next. As ERosa pointed out, the second source provided by Jdannon is not consistent with most valid sources on this topic. This source stated that a confidence coefficient (that's the probability associated with a confidence interval) should not be interpreted as a probability. This is wrong. At least four citations have been given that support this definition that the confidence coefficient is actually a probability and all notation used is consistent with this without ever explaining that P() on one page means probability and on another page it doesn't. It appears that the citation given by Avenue below was not in any peer-reviewed journal or text that would be used in any stats course. On a side note, I showed a petroleum-industry statistician this debate just a few minutes ago and we both agreed that Jdannon and Avenue (assuming they are different persons and not socks) must be some terribly confused armchair statisticians. I'll repeat the previous challenges in here to produce a source for the opposing argument that is not just a unpublished document or web page written by a community college teacher.DCstats (talk) 18:22, 30 April 2008 (UTC)
You have a genius for missing the point. The symbol P is the one there appears to be confusion over, not a, x, and b. We don't know who your oil industry friend is, or what you told them, so that doesn't help the debate here. Accusing someone with a thousand times more edits than you (and that's not an exaggeration) of being a sock puppet is laughable. Yes, the link I gave is not from a textbook. I offered it to help clear up your confusion, not as something to put directly in the article. A better source is quoted to the left. This is entirely consistent with Jdannan's second link above. -- Avenue (talk) 00:23, 1 May 2008 (UTC)
To DCstats, Avenue also has a genius for missing the point. And I think you may be on to something about the sock puppets. Apparently Avenue believes that the number of edits somehow insulates him or her from that suspician. I think it is more of a measure of how much time someone has on thier hands (I'm not trying to be discouraging to all of the useful contributors of wikiland).
To Avenue, as to whether or not you are getting the point, there are at least four widely accepted (and published) references that directly your position and the position of the this source (if it is valid, it would be the first). They were offered by Hubbardaie much earlier in the discussion. But I will make a gesture in the spirit of community. I'll review the source you just provided and if I can confirm it is valid then I will concede that there is a debate. We obviously can't just say that the debate is settled in favor of the single source you gave because the other sources - a couple of which I have confirmed - obviously contradict it. So, those are the choices: Either Hubbardaie's sources are right and a confidence interval is literally interpreted as a probability or there is a debate and neither can be held as a consensus agreement. Given the stature of the previously offered sources in favor of the "CI is a probability" position", there can be no other reasonable path.ERosa (talk) 01:06, 1 May 2008 (UTC)

(reset indent) If you want to pursue the sock puppet accusation, go ahead. From Jdannan's home page it seems we live in different countries, so checkuser should be clearcut.

It is not widely accepted that Hubbardaie's sources contradict the position taken by mine. The only comment he got after posting them suggested he didn't fully understand the main one he quotes, by Sveshnikov. If interpreted correctly, there is no apparent contradiction. He may easily have also misunderstood the last two, which he hasn't quoted. The Mathworld link uses odd language ("A confidence interval is an interval in which a measurement or trial falls corresponding to a given probability", my emphasis), and doesn't discuss interpretation at all. Not one to rely on, I think.

No, don't believe I misunderstood the sources at all and, in fact, I made comments at the end of the "Example of a CI calculation going terribly wrong" that are not inconsistent with your's. The problem is that the literal notation, the same notation used in this article, refers to a probability that a value is between two points (P(LB<x<UB)). As you have previously indicated, there are conditions where the 90% confidence interval may not acatually have a 90% probability of containing the answer. But those cases are limited to situations where we know something about the underlying distribution which is not addressed in classical methods. Also, the Mathworld source goes on to describe an example that, again,consistently uses the notation that x has a probability of being in the interval. It addresses the issues as if the definition is sufficient and no other "interpretation" is required. This is true for the other texts I cite.
I think this is just a simple confusion. Avenue and Jdannon argue that, although the notation for a confidence interval may imply a probability, that the application of classical methods (which do not allow us to use this additioal knowledge of the underlying distribution) will produce an interval which does not represent the stated probability. This is true. ERosa and DCstats argue that, in this case, the computed interval would not be the "true" 90% confidence interval because there is not a 90% chance the interval contains the value, as all the notation implies.
NO, that is NOT what Avenue and Jdannan argue. You are misinterpreting them. There is no "additional knowledge" to use. As I state below, the important point is that the population mean is fixed. Whether we "know" its value is totally irrelevant. darin 69.45.178.143 (talk) 17:23, 2 May 2008 (UTC)
I'm curious about another distinction that is being made here. Avenue and Jdannon seem to agree that, over a large number of separate samplings, 90% of the confidence intervals will contain the population mean. But isn't that what probability means according to the frequentist? X has probabiltiy P if, over a large number of trials, P=number of times X occurs/number of trials.
YES, that is exactly what the frequentist interpretation of confidence interval means. So WHY ARE YOU SO CONFUSED?? darin 69.45.178.143 (talk) 17:23, 2 May 2008 (UTC)
In short, it seems that Wasserman would say that if a classical method is used to compute a 90% CI, and that we know other information that means the there is not a 90% chance it contains the value, then the 90% CI is still "correct" - it just doensn't mean anything about a 90% probability. DCstats and Erosa argue that the literal interpretation of the mathematical notation is correct and that if a 90% CI doesnt have a 90% chance of containing the answer, then its not the real 90% CI. Is this a correct restatement of everyone's positions?Hubbardaie (talk) 16:42, 2 May 2008 (UTC)
Hubbardaie, you are obviously hopefully confused. Below ("Dear DCStats and ERosa") I expound the correct frequentist interpretation of classical confidence intervals. PLEASE READ THIS CAREFULLY. The fundamental mistake you are making is that you are forgetting that a confidence interval is a random interval and that the population mean is a fixed number. And that probabilistic statements about the confidence level are probabilistic statements about that random interval where the population mean is a fixed constant, not probabilistic statements about a particular realization of that random interval where the population mean is a random variable. Again, it's the sampling that's random, not the population mean. Please read the 3 quotes I give below. darin 69.45.178.143 (talk) 17:23, 2 May 2008 (UTC)

The source I posted is on Google Books,[3] if that helps. (The fact that you consider reviewing the source I posted such a magnanimous gesture seems a bit bizarre.) If you do look at it, you'll find Wasserman defines confidence intervals in a way consistent with Sveshnikov. You seem to be seeing contradictions where none exist. -- Avenue (talk) 04:11, 1 May 2008 (UTC)

Another area where there seems to be confusion (as you noted before) is the distinction between statements derived from probabilities (confidence intervals) and statements of probabilities. See page 4 here for an explicit explanation of this point. -- Avenue (talk) 15:02, 28 April 2008 (UTC)
Thanks for that link - it looks like a nice clear description, and I like the challenge to bet on the results!Jdannan (talk) 08:17, 29 April 2008 (UTC)
Your standards for a "source" are not very high. Well, perhaps that's why you merely refer to it as a "link" and ot a source. That's probably a good idea. This is not published in any peer reviewed journal nor widely accepted text. A quick review of ITS sources reveals that many of them are also just unpublished web sources. Please provide a proper source. This reads like bad freshman philosophy. ERosa (talk) 19:05, 29 April 2008 (UTC)

Regarding more general comments about the contents of this article:

  • those who want to see a description of Bayesian versions of interval estimation would do well to help to specify how this works formally in the article Credible interval, where there is a sad lack of specificity.
  • those who want more discussion of comparisons of procedures, and of interpretations and usefulness and of alternatives would do well to contribute to the article Interval estimation.
  • those who are interested in more general philosophical questions associated with other ways of thinking about interval estimation may also want to see Fiducial inference.

Melcombe (talk) 08:56, 25 April 2008 (UTC)

As you suggested, I checked out the articles on credible interval ad fiducial inference. Both are also seriously lacking in verifiable citations for specific claims in text, but these articles seem to be more consistent with accepted thinking on the topic. Both correcty describe these distinctions as not widly accepted and, in the case of fiducial inference, even controversial. The credible interval article even correctly gives a definition of confidence interval that we should simply use for this article. The fiducial inference inference article uses the same mathematical expressions describing that a confidenc interval is really the probability that the inteval contains the value with the stated probability.
Since we can't use original research, we have to resort to claims based only on published citations. I suggest that we refrain from adding lengthy philosophical interpretations based on not a single in text source, especially when the cited sources discussed elsewhere in this discussion directly contradict the position.DCstats (talk) 19:19, 25 April 2008 (UTC)

I don't know if this helps the discussion or not, but... If I do one test at 90% the chance that the interval will contain the real result is 90%. If I do 20 tests at 90%, and one returns a positive result, the chance that that interval contains the real result is about 12%. When multiple tests are done they are not independent of each other. Consequently, the real 90% confidence interval doesn't just depend on the result of one test, it depends on the number of tests done and the number of positive results found. This confusion of meanings over what the 90% means is I think at the heart of most of the world's dodgy statistics. It would be nice if this article could make these sorts of things clear. Actually, it would be nice if this and other mathematical articles in WP were actually meaningful to laymen. —Preceding unsigned comment added by 125.255.16.233 (talk) 22:26, 27 April 2008 (UTC)

I can answer your first question...it might have helped if you didn't make some mathematically incorrect (or not sufficiently explained) statements. Your second half of the first sentence is correct (If I do one test at 90%...). The second sentence will require more explanation because, even though I'm a mathematician and statistician, you will have to explain your assumptions before I can see how they are correct (that's a nice way for me to say they don't make sense). If you mean that when you conduct 20 trials of something that has a 90% probability, then the probability of getting exactly one success is on the order of 10 to the power of -18 (one millionth of one trillionth). This is a simple binomial distribution with a 90% success rate and 20 trials. So I will give you the benefit of the doubt and assume that you actually meant something other than that and that further explanation would clear it up. But one of your conclusions, oddly, is correct: that the meaning of a 90% CI must be that over a very large number of trials, the probability times the trials equals the frequency.DCstats (talk) 00:16, 28 April 2008 (UTC)

A confidence level of 90% means there's a 90% chance that you won't get a false +ve. In 20 trials, there is therefore a (.9)^20 chance that you won't get any false +ves. That's about 12%. So any of the remaining tests will have a real confidence level of 12%, 'real' meaning taking into account all the tests. —Preceding unsigned comment added by 125.255.16.233 (talkcontribs)

First, this article is about confidence intervals, not significance tests. The size of the interval gives more information than a test result. An analogue of the problem of multiple comparisons still applies, but thinking about the false discovery rate is probably more useful here than the familywise error rate. To some extent it's implicit in the treatment here already, but explicitly mentioning the issue might be useful. I don't think we need to get into numerical details though. -- Avenue (talk) 12:14, 28 April 2008 (UTC)
Avenue is correct that this is about confidence intervals and not significance test. Even if it were about significance tests, the unsigned contributor is still getting his math wrong - or, more precisely, using the wrong math to answer the question. A significance test is simply not defined as the probability of getting all correct results (i.e. the true value being within the interval) after 20 trials. The probability of a 90% CI bounding the true value is 90% for each individual test, no matter how many are conducted and that is the only probability relevant to the CI. Furthermore, the unsigned contributor mentioned nothing of this calculation being some sort of significance test in the original statement.ERosa (talk) 14:46, 28 April 2008 (UTC)
I saw language like "the number of tests" and "false +ve" and assumed that they were thinking of each confidence interval as having an associated significance test. If I've misinterpreted them, I'm sorry. -- Avenue (talk) 15:19, 28 April 2008 (UTC)
No, you are right, that is how I was thinking. It's because I see confidence intervals most commonly being used to offer a measure of the accuracy of statistical tests. I'm talking in the context of a scientist saying: "Here is the result I found. It's 1.63 (CI:1.43-1.89,95%). That means that there is a 95% chance that the claim I'm making is correct." And then the newspapers saying: "Dancing the polka increases your chance of contracting leprosy by 63%!" I think this article needs to convey to laymen that that sort of thing is rubbish.

If you look up confidence level you will see that it redirects to this article, meaning that the editors thought that information on both topics falls within the scope of this article. Further, the size of the interval is analogous to the smallness of the p value; a false positive with a really narrow interval is still a false positive. The width of the confidence interval is of no use in calculating the likelihood of a false positive, and so has no real meaning when judging whether a statistic is believable or not. —Preceding unsigned comment added by 125.255.16.233 (talk) 13:53, 28 April 2008 (UTC)

It seems natural to me that confidence level redirects here, and significance level takes you to Statistical significance, even though they might be equivalent in some mathematical sense. It certainly does not mean that everything covered there has to be covered here, and vice versa. I see you are now bringing in p values, another topic that needs only cursory mention here at most. All these things are related, but the focus here is on confidence intervals. -- Avenue (talk) 14:15, 28 April 2008 (UTC)
Maybe we should change things so that Wildcat strike points to the article on wildcats because, even though they mean different things, they do both have the word wildcat in them. If two terms mean essentially the same thing then they should refer to the same article. It makes no sense to redirect a term to an article other than the one that explains it.
The two terms do have distinct meanings, despite being mathematically equivalent. The current redirects make perfect sense to me. -- Avenue (talk) 08:27, 3 May 2008 (UTC)

Dear DCstats and ERosa

Starting a new section here because the preceding is hopelessly indented.

DCstats and ERosa, you are mistaken in your thinking. Jdannan and Avenue are quite correct. I know this might be very difficult for you to accept. You claim to have references supporting your interpretation. However, I think you are misinterpreting the statements from the textbooks you are reading. I can't speak for your "petroleum industry statistican" friend. My guess is that you conveyed your misinterpretation of the situation to him/her. I have a ph.d. in mathematics and after reading this talk page and thinking I was almost losing my mind, I consulted with several other ph.d. mathematician and statistician friends of mine who confirmed that Jdannan and Avenue are indeed correct. And the interpretation given in the article is basically correct as well. In fact, it's the interpretation I've seen given in every stats book I've looked at.

If you have a ph.d. in mathematics, then you should be able to retrieve to the aforementioned references and can determine for yourself if they were misinterpreted instead of just speculating that they were misinterpreted. I checked out the Sveshnikov reference and the Mathworld reference is available online. I haven't seen the other references. My background is behavioral research so my understanding of stats may be limited to those specific research applications. But I can tell you that the reference Hubbardaie provided seem to be consistent with how most people in my field use the terms. If we are mistaken that 90% CI of x is a to b means P(a<x<b)=.9,
You have to say whether a and b are fixed or random. therwise, it's impossible for me to say if you're "mistaken" or not. You also have to say whether the "CI" is a confidence interval or a credible interval. Part of the problem here is that a lot of Bayesians use the term "confidence interval" when they are really speaking of a credible interval". darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)
then it may be a widely held misconception in my field.
I don't know... I don't really know enough about Monte Carlo methods or decision science to say. It would help if people would start stating mathematical statements precisely, instead of cutting and pasting bits or expressions and notation from here and there. You can't just write down "P(a<x<b)" and say it means something. You have to tell me what P, x, a, and b, mean. How is P defined? Where do the a and b come from? Where are the quantifiers (universal and existential, "for all", "there exists")?? It's really impossible to answer any of these questions unless people are precise in what they say.
It's important to note the differences between the "long term frequency thought experiments" of frequentists and Bayesians. However, it's even more important to note the similarities between them: Both "long term frequency thought experiments", for frequentist confidence or Bayesian credible intervals, are based on repeated sampling when the population mean is fixed but unknown: "It is often overlooked, but credible intervals do have a repeated sampling interpretation. As introduced with the notion of Bayes risk, consider repeated sampling whereby each sample is obtained by first generating θ from the prior distribution and then generating data x from the model distribution for X|θ. It is trivial to establish that with respect to such sampling there is probability (1 − α) that a 100(1 − α)% credible interval for θ based on the X will actually contain θ. In this sense, a credible interval is a weaker notion than confidence interval which has the repeated sampling interpretation regardless of how the true θ values are generated." — Paul Gustafson, "Measurement Error and Misclassification in Statistics and Epidemiology"
What some don't seem to understand is that the long term frequency interpretation of credible intervals is philosophically frequentist, the "commensurability" of the two "answers" takes place entirely within the frequentist interpretation. There is no "commensurability" between confidence intervals under the frequentist viewpoint and credible intervals under the Bayesian viewpoint. Yes, they give the same "answers", but only under frequentist interpretation. That's why you can't say that even if a 90% confidence interval coincides numerically with a 90% credible interval that the "probability" of the random confidence interval in the frequentist sense is even remotely the same as the "probability" of the credible interval in the Bayesian sense; the former is in terms of long-term frequency, the latter in terms of degree of belief. It's like comparing apples and oranges. darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)
If you search the terms "subjective confidence interval" or "judgemental confidence interval" or "calibrated probability", it would appear that quite a lot of published researchers in decision psychology are also making this error. Often, these intervals are used in Monte Carlo simulations for management decisions where, of course, it is the basis for generating a random number. In that case, they are interpreting a 90% confidence interval as literally meaning a 90% probability of containing a value (in this case, the generated value is the random number, even though the value may represent a (simulated) mean. This may be why the Mathworld definition seems consistent with this interpretation because Mathmatica (Wolfram's software) also runs Monte Carlos.
Perhaps instead of the high-emotion and, as you said, losing your mind, we could have a rational and civil discussion about why these two worlds are so far apart. First, why does this article and the references provided even use the notation of the form P(a<x<b) to describe a confidence interval if they are not actually talking about a probability. I read your argument below that this is merely a shorthand but, if so, that would be a very confusing way to shorten it. Perhaps that is one source of the confusion.
As I said above, the confusion stems from the fact that many non-mathematicians don't realize that you have to be extremely precise when making definitions and statements, including definition of all notations, all quantifiers, everything. Otherwise, you're just talking jibberish. darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)
Another sourse may be, as you seem to describe at one point, that there may be no practical difference between the interpretations. Both may produce the same answers and both should pass the long-term frequency criterion. If researchers in the behavioral sciences think that the rejection of a hypothesis with 95% confidence means that there is a 95% probability that the hypothesis is false, and they see that only about 5% of hypothesis with 95% confidence are wrong, what would be the practical consequence of this misinterpretation?ERosa (talk) 14:18, 3 May 2008 (UTC)
That is another issue. Good lord, let's not go down that road. I'm surprised you brought it up, since Bayesians tend to deplore traditional p-value testing. darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)

Some references, since these seem to be very important to people:


  • "Warning! There is much confusion about how to interpret a confidence interval. A confidence interval is not a probability statement about θ since θ is a fixed quantity, not a random variable." — Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference, p. 92
  • "CAUTION! A 95% confidence interval does not mean that there is a 95% probability that the interval contains μ. Remember, probability describes the likelihood of undetermined events. Therefore, it does not make sense to talk about the probability that the interval contains μ, since the population mean is a fixed value. Think of it this way: Suppose I flip a coin and obtain a head. If I ask you to determine the probability that the flip resulted in a head, it would not be 0.5, because the outcome has already been determined. Instead the probability is 0 or 1. Confidence intervals work the same way. Because μ is already determined, we do not say that there is a 95% probability that the interval contains μ." — Michael Sullivan III, Statistics: Informed Decisions Using Data, p. 453
  • "The idea of interval estimation is complicated; an example is in order. Suppose that, for each λ, x is a real random variable normally distributed about λ with unit variance; then, as is very easy to see with the aid of a table of the normal distribution, if M(x) is taken to be the interval [x − 1.9600, x + 1.9600], then (1) P(λ in M(x) | λ) = α, where α is constant and almost equal to 0.95. It is usually thought necessary to warn the novice that such an equation as (1) does not concern the probability that a random variable λ lies in a fixed set M(x). Of course, λ is given and therefore not random in the context at hand; and given λ, α is the probability that M(x), which is a contraction of x, has as its value an interval that contains λ." — Leonard J. Savage, The Foundations of Statistics, p. 260


Your misunderstanding seems to be based on an inadequate grasp of a few facts:


  1. The population mean μ is a fixed constant, not a random variable.
  2. The endpoints of the confidence interval are random variables, not fixed constants.
  3. A statement about the level of confidence of a confidence interval is a probabilistic statement about the endpoints of the interval considered as random variables.


The issue about whether μ is "known" or not is a complete red herring. What matters is that μ is fixed, not whether it's "known". This also has absolutely nothing to do with probabilistic interpretations of quantum physics, good lord.

Let me go through an example which has the most simplifying assumptions. Suppose we have a normally distributed random variable X on a population with mean μ = 100 and standard deviation σ = 16, and suppose we select samples of size n = 100. Then the sample mean X-bar is a normally distributed sampling distribution defined on the space of all possible samples of size 100, and it has mean μ = 100 and standard deviation σ/sqrt(n) = 16/sqrt(100) = 16/10 = 1.6. A 95% confidence interval for μ based on this sampling is then given by (X-bar − zα/2*σ/sqrt(n), X-bar + zα/2*σ/sqrt(n)) = (X-bar − (1.96)*1.6, X-bar + (1.96)*1.6) = (X-bar − 3.136, X-bar + 3.136).

Now, note the following facts:


  1. The confidence interval is defined to be a random interval, i.e. the endpoints of the confidence interval are random variables defined in terms of the sample mean X-bar.
  2. We "know" what the population mean μ is, and yet we were still able to define the random interval that is the "95% confidence interval". Our "knowledge" of the value of μ has nothing to do with whether we can define this random interval.
  3. The probability that μ is contained in the random confidence interval is 95%. In other words, P(X-bar − 3.136 < μ < X-bar + 3.136) = 95%. Note, this is a probabilistic statement about the random sampling distribution X-bar.


Once we construct a particular confidence interval, the probability that μ is contained in that particular realization is either 0 or 1, not 95%. This is because our endpoints are no longer random variables, and our interval is no longer a random interval. For example, suppose we take a sample and get x-bar = 101. In that case, our particular realization of the confidence interval is (97.864, 104.136). The probability that μ = 100 is contained in this interval is one, not 95%. This is because 97.864, 100, and 104.136 are all just fixed constants, not random variables. If we take another sample and get x-bar = 91, our particular realization of the confidence interval is (87.864, 94.136), in which case the probability that μ = 100 is contained in this interval is zero, not 95%.

"Your claim that you only have a confidence interval 'prior to making the observations and calculating the interval' but that is not longer true 'once you have the numbers' belies some deep misunderstanding of the subject. The confidence interval based on samples can ONLY be computed after we have the observations and the calculation is made. What exactly do you think the confidence interval procedure for the z-stat and t-stat are about? The 90% confidence interval actually has a 90% probability of containing the estimated value. This is experimentally testable."

Well, sure, just like it's experimentally testable by flipping a fair coin 1,000,000 times that it has 50% probability of landing heads. But once you've flipped the coin heads, the probability that particular coin toss is heads is 100%, not 50%. You seem to be very caught up on the fact that μ is "unknown". This is totally irrelevant. As I said above, what matters is that μ is fixed, not whether it's "known".

Here's a little thought experiment: Suppose I go into another room where you can't see me. I flip a coin in that room, and it comes up either heads or tails, but you can't see which happened. Now, you decide to flip a fair coin. It is true, the experiment of you flipping a fair coin can be modelled by a Bernoulli trial random variable, and the probability that the flip will agree with my flip is 50%. Now, suppose you actually flip the coin, and say it comes up heads. What is the probability that your heads flip agrees with my flip? It's not 50%, it's either 0% or 100%. The fact I don't have knowledge of whether your flip is heads or tails, and hence the fact I don't know which of the two cases is correct, whether the probability is 0% or 100%, is totally irrelevant. The two coins are specific physical objects. They either agree or they don't agree. There is no "50% probability" they agree.

Let's go one step further. Suppose the coin I flip that you can't see is a fair coin. Once I've flipped my fair coin, the probability you will flip a fair coin that agrees with mine is 50%. However, this has nothing to do with whether my coin is fair or biased. To see this, suppose my coin is biased and is weighted to flip heads 99% of the time and tails 1% of the time. Now, I flip the coin, and you can't see the result. The probability you will flip a fair coin that agrees with mine is still 50%. This is because my flip is a fixed constant, not a random variable. It's your flip that's random, not mine! Now, do you see why "knowledge" of μ is irrelevant?

The fact that so many people who claim to be statisticians or scientists are confused on these points is a bit surprising to me. But then again, a world in which mathematical idiots like MarkCC spout complete nonsense all the time and everyone thinks he's a math genius, and a world in which every scientist "knows" HIV causes AIDS, then again, maybe it shouldn't surprise me. darin 69.45.178.143 (talk) 16:40, 1 May 2008 (UTC)

I'm not yet certain I want to claim the position of defending DCstats and ERosa on all points, but I think its fair to correct some things you've said. I don't hold that your description of statistics is in error, but I don't think you are addressing the actual claims that everyone has made.
First, I don't think anyone is disputing the fact that the popuation mean is fixed.
But that's what people are implying. When you say, "the probability that the population mean lies in the confidence interval (.8, 1.2) is 90%", you're implying that you're interpreting the population mean as a random variable rather than a fixed constant. If you really understood that the population mean is a fixed constant, then you would have to conclude that once you have a particular realization of the confidence interval, the probability the population mean lies in that particular interval is either 0% or 100%. You're saying one thing and thinking another way. You can't have it both ways. darin 69.45.178.143 (talk) 21:12, 2 May 2008 (UTC)
No, I see no such implication and you can't tell what someone else is thinking vs. what they wrote.
That's why you have to be precise in what you write. So people don't misinterpret you! darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)
I think the explanation was quite clear and is consistent with what you wrote below. Since confidence intervals are based on a random selection of values, the intervals will be random and there is no fundametal problem with saying that a random variable (like an upper bound of an interval) has a probability of being more or less than a fixed value.ERosa (talk) 13:36, 3 May 2008 (UTC)
I almost missed it!! You said, "Since confidence intervals are based on a random selection of values, the intervals will be random", this means you are considering a confidence interval with fixed endpoints, a particular construction. Which is just what I was saying. You can't say the "intervals" (collectively) "will be random" [sic] and then choose one of them and say it's random. Randomness is a property of all the intervals collectively, not a particular one.
Maybe it wasn't you, maybe it was DCstats, who said: "Your claim that you only have a confidence interval 'prior to making the observations and calculating the interval' but that is not longer true 'once you have the numbers' belies some deep misunderstanding of the subject. The confidence interval based on samples can ONLY be computed after we have the observations and the calculation is made. What exactly do you think the confidence interval procedure for the z-stat and t-stat are about? The 90% confidence interval actually has a 90% probability of containing the estimated value. This is experimentally testable."
Tell me, how else would you interpret that statement?? darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)
But since the bounds of an interval are computed from, by definition, randomly selected samples, then the bounds are random and the common notaton P(LB<X<UB) can still be applied without any violation of this concept. If I said P(A<B)=.8 it could be that A is random and B is fixed or the other way around or both are random.
NO!! The difference between the two is the difference between frequentist vs. Bayesian interpretation!!
But, like you said at some other point, you are no expert on Bayesian. So you which part are you diagreeing with? Are disagreeing that hubbradaie's claim P(A<B)=.8 could be valid if A were random and B were fixed, if B were random and A were fixed, or if both were random?ERosa (talk) 13:36, 3 May 2008 (UTC)
I'm saying, if CI is (a, b), and we considering P(a < x < b), where x is parameter, then a and b random, x fixed is the frequentist interpretation, a and b fixed, x random is the Bayesian interpretation. It's not that one is "valid" and the other not. Both viewpoints are "valid". The problem is confusing the two. darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)
Now we've hit the real issue where people are fundamentally confused. What does the notation P(X > 3) mean? You have to remember that X is really a random variable X(s) defined on a sample space S, and that the notation P(X > 3) has suppressed the dependence of X on s. In other words, P(X > 3) is really a shorthand way of writing P(X(s) > 3) = P({s in S : X(s) > 3}), i.e. it's a shorthand for the probability of the event which consists of all outcomes in the sample space S for which the random variable X takes on a value greater than 3. Once you choose a particular s0 in S, you have "performed the experiment", so to speak. You cannot then substitute the fixed value X(s0) (the "outcome of the experiment") into the notation P(X > 3) and say you have the same probability. There is a difference between
P(X(s) > 3) = P({s in S : X(s) > 3})
and
P(X(s0) > 3) = P({s in S : X(s0) > 3})
In the former, s is a bound variable, or so-called "dummy variable", whereas in the latter, s0 is a free variable. What everyone is doing here is changing the suppressed bound variable into a suppressed free variable and then claiming it's the same thing, provided we suddenly start interpreting the fixed population mean as a random variable. The reason people are making this mistake is because the notation is suppressing the argument of the random variable X. However, once you make the argument of X explicit, you can see that they're two very different quantities.
The same thing would happen if we were dealing with prediction intervals instead of confidence intervals. When we say we have a 90% PI, we are saying the probability the PI contains the future outcome according to the joint distribution over the random endpoints and future outcome is 90%. But when we fix a particular PI by actually constructing it, now we have a marginal distribution, where only the future outcome is random. And the probability that the specific, constructed PI contains the future outcome according to this marginal distribution will almost certainly not be equal to 90%. It's the same thing here with confidence intervals. darin 69.45.178.143 (talk) 21:12, 2 May 2008 (UTC)
Second, it is also true that once we know the true mean and the computed LB and UB, then, of course, the probability is either 0 or 1.
Your mistake here is the phrase "once we know the true mean". Again, as I said above several times, "knowledge" of the value of the population mean μ is totally irrelevant, the only thing that matters is that μ is fixed, not whether it's "known" or not. The whole point of the definition of confidence interval (have you even read the main article??) is that the stated probability holds for all values of the parameters. darin 69.45.178.143 (talk) 21:12, 2 May 2008 (UTC)
The problem is that we don't know which in a given instance.
Why is that a problem? It's only a problem because you think it's a problem. What people are saying is that "since we don't know whether the probability is 0% or 100%, then the only reasonable interpretation is to start interpreting the population mean as a random variable and say the probability this random population mean lies in the fixed interval is the level of confidence", which is a totally false statement to make.


According to your reasoning, it's impossible to make the following statement: "Let x0 be a fixed real number. Then exactly one of the following is true: (a) x0 is greater than zero; or (b) x0 is less than or equal to zero." For any given fixed x0, we don't know which is the case. But does that make the statement in quotation marks any less true? darin 69.45.178.143 (talk) 21:12, 2 May 2008 (UTC)
As people who have argued both sides of this debate have pointed out, if you compute a large number of 90% CI's then those intervals will contain the population mean 90% of the time. This is the only practical consequence of "probability" anyway and it is consistent with the frequentist point of view. In the case of your coin flip example, a person who guesses heads will be right about 50% of the time after a large number of trials and in a way, thats all the probability reall means.
So it seems that most of your arguments are not actually arguments against any positions I've seen here. I'll let ERosa and DCstats speak for themselves, but they don't appear to have said anything inconsistent with this (if you find something specific I'll concede that point).
There are numerous cases where ERosa and DCstats have said things inconsistent with what I have said:
DCstats said: "Your claim that you only have a confidence interval "prior to making the observations and calculating theinterval" but that is not longer true "once you have the numbers" belies some deep misunderstanding of the subject. The confidence interval based on samples can ONLY be computed after we have the observations and the calculation is made. What exactly do you think the confidence interval procedure for the z-stat and t-stat are about? The 90% confidence interval actually has a 90% probability of containing the estimated value. This is experimentally testable."
This is a specific case where DCstats demonstrates that he/she is confusing the confidence interval with one of its particular realizations, and then further confusing the probabilities between the two, by making the fallacy of switching from bound to free variable in the suppressed argument. And then on top of that, after switching to free variable, giving it an experimental interpretation (which only makes sense if the population mean is treated as a random variable, which is wrong) as if it were still a bound variable.
DCstats said: "When a 90% CI is computed, it is usually the case that all you know about it is the samples taken, not the true value. You are correct when you say that in an arbitarilly large series of 90% CI's computed from random samples, 90% will contain the estimated value. You are incorrect when you claim that saying a 90% CI has a 90% chance of containing the answer is like saying that a coin toss has a 50% chance of being tails once you have already seen it come up heads. The two are different because in the case of a 90% CI gathered from a random sample, you do not have certain knowledge of the parameter (eg. the mean). That's why its called a 90% CI and not a 100% certain point."
This is a specific case where DCstats is making several of the errors I mention above: (1) the fallacy of switching from bound to free variable in the suppressed argument; (2) the fallacy of thinking that "knowledge" of the population mean μ has anything to do with constructing or interpreting confidence interval, i.e. not understanding that in the definition of confidence interval, the probability of the confidence interval containing μ is explicitly required not to depend on μ. Furthermore, it doesn't seem that DCstats understands that in Jdannan's (or Avenue's?) analogy, the coin toss that we see "already come up heads" is meant to correspond with the sampling, not the mean."
DCstats said: "I would really like to see what source you are citing for this line of reasoning. It is truly contrary to all treatment of the concept in standard sources and is contrary even to the meathematical expressions used throughout this article where a confidence interval is described as a probility that a value is between two bounds"
This is a specific case where DCstats makes the error I pointed out above of suddenly, against all conceptual foundation, interpreting the fixed population mean as a random variable and the random confidence interval as a fixed interval. Maybe this is not the interpretation DCstats had in mind, maybe he did mean the confidence interval to still be random and the population mean fixed. In any case, one of the 2 cases holds: (1) DCstats is interpreting the confidence interval as fixed and the population mean as random, which is simply incorrect; or (2) DCstats is correctly interpreting the confidence interval as random and the population mean as fixed, but then making the same fallacy above about thinking you can replace a bound with a free variable and that it still means the same thing.
DCstats said: "Because "on earth" you haven't met the conditions required for the procedure to produce the right answer. If you already know the exact mean of a population, THEN take a random sample, the interval will not be a 90% CI, by definition."
This directly contradicts the construction of confidence interval I actually performed above! It's a construction you can find in any undergraduate stats textbook. In that case, you can "know" the exact mean of the population, and still construct a random confidence interval. Again, DCstats seems to be very confused that a confidence interval is random, and if that's not the case, very confused about bound and free variables. Very confused about one or the other, I can't tell from his/her posts which it is.
DCstats said: "So if you set up a test where you took a large number of samples and 90% of the 90% CIs contain the true value, and you want to account for other known constraints, then the non-Bayesian methods will produce the wrong answer. The definition of 90% CI has not changed, mind you. But the procedures that fail to take into account known constraints will not meet the test of repeated sampling."
This is very confused thinking. DCstats just said it does meet the test of repeated sampling, because 90% of the outcomes of the random confidence interval contain the population mean. So how is it the "wrong answer"??
ERosa said: "I think you might want to clarify that you are refering to prior knowledge in a large number of samples, you do have to know the true mean of the population in order to know that 90% of the 90% CI's meet contain the true population mean. That aside, DCstats is right. If we know that the mean is not between 2 and 4, then 2 to 4 cannot be the 90% CI according to how the term is defined. So I don't understand what Jdannan is saying. Are you saying that if you have knowledge that makes it 100% certain that the mean is not between 2 and 4, that 2 and 4 are still the 90% CI as long as you used a method that (incorrectly) ignored prior knowledge? That makes no sense. If a researcher knows that the answer cannot be within the interval she just computed, then she just computed it incorrectly. What's the confusion?"
There's a lot of confused thinking here to sort through, and which is directly opposed to many things I have said. Where to begin... "You do have to know the true mean of the population in order to know that 90% of the 90% CI's meet contain [sic] the true population mean." No, this is demonstrably false. In order to know that 90% of the 90% CI's will contain the population mean μ, we do not need to know the specific value of μ; the random 90% confidence interval has probability 90% of containing μ irrespective of the actual value of μ. This is stated right in the article, under a section which does not have a dispute tag!! I'm beginning to wonder if anyone has even read the article itself. "If we know that the mean is not between 2 and 4, then 2 to 4 cannot be the 90% CI according to how the term is defined." This is very strange thing to say. Here, ERosa is forgetting that a confidence interval is defined to be a random interval. If we know the population mean μ = 5, then 10% of the constructed 90% CI's in a large number of such constructed CI's will be like (2, 4) and not contain μ = 5. Does that mean (2, 4) is not a 90% CI? The question has no meaning. A confidence interval is a random interval, so it doesn't make sense to say a fixed interval is a CI. A fixed interval can be a particular realization of a CI, but strictly speaking, a fixed interval is not a CI.
There are other quotes I could take where DCstats and ERosa make statements inconsistent with what I've said, but they all basically boil down to some form of one of the fallacies above. darin 69.45.178.143 (talk) 21:12, 2 May 2008 (UTC)
The bulk of your point seems to be about attempting to convince others of these basic concepts, but since they might not disagree with these points, you might just be preaching to the choir. I'm only disagreeing about whether you've identified the source of the disagreement (see the post I made above in the "I suggest a major rewrite" section)Hubbardaie (talk) 17:43, 2 May 2008 (UTC)
I don't claim to be an expert on Bayesian methods. But it seems to me a lot of people who use credible (or credibility) intervals in practice are coming here and thinking they can philosophically interpret frequentist confidence intervals as if they were credible intervals, and vice versa. The fact that a frequentist and a Bayesian might always get the same confidence/credible intervals resp. using the same sample data, if the Bayesian assumes no prior information, and hence that a frequentist can "get away with" interpreting confidence intervals subjectively, and a Bayesian can "get away with" interpreting credible intervals in terms of long-term frequency, does not mean the interpretations are philosophically the same at all. When you start to think like ProfHawthorne way above, that the distinction between frequentist and Bayesian interpretation is just something that amateur statisticians and first-year grad students are hung up on, you get rambling confusions discussions like this one. Are you still lurking, Michael Hardy? darin 69.45.178.143 (talk) 22:54, 2 May 2008 (UTC)
Your condescending remark about first-year grad students implies you are beyond that in your study of statistics. Yet you claim to be no expert on Bayesian methods. Where exactly do you finish graduate study in this area and not know more about Bayesian methods?ERosa (talk) 13:36, 3 May 2008 (UTC)
It wasn't a condescending remark. In fact, it wasn't even my remark! I was paraphrasing someone else (a "ProfHawthorne" above) and it turns out I was referring to a remark I vaguely remembered from my memory that was actually by Hubbardaie: "In effect, the frequentist would have to conclude that the Bayesian analysis was consistent with its own historical frequency distribution. This is why this debate is irrelevant to most real-world statisticians even while it seems to pre-occupy amateurs and first-year stats students." (see the Archive talk page and do a find for it). So, I guess you're accusing Hubbardaie of being condescending? And BTW, I think it was you who first made the "armchair statistician" remark (or maybe DCstats), which is really ironic. As for my education, I got a ph.d. from UC Santa Barbara in 2004, my research area was algebraic number theory. I took upper-division probability theory as an undergrad, measure theory as a graduate student, and have taught lower-division stats courses. Yes, I'm not expert in Bayesian methods, but I've read enough to understand the basic conceptual and philosophical basis for each approach, and to see that a lot of people here are very confused.
Hubbardaie's remark, that this is "only" something that occupies "amateurs and first-year stats students" just shows how little understood this issue really is. The "consistency" between Bayesian inference and frequentist inference e.g. when it comes to confidence and credible intervals is only that they are occasionally numerically identical, i.e. in some relatively restricted cases, the confidence and credible intervals coincide numerically. But the philosophical interpretation does not carry over. And many of the comments here demonstrate a lack of understanding of that fact. Maybe it bothers me because I was educated as a mathematician, and not as a statistician or a natural scientist, where I've been taught to worry incessantly over "how many angels dance on the head of a pin". I don't know. darin 69.45.178.143 (talk) 05:18, 4 May 2008 (UTC)
Well, we can agree you are no expert on Bayesian methods, but you hit on the key issue. When you say that frequentist and Bayesian methods will get the same answers given the same conditions ("robust Bayesian" with no prior knowledge) and both should still meet the long term frequency test, you point out that there is no relevance in your point. The "philisophical interpretation" means nothing if it has no observable consequences. I don't think it's so much Bayesian vs. frequentist as pragmatic observation-based scientific or not. What is the confidence interval for how many angels can dance on the head of a pin?ERosa (talk) 13:36, 3 May 2008 (UTC)
ERosa, it's fine to not worry about angels dancing on the head of a pin, but first you have to show you understand the nature of the angels and the pin in the first place. And your posts here show you don't. You've shown you understand neither the frequentist nor the Bayesian "long term frequency thought experiments".

Since my name is still being taken in vain occasionally :-) I just want to briefly add that I agree with everything darin has said. Also, Avenue and Melcombe seem to know what they are talking about - in fact I'm sure they know much more about it than me, I just stumbled into this area a couple of years ago when I was trying get to the bottom of what was going on in my own field of climate science. In contrast DCStats, ERosa and Hubbardaie are still sadly confused. The underlying issue is just as darin has put it above, taking a (frequentist) CI, once it has been sampled and the end points are known, to be a (Bayesian) credible interval for the unknown parameter. Occasionally (often?) one can get away with this when your likelihood is a symmetric gaussian and a uniform prior on the parameter is considered reasonable, but this sometimes doesn't make much sense, and in slightly less trivial examples, this approach may just fail flat on its face. People seem to get suckered into this by the fact that the frequentist confidence interval isn't actually very useful in the real world, and the Bayesian one is really what people want to use. As I said at the start (and these exchanges illustrate), it is not easy to explain this clearly to people who have been making this mistake for years and who have absolutely no idea that there is anything at all dodgy in their understanding. It's quite a shock to the system! Jdannan (talk) 23:34, 2 May 2008 (UTC)

Suppose that you use something like a Neyman construction to compute 90 percent confidence intervals for the potential outcomes of an experiment, e.g., one that measures the mass of an electron. Then if you were to randomly select such an interval from the ensemble of 90 percent intervals, there is a 90 percent chance that you will have chosen one that covers the true value. The experiment itself is one way of randomly selecting such an interval. So while the experimentally-produced interval will either cover the true value or not, you can be 90 percent sure that you have produced an interval that covers the true value. If the above is correct, then it seems to me that this is really about all that you can ask for from a frequentest confidence interval or a bayesian credible interval. Ty8inf (talk) 15:48, 13 May 2008 (UTC)
No - or at the very least, you have to be very careful about exactly what probability and what event you are talking about. This is precisely the confusion that has been discussed at such length here. Prior to choosing the interval, you can certainly say that you are 90% sure that you will create an interval that contains the true value. This is fundamentally a different type of statement to saying that the particular interval [0.5,1.2] that you have constructed (once you have run the experiment) contains the true value with probability 90%.
Right, and that is what I tried to say. The experiment produces one interval from the ensemble of confidence intervals where by construction, 90 percent of those cover the true value and 10 percent do not. So, the one interval that was randomly generated by the experiment either covers the interval or it does not. All you know is that there is a 90 percent chance that it is one of the ones from the ensemble that covers the interval. Isn't this analogous to selecting a light bulb from a ensemble of bulbs where it is known that 90 percent "work" and 10 percent do not? The bulb that was randomly selected does not work 90 percent -- it either works or it does not. Nevertheless I believe one would say that the bulb has a 90 percent chance of functioning. Of course once you test it you will know for sure whether it does or does not work. But in keeping with the analogy, you never get to test it because that would be equivalent to knowing the "true value". But in the absence of that knowledge all you know is that the bulb has a 90 percent chance of functioning. (It just occurred to me that performing the test to get the true value is very similar to collapsing the wavefunction).Ty8inf (talk) 05:55, 14 May 2008 (UTC)
Sorry, there still seems to be some confusion here. While is is true that 90% of CIs will contain the parameter, once you have calculated the endpoints of your particular CI you have this extra information which must (at least in principle) be capable of changing your belief about the specific CI that is in your hands. Have you tried working through the example I provided where the parameter is known to be an integer and the particular 25% CI does not contain any integer? The issue becomes much clearer (and undeniable) where there is some solid prior information like this - people frequently tie themselves in knots by trying to define "prior ignorance" which can never truly exist in a formal sense (in Bayesian probability). As has already been mentioned on this page, there are some cases where the bayesian and frequentist answers are numerically equivalent if one is a little sloppy about precise definitions (and this is where the confusion seems to have arisen from), but this is by no means a general rule.Jdannan (talk) 01:22, 15 May 2008 (UTC)
Consider my example about measuring the mass of a particle where the mass cannot be negative (my version of your integer example). It is easy to imagine that the mass is experimentally measured as the difference of two values. On some occasions random fluctuations might cause the difference to be negative producing a 90 percent confidence interval such as [-5,-2] electron volts (eV). In this case I would not state that I am 90 percent sure that the interval covers the mass. But isn't it still a 90 percent confidence interval? It is just an example of one of the 10 percent that do not cover the true value. But at the same time, if the experiment produced a 90 percent confidence interval such as [1,3] eV and I had no reason to believe that the mass could not lie in that interval, then why shouldn't I feel justified in believing that the interval has a 90 percent chance of bracketing the mass? This seems especially true if I know that some fraction of the 10 percent that do not contain the true mass are unphysical. Ty8inf (talk) 04:34, 15 May 2008 (UTC)
Yes, that negative interval in your example is a 90% confidence interval. But if you want to calculate a belief that the parameter lies in a given interval, then you need to define a prior for the parameter. In some cases (basically if your prior is uniform) then you will often end up having a 90% belief that the parameter lies in the 90% confidence interval - but this is really a numerical coincidence (albeit a common one). In your example, if you believe that a relatively large proportion of the 10% misses will be demonstrably nonphysical, then you should on average be MORE than 90% sure that the parameter lies in your sampled confidence interval if you got a physically reasonable one, as this subset of CIs must have a >90% hit rate.Jdannan (talk) 06:42, 15 May 2008 (UTC)
In fact, this is a good example of a situation where a Bayesian approach would give different results. If you have a strong belief that the mass cannot be negative, your prior should reflect that belief and give zero probability to values below zero. Then the posterior distribution will also give zero probability to negative values, and credible intervals will not go below zero either. -- Avenue (talk) 07:36, 15 May 2008 (UTC)
FWIW, the recommended approach in particle physics in this case is to construct frequentist intervals using a maximum likelihood ratio ordering such as that popularized by Feldman and Cousins [4]. This avoids the unphysical intervals and guarantees that the coverage is at least the stated value (e.g., 90 percent). I say "at least" because its is not always possible to compute exact confidence intervals for discrete distributions such as poisson. The problem with the bayesian approach in this situation is that the 90 percent intervals often have less than 90 percent coverage and as such the bayesian approach is not recommended. Since I am far from an expert in this area I would like to know your views on this approach.Ty8inf (talk) 15:25, 15 May 2008 (UTC)
I can't pretend to give authoritative advice, but I find D'Agostini's work on Bayseian probability in physics readable and interesting in general.Jdannan (talk) 09:55, 16 May 2008 (UTC)
...The use of the word probability is different and incompatible between the two statements. The first refers to the frequency interpretation of (hypothetical) repeated sampling from an infinite population, the second to a Bayesian-type statement of a degree of belief concerning a fixed but unknown parameter. It may be a little unclear in the abstract when everything is unknown, but consider this analogy: prior to tossing a coin, it has a 50% chance of landing heads or tails. Once you have seen that it is a tail, do you still think it has a 50% probability of being heads? Or see my other example on this page about estimating a parameter which is known to be an integer. If your confidence interval happens to contain no integers (which is quite possible) then you KNOW that the probability of the parameter lying in that interval is zero (assuming you are even prepared to use probability to talk about degrees of belief, which a strict frequentist is not).04:30, 14 May 2008 (UTC) —Preceding unsigned comment added by Jdannan (talkcontribs)
So then we all agree that there is no practical difference in the Bayesian vs. the frequentist interpretation and this is a huge waste of bytes. Perhaps that's why people (like me) might, as you claim, have trouble with this concept. It simply has no bearing on the use of the term in reality. If we compute a 90% CI a large number of times, 90% of the CI's will contain the mean. It seems everyone has said that in here more than once and some can't take "yes" for an answer. Is this the part I'm "sadly confused" about? Do you disagree that 90% of the 90% CI's should contain the parameter? ERosa (talk) 13:36, 3 May 2008 (UTC)
No, Jdannan said "Occasionally (often?) one can get away with this [taking a (frequentist) CI to be a (Bayesian) credible interval]" but "sometimes [this] doesn't make much sense, and in slightly less trivial examples, this approach may just fail flat on its face". That's miles away from "agree[ing] that there is no practical difference in the Bayesian vs. the frequentist interpretation". It seems to me that one reason you have trouble with the concepts is not (as you suggest) that there's no practical difference between the two approaches, but that you are either unwilling or unable to work through the details and see how they differ. -- Avenue (talk) 22:46, 3 May 2008 (UTC)
Yes, I agree with darin's posts too. Darin, thanks for giving such a clear diagnosis of the problem and explanation of the issues. -- Avenue (talk) 08:35, 3 May 2008 (UTC)

Would it be fair to say the following? The 90% CL does not mean that there's a 90% chance that the statistic will be inside the interval, but rather that there is a 90% chance that interval will contain the statistic. In the former, the inference is that if the statistic is not in the interval at 90%, then by expanding the interval you increase the chance that the statistic will be inside it. In the latter, if the interval is the one-in-ten dodgy one, there is no relationship between the position of the statistic and the interval. The statistic may actually be inside it or it may be in some totally different place, and there's no way to know which. Is this what you guys are arguing about? —Preceding unsigned comment added by 125.255.16.233 (talk) 04:48, 3 May 2008 (UTC)

No!! It's not a statistic; it's a parameter. Statistics are observable. Michael Hardy (talk) 14:54, 22 May 2008 (UTC)
No, most of that is wrong. There's a germ of truth in your "not this, but the converse" statement. You are putting greater emphasis on the interval, which is pedagogically correct; the interval is random, so that is what the probability statement is about. But both variations are logically equivalent, so their probabilities are identical. If one probability is 90%, so is the other. You have also confused "parameter" with "statistic" here. In most simple examples, the statistic is in the middle of the confidence interval, and it would be a very unusual confidence interval that did not contain the statistic. The rest seems to be nonsense, and a different sort of nonsense from what we've been arguing about. -- Avenue (talk) 05:49, 3 May 2008 (UTC)

By 'statistic' I mean the population parameter as opposed to the calculated parameter. —Preceding unsigned comment added by 125.255.16.233 (talk) 14:03, 4 May 2008 (UTC)

Statistic and parameter are words with clearly defined technical meanings, as explained in the linked Wikipedia articles. Trying to swap their meanings will only confuse everyone you talk to. Please don't do it. -- Avenue (talk) 15:26, 4 May 2008 (UTC)

Oh, my, what have I wrought by opening this discussion (in the last section)? Well, at least this conversation shows that, indeed, there is a lot of confusion about the topic. I have seen very little in this discussion that appears to me to be a clear explanation of how confidence intervals work (though I don't have the stomach to slog through all comments in detail). I don't have time at this moment, but sometime soon I will be chiming in with how I think confidence intervals should be explained.

(I will make one concrete comment: Some have claimed that whether or not something is known or unknown is irrelevant. That is usually very much not the case in statistics.)Daqu (talk) 16:37, 4 June 2008 (UTC)

table to be added

plus things could be centered and stuff... --Sigmundur (talk) 11:58, 6 September 2008 (UTC)

Confidence interval measured in std deviations
probability % std deviations
50 0.676
75 1.154
90 1.653
95 1.972
99 2.601
99.9 3.341
99.99 3.973
99.999 4.536

On what distribution is this based? Clearly not the normal distribution, for which correct figures are given at Normal_distribution#Standard_deviation_and_confidence_intervals. Seems to be close to a t distribution with around 200 degrees of freedom. I can't immediately see the relevance of that particular distribution to the current version of the article, however. Maybe Sigmundur could clarify? Qwfp (talk) 13:38, 6 September 2008 (UTC)

Confidence Level / Significance Level

(Message and first reply copied from users' talk pages for general info)

The latest change is better (though the link doesn't work). However, the article still needs a simple definition of confidence level (i.e. something a layman can understand), and more importantly, an indication that, when used with statistical testing, it should be interpreted as indicating the significance level, preferably with a link to that article. The reason for this is simply the obvious fact that, for the average layman, the reason they would be looking up confidence levels or confidence intervals it that they have seen something in print reporting the results of statistics based research and want to know what the terms mean.

We can do the standard wikipedia "who can hold their breath longest" thing over this, but I'd rather be sensible about it. Perhaps if you changed the article to include these changes in the way you want, we can avoid the baby stuff.

Jim 125.255.16.233 (talk) 12:44, 12 October 2008 (UTC)

The link to further down the article works for me, but only moves to a section containing the definition. Your earlier edits seemed to want to imply that it is standard to use "significance level" in the context of a confidence interval and "confidence level" in the context of a significance test. I don't think this is done and that it is only of use when developing one or other via the significance-test inversion approach. You may be right that more infiormal description of confidence levels is needed ... possibly a short separate article would be better than trying to fit it in within the article for confidence intervals. Melcombe (talk) 09:22, 14 October 2008 (UTC)


The problem is that, for laymen, the usual contact they have with statistics is of the kind where they see stated something like "scientists have discovered that eating onions increases your chance of getting brain cancer by 52%", which is inevitably nonsense. If they research this they will see the statistic printed as 0.52 (95% CI, 0.29-0.90). When they look up CI they will find it means confidence interval. When they look that up they will see that this means that 0.29-0.90 is the interval likely to contain the chance of getting brain cancer, with the confidence level 95% indicating how likely that is. And that's where they stop, "knowing" that the scientist is 95% likely to be right, ie. that eating onions almost certainly causes brain cancer. If you look up significance level you will find all the warnings about what the 95% actually means and about multiple tests, but there is nothing here to guide them to that, and there are no warnings here. There needs to be something at the top of the article that leads to a layman's explanation of this problem. A definition of the confidence level in terms of the significance level is one way of doing that. Can you suggest another way? 125.255.16.233 (talk) 09:46, 15 October 2008 (UTC)

There's one 'proof-by-contradiction-type argument that i've seen somewhere (but unfortunately can't remember where) that brought home to me that you can't have it both ways, i.e. you can't interpret frequentist confidence intervals as meaning there's 95% probability of the true parameter lying within the calculated values:
Say two samples from the same population result in non-overlapping 95% confidence intervals. Correctly interpreted, this is perfectly possible, just very unlikely (1 in 800??). If you misinterpret the intervals as probability statements about a random parameter and those two fixed intervals, however, you find that the probability that the parameter lies in either interval is 0.95 + 0.95 = 1.90 (using the axioms of probability and the fact that it can't be simultaneously in two non-overlapping intervals). It's simplest to think about if the top end of one CI happens to coincide exactly with the bottom of the other: e.g. if the 95%CI for the mean is [1.2, 1.5] in the first experiment and [1.5, 1.8] in the second, then the probability the true population mean lies in [1.2, 1.8] is 1.9. But probabilities can't exceed 1, so something's gone badly wrong. This should be pretty convincing to any student of mathematics, but i couldn't call it a layman's explanation of the problem. Qwfp (talk) 11:18, 15 October 2008 (UTC)
If you go back to the underlying mathematics, and remember that the confidence interval is an expression of the significance level and the probability of the observed event, you will realise that, because 5% of the time the event is just a random event that happens to be rare enough to satisfy the test, the confidence interval it produces could be anything. There's no reason why it should overlap a genuine confidence interval. 125.255.16.233 (talk) 16:02, 17 October 2008 (UTC)
I think that at one stage there were article versions (for CI's and significance tests) that attempted to define confidence level as one (or 100) minus a significance level and also to define significance level as one (or 100) minus a confidence level, which would be circular. I think that the notions of confidence intervals and significance tests need to be treated separately as practical applications do not always make any attempt to deal with them jointly. The basic definition of a confidence level is in terms of the coverage probability. If a 2,3 or 4 sentence layman's explanation in terms of coverage can be added to the introduction, then OK, if it needs to be much longer than this then either a separate article is needed or a new subsection might be made somewhere. Unfortunately, I haven't seen a good example of an article divided in moderately long layman's and technical portions. I would agree that the CI article needs to have something about constructing confidence intervals by inverting significance tests, but it shouldn't imply that this is always done, which would be the danger in a poorly written layman's explanation in terms of significance levels. Melcombe (talk) 15:11, 15 October 2008 (UTC)

I'm not sure yet that I'm putting this right, so I'll try an analogy. If I were writing a book about the countries of the world I might include a section on Myanmar. But if someone looks for Burma they will be looking in the wrong place. So I would include a reference under Burma to Myanmar. If a layman looks up confidence in wp, seeking to understand what the result of a statistical test means, they will be taken to this article. There is nothing here to explain to them that they need to go look at the article on statistical significance. Either this article is divorced from statistical testing and so that reference needs to be here, or there needs to be a prominent section for laymen explaining the use of confidence intervals in understanding the results of statistical testing. 125.255.16.233 (talk) 16:02, 17 October 2008 (UTC)

You may be expecting too much of what is only one of about 1500 statistics-related articles. However, I have added a new section at Confidence interval#Relation to hypothesis testing that may be start on what you are thinking of. Ideally it should have a formal proof of the equivalence. Melcombe (talk) 12:14, 21 October 2008 (UTC)

This misses the point in that it is hardly something that laymen are going to understand. You must be an academic because you don't seem to have any idea of how to write for the general public. :-) Might I suggest the following to replace the last statement in the first section: "When the result of a statistical test is presented with a confidence interval, the confidence level of the interval indicates the significance level of the test (significance level is 100% minus confidence level)." With significance level linking to the appropriate article. This addresses my concerns; would you be happy with it? Jim 125.255.16.233 (talk) 14:12, 21 October 2008 (UTC)

No. You want "When the result of a statistical test is presented with a confidence interval" ... but the result of a statistical test is either yes or no (accept/reject) and a confidence interval is a different thing. If both a test and a confidence interval are presented there need not be a connection between their associated probabilities. Possibly you could have the result of a statistical analysis being presented as a confidence interval, with the confidence interval either covering or not covering a particular value of special interest and this could then be interpreted as providing a significance test. Note that "analysis" is not the same as "test" and that in statistics "test" has a particular meaning. Something like the following might do: "The parameter values inside a 100(1-α)% confidence interval can usually be regarded as being those values for which a significance test (of the null hypothesis that the true parameter value is the given value) would be accepted at a significance level of 100α% , while those outside would be values for which the test would be rejected." Melcombe (talk) 15:29, 21 October 2008 (UTC)

And what layman will ever have a hope of understanding that? A definition for laymen means no jargon and no mathematical formulae beyond simple arithmetic. That's why I spelt out the word minus. I regularly read published papers that say things like: "Total energy was associated with increased risk for both local and regional/distant stage disease. The adjusted odds ratios [95% confidence intervals (CIs)] contrasting highest to lowest quintile of energy intake were 2.15 (95% CI, 1.35–3.43) for local and 1.96 (95% CI, 1.08–3.56) for regional/distant disease." In other words, they calculated the odds ratio to be 2.15, generated a confidence interval of 1.35 to 3.43, and because it does not contain 1 are saying that the result is positive. So we see that the result of a statistical test (that there was an association) is being presented with a confidence interval (95% CI, 1.35–3.43). I want a layman seeing this in a paper and coming here to look up confidence interval to go from this article to the one on significance levels, where they can read about the pitfalls of accepting this as scientific fact. How would you suggest I phrase it? Without using jargon or "complex" formulae? 125.255.16.233 (talk) 13:00, 22 October 2008 (UTC)