Talk:Binomial distribution

Clarifications
If you go to previous versions and look at the first one, 02/15/2001, which is yours?, you will see :

1). q (1-p), maybe a typo?

2). And the formula for the numbers of ways of picking X items out of N items was: N!/X!/(N-X)!. This is plain wrong. Yes, after requesting a change for a week, I changed it.

3).There were also wording problems. RoseParks.

I see now the problem. (1-p) was intended as a parenthetical definition. I guess N1/X!/(N-X)! worked in my programming codes so I couldn't see the ambiguity. How would you calculate N!/X!/(N-X)!? From right to left? On the other hand, Today is 02/20/2001, so I think your "requesting a change for a week" is a bit off. Today is only the 20th by my calendar. In any case, the criticism has led to something better. Dick Beldin In answer to your question on how you evaluate, N!/X!/(X-N)!, this is ambiguous. In any easy example.

2/4/12 is ambiguous since Multiplication is associative over the reals. If you look at division as the inverse operation of multplication, i.e. 2/4/12=2*4^1*12^1=1/24 you are okay. If you look at division in the ordinary sense, you must specify the order of operations.RoseParks
 * (2/4)/12= 2/48=1/24 while
 * 2/(4/12)= 24/4= 6.

I agree that an expression with successive divisions appears ambiguous. Most mathematicians I know do indeed consider division as the inverse of multiplication and many programming languages explicitly specify that multiplication and divisions are performed left to right. You are correct, it is not a universal convention. In addition, the vertical placement of numerator and denominator is clearer. Dick Beldin

Confidence Interval?
I was looking for information about confidence intervals on a binomial distribution, but was surprised not to find it here. I know this case isn't quite as simple as for normal distributions, but it would be nice to have here, if somebody would like to contribute the information.

You mean CI of p, the success probability, as estimated from the data. If 70 successes in 100 trials, then p_est = 0.7, and your question is what is standard deviation of p_est. It is sqrt(p_est(1-p_est)/n_trials). The 95% confidence interval is +/- 2 standard deviations. My question is what happens if the CI range is outside the allowed 0 to 1 range for a probability. This can happen if p_est is ~1 or ~0. The CI has to be assymetric. Any ideas?


 * In the case where the confidence interval gets close to 0 or 1, the normal approximation of the binomial distribution is not accurate and rules like your "2 standard deviations" that are derived from the normal distribution are not accurate either. Depending on the circumstances, one can use a different approximation (such as the Poisson distribution) or the exact values of the binomial distribution. McKay 06:36, 27 October 2006 (UTC)


 * Another intuitive understanding of confidence intervals on binomial distribution is this: say you S success in N trials, now we don't know what p really is, but let's make a guess p_guess.  You can use the binomial distribution to calculate the probability of seeing [S_observed or more success] in N trials if p=p_guess.  If S~N (p~1), there will be a value for p_guess so that there is only a 5% chance of getting S>=S_observed.  At this value for p_guess, you would say you are 95% confident that p>=p_guess.  Similarly if S~0.  For example if you did a trial with 45 samples (N=45) and all of them were successful (S=45), if p_guess=.95 there is only a 0.099 chance of seeing all 45 successful, so there you can be 1-0.099 => 90% confident that p>=95%. —The preceding unsigned comment was added by 64.122.234.42 (talk) 17:14, 11 April 2007 (UTC).


 * Yes, this is the kind of consideration I was shooting for with my original post. While I understand adequately how to create a hypothesis test for a one-sided alternative, I was hoping that someone would come forward with a good methodology for doing a two-sided alternative hypothesis, since this would imply some means of parameterizing the asymmetry of the distribution.  I had this come up in a real-world scenario, where the  question was whether or not we had a statistically significant result, and how close it might actually be, but the interval was not so critical or well-defined.

Simulation?
I was looking for a pointer to quickly simulate a Binomial trial. That is, given a p and an n, I want to randomly select a result with a Binomial distribution. I know I can approximate this with a normal distribution, but I would prefer an exact result if it can be calculated quickly for n < 10,000. I'm sure others have come here looking as well. Thanks.


 * I added two references to the article which describe binomial random variate generation. A modern C implementation of Kachitvichyanukul and Schmeiser's BTPE algorithm is available as part of the GNU Scientific Library. --MarkSweep &#x270D; 04:08, 8 October 2005 (UTC)

HIV positive?
Is it me, or should the "A typical example is the following: assume 5% of the population is HIV-positive." part in the second paragraph be changed to something a little less... you know... The HIV part is just not encyclopedia-ish...


 * That might depend on which population. Michael Hardy 19:42, 22 October 2005 (UTC)


 * Spot on. I thought exactly the same and immediately looked at the discussion. All political correctness aside, I just don't think anyone would feel harassed if we wrote "assume 5% of the population carry a certain gene" or "are infected with a certain desease", while I am very sure that everyone with an HIV-infection or someone who knows someone closely who is infected will at least feel strange on reading this paragraph. I am all against political correctness for its own sake, but if there's no need whatsoever to use a certain formulation that might be considered inappropriate, why use it?

Probability mass function?
Okay, maybe this is standard jargon somewhere, but I've never come across it until today. I guess "mass" makes sense by the physical analogy to density. Honestly, I think it's stupid language. Should we also speak of cumulative mass distribution functions? Be consistent! I'm not going to change it, but a mathematician should. At the very least link it to the pmf page.


 * pmf is fairly standard. It is linked there now.  No, cumulative mass distribution function is not a phrase I have heard. --Richard Clegg 08:25, 6 February 2006 (UTC)

CDF Example Request
The article gives the following example: "A typical example is the following: assume 5% of the population is green-eyed. You pick 500 people randomly. How likely is it that you get 30 or more green-eyed people?".

This is a CDF example. Unfortunately, the expression given for CDF is not very clear to me. How about giving a worked example with the green-eyed people given in the article as a good example, please? --New Thought 15:12, 8 May 2006 (UTC)


 * I think the given CDF is really merely an introduction of notation. Perhaps there is no simple closed-form expression for the CDF, although there is an obvious algorithm for computing its values (just add up the appropriate values of the mass function). Michael Hardy 18:27, 8 May 2006 (UTC)


 * aha - that's the answer I was looking for! In that case, why not say something like, "The value can be computed with..."


 * $$cdf(k;n,p) = \sum_{k=1}^n {n\choose k}p^k(1-p)^{n-k}\,$$ --New Thought 09:14, 9 May 2006 (UTC)


 * Actually, in this case the CDF is
 * $$F(k;n,p) = \sum_{j=0}^k {n\choose j}p^j(1-p)^{n-j}$$
 * --MarkSweep (call me collect) 10:43, 9 May 2006 (UTC)
 * Good corection - I have added this expression to the article! --New Thought 13:04, 9 May 2006 (UTC)


 * That is correct only when k is an integer, and only when 0 &le; k &le; n. Michael Hardy 21:29, 9 May 2006 (UTC)


 * Why is it necessary to express the CDF's upper bound of summation in terms of the floor function? The binomial distribution support already indicates that the random variable must take on positive integer values, with the exception of zero (0, 1, ... n). Zane Dylanger (talk) 16:52, 5 June 2010 (UTC)


 * This is the binomial distribution - how can k not be either 0 or a positive integer? --New Thought 08:30, 10 May 2006 (UTC)
 * Just wanted to add - thanks for your help in getting to this article improvement! --New Thought 16:34, 10 May 2006 (UTC)


 * In that specific example, we have
 * $$\Pr[X \geq 30] = 1 - \Pr[X \leq 29] = 1 - F(29; 500, 0.05)$$
 * $$= 1 - I_{0.95}(471,30) = I_{0.05}(30,471) \approx 17.647\%.$$
 * You can compute this in terms of the incomplete Beta function, as indicated in the article, using your favorite numerical software. For example, in Mathematica this becomes . Direct summation is likely going to be less numerically stable than a carefully designed subroutine for evaluating the incomplete Beta function. --MarkSweep (call me collect) 06:11, 9 May 2006 (UTC)


 * Thanks very much for your response. I agree with you - and as it happens, I do use Maxima, which has a shed-load of distribution functions (load(distrib); followed by functions; will show them) - but I wanted to write the functions in Javascript for a web page. I went ahead and wrote the web page using the Poisson distribution - but I still think that this article should give expressions that people can use in normal languages and spreadsheets! I feel I've done my bit for Wikipedia maths clarity - in the Lottery_Mathematics article, mostly written by me, I did my best to make it clear exactly how to do each calculation! --New Thought 09:14, 9 May 2006 (UTC)

"nmemonic" section
I really dislike the "nmemonic" section. If anyone else agrees, please delete it. McKay 14:55, 11 June 2006 (UTC)


 * I agree. The mnemonic section is laughable. I'm deleting it. Rjmorris 14:44, 18 June 2006 (UTC)

How about putting it here, then with attention on it someone may come up with better. Tabby 03:44, 31 October 2007 (UTC)

Here is the diff:. But I agree with deleting it, it is unencyclopedic. Sander123 13:58, 31 October 2007 (UTC)

Relationship to Bezier curves?
The article currently states: The formula for Bézier curves was inspired by the binomial distribution.

Would someone care to source that statement? It seems rather dubious to me, but if it's true it's worthy of a proper explanation and not the vague description of being "inspired by". Certainly the Bernstein polynomials, which constitute the basis functions for Béziers, contain a Binomial coefficient. But binomial coefficients exist all over the place. It doesn't necessarily imply that they have much at all to do with the Binomial distribution.

From reading about Bézier curves I've always had the impression that the decision to use Bersteins as their parametrization wasn't 'inspired' by anything, but merely chosen from a group of candidates on the merit of their desireable properties. (Being such properties as the fact that curve is guaranteed to be contained within the convex hull of the control points, that reversing the control points does not change the curve, that the tangents at the endpoints consist of the line between the endpoint and the neighboring control point, etc). --130.237.179.166 14:48, 3 September 2006 (UTC)


 * I'm deleting this since no justification has been offered. Zillions of things are "inspired" by the binomial distribution anyway and I don't see why this one is important enough to single out even if it is true.  McKay 04:31, 28 October 2006 (UTC)

Better Example
I feel like there could be a better example than picking 500 people out of a population "with replacement" and seeing how many were green-eyed. Perhaps a more sensical and applicable example could be: out of 50 web servers, each of which has a 1% chance of failing by the end of the day, how many failed servers do you have at the end of the day?

—The preceding unsigned comment was added by 18.216.0.100 (talk • contribs).


 * I agree. The current example suffers from the need to do sampling with replacement, which will seem unnatural to people unaccustomed to sampling theory. --McKay 05:52, 29 November 2006 (UTC)


 * I agree with both of you. How about simply the "toss a coin..."? Hackneyed, perhaps, for us, but surely we want the general reader to "get the picture" as easily as possible? Gerald Tros 01:41, 21 May 2007 (UTC)

The example with a die is OK in principle. Most people, I reckon, will have seen and used dice. But why change the well known configuration ( 1 thru 6 dots) with "5 blank and 1 black side"? This now makes a familiar object unfamiliar and thus more difficult to mentally latch onto. Furthermore, a lesser issue, 'blank' and 'black' are two very similar words possibly leading to misreading. Why not just simply use: "Roll a die ten times and count the number of sixes.", thus appealing to a general feeling of wishing to see the highest value side turn up? Gerald Tros 01:41, 21 May 2007 (UTC)
 * I agree I like the original better as well. Sander123 09:53, 21 May 2007 (UTC)

--Why not start with a coin example? Isn't this the most straight-forward? The one that everyone did in 4th grade??--128.135.96.223 (talk) 20:34, 9 March 2008 (UTC)

-- Why does the article use a biased coin in the introductory example? Why not a normal coin that people encounter in a normal life? Bosons (talk) 16:49, 19 January 2017 (UTC)

"Kitchen's theorem"
I deleted a new section on "Kitchen's theorem". It began by saying "...we can see by Kitchen's Theorem that..." without having first said what "Kitchen's theorem" is. That is not appropriate. Then, as far as I can tell, the theorem turned out to be a proposition found in many textbooks without the name "Kitchen's theorem". The notation in which it is written includes the use of the same letter for two different random variables in the same equality. Near the bottom it has some notation that is less than correct and that includes some very clumsy language. Then there is a signature---appropriate for a talk page but not for an article. In includes "Dr. William Kitchen PhD (Psychology)", apparently identifying that person as the one who added this material. It looks like an attempt to name after himself a proposition found in innumerable textbooks since before the births of most (or all?) people now living. Michael Hardy 20:11, 23 March 2007 (UTC)

Well Michael, it's nice to see a fellow  'Mathematician' scrutinising my work, labellng it a 'proposition'. Given the fact that my Theorom has went under rigorous investigation within a university, I fail to see how you can ever have seen it in "innumerable" textbooks. Perhaps you could name a few of them for my reference. And lets not get into a Mathematical jargon slanging match; whoever you are, I would be confident in my own Mathemaical standing to stand before anyone and prove my Theorem/ lemma. And, if it is indeed in many textbooks, I'd urge you to publish a proof of my statement. I have it on good authority, from highly esteemed Mathematicians, that the Theorem I put online is indeed a new and may I add correct proposition. It wasn't a Theorem as such, hence why I referred to it as the Binomial Lemma. I trust you know what a Lemma is! In future, before you make such claims, ensure that the nature of your statements is true. Do that rather than correcting me. And in response to this, if you do indeed give one, I'd appreciate being referred to as Dr. William Kitchen. — Preceding unsigned comment added by 84.66.3.105 (talk • contribs) 18:00, 29 March 2007


 * From A First Course in Probability, Fourth Edition (1994) by Sheldon Ross, page 181, exercise 26, quoted verbatim:
 * Let X be a negative binomial random variable with parameters r and p, and let Y be a binomial random variable with parameters n and p. Show that
 * $$P\left\{X > n\right\} = P\left\{Y < r\right\}\,$$
 * If you want to attribute this result to yourself in a Wikipedia article, may I suggest that you cite some published paper that you've written in which you state it? What was the nature of this "rigorous investigation"?  Was it simply mathematicians confirming that the result is correct?  If so, that's hardly surprising.  Was it mathematicians with expertise in probability theory saying the result is new and was unknown before you introduced it?  If so, I would find that surprising and I would dispute it.  Or was it a professor saying he did not happen to have seen it before?  If he's not a probabilist, that's not too surprising and is not the same as saying that it is novel. Michael Hardy 20:14, 29 March 2007 (UTC)
 * ...oh, and since you emphasize that it's your own result, you should not put it in the article unless you also cite some place where you've published it in a journal, since otherwise it would be original research being presented here for the first time. Original research is contrary to Wikipedia policy. Michael Hardy 21:57, 29 March 2007 (UTC)
 * If you want to attribute this result to yourself in a Wikipedia article, may I suggest that you cite some published paper that you've written in which you state it? What was the nature of this "rigorous investigation"?  Was it simply mathematicians confirming that the result is correct?  If so, that's hardly surprising.  Was it mathematicians with expertise in probability theory saying the result is new and was unknown before you introduced it?  If so, I would find that surprising and I would dispute it.  Or was it a professor saying he did not happen to have seen it before?  If he's not a probabilist, that's not too surprising and is not the same as saying that it is novel. Michael Hardy 20:14, 29 March 2007 (UTC)
 * ...oh, and since you emphasize that it's your own result, you should not put it in the article unless you also cite some place where you've published it in a journal, since otherwise it would be original research being presented here for the first time. Original research is contrary to Wikipedia policy. Michael Hardy 21:57, 29 March 2007 (UTC)
 * ...oh, and since you emphasize that it's your own result, you should not put it in the article unless you also cite some place where you've published it in a journal, since otherwise it would be original research being presented here for the first time. Original research is contrary to Wikipedia policy. Michael Hardy 21:57, 29 March 2007 (UTC)

What you quoted from this textbook isn't even the same as my Theorem. And do not quote Wikipedia policy to me - take me to court, sue me, do whatever you wish. I have this Theorem in a journal, and have had it copyrighted to my name, so that scavengers on internet sites cannot attribute a novel idea to a text book they happen to have read. I had it checked, along with a proof by a university Professor who specialises in the concpets of probability and statisitics. It then underwent a stage of 'gaining plausibility', and under futher rigorous proof. There was a work through proof, and a proof by induction which clearly shows that the NEW theorem works, for all the possible values it outlines. I think you'll find the quote you have from your book involves a different concept to what I outlined before. I'll tell you what : take a look at it, and as Fermat said before he published his last Theorem "prove me wrong": I've got a mortgage on it saying you can't!! All the best, Dr. William Kitchen — Preceding unsigned comment added by 84.66.3.105 (talk • contribs) 22:59, 29 March 2007


 * OK, I will go back and look carefully at what you added to the article. But if it is to be included, it should be written clearly, using standard notation (not, for example, using the same letter for two different random variables in the same breath), standard language and spelling (e.g. "theorem", not "theorom" as you wrote above) and following standard Wikipedia conventions (e.g. who wrote what is in the edit history, NOT in the article itself).  However, it would be a lot more efficient for you simply to tell me where to find your published article in the library than for you to go on at length about the whole history of your writing the article.  (Oh, and I trust when you mention the copyright, you mean copyright on the article you wrote rather than on the theorem itself.) Michael Hardy 23:34, 29 March 2007 (UTC)

Well, I appreciate that. Like all Mathematicians, I like recognition for my work. I had to have it rigorously checked and compared with similar Theorems and Lemmas, to ensure I wasn't putting my name to a piece of work that someone else had previously discovered. Notation is a blunder, I hold my hands up on that front, and I understand the elementary nature of my error. I can provide you with my proof for the Theorem as soon as I finish my textbook which is in finalisation at the moment. All my work is momentarily on hold becasue of that. I welcome any scrutiny of my work - I feel that Mathematics is best done when under pressure from other esteemed Mathematicians. The workings of Wikipedia, however, are something I am not aware of, and I appreciate any guidelines you offer me to follow. Again, however, as I have already said, I know I can stand before any Mathematician and prove my Theorem. Regards Dr. William Kitchen
 * Hello Dr. William Kitchen, please try to relax a bit, nobody is trying to discredit your work. But we are talking about cross purposes. What one wants for an encyclopedia article on the binomial distribution is the fact that it is related to the negative Binomial distribution. Ideally such a statement should be sourced. If appropriate a proof can be added. There where a number of problems however with your contribution and Michael rightly reversed it. The notation is problematic (using X twice, using r both as an index and a parameter). The proof doesn't add to this article since it doesn't actually prove the theorem, it only give some basic definition and a referral. And finally, the theorem quoted is unknown to mathematicians, so it doesn't help one at all.


 * In my view the statement related the two distribution can stay in the article. But the proof you supplied should either be replaced by a proper proof or by a reference to a published book or peer reviewed paper.


 * As one final point, please do not make legal threats. Also wp:nor is established wikipedia policy, and this is not the place to put it to discussion. Sander123 12:09, 30 March 2007 (UTC)

Dr. Kitchen, could you tell us the title of the paper and the name of the journal and which issue it's in? That would really be a whole lot more to the point than telling us how confident you are that everything about it is sound. Michael Hardy 20:31, 30 March 2007 (UTC)

Normal Approximation
Not sure about the statement
 * This approximation is a huge time-saver (exact calculations with large n are very onerous);

The exact calculations are only onerous if one doesn't have a computer. Considering that virtually all statistics is done over computers these days the above seems unimportant. 128.195.106.28 23:55, 31 March 2007 (UTC)


 * It is less important than it used to be, but if n is very large the exact computation can still be onerous. Perhaps more important is that the normal approximation means that a great many statistical tests designed for the normal distribution (such as the Student t-test, the F-test) can also be used for the binomial distribution under the right conditions. --McKay 05:38, 1 April 2007 (UTC)

Hmm, just a reader here, but I can't make a modern computer delay visually for any reasonable n (up to 9999999999) when using the exact solution. I advise my students to always use the exact test and that the normal approximation is a relic of a bygone era. However it is interesting and perhaps worth noting why the binomial becomes normal-ish. Also, I thought the ability to use the normal approximation was based on np not n - with a low enough p, even a huge n will be skewed.4.79.81.6 04:45, 1 November 2007 (UTC)

I experienced, that the normal approximation is indeed a time-saver if e.g. computing many different binomial distributions. In my case -- using octave -- computation speeded up a lot, especially since I was using quite large n's and always had to sum up about n/2 distributions (for only one point in the plot). So thank you for mentioning it in the article! --129.13.186.1 (talk) 10:13, 18 September 2009 (UTC)
 * But you might have been better off using the incomplete beta function result that is included. Melcombe (talk) 15:53, 18 September 2009 (UTC)

Your end result for the binomial approximation is incorrect. It should be N(np, (np(1-p))^1/2). You currently have N( np, (np(1-p))). —Preceding unsigned comment added by 24.29.95.138 (talk) 21:59, 24 January 2010 (UTC)
 * The formula as given in the article is correct. It matches the mean (np) and the variance (np(1−p)) of the binomial and the normal distributions.  …  st pasha  »  22:42, 24 January 2010 (UTC)
 * But normal distributions aren't given by mean and variance, they're given by mean and standard distribution.
 * It can be done both ways. In the Wikipedia article, the normal distribution is defined in terms of the variance, so to be consistent, its probably best to do it that way here too. PAR (talk) 16:00, 25 January 2010 (UTC)
 * In words, one can say “a normal distribution with mean xxx and standard deviation yyy”. But when writing a formula, it is always the $$\mathcal{N}(\mu, \sigma^2)$$, and I've never seen it otherwise.  …  st pasha  »  09:49, 26 January 2010 (UTC)
 * Actually, come to think of it, neither have I. But there's no mathematical proof that says thats the way it has to be done, that's what I meant.PAR (talk) 16:01, 26 January 2010 (UTC)

Explicit derivations of mean and variance
This section is my first contribution. I sincerely hope it's sensible to have done so and that it is a (potential) boon to readers. I'm honing it, adding links, references, improving text etc. Please give me a couple of days, I'll post it in one single edit. I'd appreciate any advice you have for me regarding content choice, style etc. Thank you. Thanks already to Michael Hardy. Gerald Tros 01:34, 25 April 2007 (UTC). OK, a couple of weeks. It's almost ready :-) Gerald Tros 01:31, 11 May 2007 (UTC) Done. Gerald Tros 01:28, 16 May 2007 (UTC)

Might it not be a lot easier to demonstrate this proof using generating functions? I can easily do it this way, unless anyone can spot a good reason not to (it requires a lot less algebra...but does requires some GF results) Wrayal 20:45, 31 May 2007 (UTC)
 * I can see that it would be easier. But it would require more starting knowledge. I'd guess that anybody who knows about generating functions does not need to look up the derivation of the mean in wikipedia. Therefore I think the derivations should be kept as elementary as possible. Sander123 13:20, 5 June 2007 (UTC)

Incorrect cdf
The cdf of a discrete distribution must be piecewise constant. ПБХ 15:13, 21 September 2007 (UTC)

derivations
I hope somebody could help me in finding derivations or how to derive the skewness and kurtosis, even link to other sites will be much appreciated. —Preceding unsigned comment added by Student29 (talk • contribs) 19:29, 16 January 2008 (UTC)

bad language
After giving the expectation as np, the article states "This fact is easily proven as follows. Suppose first that we have exactly one Bernoulli trial. We have two possible outcomes, 1 and 0, with the first having probability p and the second having probability 1 − p; the mean for this trial is given by μ = p." This is not a proof. These sentences should really just be removed. —Preceding unsigned comment added by 68.50.194.132 (talk) 19:59, 16 February 2008 (UTC)

Variance
In the section entitled Mean, variance and mode, it isn't clear to me how the expression given follows from "Using the definition of variance, we have..." Should I try to find this in the entry for variance, figure it out from the problem statement, or use the definition of variance given just above? In any case I don't see how it follows.Telliott (talk) 11:43, 14 March 2008 (UTC)

Bad Figures
The figures have unlabeled axes, making them pretty much useless. Can someone either introduce new figures, or edit the existing ones to have axis labels? 209.94.128.119 (talk) 01:30, 13 November 2008 (UTC)

Sampling
The article for Hypergeometric distribution describes how it is used for sampling without replacement and states that Binomial Distribution is used for sampling with replacement. How is Binomial Distribution method used for sampling? Virgil H. Soule (talk) 18:11, 2 July 2009 (UTC)

Bad example
Removed:

"As another example, assume 5% of a very large population to be green-eyed. You pick 100 people randomly. The number of green-eyed people you pick is a random variable X which approximately follows a binomial distribution with n = 100 and p = 0.05 (strictly a hypergeometric distribution)."

If it isn't strictly a binomial distribution, then it is a bad example.

In lieu of misusing a hypergeometric distribution as an example of a binomial distribution, perhaps add a section detailing the relationship and how they are similar and yet different? Madkaugh (talk) 00:42, 13 October 2009 (UTC)

Mode Expression Incorrect
The expression for the mode is incorrect. Imagine a Binomial distribution with p = 1.0 and n = 2, the expression for the model will return 3 while the true value is 2. —Preceding unsigned comment added by 86.165.211.190 (talk) 21:52, 1 November 2009 (UTC)


 * I've changed it to this:
 * $$\text{mode} = \begin{cases}\lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ \lfloor (n+1)\,p\rfloor \text{ and } \lfloor (n+1)\,p\rfloor - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1.\end{cases} $$
 * $$\text{mode} = \begin{cases}\lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ \lfloor (n+1)\,p\rfloor \text{ and } \lfloor (n+1)\,p\rfloor - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1.\end{cases} $$

Michael Hardy (talk) 06:17, 25 November 2009 (UTC)

Paramater n
It doesn't make any sense for the parameter n to be 0. Most texts limit n to be a natural number. —Preceding unsigned comment added by 128.187.81.187 (talk) 21:03, 23 November 2009 (UTC)
 * The case n=0 give a valid distribution, which is a natural part of the same family and which is required in more complicated manipulations of distributions such as compounding. Melcombe (talk) 10:36, 24 November 2009 (UTC)

Computing the cumulative distribution function (CDF)
It's common in Wikipedia math articles to discuss algorithms for computing quantities of interest. On this page it would be very helpful to have a discussion of computing the cumulative distribution function (CDF). The article does mention various methods that can be used in various circumstances, but this is an incomplete solution at best. For example, the article doesn't provide any guidance about choosing between (a) a combination of direct summation and the normal approximation and the poisson approximation, (b) a method based on the incomplete beta distribution, and (c) something else. A discussion of computing the CDF would be useful to a lot of people. ATBS 22:28, 30 November 2009 (UTC)ATBS —Preceding unsigned comment added by ATBS (talk • contribs)

Normal approximations
The second rule of thumb for normal approximations looks suspicious. It can be written in the following form: use normal approximation whenever
 * n · |skewness| &ge; 3.33

In particular that rule claims normal approximation should not be used for any n when p = ½. So the sign should probably be reversed, and the factor n omitted? …  st pasha  »  22:41, 30 November 2009 (UTC)
 * Fixed. -12.7.202.2 (talk) 18:29, 14 May 2010 (UTC)

There is an error in an example: σ = (p(1 − p)/n)1/2. Should be σ = (np(1 − p))1/2. Please someone fix it or explain what I do not understand there. —Preceding unsigned comment added by 213.197.179.210 (talk) 10:34, 5 May 2011 (UTC)

Controlling the variance
Hi all,

I came up with a way to add variance to the binomial distribution, for this purpose I consider the history of success compare to the expected value. Here is my development (I hope it is OK to have a link)

I would really like to know what you think, to me it looks very cool as I use expected value and sum of binomial series and I didn't see anything like it anywhere.

What do you say? Ofermano (talk) 16:31, 28 June 2011 (UTC)

Accessibility
Would it be possible to write an introductory section that gives just a conceptual description of what the binomial distribution is about, before we enter the maths? Like tossing a coin, or drawing marbles from a box, and replacing the drawn marble each time (and mixing the box up again)? -- J N  466  02:18, 3 July 2011 (UTC)
 * Good idea. The lead sort of introduces it, but there should be room for a more detailed overview. Sources shouldn't be too tricky to find. Alzarian16 (talk) 04:20, 3 July 2011 (UTC)

Error in article
Hi, isn't the standard deviation calculated as : sqrt((p(1 − p) n)) ? In the article it is written as: sqrt((p(1 − p)/n)) — Preceding unsigned comment added by 213.55.184.169 (talk) 06:31, 15 March 2012 (UTC)

Cumulative distribution function -- Example
Is it my imagination or are only the first and last probabilities for the biased coin correct? I have run that in SAS

data _null_ ; p = 0.3 ; do i = 0 to 6 ; prob= (p**i) * ((1-p)**(6-i)) ; put i= prob= ; end ; run ;

and I get

i=0 prob=0.117649 i=1 prob=0.050421 i=2 prob=0.021609 i=3 prob=0.009261 i=4 prob=0.003969 i=5 prob=0.001701 i=6 prob=0.000729

Docsteve.518 (talk) 21:46, 15 April 2013 (UTC)


 * Yes, it is your imagination. Seriously, you have forgotten to include the combinatorial coefficient. Melcombe (talk) 21:51, 15 April 2013 (UTC)

Oh yes, thanks for that. That's what I get for trying to do it long hand.

The SAS functions exist for a reason!

data _null_ ; do i = 0 to 6 ; x = pmf('Binomial',i,.3,6) ; put i= x= ; end ; run ;

And yes, there's the sequence

i=0 x=0.117649 i=1 x=0.302526 i=2 x=0.324135 i=3 x=0.18522 i=4 x=0.059535 i=5 x=0.010206 i=6 x=0.000729

16:07, 16 April 2013 (UTC) — Preceding unsigned comment added by 72.43.218.26 (talk)

Question about Cummulative Distribution Function
Firstly, I want I am wondering about the definition of the CDF
 * $$F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1-p)^{n-i}$$

From my training (and looking at the graphs on THIS page), we should be defining F on the real numbers and writing &lt; and not &le;, that is:


 * $$F(x) = \left\{ {\begin{array}{*{20}{l}} 0&{x < 0}\\ {\sum\limits_{j = 0}^{k - 1} {\left( {\begin{array}{*{20}{c}} n\\ k \end{array}} \right)} \,\,{p^k}{{(1 - p)}^{n - k}}}&{k - 1 \le x < k,\,\,k \in \{ 1,2,...,n\} }\\ 1&{x \ge n} \end{array}} \right.$$

or indeed:
 * $$F(x) = \left\{ {\begin{array}{*{20}{l}} 0&{x < 0}\\ {\sum\limits_{j = 0}^{k - 1} f(j)}&{k - 1 \le x < k,\,\,k \in \{ 1,2,...,n\} }\\ 1&{x \ge n} \end{array}} \right.$$

(I also switched to j in place of i since so many applications now assume i is the corresponding complex number.)

I had already given this formula in my MK wikipedia page and was wanting to add an iterative "computer graphing formula" and so checked to see if there was one on this page and became confused with the above formula.

BTW: Here is the iterative formula I was getting ready to add. Any suggestions here to make this clearer how to use?
 * $$\begin{array}{*{20}{l}}{{F_0}(x) = 0}&{x < 0}\\{{F_{k + 1}}(x) = {F_k}(x) + f(k)}&{k \le x < k + 1,\,\,k = 0,1,2,...,n - 1}\\

{{F_{n + 1}}(x) = 1}&{x \ge n}\end{array}$$

So if you make a sequence of the probabilities values (easy), you can easily make this into a sequence of n+2 points and then draw segments as in the above graphs. Having worked this out a gazillion times for my kiddies, I finally wrote it down.

Lfahlberg (talk) 09:06, 22 November 2013 (UTC)

Graph, n
Please label the axes on the graphs, and state the allowable range for parameter n. Does n include zero? 71.139.165.140 (talk) 19:11, 14 December 2014 (UTC)

We should add the moment generating function
Here is a reference to use: http://www.le.ac.uk/users/dsgp1/COURSES/MATHSTAT/5binomgf.pdf

Tal Galili (talk) 17:04, 25 March 2015 (UTC)


 * I took a quick look at other probability distribution pages on Wikipedia, and I'm not seeing any derivations of the mgfs there. I suppose that doesn't necessarily disqualify us for adding the mgf to this page, but considering that this page doesn't even derive the mean or variance of the binomial, then I don't see adding this material to this page. Blahb31 (talk) 21:45, 25 March 2015 (UTC)


 * Thank you, fair point. Since MGF are very basic for using these objects in various settings, I think this type of information should be available somewhere on Wikipedia. a) would you agree? b) if so - where do you think would it fit?

Cheers, Tal Galili (talk) 22:28, 25 March 2015 (UTC)


 * I'm going to say no. This is material for a textbook, not an encyclopedia. Blahb31 (talk) 11:54, 26 March 2015 (UTC)

We should add a section on skew-normal approximation
title says it all, see for example Ching-Hui Chang et. al.: "A note on Improved Approximation of the Binomiual Distribution by the Skew-Normal distribution" the American Statistician. Kjetil B Halvorsen 13:34, 8 June 2015 (UTC) — Preceding unsigned comment added by Kjetil1001 (talk • contribs)

Mode proof doesn't define a_k
The proof of the mode doesn't define a_k. It's relatively clear that f(k) is meant, but that should be defined (or f used) — Preceding unsigned comment added by Andreas Mueller (talk • contribs) 00:21, 3 March 2016 (UTC)
 * fixed. McKay (talk) 03:40, 3 March 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Binomial distribution. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20140515145146/http://www.mbastats.net/Content/Basic_Prob/Binomial.html to http://www.mbastats.net/Content/Basic_Prob/Binomial.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 21:04, 2 November 2016 (UTC)

Spoilers!
Is the "Spoilers!" in the conditional binomial proof section a joke? :) Nicolas Perrault (talk) 15:38, 1 May 2017 (UTC)

Well... it is funny, but it is recently-added vandalism so I removed it. Bosons (talk) 02:31, 2 May 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 external links on Binomial distribution. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20160303182353/http://www3.stat.sinica.edu.tw/statistica/oldpdf/A3n23.pdf to http://www3.stat.sinica.edu.tw/statistica/oldpdf/A3n23.pdf
 * Added archive https://web.archive.org/web/20150113082307/http://psych.stanford.edu/~jlm/pdfs/Wison27SingleProportion.pdf to http://psych.stanford.edu/~jlm/pdfs/Wison27SingleProportion.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 17:08, 20 July 2017 (UTC)

Lead readability
I think at its current state, the lead section is very opaque for a layperson. It's so full of technical jargons that only a person already familiar with these statistical terms would care to understand what this concept means. I think the lead should be much more accessible. It should use very simple language or have very simple language definition for each opaque jargon-y term. It should also contain concrete examples, preferably historically relevant examples, associated with the concept, to provide a very basic, graspable understanding of the topic. Zaheen (talk) 08:05, 14 September 2017 (UTC)

The Manual of Style/Lead section says: "The lead ... should ... establish context, explain why the topic is notable..." and "avoid difficult-to-understand terminology" "....with the goal of making the lead section accessible to as broad an audience as possible. Where uncommon terms are essential, they should be placed in context, linked and briefly defined. The subject should be placed in a context familiar to a normal reader." I don't think that is the case here at all. Zaheen (talk) 08:09, 14 September 2017 (UTC)

Mean, Mode, Median seem off by a factor of n
Isn't the mean not the sum, but the sum divided by n? Similar errors appear to be made with the mode, median, variance, etc. I think skewness, kurtosis, and entropy could also be affected, but I don't know. When we look at the definition of this distribution it is clearly between 0 and 1 for all allowed values of p, k, and n; and yet if the mean is truly given by p*n, then with p=0.5 and n=3 we get 1.5 which is outside all output values. StephenJohns00 (talk) 11:37, 31 August 2018 (UTC)


 * The support of a binomial random variable is $$\{0,1,\ldots,n\}.$$ So it makes sense for the mean, median, and mode to fall somewhere within this range. The values listed in the article are correct.  –Deacon Vorbis (carbon &bull; videos) 11:44, 31 August 2018 (UTC)


 * But shouldn't the mean listed on the binomial distribution page refer to the mean of the distribution and not the mean of a binomial random variable, which has a different definition, causing this confusion?StephenJohns00 (talk) 12:25, 31 August 2018 (UTC)


 * No, those are the same things. –Deacon Vorbis (carbon &bull; videos) 12:26, 31 August 2018 (UTC)

The random variate section is weird
Currently the random variate section has this paragraph, which is kind of weird:


 * One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values kfrom 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a pseudorandom number generator to generate samples uniformly between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.

For one, you can just do n bernoulli trials. You would need an absurdly large n for this be inefficient. Second, we have closed form expressions for the pmf that always add up to one, so if it does not, you miscalculated. Third, using the cumulative probability distribution would probably work better than the pmf. — Preceding unsigned comment added by TheKing44 (talk • contribs) 05:49, 1 February 2019 (UTC)


 * Correct about cumulative distribution. The "close to one" refers to the fact that in practice it will be necessary to approximate the probabilities. The problem with n bernoulli trials is that you have to do them for each output value, while computing the cumulative probabilities only has to be done once followed by (say) a binary search of cost O(log n) for each output value. McKay (talk) 07:18, 1 February 2019 (UTC)


 * True, but there's also an initial cost of generating the table, which might be prohibitively high depending on the number of values needed. This section seems a little iffy; I'll try to look into it a bit more if I get a chance.  –Deacon Vorbis (carbon &bull; videos) 14:35, 1 February 2019 (UTC)


 * After looking a bit, I found a paper which talks about the alias method, which we conveniently have an article on already. Maybe a brief mention could be made, but it's a fairly general method for any discrete distribution, so it probably wouldn't make much sense to go into great detail (and on second thought, I'm a bit skeptical that we should even be saying as much as we already are, given the generality of the process being described).  –Deacon Vorbis (carbon &bull; videos) 14:46, 1 February 2019 (UTC)

Entropy
Entropy is given as entropy    = $$\frac{1}{2} \log_2 \left( 2\pi enp(1-p) \right) + O \left( \frac{1}{n} \right)$$ Is it possible to give a reference for that formula given that it is not standard textbook knowledge. Also it is unclear, what the operator/function O of 1 over n stands for. Thanks. (Sorry for all breaches of etiquette.) — Preceding unsigned comment added by 92.217.250.44 (talk) 14:56, 12 July 2019 (UTC)

I'm new and do not know the etiquette. Hence I will not update the page. However, it just came to my attention that you do not have the formula for entropy of the binomial distribution. it is 2^-S= ((N-U)/N)^N * (U/(N-U))^U, where in your notation N=n and U=np. The unit of S is in bits. I hope this helps, Jens Adler Nielsen Jens Adler Nielsen (talk) 10:12, 25 August 2019 (UTC)

Issue with one of the tail bounds?
The current tail bound given for p=1/2, n/2>=k>=3n/8 fails for n=64, k=26. I believe, based on the Chernoff inequality immediately above it, the 16 in the exponent should be replaced with a 4. Also, I can't find a way to derive this apparent formula from anything in the provided source.

69.119.31.14 (talk) 14:28, 12 August 2019 (UTC)

Confirming the error. The closest formula in the source is in Proposition 7.3.2, page 46, and it provides the lower bound $$ F(k; n, \frac 1 2) \geq \frac{1}{15} \exp\left(\frac{-16 (\frac n 2 - k)^2}{n}\right) $$. — Preceding unsigned comment added by 176.150.242.62 (talk) 00:08, 6 November 2019 (UTC)

Fisher information is actually expected Fisher information
Hi all, it seems to me that what is described here as the Fisher information is actually the expected Fisher information, i.e. expectation of the Fisher information (where the expectation is taken with respect to the data)

The actual Fisher information is: $$g_n(p) = \frac{x}{p^{2}}+\frac{n-x}{(1-p)^{2}}$$

For a derivation of the Fisher information, see example 2.10 of this book, and for a derivation of how taking the expectation leads to $$\text{E}_X[g_n(p)] = \frac{n}{p(1-p)}$$ see example 4.1 of the same book.

Should we change that in the page? Best

Ddreif (talk) 18:44, 14 November 2021 (UTC)

Wiki Education Foundation-supported course assignment
This article was the subject of a Wiki Education Foundation-supported course assignment, between 27 August 2021 and 19 December 2021. Further details are available on the course page. Peer reviewers: Ziyanggod, C.Hua Wang, Jiang1725.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 15:44, 16 January 2022 (UTC)

Consistent capitalization required on same page
In some places the article uses Binomial (not at start of sentence) and in others binomial. Universemaster1 (talk) 14:01, 28 May 2022 (UTC)

India Education Program course assignment
This article was the subject of an educational assignment supported by Wikipedia Ambassadors through the India Education Program.

The above message was substituted from by PrimeBOT (talk) on 19:51, 1 February 2023 (UTC)

Why are equations suddenly not showing?
Something in Firefox? Found it: text color must be black. OveGjerlow (talk) 19:09, 20 September 2023 (UTC)

Interpretation
I think some parts should be more accented in the first sentences of the interpretation section. I mean the sentence:

The binomial distribution is concerned with the probability of obtaining any of these sequences, meaning the probability of obtaining one of them

which is important, so should be almost repeated at the beginning, for example:

This probability formula means the probability of obtaining k successes in n trials for all possible combinations.

Then you can leave the rest: The formula can be understood as follows ...

Also, I know that Wikipedia has a unique problem with writing in plain language, but I would add a simple descriptive example that is easy to grasp: If we have a fair coin (p = 0.5) and two trials (n = 2), then if we want to get 2 heads (k = 2), there is only one possibility to achieve this (2! / (2!*0!) = 1), and since we need to get heads twice, it means that P = 0.5*0.5 = 0.25 (which is the same by our formula: P = 1*0.5^2*0.5^(2-2) = 0.25). Pawel.jamiolkowski (talk) 16:36, 15 July 2024 (UTC)