Talk:Negative binomial distribution

Error in Method of Moments?
The method of moments expression is given as p=1-E[X]/V[X], even though 1-E[X]/V[X] clearly evaluates to 1-p if you use the expressions for mean and variance given just above. It seems to me that there's some confusion between p and 1-p. I'm not sure if any similar mixups were made anywhere else in this article, might be worth checking. — Preceding unsigned comment added by Freedmanari (talk • contribs) 00:17, 13 November 2022 (UTC)


 * I also think that's an error 8.33.1.254 (talk) 17:03, 24 April 2023 (UTC)
 * I agree that there is an error on the method of moments. With this parametrization, it should be simply E[X]/V[X]. 216.164.128.208 (talk) 19:33, 6 October 2023 (UTC)

Error in variance?
The variance should be rq/p^2, but is instead listed as rp/q^2, with q=1-p. Is this true for some specific convention? I would expect not. The simple proof for the variance is that each successive geometric random variable of r such variables has the same parameters, so the variance of their sums is the sum of their variances, and as such is r(q/p^2)=rq/p^2. Is this not correct? — Preceding unsigned comment added by Bargrum (talk • contribs) 00:11, 13 October 2022 (UTC)

Negative binomial regression
There isn't yet an article on negative binomial regression in Wikipedia: maybe I'll write one if I get the time. This application of the distribution uses a reparameterization in terms of the mean and dispersion, so I have added a bullet point to clarify what this form is, and the various terms used, with some additional references. I moved Joe Hilbe's book into the references (updating to the second edition), and so deleted the section on additional reading Peterwlane (talk) 06:30, 30 May 2013 (UTC)

What's up with the pmf?
Wolfram mathworld, as well as the statistics textbooks I've consulted, list the pmf as having k+r-1 choose k-1, but in this page it is consistently choose r. Why is this? I can't find any difference in convention. In all cases p is probability of success and k is the desired number of failures. I would really like an explanation because as it stands I perceive it as an error. http://mathworld.wolfram.com/NegativeBinomialDistribution.html — Preceding unsigned comment added by Doublepluswit (talk • contribs) 18:17, 7 May 2013 (UTC)


 * On this page, k is the number of successes, not the number of failures, while on the Wolfram page, r-1 is the number of successes. Notice that k can be zero, but r cannot. This leads to the valid difference. I would be interested as to the reasoning for using the convention that is used on this page, however. Wolfram's/Ross's convention is more natural and convenient from my perspective at least. Machi4velli (talk) 09:31, 1 July 2013 (UTC)


 * This is really frustrating! I definitely think the Wolfram version should be mentioned / included in the alternate param sections. Ashmont42 (talk) 18:09, 19 June 2017 (UTC)

In the Wolfram version, r is the number of successes, NOT r-1. The coefficient is {k + r - 1 \choose r - 1} because we do not choose the last observation as a success. It must be a success as it is the r-th success and ends the trial. We still must account for the probability of the r-th success occuring, hence we get p^r(1-p)^k. Note that choosing the k failures is equivalent to choosing the r-1 successes so {k + r - 1 \choose r - 1} = {k + r - 1 \choose k}. SuntzuisafterU-235 (talk) 18:52, 13 May 2020 (UTC)


 * There is still a discrepancy of definition between the side box pmf and the main text pmf... In the box "r > 0 — number of failures until the experiment is stopped", in the text "observing this sequence until a predefined number r of successes have occurred". So then the two pmf are correct but do not match, this will be confusing to beginners. Can we once for ever define it as in the text? However, the infobox does not seem editable... — Preceding unsigned comment added by Frederic Y Bois (talk • contribs) 09:05, 17 May 2020 (UTC)


 * I just want to confirm that I too think there is a problem with this page. For example, the PMF on the side box is not consistent with the formulae for the mean and variance.  — Preceding unsigned comment added by RichardSecond (talk • contribs) 23:26, 1 August 2020 (UTC)


 * The PMF is wrong. The r and the k need to be swapped on the product ... (1-p)^k * p^r Geoffrussell (talk) 00:11, 29 December 2020 (UTC)


 * I think the side infobox is causing a lot of problems. Can someone help us find how to edit it? If it cannot be changed, then the rest of the article has to be written to be consistent with it. Rscragun (talk) 22:25, 2 January 2021 (UTC)


 * I don't think the PMF is wrong, I think the problem is that it has defined the parameter as the number of failures rather than the number of successes resulting in the flipped exponents but the same binomial coefficient. However it is extremely confusing and I also think it should be changed.

Example with real-valued r
In the case of an integer valued r one may correctly write:

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number r of failures occurs. For example, if one throws a die repeatedly until the third time “1” appears, then the probability distribution of the number of non-“1”s that had appeared will be negative binomial.

How would this example be in the case of a real valued r?

Beta negative binomial mixture
Would not it be a good idea to discuss also the beta negative binomial mixture (see among others Wang, 2011)?

Reference

Wang, Z. (2011). One Mixed Negative Binomial Distribution with Application. Journal of Statistical Planning and Inference, 141, 1153-1160. — Preceding unsigned comment added by Ad van der Ven (talk • contribs) 10:34, 17 August 2011 (UTC)

Wrong p in Sampling and point estimation of p?
The formula in the section "Sampling and point estimation of p" seems to give the probability of failure, which is not the definition we're using. For example, if you observe k=0, you saw no successes and k failures, so the probability of success (p) should be low. But the formula gives p=1. Should it be changed to k / (r + k) ? Martin (talk) 15:03, 28 November 2010 (UTC)

German version is better
I can barely read any German, yet the article here made more sense than this one...74.59.244.25 (talk) 03:26, 18 February 2008 (UTC)


 * I'm just noticing this comment now. I'll look at the German version. Michael Hardy (talk) 17:48, 14 April 2010 (UTC)


 * The German version also has a different form for the CDF, even though it seems to give the same definition of p (as the probability of success). I believe that the German version of the CDF is accurate. 104.129.196.214 (talk) 14:49, 8 October 2018 (UTC)

request for introduction
This article needs a proper introduction, that can help a layman understand what the term means, what it entails when used in text or conversation. Currently, this is not feasible, you'd have to scroll down a long ways and start reading the examples to even begin to understand; if you had no previous knowledge of mathematics or statistics at all. I'm putting this at the top, as I think it's more vital issue than any concerning the mathemathical/statistical content of the page. Starting with In probability and statistics the negative binomial distribution is a discrete probability distribution. Does not explain what Negative Binomial Distribution is - what separates it from other discrete probability distributions. I personally think this should be attempted as highest priority, obviously I'm not able to do it (or I wouldn't be writing here, eh). Assuming anyone (not to mentione everyone) is able to understand mathematical formulas that incorporate greek letters is IMHO pedagogically unsound --Asherett 12:17, 13 September 2007 (UTC)


 * Well, obviously the statement that "In probability and statistics the negative binomial distribution is a discrete probability distribution" does not way WHICH discrete probability distribution it is---that comes later in the article. As for making it clear to someone who knows NOTHING AT ALL about mathematics or statistics: that may not be so easy.  Perhaps making it clear to a broader audience can be done, with some effort, though. Michael Hardy 19:15, 13 September 2007 (UTC)

The current introduction is puzzling, first defining the distribution in terms of number of trials before the rth success, then referring to the general case with non-integer r without defining it. It would be good to provide a definition that includes non-integer r before referring to them. --Nonstandard (talk) 02:12, 17 May 2016 (UTC)

reversion
I have reverted the most recent edit to negative binomial distribution for the following reason.


 * Sometimes one defines the negative binomial distribution to be the distribution of the number of failures before the rth success. In that case, the statement that the expected value is r(1 &minus; p)/p is correct.


 * But sometimes, and in particular in the present article, one defines it to be the distribution of the number of trials needed to get r successes. In that case, the statement is wrong.

If you're going to edit one part of the article to be consistent with the former definition, you need to be consistent and change the definition. Michael Hardy 17:40, 7 Jul 2004 (UTC)

etymology
shouldn't there be a sentence or two saying why this is name negative binomial and what it has anything to do with binomial, especially for layman. —Preceding unsigned comment added by 164.67.59.174 (talk) 19:17, 2 September 2009 (UTC)

Equivalence?
If Xr is the number of trials needed to get r successes, and Ys is the number of successes in s trials, then


 * $$\operatorname{Pr}(X_r \leq s)=\operatorname{Pr}(Y_s \geq r).$$

The article went from there to say the following:


 * Every question about probabilities of negative binomial variables can be translated into an equivalent one about binomial variables.

I removed it. I tentatively propose this as a counterexample: Suppose Wr is the number of failures before the r successes have been achieved. Then Wr has a negative binomial distribution according to the second convention in this article, and it is clear that this distribution is just the negative binomial distribution according to the first convention, translated r units to the left. This probability distribution is infinitely divisible, a fact now explained in the article. That means that for any positive integer m, no matter how big, there is some probability distribution F such that if U1, ..., Um are random variables distributed according to F, then U1 + ... + Um has the same distribution that Wr has.

So how can the question of whether the negative binomial distribution is infinitely divisible be "translated into an equivalent one about binomial variables"? Michael Hardy 01:43, 27 Aug 2004 (UTC)


 * Removing the bit about "every question" seems OK to me; the important point is the relation between binomial and negative binomial probabilities. But Mike, it wasn't put in there for the purpose of annoying you. You might consider using the edit summary to say something about the edit rather than your state of mind -- how about rm questionable claim about "every question" instead of I am removing a statement that has long irritated me. Wile E. Heresiarch 15:23, 6 Nov 2004 (UTC)

Major reorganization
Trying to be bold, I've just committed several major changes. I found the previous version somewhat confusing, since it talked about three slightly different but closely related "conventions" for the negative binomial, and it never became fully clear to me which convention was in use at which point in the subsequent discussion. I've replaced the definition with what I consider to be the most natural version (the previous convention #3). The reasons that definition is "natural" is that it arises naturally as the Gamma-Poisson mixture, converges-in-distribution to the Poisson, etc. The shifted negative binomial (previous convention #1) can still be derived (see the worked example of the candy pusher). Now we have a single, consistent (hopefully!) definition of the negative binomial instead of three similar-yet-different conventions. I'm painfully aware that all of the previous three conventions are in use and sometimes referred to as the negative binomial; but then again, that doesn't even begin to exhaust the variations on this distribution that can be found in the wild, so why not pick one reasonble definition and stick to that here? --MarkSweep 12:04, 5 Nov 2004 (UTC)


 * Well, if we were writing a textbook, we would certainly want to pick one defn and stick to it. However, we're here to document stuff as it is used by others. If there are multiple defns in common use, I don't see that we have the option to pick and choose. Sometimes multiple defns can be collapsed by saying "#2 is a special case of #1 with A always a blurfle" and then describing only #1. I don't know if that's feasible here. Regards & happy editing, Wile E. Heresiarch 15:09, 6 Nov 2004 (UTC)


 * Yes, that was basically the case here. The previous "convention #2" was the Pascal distribution, which is a special case of the general negative binomial (previous "convention #3").  This didn't become fully clear in the previous revision, where the discussion of the Pascal distribution seemed more like an afterthought.  The previous "convention #1" appeared to be simply a Pascal distribution shifted by a fixed amount.  There is still a discussion of that in the worked example, but that could arguably be moved to the front and made more explicit. --MarkSweep 23:25, 6 Nov 2004 (UTC)


 * Hi, just found this page and I don't like that the starting point is the more general formula that has r being a strictly positive real. I think that 99% of the time somebody is interested in this distribution, r is going to be an integer. Which isn't to say that we should purge this more complete definition, just that there is a lot to be said for following the way the present article on the Binomial distribution is written (since this is closely related) and because that one is a heck of a lot clearer. I would suggest using one variable where r is an integer and a seperate variable where it is a real (to keep them straight). Along the same lines, I also think that starting talking about Bernoulli trials so far down the page is not a good idea--I'd like to see it up top. Is this what you two are talking about? Oh, wait, those dates are 2004! oh well, I'll still wait to see if anyone cares b/c this is a big edit. --O18 07:13, 9 November 2005 (UTC)


 * I support the previous comment. I am a graduating maths/computer science student, but the first definition was absolutely non-intuitive for me and only the "Occurrence" section made it clear. I doubt whether the generalization is more important than the fact that this distribution is derived from the Pascal distribution. —The preceding unsigned comment was added by 85.206.197.19 (talk) 20:10, 4 May 2007 (UTC).

Plots?
Is it possible to get some plots of what this looks like? I got sent here from the mosquito page, and anyone reading that probably doesn't want to wade through many lines of math, just see a picture of what it means. --zandperl 04:10, 30 August 2005 (UTC) Sketch-The-Fox 23:21, 19 August 2006 (UTC)
 * One year later, exactly the same issue. Remarkably, the mosquito page still links here, but there's no plot. Anyone?

The datapoints and datalines in the animated plot are all mistakenly right-shifted by one. The support begins at k=0, not k=1. The bar charts in some of the other languages (Spanish, Arabic, French, Polish, Slovenian, Turkish, Chinese) are ambiguous, since each bar extends a full unit, so perhaps the creator of the animated plot misinterpreted which side of the bars to assign the values to. The correct values are:

μ=10, r=1, p=0.909091: {{0, 0.0909091}, {1, 0.0826446}, {2, 0.0751315}, {3, 0.0683013}, {4, 0.0620921}, {5, 0.0564474}, {6, 0.0513158}, {7, 0.0466507}, {8, 0.0424098}, {9, 0.0385543}, {10, 0.0350494}, {11, 0.0318631}, {12, 0.0289664}, {13, 0.0263331}, {14, 0.0239392}, {15, 0.0217629}, {16, 0.0197845}, {17, 0.0179859}, {18, 0.0163508}, {19, 0.0148644}, {20, 0.0135131}, {21, 0.0122846}, {22, 0.0111678}, {23, 0.0101526}, {24, 0.0092296}}

μ=10, r=2, p=0.833333: {{0, 0.0277778}, {1, 0.0462963}, {2, 0.0578704}, {3, 0.0643004}, {4, 0.0669796}, {5, 0.0669796}, {6, 0.0651191}, {7, 0.0620181}, {8, 0.058142}, {9, 0.0538352}, {10, 0.0493489}, {11, 0.0448627}, {12, 0.040501}, {13, 0.0363471}, {14, 0.0324527}, {15, 0.0288469}, {16, 0.0255415}, {17, 0.0225366}, {18, 0.0198239}, {19, 0.0173894}, {20, 0.0152157}, {21, 0.0132835}, {22, 0.0115728}, {23, 0.0100633}, {24, 0.0087355}}

μ=10, r=3, p=0.769231: {{0, 0.0122895}, {1, 0.0283604}, {2, 0.0436313}, {3, 0.0559376}, {4, 0.0645434}, {5, 0.0695082}, {6, 0.0712905}, {7, 0.0705071}, {8, 0.0677953}, {9, 0.0637391}, {10, 0.0588361}, {11, 0.0534874}, {12, 0.0480015}, {13, 0.0426049}, {14, 0.0374548}, {15, 0.0326529}, {16, 0.0282574}, {17, 0.0242937}, {18, 0.0207638}, {19, 0.0176534}, {20, 0.0149375}, {21, 0.0125847}, {22, 0.0105606}, {23, 0.00882994}, {24, 0.00735829}}

μ=10, r=4, p=0.714286: {{0, 0.00666389}, {1, 0.0190397}, {2, 0.0339994}, {3, 0.0485706}, {4, 0.0607133}, {5, 0.0693866}, {6, 0.0743428}, {7, 0.07586}, {8, 0.0745054}, {9, 0.0709575}, {10, 0.0658891}, {11, 0.0598992}, {12, 0.0534814}, {13, 0.0470166}, {14, 0.0407797}, {15, 0.034954}, {16, 0.0296485}, {17, 0.0249147}, {18, 0.0207623}, {19, 0.0171718}, {20, 0.0141054}, {21, 0.0115146}, {22, 0.00934628}, {23, 0.00754669}, {24, 0.0060643}}

μ=10, r=5, p=0.666667: {{0, 0.00411523}, {1, 0.0137174}, {2, 0.0274348}, {3, 0.0426764}, {4, 0.0569019}, {5, 0.0682823}, {6, 0.0758692}, {7, 0.079482}, {8, 0.079482}, {9, 0.0765382}, {10, 0.0714357}, {11, 0.0649415}, {12, 0.0577258}, {13, 0.0503251}, {14, 0.0431358}, {15, 0.0364258}, {16, 0.0303548}, {17, 0.0249981}, {18, 0.0203688}, {19, 0.016438}, {20, 0.0131504}, {21, 0.0104368}, {22, 0.00822294}, {23, 0.00643535}, {24, 0.00500527}}

μ=10, r=10, p=0.5: {{0, 0.000976562}, {1, 0.00488281}, {2, 0.0134277}, {3, 0.0268555}, {4, 0.0436401}, {5, 0.0610962}, {6, 0.0763702}, {7, 0.0872803}, {8, 0.0927353}, {9, 0.0927353}, {10, 0.0880985}, {11, 0.0800896}, {12, 0.0700784}, {13, 0.0592971}, {14, 0.0487083}, {15, 0.0389667}, {16, 0.0304427}, {17, 0.0232797}, {18, 0.0174598}, {19, 0.0128651}, {20, 0.0093272}, {21, 0.00666229}, {22, 0.00469388}, {23, 0.00326531}, {24, 0.0022449}}

μ=10, r=20, p=0.333333: {{0, 0.000300729}, {1, 0.00200486}, {2, 0.007017}, {3, 0.0171527}, {4, 0.032876}, {5, 0.0526015}, {6, 0.0730577}, {7, 0.0904524}, {8, 0.101759}, {9, 0.105528}, {10, 0.10201}, {11, 0.0927365}, {12, 0.0798564}, {13, 0.0655232}, {14, 0.0514825}, {15, 0.0388979}, {16, 0.0283631}, {17, 0.020021}, {18, 0.0137181}, {19, 0.00914539}, {20, 0.0059445}, {21, 0.00377429}, {22, 0.00234463}, {23, 0.00142717}, {24, 0.000852336}}

μ=10, r=40, p=0.2: {{0, 0.000132923}, {1, 0.00106338}, {2, 0.00435987}, {3, 0.0122076}, {4, 0.0262464}, {5, 0.0461937}, {6, 0.0692905}, {7, 0.0910675}, {8, 0.107004}, {9, 0.114138}, {10, 0.111855}, {11, 0.101687}, {12, 0.0864336}, {13, 0.0691469}, {14, 0.052354}, {15, 0.0376949}, {16, 0.0259153}, {17, 0.0170736}, {18, 0.0108133}, {19, 0.00660178}, {20, 0.00389505}, {21, 0.00222574}, {22, 0.00123428}, {23, 0.000665436}, {24, 0.000349354}}

This is visually confirmed in the last two frames of the animation, where the mean of the lopsided plot is obviously to the right of the intended mean of 10.

Note that the Italian version also uses this animated plot. AndreasWittenstein (talk) 17:30, 4 February 2011 (UTC)


 * .  //  st pasha  »  23:06, 4 February 2011 (UTC)

Wow, that was quick! Thanks, Pasha. AndreasWittenstein (talk) 00:07, 6 February 2011 (UTC)

the mean is wrong
should be (1-p)r/p, surely

UM According to 'A First Course in Probability' by Sheldon Ross, the mean is r/p

Both are correct. The only difference is in the choice of random variable, in case if you choose X as number of trials for rth success its mean will be r/p, but if Y denote the number of failures for rth success then mean should be rq/p. If you look closely then X and Y are related as follows '''X=Y+r then using linearity of expectation E(X)=E(Y)+r= rq/p +r =r/p. — Preceding unsigned comment added by 103.37.200.103 (talk) 05:55, 7 June 2018 (UTC)


 * Wrong. Look, how many times do so many of us have to keep repeating this?  Sheldon Ross's book CORRECTLY gives the mean of what Sheldon Ross's book calls the negative binomial distribution.  But there are (as this article explains) at least two conventions concerning WHICH distribution should be called that.  Sheesh. Michael Hardy 21:42, 29 November 2006 (UTC)

Correct mean and variance. The mean for the distribution as defined on the page should be r*(1-p)/p, and the variance should be r*(1-p)/p^2. An easy way to verify these are correct is to plot them together with the pmf (using the same values for r and p). —Preceding unsigned comment added by Cstein (talk • contribs) 12:41, 15 June 2010 (UTC)


 * The mean and variance are incorrect as parametrized. Consider the case where r = 1. Then we have the geometric distribution with parameter 1-p. This has mean p/(1-p). Stats-reader121 (talk) 05:20, 10 October 2022 (UTC)

Please check again! Other sources, e.g., Wolfram Alpha and the German article, also say that the mean is r*(1-p)/p, but they use a different p. If you define

f(k) = {k+r-1 \choose r-1} (1-p)^r p^k $$ then the mean is r*p/(1-p), and the variance r*p/(1-p)^2. Gogol-Döring (talk) 09:44, 21 July 2010 (UTC)

If p is the positive probability, as the page states, then the mean is r*(1-p)/p. This needs to be fixed! — Preceding unsigned comment added by 71.163.43.88 (talk) 21:53, 13 March 2013 (UTC)

— The book I use is "Statistical Distributions, 2nd Edition" by Evans, Hastings, and Peacock. They define r as the number of successes, and p as P(success). They also define q=(1-p) which shortens all the formulas. They say the mean is rq/p, and the variance is mean/p. This makes sense to me. Suppose "success" is "being a genius". Suppose p is 10^-6 or one in a million. That means if you want r geniuses, you need about r/10^-6 = r × 10^6 = r million people. So the smaller p is, the bigger the mean has to be. And of course, the smaller p is, the less relevant q is, because it's basically one.

I can see that if you say you're looking for r failures, rather than r successes, you could get what this article says.

MikeDunlavey (talk) 14:01, 11 April 2015 (UTC)

Current mean is wildly wrong, if p is defined as the probability of success (as it is, currently). If I have an event with a success probability p of 10%, and I'm looking for the number of tries until 1 success, the current formula incorrectly reports that I need 0.1*1/(0.9) = 0.11... tries on average until I get a success. The actual mean in this situation should be r/p = 1/.1 = 10 tries.24.127.190.251 (talk) 17:08, 7 August 2020 (UTC)draypresct

the mgf is wrong
The numerator should be pe^t instead of p. The following link can support this http://www.math.tntech.edu/ISR/Introduction_to_Probability/Discrete_Distributions/thispage/newnode10.html

The bottom of that page gives the mgf of negative binomial distribution. I verified it. —Preceding unsigned comment added by 136.142.163.158 (talk • contribs)


 * WRONG!!! This article has it right, and so does the web page you cited.  They're talking about TWO DIFFERENT DISTRIBUTIONS.  You did not read carefully.  The negative binomial distribution dealt with in this article is supported on the set


 * { 0, 1, 2, 3, ... }


 * whereas the one on the web page you cite is supported on the set


 * { r, r + 1, r + 2, .... }


 * Both articles are clear about this. You need to read more carefully. Michael Hardy 19:55, 13 September 2006 (UTC)


 * Both articles may be mathematically correct, but the using the number of successes as the RV, the number of failures as the goal (one parameter), and the probability of success as the other parameter is to me less intuitive than using the number of failures as the RV, number of successes as the goal parameter, and the probability of success as the other parameter. The introduction to the example of selling candy using the article's current convention seems unnatural and forced. There is much more than being mathematically correct. Lovibond (talk) 16:45, 11 October 2015 (UTC)

The MGF given in the side panel was inconsistent with the formulas for the mean and variance given in the same panel, and inconsistent with the MGF for the geometric distribution; the MGF for the negative binomial should be the same as the MGF for the geometric distribution taken to the r'th power since a negative binomial r.v. can be interpreted as a sum of geometric r.v's (assuming they are both counting the same thing but with different end points). The problem was that the old displayed formula was treating p as the probability of failure instead of the probability of success. I have now fixed this Vapniks (talk) 13:08, 16 January 2023 (UTC)


 * Hi Vapniks, thank you for your input on the mgf. I just found the domain of the mgf seems still inaccurate: it probably should be "t<-log(1-p)". However, personally I do not know how to edit it on Wiki... Lcat0718 (talk) 10:08, 14 April 2023 (UTC)

Use of gamma function for a discrete distribution
Is it the convention among probability literature to represent the negative binomial with the gamma function? In Sheldon Ross's introductory text, the distribution is introduced without it (although that is an alternative representation of the distribution). I am not objecting but as a beginner am curious why this is how it is represented. --reddaly

I think either adding this way of writing it: $$\binom{n-1}{r-1}p^{r}(1-p)^{n-r}$$, or specifying that $$\Gamma(x + 1) = x!$$ would be beneficial. some people start running when they see the gamma function
 * Good idea. It would be easier on the eyes for those who haven't yet discovered how to love the &Gamma; function. Aastrup 22:24, 18 July 2007 (UTC)

Expected Value derivation
The classic derivation of the mean of the NBD should be on this page, as it is on the binomial distribution page. --Vince undefined 04:44, 12 May 2007 (UTC)
 * I agree. Aastrup 22:24, 18 July 2007 (UTC)

MLE
This article lacks Maximum Likelihood, and especially Anscombe's Conjecture (which has been proven). Aastrup 22:24, 18 July 2007 (UTC)

overdispersed Poisson
I recently added a note about how the Poisson distribution with a dispersion parameter is more general than the negative binomial distribution and would make more sense when one is simply looking for a Poisson distribution with a dispersion parameter. I think it's important to realize that the Poisson distribution with a dispersion parameter described by M&N is more general in that the variance has positive support instead of the more limited greater support than the mean. There certainly are situation where the negative binomial distribution makes sense, but if one is just looking for a Poisson with a dispersion parameter, why beat around the bush with this other distribution and not just go for the real thing? O18 (talk) 17:38, 26 January 2008 (UTC)


 * There is no such thing as "overdispersed Poisson", because if it is overdispersed, then it is not Poisson. If "the Poisson distribution with a dispersion parameter described by M&N" is important, then go ahead and describe it in some other article, perhaps in a new article. This article is about the negative binomial distribution only. The (positive) binomial distributions have variance < mean, and the Poisson distribution has variance = mean, and the negative binomial distribution has variance > mean. Bo Jacoby (talk) 22:36, 26 January 2008 (UTC).

First paragraph
Among several objections I have to the edits done on April 1st by 128.103.233.11, is this: the rest of the article is about the distribution of the number of failures before the rth success, not about the one that counts the number of trials up to and including the rth success. Thus, in the experiment that that user described, the distribution should have started at 0, not at 2. This matters because (1) we want to include the case where r is not an integer, because (2) we want to be able to see the infinite divisibility of this distribution. Michael Hardy (talk) 16:23, 4 April 2009 (UTC)

Trials up to rth success
This page should be updated to include a column in the side table for the version of the negative binomial for "numbers of trials to rth success". This is the most intuitive, if not the most common, version of this distribution. It answers the question "how many batches should I run if I want r success." I think the page on the geometric distribution handles this nicely, there is no reason the exact analog cannot be done here. Until this is done, I predict endless waves of people claiming that the mean is r/p. As it is, this page is currently unreadable. Formivore (talk) 22:05, 6 April 2009 (UTC)


 * Yeah, what situation is the distribution as described on this page useful for? I've only ever encountered the NB distribution that is "numbers of trials to rth success". O18 (talk) 05:39, 26 July 2009 (UTC)


 * Really, what's so difficult about this? The negative binomial distribution builds upon a sequence of Bernoulli trials. Each trial has a binary outcome: two possibilities. The words “success” and “failure” are just labels we arbitrarily attach to those 2 outcomes. Say, if your trials consist of flipping the coin, would you call Heads the “success” or Tails? If the trial consists of people voting for a democratic or a republican party, which one should be called the success (okay, you might have a personal opinion on this account :)? If the trial is a survey question with answers Yes/No — which one is success? and so on...
 * “Numbers of trials to rth failure” is just as valid interpretation as the opposite one. For example, suppose in a hospital a doctor gets fired after the 3rd patient who dies from his error. “A patient dying” we'll call the failure (well it would be awkward to call it a success). So how many patients will the doctor have until he gets fired — that would be our negative binomial distribution?  //  st pasha  »  20:47, 15 April 2010 (UTC)


 * stpasha, I think you just highlighted the point. The distribution on the page is the number successes before the rth failure. But you said, the number of patients. For the distribution on the page, it would be the number of patients that don't die until the doctor gets fired. But that is a much less natural parameterization. Think about a manufacturing process--you want to know how many widgets you have to make before you get, say three that work. The distribution on the page would how many bad widgets you have to make before you get 3 good ones, but you really want to know how many total widgets you have to make. 018 (talk) 23:12, 15 April 2010 (UTC)

The definition on this page ("the number of successes in a sequence of Bernoulli trials before a specified (non-random) number of failures") is not one I can recall seeing in any probability and statistics textbook. I have about 30 on my bookshelf; of the six I sampled, five defined NB(r,p) as the number of trials before the rth success, and one defined it as the number of failures before the r success. None used the definition of this page. One of the "classic" probability text (Feller, An Introduction to Probability Theory and its Applications, vol. 1, page 165) uses the number-of-failures definition. I too would suggest that this page describe those two competing definitions (similar to what is done on the Geometric distribution page). The current definition should either be scrapped, or (if someone can point out a source that uses that definition) perhaps retained as an alternate definition in a separate section. DarrylNester (talk) 16:30, 18 February 2013 (UTC)

Little match girl
Are we reading the same The Little Match Girl article? The one that is linked is about a girl who dies on a cold night after being too afraid to return home for fear of being punished, and instead dies in the cold. I really don't see the link between the Andersen story and this article, nor why it is a "classic example" of a negative binomial distribution. Considering what is in the linked article, I see no benefit (and a great deal of confusion) by putting that link there. The story is similar to the example of pat, but totally non mathematical, and adds no information about the neg. bin. dist. Just to make things clear, I am not the IP earlier, and I don't think the link should be there User A1 (talk) 02:57, 25 July 2009 (UTC)
 * In the setup of TLMG, Pat must empty her box of matches or face child abuse. In Dr. Evans' example, Pat must empty her box of candy bars or face child abuse. What is the probability that Pat freezes to death? --Damian Yerrick (talk | stalk) 13:10, 25 July 2009 (UTC)
 * So, in explaining about Fisher's exact test, do you think it would be inappropriate to add a link to the problem of adding the milk while the tea is still steeping (were such an article to exist)? In one sense it is not apropos, in another, it is just part of the canon regardless of how interesting it looks when you don't know the history. O18 (talk) 20:12, 25 July 2009 (UTC)
 * I think that the link firstly is far to tenuous, both myself and an IP have no idea what you are on about w.r.t. the link. Secondly, I would remind you of WP:EGG (no easter egg links). Finally, should we then link integer to Hansel and gretel, temperature, porridge and bed to little red riding hood? Follicle_(anatomy) to Rapunzel ? I consider the links no more bizzare than this.User A1 (talk) 01:45, 26 July 2009 (UTC)
 * Just to be clear, I believe Damian is being sarcastic? User A1 (talk) 01:47, 26 July 2009 (UTC)
 * Sorry, still going: in explaining about Fisher's exact test, do you think it would be inappropriate to add a link to the problem of adding the milk while the tea is still steeping. Sure that's fine, as it is a good example of the applicability of the mathematics, but I wouldn't then link that to waltzing matilda, on the pretext that the swagman steeps his tea. User A1 (talk) 01:49, 26 July 2009 (UTC)
 * After using google for a while, it appears that the little match girl, negative binomial link was only on website that use the text as it used to appear in this article. I don't understand why you couldn't see why the link was related, but given that it is 2 to 1, and the 1 doesn't really care, I say lets just ax it and be done. O18 (talk) 05:30, 26 July 2009 (UTC)

Major Changes
I have added the alternate formulation of the negative binomial that describes the probability of k **trials** given r sucesses to the side table and to the body section describing the pmf. This presentation of both formulations follows e.g. Casella, and I believe is justified both by the record of this talk page as well as by theoretical considerations. While the trials to r sucesses formulation has some disadvantages (parameter-dependent support) it has the big advantage actually being the waiting time distribution of a Bernoulli process. The two-column side table was taken from the page on the geometric distribution; in fact the two geometric distributions are just the cases of the two neg. binomials with r=1. If it's worth doing there (where the difference is a factor of (1-p) fer cryin' out loud), it's worth doing here.

I have not modified any other sections. I believe everything else on the page is still valid after this change (since the original pmf is still there). Some of it may now not be needed and could be removed. If anyone has cleaner way of doing this presentation (which is a bit clunky) go ahead. However, I would appreciate it if this change was not reverted without a good argument against it. Formivore (talk) 23:44, 16 October 2009 (UTC)


 * I believe the second formulation (the number of trials before r-th success) should be removed as a second column of the infobox. A person who doesn’t know what the NB distribution is and comes to this page, will likely to get confused by the fact that there seems to be two different(?) distributions by the same name, and will never realize that they only differ up to a shift by a constant r. Btw Casella starts with the informal description of what is called “2nd formulation” here, but later on redefines it into our “1st formulation” and says that “unless specified otherwise, later on we will always be understanding this definition when we use term ‘negative binomial’”.
 * If we leave only one definition (leaving the other one as a short subsection describing the differences in the alternative formulation), it has following advantages: (1) the reader will never get confused regarding which definition is used on the page, (2) this definition can be properly generalized to the negative multinomial distribution, (3) this definition is infinitely divisible, arises as a mixture of gamma-poisson, and other things mentioned on this talk page.
 * It will also be beneficial to recast the parameter p as the probability of failure not of success (or alternatively, to swap around what we consider failures and what are successes here). E.g. we may define the NB distribution as “probability of having k=0,1,2,… successes before a fixed number r of failures occur”? That way the definition sounds more naturally, and extends to the multinomial case gracefully. …  st pasha  » 10:44, 8 December 2009 (UTC)


 * Both of these are couched it terms of trials and r appears in the "choose" function, but r is stated to be a real. Maybe we should start simple and then get more complicated later? What loss is there to having r be an integer and then having a section that allows for otherwise and then states the pdf with gamma functions (I'm assuming that is what takes the place of the choose). 018 (talk) 14:50, 8 December 2009 (UTC)
 * I have to disagree Stpasha. If someone comes to this page not knowing what the NB is, there is a good chance they will have the wrong distribution in mind leading to more confusion, not less. Roughly half of this talk page is taken up with confusions of this sort. Maybe it's not best to have the double side table, but there should at least be a very clear explanation at the top of the article of the two formulations.
 * Successes and failures are defined as they are to generalize the geometric distribution. This is a more important analogy that the negative multinomial, which is fairly obscure. That said I don't see how one way is more natural than the other for the NB. Formivore (talk) 07:42, 12 December 2009 (UTC)


 * Well, the “success” and “failure” are just arbitrary labels we assign to two possible outcomes of a Bernoulli trial. Say, if we consider an individual who has small chance p of dying in each day (so that the lifespan has geometrical distribution), then the event of his death will be called “success”.
 * In order to have consistency we might as well reparametrize the geometric distribution as well, so that its pmf is f(k) = pk−1(1−p). This expression actually looks simpler than the f(k) = p(1−p)k−1 (although of course they are quite the same). …  st pasha  » 12:36, 12 December 2009 (UTC)

Looking at this article is looking at social failure. All kind of flotsam has accumulated. Regardless of whether the side table should have one or two columns, this article should be revised to remove redundancies and sections that are not notable. I'd propose the following changes: 1)Move the "Limiting Case" and "Gamma-Poisson mixture" subsections further down in the article. I don't know if the parameterization used to arrive at the limit is broadly applicable, or if it is only used for this derivation. If the former is true, then this should be explained. Otherwise this should be moved to the "Related Distributions" section. The mixture derivation does not describe a specification at all and should be moved to the "Occurrence" section. This section should just describe what this distribution is, that's it. 2)The "Relation to other distributions" subsections describes, in a derivation involving the incomplete gamma function, the k trials to r successes stuff that the wrangling has been about. A good explication at the beginning of the article will obviate this section. 3)The "Example" at the end of the article is unnecessary and poorly written. There is also a much shorter example in the "Waiting time in a Bernoulli process" subsection that does not involve candy bars. Formivore (talk) 08:59, 12 December 2009 (UTC)

The article titled geometric distribution has two columns: one for the number of trials before the first success, and one of the number of trials including the first success.

It was necessary to do that because before it appeared that way, idiots wreckless irresponsible editors kept coming along saying "I CAN'T BELIEVE THIS ARTICLE MAKES SUCH A CLUMSY MISTAKE!!!! MY TEXTBOOK SAYS....." and then recording information that's correct for one of the two distributions and wrong for the other, and failing to notice that there are two of them, even though the article clearly said so.

We cannot omit the negative binomial distribution of the number of trials before the rth success because
 * That's the one that's infinitely divisible;
 * That's the one that arises as a compound Poisson distribution;
 * That's the one that allows r to be real rather than necessarily and integer.

Michael Hardy (talk) 19:55, 12 December 2009 (UTC)


 * Ok it seems like it’s either me, or Michael (or both) are confused here. Which only reinforces the point that the entire situation is utterly befuddling. The first column is not the “number of trials before the r-th success”, but rather the number of “failures” before the r-th success. So the difference between two columns is not in before/including, but rather whether we count only the failures, or both the failures and the successes. The two definitions differ by a shift constant r, so it's no biggie.
 * Oh, and I'm not saying we should omit the definition of negative binomial as the number of failures before the rth success, that's the one I'm suggesting to keep, while the other one to scratch out (the one whose support starts from r). …  st pasha  » 09:36, 13 December 2009 (UTC)

OK, I haven't look at this discussion for a while. I was hasty with language; what I meant was:
 * One distribution is that of the number of trials needed to get a specified number of successes; and
 * One distribution is that of the number of failures before a specified number of successes.

The latter allows the "specified number" to be a non-integer, and is infinitely divisible. If we're going to keep only one, it should be that one. Michael Hardy (talk) 03:16, 21 December 2009 (UTC)


 * Michael, I think that would be a great idea for a text book, but I would rather see the page be, well, encyclopedic in its coverage. One thing I think is certain, if we want to state the non-integer case, it should be in another section, not in the bar on the right. 018 (talk) 16:49, 21 December 2009 (UTC)

I never said we should have a "bar" for the specifically non-integer case. But we should have one for the case that's supported on {0, 1, 2, ...}. And it should state a parameter space that includes non-integers. Somewhere in the text of the article that should be explained (possibly in its own section). Michael Hardy (talk) 20:48, 21 December 2009 (UTC)

More dumbing down needed?
The recent edits by user:24.127.43.26 and by user:Phantomofthesea make me wonder if we need to dumb this down again to rid ourselves of irresponsible editors who edit without paying attention to what they're reading or what they're writing. Michael Hardy (talk) 02:52, 19 February 2010 (UTC)


 * Maybe we should reject the “success/failure” terminology altogether, and instead use something more neutral, like “0/1”. That way whenever a person reads this page he/she would have to stop for 3 seconds and think how our 0/1 maps to his/her textbook’s success/failure.  //  st pasha  »  20:51, 15 April 2010 (UTC)

a question
I don't want to mess with the entry, but according to Casella & Berger, the pmf listed here is incorrect. The p and (1-p) are switched. It should be (p^r)(1-p)^k. I haven't looked through to see how that mistake affects the rest of the article, if at all, so I'll leave it to someone with more knowledge of this article than me to correct. —Preceding unsigned comment added by 128.186.4.160 (talk • contribs)
 * Sigh............. not this comment again. The article says:
 * Different texts adopt slightly different definitions for the negative binomial distribution.
 * OK? You need to read what it says!. Michael Hardy (talk) 21:25, 13 April 2010 (UTC)

Ok, I read it more carefully and I concede that what is written here is technically correct. However why not just stick with the Casella Berger definition on here? I'd argue that the Casella/Berger book is the most widely used of its kind, so defining the pmf this way is just confusing to most people. —Preceding unsigned comment added by 68.42.50.243 (talk • contribs) 21:00, 13 April 2010


 * I don't see where this decision was discussed above. In the first instance that I see of such a discussion Michael Hardy is saying it has happened before, so I guess there must be an unlinked archive? 018 (talk) 02:58, 14 April 2010 (UTC)


 * Sorry, I've now looked at this talk page and see that it looks to me like MH pointed out that any change would require the entire page be changed in the section titled, " reversion". Since then in the section, "Trials up to rth success" Formivore correctly predicts endless waves of people correcting it because of the more intuitive interpretation of the alternative specification. There has also been a more lengthy discussion in the section titled "Major Changes" where Formivore tries to update the article and describes it as being in disarray and confusing. Formivore appears to have given up. In the end MH likes that one can make a (somehow useful?) change of support for one of the parameters for the less intuitive parameterization. 018 (talk) 04:47, 14 April 2010 (UTC)


 * To focus my ramble into a question, what is the value of being able to treat r as real and not just integer valued? Why do we care? Also, even if we do want this, might it make more sense to give that formulation a separate section that starts by reparameterizing, showing the new cdf/pdf and then explaining why it is useful. 018 (talk) 17:02, 14 April 2010 (UTC)
 * Another reason you might want r to be real is so that the negative binomial becomes an overdispersed generalization of the Poisson distribution. That is the only capacity in which I use the negative binomial distribution in my own work. That having been said I agree that the suggestion to have the more general parameterization in a separate section makes sense. 69.143.122.185 (talk) 16:12, 23 February 2023 (UTC)

We want to treat r as real because it shows that this is an infinitely divisible distribution and that there's a corresponding Levy process. Michael Hardy (talk) 17:46, 14 April 2010 (UTC)
 * Okay, so (1) Why is there no mention of "Levy process", and (2) why does this trump the overall understandability of the article? Would you agree that this parameterization could be moved it its own section? 018 (talk) 18:09, 14 April 2010 (UTC)

an idea
We could put the whole box in a transcluded page to make it somewhat more difficult to quickly edit it. It is a bit extreme, but there have been many well intentioned (if somewhat inattentive) incorrect edits to it. 018 (talk) 17:07, 29 April 2010 (UTC)


 * Sounds great.  //  st pasha  »  18:02, 29 April 2010 (UTC)
 * And we need to do the same thing with dice/die...  //  st pasha  »


 * Okay, I did it here. Lets see how it goes. 018 (talk) 19:23, 30 April 2010 (UTC)

Error in definition section equations.
In the summary on the right hand side we have:

r - is the number of failures

pmf: c \times (1-p)^r p^k

which makes sense.

Then in the definition section we have, success is p, failure is (1-p) but then the pmf function is given as

c \times (1-p)^k p^r

which is the wrong way round - this gives the probability of k failures in r+k trials. This error continues thoughout the definition section. However in the related distributions section when the more common version of the pmf is written using \lambda (more commonly \theta in my experiance) we have (1-p)^r p^k, which is as it should be following the textual definitions given previously. —Preceding unsigned comment added by 193.63.46.63 (talk) 10:24, 4 October 2010 (UTC)

Very serious issues with File:Negbinomial.gif
I see there are several very serious issues with the article's picture Negbinomial.gif:



I am suprised noone has caught this after all the time this picture (or its previous incarnations) has been shown in the title page.

We know that the mean of the distribution is $$\mu = \frac{pr}{1-p}$$. To have a constant mean value of 10, then p and r have to be related as $$p=\frac{10}{10+r}$$, which is the wrong thing to do, as p and r should be completely independent of one another. For example, for r=10, then p=1/2. But if r=20, then p=1/3. Since p and r are the exogenous variables of the distribution, then we should show a picture of a distribution that keeps one or the other constant. We should NOT show a picture that varies both simultaneously, since this does not show the true behavior of the function.

Also, if p and r are set this way, then there is no way that the standard deviation will be constant. Since we know that $$\sigma^2=\frac{pr}{(1-p)^2}$$, then we find that $$\sigma^2=\frac{10(10+r)}{r}$$, a non-constant function of r. And it is odd that the author of the picture chose to show the standard deviation as a horizontal segment. It should be in the same domain as the mean (i.e., a vertical line).

This picture has so many issues that I must recommend that it not be shown and that a new one (hopefully a correct one) be developed. In an earlier post, someone mentioned that the German language version of the article is a better one. Like that person, I don't read German either, but I can tell that the sample picture used there is a correct one. Perhaps an equivalent picture for the English language can be developed to replace Negbinomial.gif. If noone comes up with one in the next few days, I'll just bite the bullet and make my own and upload it.  Bruno  talk  15:58, 25 May 2011 (UTC)


 * This is not an error, just an expositional bit you don't like. I very much prefer the explanation on the page to one where the mean is changing, though it might be worth adding p to the graph as well as r to make it clear that both parameters are changing. One could, of course, reparameterize the NB so that the graphs shown were not only changing one parameter, in some sense it is arbitrary. In any case, the pmf graph should probably have p labeled on it even if it were not changing. 018 (talk) 16:33, 17 August 2011 (UTC)
 * Yes, while it may not be an error, I surely don't like it as you point out, and neither should you. I frankly do not see how presenting a function where its main parameters vary simultaneously contributes to clarity. While the animation looks pretty, the reader cannot get a clear understanding as to how the function actually works.  Either you vary one parameter at a time and do the animation that way, or you show a static picture like in the German language article.  Yes, it is not wrong (save for the standard deviation point), but it is not right either.
 * There is already a lot of fodder for confusion by presenting the material in the article in a way that deviates from the standard texts. The picture contributes to this.  Again, it is not wrong either, but it is certainly not right, in that readers are left wondering what is going on.  We see evidence of this elsewhere in this Talk page, where less-than-careful readers get into pitfalls, and other writers feel compelled to point out the deviations.  I don't see why the article should be rife with these problems, as there are much better ways to present the material.  This is a substandard article, starting from the picture.  I'd offer to rewrite the whole thing but I know I will certainly run into the same resistance I am running into by pointing out deficiencies in the graphic that could easily be corrected.  Expositional bit, my foot!  Bruno  talk  13:47, 18 August 2011 (UTC)
 * The mean being constant is what I think makes it clear. I've never gotten much out of drawings where the centrality parameter is changing. Do you agree that this figure could be made clearer by adding all parameters? 018 (talk) 19:29, 18 August 2011 (UTC)

I think the figure here does a poor job of providing a visual idiom for the negative bionomial distribution that distinguishes it from monotone distributions link the geometric with a single parameter. — Preceding unsigned comment added by 65.127.74.2 (talk) 15:41, 19 May 2013 (UTC)

Concrete outcomes vs subjective values
The outcomes "success" and "failure" are concrete and can be mapped to a set {1,0} by an indicator function. The values "good" and "bad" are subjective values based on social constructions, experiences and personal preferences (concepts that may not even exist in concrete form). The comparison here between "success"/"failure" and good"/"bad" does not make any sense: "When applied to real-world situations, the words success and failure need not necessarily be associated with outcomes which we see as good or bad." The point of an experiment is *not* to have subjective biases, so why would the experimenter see the outcomes as good or bad? Runestone1 (talk) 00:44, 2 June 2011 (UTC)


 * The quote agrees with you... — Preceding unsigned comment added by Machi4velli (talk • contribs) 09:43, 1 July 2013 (UTC)

Correct equation?
In the "Extension to real-valued r" section, I see a denominator of "x! gamma(x)". Is that right, or should it be "x gamma(x)"?


 * It's correct. Note that if x were a positive integer, we would have gamma(x) = (x-1)!, and you'll see that it would reconstruct the binomial coefficient in the integer case Machi4velli (talk) 10:00, 1 July 2013 (UTC)

Mode does not appear to be correct
The current mode given doesn't seem to match other sources for negative binomial mode. I've checked under the multiple parameterizations and haven't come up with that formula. Can someone else confirm of deny this? — Preceding unsigned comment added by 108.246.235.64 (talk) 23:40, 19 July 2013 (UTC)


 * The current mode given doesn't seem to match other sources for negative binomial mode. I've checked under the multiple parameterizations and haven't come up with that formula. Can someone else confirm of deny this? — Preceding unsigned comment added by 108.246.235.64 (talk) 23:49, 19 July 2013 (UTC)

Gamma-Poisson mixture parameter confused
I changed the second sentence of the section "Gamma–Poisson mixture" which had suggested that negativebinomial(r,p) ~ poisson(gamma(shape = r, shape = p/(1 − p))). Instead, it should be the rate parameter that should have that value. Here's some R code that makes it obvious:


 * many = 10000
 * r = 15   # trunc(runif(1,2,20))
 * p = 0.56 # runif(1)
 * x = rnbinom(many,r,p)
 * # negativebinomial(r,p) ~ poisson(gamma(shape = r, rate = p/(1 − p)))
 * lambda = rgamma(many,shape = r, rate = p/(1 − p))
 * y = rpois(many,lambda)
 * # negativebinomial(r,p) ~ poisson(gamma(shape = r, scale = p/(1 − p)))
 * lambda = rgamma(many,shape = r, scale = p/(1 − p))
 * z = rpois(many,lambda)
 * plot(sort(x),1:many/many, xlim=range(c(x,y)),ylim=c(0,1),col='green', lwd=3, type='l')
 * lines(sort(y),1:many/many, col='blue')
 * lines(sort(z),1:many/many, col='red')
 * plot(sort(x),1:many/many, xlim=range(c(x,y)),ylim=c(0,1),col='green', lwd=3, type='l')
 * lines(sort(y),1:many/many, col='blue')
 * lines(sort(z),1:many/many, col='red')

Scwarebang (talk) 00:24, 12 October 2013 (UTC)

An error in PMF, Mean and Variance formulas
I always have assumed that the formulas given in Wikipedia were correct. I will not do that anymore... I have no idea, why such basic mistake prevailed for so long. Maybe because the mistake is on the sidebar, which is not straight-forward to edit (I have no idea, how to do it). Here are the correct entries:

$$ pmf = \left\{ \begin{array}{cc} \begin{array}{cc} (1-p)^k p^r \binom{k+r-1}{r-1} & k\geq 0 \\ 0 & k < 0 \end{array} \end{array} \right. $$

Mean = $$ r \left(\frac{1}{p}-1\right) $$

Variance = $$ \frac{r}{p^2} \cdot \left(1-p\right) $$

The results were checked manually on Wolfram's Mathematica, and then checked again, if there is no error on the Mathematica's side. — Preceding unsigned comment added by Adam Ryczkowski (talk • contribs) 10:21, 23 October 2013 (UTC)


 * You are right when $$p$$ is the probability of success and $$r$$ is the number of successes until the experiment is stopped. This is because in this case, $$X$$ is a sum of $$r$$ geometric random variables with probability $$p$$. The mean of each such variable is $$\left(\frac{1}{p}-1\right)$$ so the mean of their sum is $$ r \left(\frac{1}{p}-1\right) $$.
 * However, the article defined $$r$$ as the number of failures until the experiment is stopped. Then, $$X$$ is a sum of $$r$$ geometric random variables with probability $$1-p$$. The mean of each such variable is $$\left(\frac{1}{1-p}-1=\frac{p}{1-p}\right)$$ so the mean of their sum is $$ r \frac{p}{1-p} $$ as in the side bar. --Erel Segal (talk) 08:50, 8 December 2015 (UTC)

Isn't the CDF wrong?
CDF: $$1-I_p(k+1,\,r)$$, the regularized incomplete beta function.

According to the regularized incomplete beta function $$I_x(a,b) = \sum_{j=a}^\infty \binom{a+b-1}{j} x^j (1-x)^{a+b-1-j}$$.

So, $$CDF=1-\sum_{j=k+1}^\infty \binom{k+r}{j} p^j (1-p)^{k+r-j}$$. But we know that $$CDF=1-\sum_{j=k+1}^\infty \binom{j+r}{j} p^j (1-p)^{r}$$.


 * I agree that the CDF is wrong. Based on the way $$p$$ is defined (as the probability of success), I would have expected $$1-I_{1-p}(k+1,\,r)$$ or $$I_p(r,\,k+1)$$. I've been going in circles for a while trying to figure out why values calculated using this (incorrectly) presented CDF don't match R's pnbinom function. Fwiw, the German version of this page presents the CDF as $$1-I_{1-p}(k+1,\,r)$$, which matches numerical calculations. — Preceding unsigned comment added by 104.129.196.214 (talk) 14:55, 8 October 2018 (UTC)


 * Yes, the CDF is wrong. Based on the discussion above, and multiple references. I fixed the CDF in the article body, but I can not figure out how to fix the CDF in the article's table, so it is still wrong in the table. If anyone can fix it in the table that would be appreciated. 69.143.122.185 (talk) 16:14, 23 February 2023 (UTC)

MLE section uses wrong pmf?
Doesn't the MLE section use the wrong pmf (where r is the number of successes instead of failures)? This is inconsistent with the rest of the article. — Preceding unsigned comment added by Nicole.wp (talk • contribs) 23:35, 27 May 2014 (UTC)

I thought so, and changed it. Apologies if I should have discussed first.Clay Spence (talk) 20:45, 8 October 2014 (UTC)

Inconsistent introduction
The case with r real contradicts the initial definition "The number of successes before r failures in independent Bernoulli trials". It should be made clear at the outset that this case is a special case. The reason for putting this case first seems to have been that it's the most common in practice. I doubt this, although it might, perhaps, be the most common in elementary texts (and is all that the Wolfram Math World article discusses). The real case occurs often when the assumptions for the Poisson distribution are not quite met. For example, fatal road accidents tend to follow the Poisson distribution, while deaths do not because more than one may occur in a single accident. We get the negative binomial if the number of deaths per accident follows a logarithmic series distribution. Similarly, people can have different rates for non-fatal accidents, so that repeated accidents to the same person are not Poisson distributed. I remember the second case from my first statistics course in my first year at university (1964/65). Greenwood and Yule (1920) studied accidents to women manufacturing high explosive shells in World War I. Letting the Poisson parameters for individual workers (representing accident proneness) be gamma distributed gives the negative binomial. (In our course we started with the negative binomial and an accident proneness variable with an arbitrary distribution. This then gives rise to an integral equation for this distribution which the gamma fitted. High school calculus was a prerequisite for B.Sc. and the course discussed the beta and gamma distributions, but we would not have been able to solve the integral equation except by trying all the continuous distributions we knew.) I suggest starting with a description such as "the negative binomial is a distribution on the integers 0, 1, 2, 3, ... with two parameters often denoted by p and r where p is between 0 and 1 and r is positive." Then continue with "The special case when r is an integer is known as the Pascal distribution and represents the number of successes in independent Bernoulli trials before obtaining r failures." I also think the different definitions in the Pascal case should be mentioned here and not relegated to a side bar. Continue "The more general case (also known as the Polya distribution) occurs in at least two ways." Then list those ways as I described above. We should also mention in the article (but not in the introduction) that alternative parametrizations for the Polya case are often more useful, for example the mean = pr/(1-p) = $$\mu$$ and r or odds ratio = p/(1-p) = $$\theta$$ and r. TerryM--re (talk) 00:30, 6 October 2014 (UTC)

Incorrect Fisher information for the convention used in this article
The negative binomial distribution as defined in this article represents the number of successes (k) before r failures. The fisher information is correct for the formulation representing the number of failures (k) before r successes. See the following WolframAlpha calculation:

Basically, p and (1-p) need to be switched.


 * I agree. I have calculated it myself but I don't know how to change the original.
 * Note that this only applies to known r (which usually means that r is an integer). When both r and p are both unknown Fisher's information becomes a matrix which doesn't have a closed form.
 * TerryM--re (talk) 21:01, 3 December 2014 (UTC)

I also calculated it by myself and it is correct. I edited it now.

Polya naming convention - bioinformatics
The article head includes the statement, "There is a convention among engineers, climatologists, and others to reserve “negative binomial” in a strict sense or “Pascal” for the case of an integer-valued stopping-time parameter r, and use “Polya” for the real-valued case." I'm wondering whether people think it would useful to note that this convention is not followed in bioinformatics. Specifically, the negative binomial distribution plays a central role in interpreting the counts of Next Generation Sequencing reads as over-dispersed Poisson variates. See the deseq package homepage http://bioconductor.org/packages/release/bioc/html/DESeq.html. Note that this is among the top 5% of downloads in bioconductor, bioconductor being something of an industry standard distribution of analysis packages. (here's a citation for that claim http://www.nature.com/nmeth/journal/v12/n2/abs/nmeth.3252.html

I might edit the sentence as follows: "There is a convention in fields such as engineering and climatology (notably excluding bioinformatics) to reserve “negative binomial” in a strict sense or “Pascal” for the case of an integer-valued stopping-time parameter r, and use “Polya” for the real-valued case." or perhaps "There is a convention among engineers, climatologists, and others to reserve “negative binomial” in a strict sense or “Pascal” for the case of an integer-valued stopping-time parameter r, and use “Polya” for the real-valued case (notably, this convention is not observed in bioinformatcs)."

So my question is, is the loss in clarity due to inserting a parenthetical element like that worth the added information? Having written this, I'm not so sure, because it seems like if you know enough to understand (or even wonder about) the relevance of the NBD to bioinformatics, the naming convention issue won't confuse you. And besides, is bioinformatics so important that the exception is worth putting in the head of the article? I mean, it's my field, so it's hard for me to judge its notability. Flies 1 (talk) 00:04, 31 January 2016 (UTC)

Median is hard to find
I realize the median of this distribution is hard to find (papers exist on the bounds), but that fact should at least be mentioned on the page Barry.carter (talk) 12:26, 24 December 2016 (UTC)

Mixed conventions
I don't have the stats background or time to go through and figure this out, but my stats professor was looking at this article this morning and told us in class today that, as warned about in §Alternative formulations, the article is using subtly different definitions of the NB distribution in different sections and is accordingly self-contradictory. This has evidently been controversial here in the past, but since there's now a stable set of standard and alternate formulations, it would be great if someone went through to check that everything is internally consistent with the given definition. FourViolas (talk) 20:27, 17 October 2018 (UTC)


 * I agree this is a very big issue. What we may need is to rewrite the entire page using a more common formulation.  I haven't seen any other source use the formulation that this wiki page uses.  Every time it is fixed, someone always comes and breaks it again.  The problem is that there are many well-intentioned vandals that will come and make very subtle changes to the page, like simply adding a "-1", or swap "r and k", or whatever.  They don't realize that whatever source they are using has a different formulation than what this page uses, and they are unknowingly breaking the equations and making the page internally inconsistent.


 * Even though I drafted the alternative formulations section, I'm not confident I could rewrite the page in a different, more common formulation. I suggest perhaps a formulation where X is counting r failures given k successes, even though I don't personally find this formulation the most intuitive.  If someone wants to do this, I fully support them.  Ajnosek (talk) 19:29, 17 April 2019 (UTC)


 * Yes, this ("well-intended vandals") is a fundamental problem of math on Wikipedia. I really do not know how could it be solved. See also here. Boris Tsirelson (talk) 04:14, 18 April 2019 (UTC)


 * I think at least the self-contradiction between the main definition ("number r of successes") and the tempalte ("r > 0 — number of failures") should be resolved. Which one should be standard?オオカワウソ (talk) 20:22, 11 July 2020 (UTC)

A diversity of conventions have seemingly always haunted the topic of the negative binomial distribution, and people who learned about the NB distribution is one course often don't know that the version they learned isn't the only one. Michael Hardy (talk) 17:18, 18 April 2019 (UTC)

Is the gif for the pmf wrong?
Is the gif for the pmf wrong? Because as there is a higher number of successes necessary, the random variable should also increase and thus mean too. It will start looking more normal but also shift right, correct? Plus, when r=40, there is no way that k could be so low. That doesn't make sense to me SwagmanJ (talk) 16:14, 19 June 2019 (UTC)

Mean does not agree with parameter definitions
Looking at this now, I'm sure that the formula for the mean in the sidebar doesn't agree with the definition of p in the very same sidebar. One seems to use p as probability of success, but the mean seems to use it as the probability of failure. It would be nice if it could be consistent within the same sidebar. 92.23.239.27 (talk) 16:11, 3 April 2020 (UTC)


 * Yeah the use of p,k, and r seem inconsistent. On the sidebar, r is the number of failures and k is the number of successes. Elsewhere, r is the number of successes and k is the number of failures. In pmf definition paragraph, p is the probability of success, but the formula gives $$Pr(X=k) = ...(1-p)^r...$$, which is (probability of failure)^(number of success), which is surely wrong. 2607:FEA8:1CE0:D71B:4541:EA2D:A73A:C911 (talk) 21:34, 23 April 2020 (UTC)


 * The pmf in the sidebar is plain wrong. If k is the number of successes the binomial coefficient should be choosing k-1 since the last trial must be a success. But it chooses k instead which is nonsense! Silverhammermba (talk) 05:41, 25 April 2020 (UTC)


 * The mean in the side-bar is indeed incorrect should be (1-p)r/p. How do we fix this? Causing lots of confusion and stealing man-hours from this world. — Preceding unsigned comment added by Rohitpandey576 (talk • contribs) 06:08, 3 August 2020 (UTC)

I think the characteristic function in the sidebar is wrong, too? Seems to be the same issue, p is treated as number of failures, which is inconsistent with the rest of the article. — Preceding unsigned comment added by 156.116.8.72 (talk) 09:06, 7 August 2020 (UTC)


 * An error was introduced into the PMF formula on 17-Jun-2020. I have fixed that. I think the more simple ones like mean, PMF etc are correct as defined in the sidebar; and match what was originally put there in 2010. The sidebar has a very strange convention: r (the number which is fixed) is defined as failures, but p is the probability of success. That is why the PMF has a (1-p)^r term (prob of failure ^ number of failures), when most of us are probably used to seeing p^r (prob of success ^ number of successes). Adpete (talk) 05:21, 24 August 2020 (UTC)

Inconsistency in definition of parameter r
In the text the parameter r is described as the following:

[...] before a specified (non-random) number of successes (denoted r) occurs.

However in the infobox to the right the definition is:

r > 0 — number of failures until the experiment is stopped (integer, but the definition can also be extended to reals)

As comparision the German site has the following in the infobox:

r > 0 — Anzahl Erfolge bis zum Abbruch

which means translated number of successes until stop, so the opposite of the English infobox, but in line with the english article.

--130.225.188.33 (talk) 13:28, 11 August 2020 (UTC)


 * I came here to say the same thing! The lead defines r as successes, while the infobox defines r as failures (and even more confusingly, k as successes). r is used for successes at Wolfram Mathworld and, from a quick google, seems more popular too. It is also used in both textbooks I have on hand (Sheldon Ross, Introduction to Probability Models; Wackerly, Mendenhall and Scheaffer, Mathematical Statistics with Applications). And as the above editor points out, German Wikipedia uses r for successes too. So I think we should change it to r successes and k failures throughout. Adpete (talk) 04:05, 24 August 2020 (UTC)
 * DONE. I have changed the definitions in Template:Negative binomial distribution to match the article (and most sources). I have not changed any formulae BUT I am pretty sure some of the formulae are wrong. e.g. the mean. I did not change it because all the Infobox formulas would have to change, and I don't have the time ATM. But I think it should be changed. Note also some formulae have been added to Infobox in recent years; they should all be checked. Adpete (talk) 04:56, 24 August 2020 (UTC)
 * I can take the Infobox changes on — the only thing is I couldn't find the Infobox when I looked at the article's source. I'm fairly inexperienced with Wikipedia so if somebody could point me in the right direction for where to look it would be much appreciated. --Data Boiiii (talk) 14:58, 11 October 2020 (UTC)

Mistake in the expression for the mean
Should be E(X)=r(1-p)/p — Preceding unsigned comment added by 2001:16B8:469B:1A00:B41D:FED1:BFF3:5A82 (talk) 13:43, 27 September 2020 (UTC)


 * That is because the parametrisation here. Here the expected number of successes before stopping is an increasing function of the probability of succeeding 17:33, 17 November 2020 (UTC)  — Preceding unsigned comment added by 2A00:23C6:1482:A100:B959:2BC3:7745:4842 (talk)

r - number of failures or success?
It seems like there is a mismatch in the way r is explained between the first paragraph and the definition.

In the first paragraph it says: "before a specified (non-random) number of failures (denoted r) "

In the definition: "until a predefined number r of successes"

While technically this is the same, it is confusing. I suggest to change the sentence in the first paragraph. — Preceding unsigned comment added by Amirubin87 (talk • contribs) 08:00, 13 December 2020 (UTC)


 * There has been much discussion of this. It seems that the two examples you gave got swapped between your comment and today (when I changed the intro back) but were still inconsistent. What a mess. Unfortunately, changing the first paragraph is an insufficient solution. The infobox needs to be updated if the first paragraph is changed, but I cannot figure out how to find the infobox for editing. Furthermore, any choice of convention must be accompanied by a complete check of the article for every use of the terms. For example, the Definition section also does not have consistent usage. Right now, it says that we *observe a sequence until r failures*, but all of the examples use “success” to refer to the limiting events.


 * I am in favor of avoiding the words “success” and “failure” completely (or at least making it very clear up front that these terms do not matter). I prefer terms like “limiting event” or “stopping event”, but I do not think they are in common use, and it might not be great for Wikipedia to invent terminology. If we stick with “success” and “failure” (which, again, are horrid terms), I think the best choice right now is to edit the infobox to have r be the number of successes until the experiment is stopped (because it seems to be the most common terminology in the body), but we have to find the infobox before we can edit it. As a temporary alternative, I am going to do some edits to the Definition section to make it clear that not only do success and failure not refer to good or bad outcomes but that these terms are utterly arbitrary and that the important idea is how much of some limiting event we stop after (or before, depending on other arguments happening elsewhere on the talk page).


 * Additional essential note: Choosing whether we want to say that we stop after r “successes” or “failures” does not solve all ambiguities. The choice of whether to parameterize the model with the probability of the limiting event (rather than with the non-limiting event) is just as arbitrary and is an orthogonal choice. I see a few cases where the text uses something like “p is the probability of success” as if that alone makes the definition clear, but it does not. Rscragun (talk) 21:37, 2 January 2021 (UTC)

We need to pick one, and it needs to be applied consistently throughout the article. Of course we should state the other parameterizations, but only in that section, and never come back to them elsewhere in the text. I would also suggest we borrow from https://www.vosesoftware.com/riskwiki/NegativeBinomial.php and use s for number of successes, r for number of failures, and n for the number of trials, in all instances. That would make it easier to tell which convention is being used simply by which variables are in the expression. --Ipatrol (talk) 20:28, 26 April 2021 (UTC)


 * I like the idea of using s for successes. r doesn't stand for anything though so I find it confusing and would prefer to get rid of it.


 * I also like the idea of avoiding the words "success" and "failure" entirely. E.g. something similar to but smarter than calling them "heads" and "tails", and writing $$p_H$$ and $$p_T$$. Or $$p_S$$ and $$p_C$$ for "stopping outcome probability" and "continuing outcome probability" (but much better than those since those are horrid phrases/terms too).


 * Really the main concern seems to be just choosing some notation where it is easy to both (*1*) remember what each symbol stands for and (2) discern whether an alternative definition is being used.


 * Also maybe using the mean-based parameterization as the default would largely avoid all of these problems? And then just add a section explaining how in special cases where the "dispersion" parameter is an integer the distribution has the intuitive interpretation in terms of successes and failures.


 * The general definition for real r using $$\Gamma$$ is currently buried in the article anyway, which is arguably problematic. And that's the more "natural" definition to use with the mean-based parameterization, at least with the idea of the negative binomial representing and "over-dispersed Poisson".

157.131.254.143 (talk) 22:19, 10 May 2021 (UTC)hasManyStupidQuestions

Infobox
Some comments want to edit the Infobox. It can be found at Template:Negative_binomial_distribution but if changed then it needs to be changed in its entirety, including things like CDF and skewness and characteristic function. --2A00:23C7:7B18:9600:895C:FB70:977B:A00A (talk) 00:47, 25 May 2021 (UTC)

Incorrect interpretation of meaning of negative-binomial distribution in the intro
I believe that the interpretation of negative binomial distribution in the intro is wrong. The probability to have x successes before k failures is measured by Gamma distribution (or x-k successes, to be precise), not negative binomial, cause in neg-binomial the normalisation factor is missing. In Neg-Binomial you divide number of ways to achieve x successes before k failures by the number of ways to have any number of successes and losses in a sequence of exactly (x+k) tosses.

Instead, what the formulation "probability to get x successes before k losses" implies, is that you divide the number of ways to achieve x successes before k losses by the sum of (ways to get 1 success before k losses + ways to get 2 successes before k losses + ... + ways to get infinity successes before k losses). This last probability is described by gamma-distribution, not negative-binomial. — Preceding unsigned comment added by 94.25.174.77 (talk • contribs)
 * I'm having trouble understanding what you're saying here. But the gamma distribution is a continuous distribution, and here we're dealing with a discrete distribution. Michael Hardy (talk) 01:33, 11 January 2022 (UTC)

selling candy example, again
There are several comments here in the talk page about the selling candy example, and how overwrought it is. Above and beyond the storyline's unsure footing, the reference of only having 30 houses in the neighbourhood begs confusion with the conventional binomial distribution, because it should be the fixed nature of the number of successes (or failures) that should take pride of place in an example of the negative binomial distribution. I got rid of the first mention of this but kept the second. — Preceding unsigned comment added by Nutria (talk • contribs) 15:41, 1 June 2022 (UTC)

Length of hospital stay example
This section needs elaboration. 68.134.243.51 (talk) 19:55, 21 August 2022 (UTC)

Error in the sidebar Variance formula
Posting this here seeking community approval before editing the sidebar, as requested in the sidebar page.

Given the pmf currently used in the sidebar

$$ f(k) = \binom{k+r-1}{k}(1-p)^r p^k $$

I believe the formula for $$V[k]$$ shown in the sidebar is incorrect.

The easiest way to see this is by noting:

1. Intuitively, the formula for the mean $$\big(E[k] = \frac{pr}{1-p}\big)$$ is correct (at least, $$p$$ belongs in the numerator there). You can convince yourself of this by realizing the pmf is formulated such that $$k$$ is the number of successes before the $$r^{th}$$ failure and $$p$$ is the success probability.

2. A property of the Negative Binomial distribution (mentioned in the article) is that $$V[k] = E[k] + \frac{1}{r}E[k]^2$$

So, for fixed $$r$$, the variance ought to increase as the expected value increases. For fixed $$r$$, the expected value increases with $$p$$ and so the variance must also increase with $$p$$, and the current variance formula cannot be correct.

I am pretty sure the variance of $$ k $$ should instead be given by $$V[k] = \frac{pr}{(1-p)^2}$$, though I imagine proving this involves frustrating manipulation of binomial coefficients.

Bernoulli18 (talk) 04:00, 27 September 2022 (UTC) Bernoulli18

Bernoulli18 (talk) 03:57, 27 September 2022 (UTC)


 * This a symptom of the persistent issue here over the four possible definitions. Whether you think the variance is $$\frac{pr}{(1-p)^2}$$ or $$\frac{(1-p)r}{p^2}$$ depends on whether fixed r is the number of successes or failures (with p being the probability of a success).  Whether you think the mean is $$\frac{pr}{1-p}$$ or $$\frac{(1-p)r}{p}$$ or $$\frac{r}{1-p}$$ or $$\frac{r}{p}$$ depends on that choice and whether the minimum possible result is 0 or r (so in the latter case measuring attempts as the random variable). It is important that the sidebar is self-consistent, so do not change part without changing the rest, and desirable that it is consistent with the article. --2A00:23C6:148A:9B01:A9DC:D2B8:5DC6:D0A (talk) 12:30, 20 November 2022 (UTC)

the mean-shape parametrization is non-arbitrary
For about a week, this page had the incorrect formula for $$P(k;r,m)$$, until I corrected it yesterday. When editing the page to arbitrarily match one definition of $$p$$ or another, please remember that the mean-shape parametrization should not change, because it does not depend on $$p$$. Instead, $$p$$ is one of the two terms in that parametrization; which one it is reflects a specific convention. Please do not change $$P(k;r,m)$$ due to issues with the parametrization of $$p$$. 2600:6C50:5300:F00:EC7D:7688:2015:DEA5 (talk) 19:41, 18 February 2023 (UTC)

Kurtosis
Please see the template about the formula for the kurtosis. Per Westerlund (LTU) (talk) 06:39, 30 March 2023 (UTC) Shouldn't the excess kurtosis be like this? $$\frac{6}{r} + \frac{p^2}{(1-p)r}$$ It has that form in other wikipedias and it agrees with Abramowitz and Stegun (after some calculations from page 929). What do you think, ? Per Westerlund (LTU) (talk) 16:28, 4 April 2023 (UTC)

Fixed skewness
The skewness formula was out of sync with the definitions of $$r$$ and $$p$$, possibly because of previous changes. The current definitions match those in Wolfram MathWorld, but the formula was $$\frac{1+p}{\sqrt{p r}}$$ which has $$p$$ as the probability of failure instead of success (as it is here and in Wolfram MathWorld). I simply fixed it on the template page (by substituting $$1-p$$ for $$p$$ in the above formula)--I am too busy to first discuss, then get agreement, then fix it. To be completely sure, note also that the probability mass function is identical with Wolfram MathWorld (up to a different choice for whether to use $$k$$ or $$r-1$$ on the bottom of the choose function). And to be extra-extra sure, I ran a numerical simulation which checked out with the fixed (Wolfram MathWorld) form and not the one previously here.

Anyway, please think at least as carefully as I have before making any further changes! Ichoran (talk) 03:23, 26 July 2023 (UTC)