Talk:Dirichlet-multinomial distribution

I have rewritten some of this article to address the following problem: The previous version of this article did not clearly make the distinction between the joint probability for a sequence of draws and the counts over such a sequence. See for example Tom Minka's paper at http://research.microsoft.com/en-us/um/people/minka/papers/multinomial.html, where this problem is highlighted. The introduction of multinomial distribution also mentions this problem.

See also http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/ by Minka, where he defines the Dirichlet compound multinomial (DCM), without the multinomial coefficient. In other words, Minka's DCM models particular sequences, while the version in the present Wikipedia article models counts.

The above comment is confused. None of the Minka papers mentioned has a definition of the Dirichlet-multinomial (DM). One of the papers has a Likelihood function of a sample without the normalizing constant (aka "nuisance parameter" of the likelihood) as the normalizing constant was not pertinent to the estimation of the Dirichlet. But the likelihood function is not a distribution if not summed over all permutations of the sample which is what the normalizing constant is for. That is, the above author wrote an expression which was not even a density which must sum to 1. The DM is not 2 different distributions where a distinction needs to be drawn as the above author contends. It is a well-known single distribution which comes with a normalizing constant. The DM is a multivariate generalization of the Beta-binomial (comes with normalizing constant) and as Minka points out, the DM approaches the multinomial distribution which also has a normalizing constant as both Wikipedia articles correctly express. — Preceding unsigned comment added by 192.152.134.249 (talk) 20:30, 7 April 2016 (UTC)

The terminology categorical distribution is not used everywhere. Bishop does not use that name, but he does have different formulas in his book (Pattern Recognition and Machine Learning) for the multinomial and categorical distribution. (Minka simply calls both multinomial.) Since wikipedia already has an article called categorical distribution, I thought it would help to use this terminology here and elsewhere in Wikipedia. CalvynkW (talk) 12:03, 14 April 2011 (UTC)

Hi does anyone have the original Pòlya reference on this topic? From some search it appear to have been done in the twenties. — Preceding unsigned comment added by 131.175.28.134 (talk) 17:20, 15 December 2011 (UTC)

Despite the comment above, the distinction wasn't clearly made between the two forms of the DCM. I have almost totally rewritten and greatly expanded the page, and it should hopefully now make the distinction clear enough. The page now mostly focuses on the simpler form (without the multinomial constant). Benwing (talk) 21:35, 17 March 2012 (UTC)

I think the current exposition without the multinomial constant should specify exact what the index runs over. If it is a categorical variable in Bishop's sense, namely a vector with only one value 1, then this does NOT model counts. Feel free to correct my derivation: https://math.stackexchange.com/questions/709959/how-to-derive-the-dirichlet-multinomial/ Anne van Rossum (talk) 21:25, 21 March 2014 (UTC)

Requested move

 * The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section. 

The result of the move request was: Move already made Mike Cline (talk) 12:48, 24 March 2012 (UTC)

Multivariate Pólya distribution → Dirichlet compound multinomial distribution – The term "Dirichlet compound multinomial distribution" (DCM dist) seems more common than "multivariate Pólya distribution".

Although Google tests aren't very accurate, here are the results from the Google test I did:


 * 20,000+ for "multivariate polya"
 * 46,000+ for "dirichlet compound multinomial"
 * 24,000+ for "dirichlet multinomial"
 * 30,000+ for "dirichlet multinomial distribution"

Furthermore, the term "multivariate Pólya" is much more confusing than DCM:


 * 1) DCM clearly describes where it comes from (compounding a multinomial over a Dirichlet).
 * 2) DCM is more parallel with the two-category case, where the corresponding distribution is the beta-binomial distribution.
 * 3) The term "Polya distribution" has been used for a large number of different distributions with divergent characteristics (e.g. a special case of the negative binomial; a generalization of the negative binomial; another parameterization of a negative binomial; a distribution that generalizes the binomial and hypergeometric; and sometimes for the DCM dist itself). So the term "multivariate Polya" might equally well refer to multivariate extensions of various different distributions.
 * 4) On top of this, the term "Polya distribution" does NOT seem to be used for the beta-binomial distribution. Logically, if the DCM is described as "Multivariate X distribution" for some X, that X ought to refer to the beta-binomial dist.

The second item above suggests that "Dirichlet-multinomial distribution" might be an alternative, and indeed that term does exist, but seems less common than DCM dist. Benwing (talk) 11:50, 17 March 2012 (UTC)

Personally, I think "Polya distribution" should stay, since "dirichlet compound multinomial" is a very cumbersome name. It makes sense to me to call a beta-binomial a Polya distribution, since it is equivalent to drawing from a Polya urn. If you do change it though, "dirichlet-multinomial" is less cumbersome than "dirichlet compound multinomial" and is more directly analogous to "beta-binomial". These are also the two names for it used in Tom Minka's paper, referenced above. Disclaimer: I didn't contribute to this article.Satyr9 (talk) 21:46, 21 March 2012 (UTC)

I went ahead and changed it to "Dirichlet-multinomial": Various statisticians told me that "Dirichlet-multinomial" was the most common, and it fits the analogy best. "Polya distribution" has horrible problems with ambiguity, unfortunately, which is one main reason I don't want to use it. Benwing (talk) 09:02, 24 March 2012 (UTC)


 * The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.