Talk:Dirichlet distribution

Maximum Entropy
Since the article states that the Dirichlet distribution is a maximum entropy distribution, could someone more knowledgeable than me add the relevant constraints to the article?

See https://www.sciencedirect.com/science/article/pii/S0047259X05001004. Thanks! 2A02:1811:3707:B900:C4B:98F2:7DE6:7518 (talk) 18:45, 28 May 2020 (UTC)

Delta
Should that be a Dirac delta, not a Kronecker delta? If it were Kronecker, then distribution would be everywhere finite, but nonzero only on a set of measure zero.

According to David J.C. MacKay and Linda C. Bauman Peto "A Hierarchical Dirichlet Language Model" it should be a Dirac delta function.

It is clearly a Dirac delta since it has to be defined over real numbers, and not integers.

Question regarding chained Dirichlet distributions
If I draw a probability distribution $$X\sim Dir(\alpha)$$, and then another distribution $$Y\sim Dir(rX)$$ for some constant r, is the marginal distribution of Y Dirichlet? A5 15:12, 20 April 2006 (UTC)

Dummy questions
Where it writes: "The Dirichlet distribution is conjugate to the multinomial distribution in the following sense: if", should the "X~Mult(X)" be "X~Mult(alpha)"? The notation Mult(X) doesn't make sense to me since X is a random variable, not a parameter. Chongman 01:28, 19 April 2007 (UTC)

I think, the sum which is 1 in the second paragraph should sum over all Alphas not over all x. Munibert 16:29, 15 March 2007 (UTC) Am I right or wrong?
 * Since, I now realise that the Alphas stand for events (from N) and the x are the probabilities I was wrong. But why does one use this terminology which seems to reverse the usage from multinominal distributions? Munibert 16:35, 15 March 2007 (UTC)

Can someone please explain what this distribution reflects? For the normal distribution, the authors go into lengths to cite examples what kinds of everyday values follow a normal distribution... cannot someone add an example like this for the dirichlet distribution? --Maximilianh 05:18, 4 June 2006 (UTC)


 * The "cutting up strings" text recently added provides a minimal example of where this distribution would come up. Having others would be good, of course.  BSVulturis 20:47, 12 March 2007 (UTC)

Why is this distribution called a continuous distribution, when the cumulative distribution is not continuous? It should be neither continuous nor discrete. Albmont 18:58, 16 October 2006 (UTC)


 * It's now defined without the dirac delta function that was there before. Someone will have to check, but I think it's now defined over a set that we integrate and whose integral is continous.  MisterSheik 21:36, 27 February 2007 (UTC)

What does this mean? "The characteristic function χ ensures that the density is zero unless..." The page doesn't define a characteristic function. I'm going to change it to say that the sum over the x's is defined as 1, which is what I think it means... MisterSheik 20:05, 26 February 2007 (UTC)

I just stumbled over the very first figure in this article, showing the probability densities. 1. I think it would be useful to explain $$\alpha$$ a bit more, like: $$\alpha=(\alpha_x, \alpha_y, \alpha_z)=(6,2,2), (3,7,5), \ldots$$ 2. I think the last two choices for $$\alpha$$ got twisted; from the pictures the order should be $$(2,3,4), (6,2,6)$$ rather t$$(6,2,6),(2,3,4)$$ (at least if my clock moves according to the standard). —Preceding unsigned comment added by 87.113.20.9 (talk) 10:25, 25 January 2008 (UTC)

Last sentence says The variance around this mean varies inversely with &alpha;0. Seems contradictory to :$$\mathrm{Var}[X_i] = \frac{\alpha_i (\alpha_0-\alpha_i)}{\alpha_0^2 (\alpha_0+1)}$$. —Preceding unsigned comment added by 217.133.67.206 (talk) 15:05, 4 April 2008 (UTC)

Maybe I'm fighting windmills here, but I simply can't see where the Jacobian has gone during the derivation of the Dirichlet distribution from Gamma distributions. It seems that the author has simply transformed the independent variables from Y to X and plugged those new Xs into the formula of the density f(Y1,...,Yk). But for obtaining the density of the Xs, don't you also need to multiply f with the Jacobian? — Preceding unsigned comment added by 130.60.6.54 (talk) 09:44, 16 June 2011 (UTC)

Uniform Dirichlet Distribution
Can please somebody explain me what it is? Diego Torquemada (talk) 07:08, 23 April 2008 (UTC)

First equation appears wrong
The first equation doesn't make sense. The product ranges from i=1 to i=K, which implies that there is an x[K] in the product. However, the left hand side makes it clear that x[K] is not present in the function. It's hard to have any confidence in the article as a whole when the defining equation is inconsistent: could someone who understands this please fix it?

213.162.107.11 (talk) 07:30, 12 June 2009 (UTC)

-- I am wondering about that myself. I think the author was trying to capture something about the fact that it is only defined on the K-1 dimensional "simplex" -- i.e. the vars x1 ... xK have to sum to 1. I am going to try to figure this out and update the article. --Ivan --74.56.167.228 (talk) 16:24, 19 November 2009 (UTC)


 * @Schmock for 2 vars (x) and (1-x) makes sense. Notice the left hand side of the Beta distribution funciton just shows x so it is all Kosher. Now in the dirichlet definition you have x_1 ... x_{K-1} in the LHS and x_K appears on the right hand side. This equation is syntactically wrong and doesn't make sense to the user until they read the following paragraph.

With the proposed change I made, equations stands on its own and we clarify that we are over a symplex and x_1 ... x_K must sum to 1. --74.13.201.69 (talk) 19:52, 5 December 2009 (UTC)

math style
The agreed-upon math style is to not use math tags around things like $$K$$-dimensional vector. It's harder to read in certain browsers. Crasshopper (talk) 00:48, 23 January 2011 (UTC)

Scaled Dirichlet distribution
So. I think the assumptions in the Wiki page are not correct. I believe that the provided PDF is only valid when equal scaling is applied to all shape parameters, i.e. $$a_1= a_2=\cdots=a_k$$. In the case where you want to introduce biasing in your distribution towards different variables, then you will need to use a scaled PDF. See Compositional Data Analysis: Theory and Applications (Section 10.2) for more information. — Preceding unsigned comment added by Datahipster (talk • contribs) 22:42, 26 November 2011 (UTC)

Support
Should the support really be x1 ... xK, and not to xK-1? It makes more sense for xK to be understood and implied as 1-(x1+...+xK-1), just as in the special case of the beta distribution.

It is stated that each x_i must lie in (0,1) for x to be in the support. However, the support is, by definition, a closed set, so, for example, x = (0,1,0) should be in the support, right? Anyway, even some definition of "support" allows for this use, it is inconsistent with the infobox, which may be confusing to nonmathematicians like me. Franknarf11 (talk) 06:23, 3 October 2013 (UTC)

This is necessarily confusing, the author of the support section takes the support to be the open simplex, I personally think this is a valid choice, but view the closed simplex as equally valid. The article on the Beta distribution tackles this by saying the support is (0,1) or [0,1], but I don't think this approach is practical here. Either way since this comment was made the support section and infobox are now consistent. SymplectoJim (talk) 15:16, 14 March 2019 (UTC)

error in right panel
In the right panel, in "support", it says: Σxi = 1. Shouldn't be Σxi < 1? At least this is how appears in the Mathematica help — Preceding unsigned comment added by Olimak9000 (talk • contribs) 17:17, 6 January 2014 (UTC)

This is not an error. SymplectoJim (talk) 14:58, 14 March 2019 (UTC)

PDF plot
How is the new plot better than the old one?

New: https://en.wikipedia.org/wiki/Dirichlet_distribution#/media/File:Dirichleit_Probability_Distribution_for_Different_Alpha.png Old: https://en.wikipedia.org/w/index.php?title=Dirichlet_distribution&oldid=782668807#/media/File:Dirichlet-3d-panel.png

GrantZ (talk) 09:17, 28 July 2017 (UTC)

alpha_0
The box says: "with α_0 defined as for variance, above". But I can't find such definition. --89.204.135.44 (talk) 18:09, 13 May 2019 (UTC)

Concentration parameter section is wrong?
It is not clear what the concentration parameter section means. A symmetric dirichlet with a very small concentration, say [0.01, 0.01, 0.01, 0.01] predicts most of the mass to be in a single component (but does not know which), whereas one with a large concentration, say [100, 100, 100, 100], spreads the mass equally over every component and it's pretty sure none wins over the others. This seems very different than what the section says.

Maybe the meaning is just flipped?

PDF plot
Recently a new graphic was proposed: https://upload.wikimedia.org/wikipedia/commons/2/2f/Page1-1125px-Dirichlet.png with the author claiming the original one is incorrect: https://upload.wikimedia.org/wikipedia/commons/7/74/Dirichlet.pdf

In my opinion, the labeling on the proposed one is confusing. E.g. in the top right, x1 refers to the single dimension that has its axis shown in the graph. However, the placement of "x1" would suggest it is a label for the bottom horizontal axis. Second, the new axis does not provide any additional information. The graphic only projects the axis labels of the old graphic onto a line in the middle. This only obscures parts of the plot. Lastly, the quality of the graphic is below the standard, especially he visible remains of the old oxis that appearently have only been whited. Also the axis labels do not use proper subscripts.

I suppose the main confusion here is that the plot does not have orthogonal dimensions, like a normal 2d plot, but axis that have a 120° angle between them. The ticks on the x2 and x3 axis indicate this. I do agree that this is very subtle, but definitely not wrong in the original plot. As an improvement, I would suggest to add small grid lines to the orgiginal plot. 141.76.60.253 (talk) 15:51, 3 May 2023 (UTC)