Talk:Gibbs' inequality

The proof on this page seems to state that $$-\sum_{i\in I} p_i\,\textrm{ln}\,p_i = -\sum_{i=1}^n p_i\, \textrm{ln} \,p_i$$. However, I was defined so that $$p_i = 0$$ whenever $$i\notin I$$, so making this change in the indices for the sum introduces indeterminate terms of the form $$0\cdot \textrm{ln}\, 0 = 0 \cdot -\infty$$.

Changing the indices for the sum may introduce similar indeterminate forms on the left hand side of that inequality, as well.

How can this proof be corrected?


 * The notation used on this page is lazy. The implicit understanding is that a new function is defined as:



\ln^{+}(x) = \begin{cases} \ln(x) & \mbox{ if } x > 0 \\ 0 & \mbox{ if } x <= 0 \end{cases} $$


 * but the author (me - reetep) didn't bother to specify this explicitly. My apologies - it's the sort of thing you do without thinking after having completed a mathematics degree; you become accustomed to writing mathematics for people with a similar background in maths and more often than not you find yourself demonstrating how something might be proved without actually doing it (because you have neglected to complete all of the technical details).

If you refer to the definition of mathematical entropy you should find there that a similar definition has been made for the log function. This is essentially what Gibbs' inequality is about and we are talking about the same function in this article.

Special case
Consider a two-state distribution with probabilities $$p_1=0.5,\ p_2=0.5$$ and alternative probabilities $$q_1=0,\ q_2=1$$. This is allowed.

The inequality will then look like this $$-(p_1 ln(q_1) + p_2 ln(q_2))\geq -(p_1 ln(p_1) + p_2 ln(p_2))$$. This results in $$-0 \geq ln(2)$$, which is wrong.

What constrains did I not obey?

134.100.209.149 (talk) 16:01, 3 January 2008 (UTC)


 * The page currently states that the pi and qi are positive numbers, so that constraint was violated (may not have been there when you wrote the comment). I think since log 0 is not defined but approaches $$-\infty$$ (while log 1 = 0), you end up with $$ln(2) \leq 0.5 \infty$$. RVS (talk) 02:56, 23 December 2008 (UTC)

Question about format
I'm familiar with the traditional way to write entropies in Shannon-Jaynes format, which may a good reason to not fiddle, but a minus sign multiplying the side of an inequality sets off flags in my brain about the danger of reversals if anything gets multiplied by anything. Of course this danger doesn't exist if everything is positive, which at least in the first statement of the inequality could be the case. I've added a comment to that effect which I think solves the problem, but am wondering this: Are there any objections to also writing for example -&Sigma;piln[qi] as &Sigma;piln[1/qi] where it might keep the question from coming up? Thermochap (talk) 19:27, 21 February 2008 (UTC)


 * $$0 < p_i, q_i < 1$$, so pi, qi are positive numbers, and log pi, log qi are negative numbers, so the full negated expressions are positive numbers. I think you're right that it's best to leave them in what you call the Shannon-Jaynes format, $$ - \sum_{i=1}^n p_i \log_2 p_i$$. RVS (talk) 05:41, 23 December 2008 (UTC)

Notation error?
I believe the line $$ D_{\mathrm{KL}}(Q\|P) \equiv - \sum_{i=1}^n p_i \ln \frac{q_i}{p_i} \geq 0$$ should read $$ D_{\mathrm{KL}}(P\|Q) \equiv - \sum_{i=1}^n p_i \ln \frac{q_i}{p_i} \geq 0$$ (interchanging P and Q) to be correct and consistent with the Kullback–Leibler divergence page as well as other reference works. I also think it would be better to rewrite the equation as $$ D_{\mathrm{KL}}(P\|Q) \equiv \sum_{i=1}^n p_i \ln \frac{p_i}{q_i} \geq 0$$ to be simpler and more consistent with that page. I will make these changes unless someone disagrees. RVS (talk) 02:21, 23 December 2008 (UTC)


 * Changed. RVS (talk) 17:39, 23 December 2008 (UTC)

Proof of separation
The paragraph that starts with "For equality to hold, we require:" does not seem to prove anything, because if point 1 is true, then $$ \frac{q_i}{p_i} = 1$$. — Preceding unsigned comment added by Mdouze (talk • contribs) 21:52, 28 September 2011 (UTC)

s's
I note that the article has been moved from the usual spelling Gibbs' inequality. I agree that usage varies, but the spelling with s's seems to be the least common. I recommend that the move (and the spelling) be reversed, or, if, it would avoid an edit war, moved to a spelling without an apostrophe. What does anyone else think?  D b f i r s   11:21, 21 December 2017 (UTC)
 * I have moved the article back to Gibbs' inequality. Both of the sources in the article write this without an extra "s" after the apostrophe. GeoffreyT2000 (talk) 01:22, 22 December 2017 (UTC)
 * Thank you. I didn't want to revert without consensus.  I think  probably pronounces it "Gibbsez", hence the preference for s's, but following the references seems to be good Wikipedia policy per WP:Common name.    D b f i r s   08:25, 22 December 2017 (UTC)

, : In this case it boils down to the grammatical function of the noun "Gibbs": if it's an attributive-noun use then it's "Gibbs inequality", but if it's possessive then it's "Gibbs's inequality". Compare for example with "This house is Gibbs's" (whether you spell Gibbs' or Gibbs's is no matter for now – just consider the grammatical function): would you pronounce as "Gibbs" or "Gibbsez" in this case? The answer should tell you something about the grammatical form you implicitly use when you say either "Gibbs inequality" or "Gibbsez inequality". Once we establish the grammatical form used, we can debate the spelling. cherkash (talk) 01:39, 4 January 2018 (UTC)


 * I appreciate your point about grammatical function, except that you seem to reject a common third alternative ("s" apostrophe) for the possessive, but that doesn't come into the argument here because Wikipedia policy is to follow WP:Common name, so we just have to look at the best references.  Dbfirs  09:53, 4 January 2018 (UTC)


 * "Common name" policy doesn't really deal with spelling (or misspelling), but rather with "let's call the thing what it's usually called" (notice, not "how it's usually written" – and that's an important distinction). Once the common name is established, we are dealing with the nuances of spelling, esp. when there are different styles that may be acceptable among various sources consulted – and these nuances of spelling are in this case governed by the Wiki's MOS. Which part of the MOS is relevant depends on the grammar's nuances of the name: if it's an attributive-noun use, there's nothing to debate, but if it's a possessive, then the rules/recommendations about singular possessives are relevant. To make my point even clearer: it doesn't matter if the majority of sources spells the possessive in a particular way – once we established it's indeed a possessive that we want to use, then the MOS kicks in and determines the spelling. cherkash (talk) 22:24, 4 January 2018 (UTC)


 * You are applying a "rule" that no-one else recognises.  Db<i style="color: #4fc;">f</i><i style="color: #6f6;">i</i><i style="color: #4e4;">r</i><i style="color: #4a4">s</i>  22:29, 4 January 2018 (UTC)


 * Generalizations like this are rarely useful, and they don't lend you any credibility. cherkash (talk) 22:58, 4 January 2018 (UTC)


 * There is no misspelling according to the spelling rules that I was taught. It seems that you were taught differently.   <i style="color: blue;">D</i><i style="color: #0cf;">b</i><i style="color: #4fc;">f</i><i style="color: #6f6;">i</i><i style="color: #4e4;">r</i><i style="color: #4a4">s</i>  23:21, 4 January 2018 (UTC)


 * That's correct: I was taught differently. And most respectable real-world MoS's (with exception being mostly some modern periodicals and news agencies) agree with the way I was taught. The comparative analysis among MoS's has been presented many times in different discussions here, I can look around and dig up the links if you are truly curious. cherkash (talk) 02:13, 5 January 2018 (UTC)


 * No need to dig up the links again; I can see that usage varies. We all tend to believe that what we were taught at primary school is the only truth, so we are never going to agree.  In the absence of a world standard, should we not follow our own Manual of Style and WP:Common name?   <i style="color: blue;">D</i><i style="color: #0cf;">b</i><i style="color: #4fc;">f</i><i style="color: #6f6;">i</i><i style="color: #4e4;">r</i><i style="color: #4a4">s</i>  09:51, 5 January 2018 (UTC)


 * Yes, following MOS:POSS is what I was advocating. It links the Wiki article that deals with these choices – which contains two sections that clearly state that the " s's " form is usually preferred to the " s' ". (See this section and this section.) cherkash (talk) 00:55, 13 January 2018 (UTC)

Proof using Lagrange multipliers
Should we add this proof?

We will find the minimum of $$-\sum_i p_i \log q_i$$ over all possible values for $\{q1, ..., qn\}$ subject to the restriction $$\sum_i q_i = 1$$ using a Lagrange multiplier for the restriction. We will show that the minimum is at $qi = pi$ for all $i$ thus proving Gibb's inequality. That is, we want to minimize $$\Phi = -\sum_i p_i \log q_i - \lambda\left(1 - \sum_i q_i\right)$$ with respect to $q1, ..., qn$ and $λ$. For any $j$, $$0 = \frac{d\Phi}{q_j} = -\frac{p_j}{q_j} + \lambda$$ which tells us that $qj = pj / λ$. That the $qj$ values sum to $1$ then gives that $λ = 1$ and thus $qj = pj$. A quick check shows that the matrix of second derivatives is diagonal and all its diagonal entries are positive, so the extremum we have found is necessarily a minimum: $$\frac{d^2}{dq_j^2}\left(-\sum_i p_i \log q_i\right) = \frac{p_j}{q_j^2} > 0.$$ — Q uantling (talk &#124; contribs) 14:34, 28 June 2024 (UTC)