Talk:Limiting density of discrete points

Untitled
I will eventually expand this to the more general concept of limiting density and move it to that title then. For the meantime, however, I just started with the limiting density in the application with which I am most familiar. Please feel free to contribute/correct. WDavis1911 (talk) 23:47, 1 May 2008 (UTC)


 * Should probably remove the comment on log dimensionality

You object that Shannon's formula cannot be correct due to the log argument not being unitless, but I believe this is incorrect. For a continuous distribution, p(x) is not a probability, p(x)*delta is. Shannon's formula would then be the limit as delta->0 of (the summation of) p(x)*delta*log(p(x)*delta), where "p(x)*delta" is a probability in both cases. By the mean value theorem, we expect a valid Riemann integral from this, so there's no reason at this point to expect any problems.

This unfolds to two summations, -Sum(p(x)*delta*log(p(x))) + -Sum(p(x)*delta*log(delta)), and take the limit as delta->0 we get

Integral(p(x)*log(p(x))dx) - Integral(p(x)*log(dx)*dx).

The 1st term then becomes Shannon's formula, but the 2nd term is problematic as log(0) is undefined.

One can safely say that 0*log(0) is zero by several methods: L'Hopital's rule, or replacing log(x) by a polynomial expansion, for example. The second term is therefore zero, and what remains is Shannon's original formula.

The derivation of Shannon's formula is therefore correct in both units and results, but still has all the problems pointed out by others.

The "units" of the log factor are not a problem, and probably should not be mentioned. Shannon's formula indeed has problems, but not due to this. — Preceding unsigned comment added by 64.223.129.132 (talk) 05:51, 2 January 2019 (UTC)


 * I'm puzzled as to what this article is about. There seems to be an abrupt change of subject, and the limit stated before that is true under any of various different sorts of assumptions, none of which are stated or even hinted at. Michael Hardy (talk) 05:00, 9 May 2008 (UTC)


 * Totally understandable, and thanks for the attention to the article. I originally started this stub because in my quest to understand Maximum entropy more, I ran into the term limiting density of discrete points, with "limiting density" redlinked. It wasn't clear to me if this meant something special in the context of information theory, so I needed to delve into this more. I couldn't find much more information on how this related to the concept except for the original Jaynes' paper (or at least what I think is the first mention of the term by him in this context) and a few other references discussing Jaynes' work.
 * Since starting this, I haven't been able to give this the full attention that it needed, but I haven't worried so much because I was sure that if I didn't do it, others in the community would. My feeling, however, is that Jaynes' use of this is not specific to his work at all, and just an application of limiting density to adjust Shannon's entropy measure for continuous distributions. In other words, I think this should be eventually converted to an article talking about limiting density from a probability theory perspective, and not this specific application to maximum entropy. I'm not an expert in probability theory however, so I'd rather let someone else be bold in that regard, at least for now. In the meanwhile, I am trying to supply as much information as I can from my limited perspective. WDavis1911 (talk) 18:47, 15 May 2008 (UTC)
 * The Integral(p(x)*log(dx)*dx) does not go to 0 as you suggest, it is equal to log(dx)*Integral(p(x)*d(x))=log(dx) which diverges to -infinity. 46.2.229.188 (talk) 18:45, 16 June 2024 (UTC)

I've added some extra background to try and make this article easier to understand - I hope that's ok.

However I would suggest that this article could/should maybe be merged into principle of maximum entropy where it's linked from -- these few lines of derivation are quite important in order to understand why Jaynes introduced a new formula for the continuous entropy and there are not many other pages that are likely to link to it. Nathanielvirgo (talk) 17:17, 17 December 2008 (UTC)

Having thought about it a bit more I think this page should be extended a bit and renamed to something more like "Jaynes' continuous entropy formula" ("limiting density of discrete points" sounds too general to me), as I quite often need to explain this to people and it would be good to have a decent page for it on Wikipedia. I will try to find time to do this some time soon. Nathanielvirgo (talk) 14:01, 18 December 2008 (UTC)

I added a small clarification on the relationship between this concept and the Kullback-Leibler divergence, because the article implied that they were the same concept. Although formally similar they are conceptually different, and I hope this is now clear. I also removed a few comments stating that Jaynes' formula gives the "relative information entropy." I hope that doesn't annoy anyone - the reason is this: the term "relative information entropy" refers to the Kullback-Leiber divergence, and it's "relative" because there are two probability distributions involved. It's effectively the entropy of one probability distribution relative to another. In Jaynes' formula one of these probability distributions is replaced by a function defining a density of points, not probabilities. In the framework Jaynes gave for the use of his formula there is nothihg "relative" about it - m(x) is an invariant that is defined by the problem under consideration and is not to be updated in the manner of a probability distribution. Jaynes saw his formula simply as the correct formula for the (absolute) entropy of a continuous distribution. Nathanielvirgo (talk) 14:20, 13 August 2009 (UTC)

Also, I think it would be inappropriate to merge this article with the one on the Kullback-Leibler divergence. The formulae are similar but the symbols stand for different quantities. The two definitons are used in different circumstances and motivated in entirely different ways.

Nathanielvirgo (talk) 14:29, 13 August 2009 (UTC)

Merging
I don't think this article should be merged with Kullback-Leibler divergence for reasons given by others elsewhere on this page. Right now, I think only one person tentatively supports the merging. I propose we keep this article separate, and remove the merge tag. Njerseyguy (talk) 16:27, 17 August 2009 (UTC)
 * I agree. There has been no further action on this for quite a while and not much in favour, so I will remove the merge tags. --mcld (talk) 12:41, 12 April 2010 (UTC)

Name of this page
The name of this page, "limiting density of discrete points" is not quite right, because it refers to a step in Jaynes' derivation of his version of the continuous entropy, rather than the quantity itself. I would like to change the page's name.

The problem is that Jaynes didn't give his quantity a name, since he saw it as simply a correction of Shannon's formula. So what should this page be called? I would like to call it "Jaynes continuous entropy", but I don't think it's been referred to as that in the literature. So I'm thinking of calling the page "differential entropy (Edwin Thompson Jaynes)". Does anyone have an opinion on this? Nathaniel Virgo (talk) 21:53, 22 August 2009 (UTC)


 * I agree that the name is a bit wrong. There are actually TWO ideas on this page:


 * (1) Introduction of m, with the understanding that it should transform the same as p.
 * (2) Passing points to limits.


 * Passing points to limits serves as a motivation of m but is not the core idea why the new entropy is sensible. The core idea is really the covariance of m and p. 178.38.142.81 (talk) 01:20, 2 February 2015 (UTC)

Negative Entropy
Another issue: the continuous entropy can still be negative, even after introducing m. Negativeness was indicated (in several entropy articles!) as a problem for differential entropy that requires handling, and partially motivates Jaynes' entropy. Actually, it never goes away and it's not a problem, just a difference to the discrete case. 178.38.142.81 (talk) 01:20, 2 February 2015 (UTC)

This is due to a quirk in the way Jaynes reports $$ H(X) $$ in that he drops a term of $$ log(N) $$. Generally speaking, $$H(X)$$ would always be negative, but $$ log(N) - H(X) $$ would always be positive. It is this later formula which is actually the limit of the discrete entropy as the number of points tends towards infinity. Jaynes left out he $$ log(N) $$ so that finite values could be considered, and also because he was not interested in encodings, precisely, but rather in maximum entropy distributions. I have clarified the article a little bit to mention this, and also to try to motivate more clearly the connection between LDDP and discrete entropy. Vertigre (talk) 16:09, 8 April 2016 (UTC)

What is wrong with the page that needs a mathematics expert
Re-order the emphasis and thrust of the page to deal with the issues under Name of this page. 178.38.142.81 (talk) 01:20, 2 February 2015 (UTC)

Need for clarification
I don't understand the relationship between the set of points x_i and p(x). There must be one. Are these points drawn according to p(x)? Thanks for any clarification. Renato (talk) —Preceding undated comment added 13:37, 27 February 2013 (UTC)

Assessment comment
Substituted at 03:13, 3 May 2016 (UTC)

m(x) is not necessarily a probability measure
One problem with the article is the statement that the invariant measure m(x) is a probability measuer. I don't think this is true (but will read the linked papers by Jaynes to see). That is also one reason this is **not** KL-divergence, because KL-divergence is between probability measures. The reason? invariant measure, as an example, the invbariant measure under th scale-invariant grop on the halfline (real positive halfline) is 1/x, which have infinite integral. Neither it is clear that an invariant measure always exist! it depends on the situation, the ground space. Kjetil B Halvorsen 18:42, 7 May 2017 (UTC) — Preceding unsigned comment added by Kjetil1001 (talk • contribs)

Corrected two errors
The original article used the expression $H(X)$ before defining it, and in a way that conflicted with the later definition, so I took that out. Also, I fixed an equation to show that it only holds asymptotically for large $N$. — Preceding unsigned comment added by Ted.tem.parker (talk • contribs) 23:18, 7 September 2017 (UTC)

Not understandable
One cannot understand the article in this form, especially the definition of the LDPD. What are the $$ x_{i} $$, how are they choosen? What is $$ H_{N} $$ ? — Preceding unsigned comment added by 2003:E4:BBED:B900:3D04:BC4D:89BF:9B49 (talk) 11:09, 17 December 2017 (UTC)

I'll further point out that the article is ambiguous with regards to how the LDDP relates to discrete entropy in equations like $$ H_{N}(X)\sim \log(N)+H(X) $$. — Preceding unsigned comment added by HelloWorldItsMeDrG (talk • contribs) 02:50, 23 March 2021 (UTC)