Talk:Information theory and measure theory

Entropy as a Measure Theoretic measure
Where is this in Reza? I could not find it.


 * pp. 106-108 198.145.196.71 20:43, 25 August 2007 (UTC)

Possible misstatement
I am uncomfortable with the phrase 'we find that Shannon's "measure" of information content satisfies all the postulates and basic properties of a formal measure over sets.' This may not be quite correct, as it is a signed measure, as explained below in the article. How should it be better worded? --130.94.162.64 21:18, 20 June 2006 (UTC)


 * signed measure is still a measure, so if that's the only objection, it should be ok. on the other hand, the section title suggests entropy is a measure, that doesn't seem right. Mct mht 02:49, 21 June 2006 (UTC)
 * Not as defined in Measure (mathematics). There a "measure" is defined clearly as non-negative.  The trouble is that that two rvs that are unconditionally independent can become conditionally dependent given a third rv.  An example is given in the article. Maybe we should start calling it a signed measure right off the bat.  A measure is normally assumed positive if not specified otherwise.  --130.94.162.64 19:09, 21 June 2006 (UTC)


 * Measure (mathematics) does say signed measure is a measure, as it should. similarly, one can talk about complex measures or operator valued measures. but yeah, specify that it is a signed measure is a good idea. Mct mht 19:44, 21 June 2006 (UTC)

in the same vein, i think the article confuses measure in the sense of information theory with measure in the sense of real analysis in a few places. Mct mht 03:02, 21 June 2006 (UTC)


 * There are two different senses of "measure" in the article. One is the abstract measure over sets which forms the analogy with joint entropy, conditional entropy, and mutual information.  The other is the measures over which one integrates in the various formulas of information theory.  Where is the confusion? --130.94.162.64 04:16, 21 June 2006 (UTC)

some language in the section is not clear:


 * If we associate the existence of sets and with arbitrary discrete random variables X and Y, somehow representing the information borne by X and Y, respectively, such that: whenever X and Y are independent, and...

Associate sets to random variables how? are they the supports of the random variables? what's the σ-algebra? what's meant by two random variable being independent? Mct mht 03:14, 21 June 2006 (UTC)


 * Just pretend that those sets exist. They are not the supports of the random variables.  The sigma-algebra is the algebra generated by the operations of countable set union and intersection on those sets.  See statistical independence. --130.94.162.64 03:34, 21 June 2006 (UTC)
 * I mean unconditionally independent. --130.94.162.64 03:49, 21 June 2006 (UTC)


 * so given a family of random variables, one associates, somehow, a family of sets. the σ-algebra is the one generated by these family (in the same way the open sets generate the Borel σ-algebra), or does one assume that, somehow, the family is already a σ-algebra? also the section seems to imply the Shannon entropy is a measure on the said σ-algebra, is that correct?
 * Yes. --130.94.162.64 04:32, 21 June 2006 (UTC)
 * then there seems to be, at least, two ways measure theory is applied in this context. first, in the sense that entropy is a measure on some, undefined, sets corresponding to random variables. second, one can talk about random variables on a fixed measure space, and define information theoretic objectes in terms of the given measure. that a fair statement? Mct mht 04:14, 21 June 2006 (UTC)


 * The σ-algebra is the one generated by the family of sets. (They are not already a σ-algebra.) And I believe that is a fairly reasonable statement if I understand it right. --130.94.162.64 04:29, 21 June 2006 (UTC)


 * There's still a lot of explaining to do; that's why the article has the expert tag. --130.94.162.64 04:32, 21 June 2006 (UTC)


 * thanks for the responses. Mct mht 04:35, 21 June 2006 (UTC)
 * We really do need expert help, though. --130.94.162.64 05:30, 21 June 2006 (UTC)

Kullback–Leibler divergence
Also, the Kullback–Leibler divergence should be explained here in a proper measure-theoretic framework. --130.94.162.64 21:27, 20 June 2006 (UTC)

Mis-statement of entropy.
I am deeply suspicious of the defintions given here; I think they're wrong. Normally, given a collection of measureable sets $$\{X_i\}$$, the entropy is defined by


 * $$S=-\sum_i \mu(X_i) \log \mu(X_i)$$

Its critical to have the logarithm in there, otherwise things like the partition function (statistical mechanics) and etc. just fail to work correctly. See, for example, information entropy.

Also, signed measures are bizarre, and you shoul avoid using them if you cannot explain why they are needed in the theory.

Also, I assumed that when this article said "H(X)", I assumed it meant "entropy", but on second reading, I see that it does not actually say what the symbols H(X) and I(X) are, or what they mean. Without defining the terms, the article is unreadable/subject to (mis-)interpretation, as perhaps I'm doing ??

Finally, I'm unhappy that this article blatently contradicts the article on random variable as to the defintion of what a random variable is. This article states that a random variable is a "set", and that is most certainly not how I understand random variables.

linas 23:54, 22 June 2006 (UTC)


 * It's been over a year now and this article still needs a major rewrite to clear up the confusion. I need some help here. -- 198.145.196.71 19:37, 25 August 2007 (UTC)


 * Specifically, the "Other measures in information theory" section is actually more fundamental than the sections that come before it, so that could be one source of confusion. 198.145.196.71 20:40, 25 August 2007 (UTC)

Main ideas
Integration with respect to various measures is one of the main ideas of this article as it stands now. It ties together differential entropy, discrete entropy, and K–L divergence. The second main idea is from Reza pp. 106-108 where it is called a "set-theoretic" interpretation of mutual information, conditional entropy, joint entropy, and so forth. (But Reza very clearly discusses measure there in roughly the way discussed in that section of this article.) There might be yet more main ideas to discuss in this article as well as references to add and clarifications to make. 198.145.196.71 17:21, 7 September 2007 (UTC)

Misattribution of credit?
I strongly suggest that R. Yeung, and not Fazlollah M. Reza be the primary reference cited. Specifically, I recommend the reference "A new outlook on Shannon's information measures", Information Theory, IEEE Transactions on, 1991 vol. 37 (3) pp. 466 - 474

First, in Yeung's paper pg 467, he says "The use of diagrams to represent the relation among Shannon's information measures has been suggested by Reza [2], Abramson [3], Dyckman [5], and Papoulis [15]." Yeung's paper constitutes a proof of (good) intuition by previous authors (including Reza).

Second, Reza pg 108, makes a serious misstatement that seems to indicate that the graphical interpretation of Shannon's measures was not completely thought out, and certainly not thought out in terms of a (signed) measure. "When two variables $X_k$ and $X_j$ are independent, their representative sets will be mutually exclusive." Yeung says on pg 469 "It was incorrectly pointed out in [2] [Reza] that when two random variables are independent, the corresponding set variables are disjoint."

On pg 470-471, Yeung cites a "classical example (see Gallager [6])" where three random variables, $X$, $Y$, $Z$, have vanishing mutual informations between each pair, and so are independent, yet cannot be represented as a trio without intersecting set variables. This is one of several illustrations that Yeung has mastered this material.

Further, Yeung discusses: 2 variables, 3 variables, Markov chains; bounds on certain quantities for some cases; the "non-intuitive" quantity I(X;Y;Z); and in a rigorous, yet transparent style. —Preceding unsigned comment added by Mohnjahoney (talk • contribs) 22:00, 18 June 2009 (UTC)

My apologies for not signing. Mohnjahoney (talk) 22:04, 18 June 2009 (UTC)

measure versus content
The article claimes Shannon entropy has the typical properties of a countably-additive measure. However, only the finite additivity property is ever touched upon. The difference between finite additivity (content) and countable additivity (measure) is very decisive in measure theory: almost nothing works with contents. Some justification needs to be given for the word "measure", or the entire section should be deleted. --87.146.22.162 (talk) 00:04, 13 May 2012 (UTC)

Reasons for deletion
This article claims that entropy is a measure in the sense of measure theory. There are two major issues with this claim that have already been mentioned here: The comments that first raised these issues are very old. It is high time something is done to fix them. I don't doubt that you could get countable additivity somehow, but what is the sigma-algebra? It is total nonsense to say, as one user did, "Just pretend those sets exist."
 * There is no sigma-algebra.
 * There is no proof that entropy is countably additive.

Here is a possible direction we could go: Cover & Thomas tell us that the entropy-power, $$2^{nH(X)}$$ where the entropy is calculated with a base of 2, is the "effective support set size" of a random variable. This non-rigorous statement is based on the asymptotic equipartition property. (C&T also say the Fisher information is akin to the "effective surface area," although they explain it poorly.) If you could make this work, then your sigma-algebra is the same one from the probability space. This seems like the obvious way to go if you want a "set of information" and corresponding sigma-algebra. Maybe someone has already done this, and the article can be about that instead of what we have now. But if it doesn't work, and entropy isn't a measure, then this article has no reason to exist.

137.216.185.166 (talk) 22:04, 7 April 2021 (UTC)

Alternative using lattices
While entropy is evidently not a measure in the measure theory sense, you can define a "measure" or "valuation" on a lattice, and some people have done so. That is one other direction this article could go. 66.17.111.151 (talk) 07:39, 10 April 2021 (UTC)

Updated reasons for deletion
I thought about the lattice approach and I'm not convinced it would be a good idea. As for entropy being a measure of "effective support set size," I added some content to the Fisher information page related to that but I still don't see how entropy can become an actual measure that way. So I still think this page should be deleted. Here's why:

The article has three sections. The first section has two parts. The first part is a demonstration of how a Riemann integral and an infinite series can be turned into a Lebesgue integral. The second part is essentially the same idea but applied to Kullback-Leibler divergence. Nothing in this section is about the relationship between information theory and measure theory per se. If any of it should be retained, it would make more sense to move it to Entropy (information theory) or Kullback–Leibler divergence.

The second section centers around the untrue claim that "Shannon's 'measure' of information content satisfies all the postulates and basic properties of a formal signed measure over sets." This claim has been repeatedly disputed on the talk page and yet it has been part of the article since its creation nearly 15 years ago. In order for entropy to be a signed measure it must be countably additive and there must be a sigma-algebra of sets for it to measure. Entropy is a function of a probability distribution, not a set. No cited source offers any support for the claim that entropy is in fact a measure. This section seems to come from an analogy between sets, intersection, union, and measure on the one hand and concepts from information theory on the other hand. But you could just as well make an analogy with propositions, conjunction, disjunction, and probability. Anything that forms a lattice and has a kind of "lattice measure" with an inclusion-exclusion property would work. So again, no special relationship between information theory and measure theory is evident here.

The third section is a continuation of the second section and has the same issues. The parts of this section that should be retained are already part of Interaction information, where they belong. 66.17.111.151 (talk) 18:22, 14 April 2021 (UTC)