Talk:Variation of information

Metric?
The given definition fails to comply with the first condition for being a metric: the identity of indiscernibles that requires d(X,Y)>0 for X!=Y.

For instance, if X is 0 with prob 1/3 and 1 with prob 2/3, and Y is 1-X, then clearly X!=Y while VI(X,Y) = 0.
 * It is a metric on partitions, not on random variables. The variables X and Y induce the same partition and accordingly convey the same information. 2A02:1210:2642:4A00:8C85:4B63:B849:BF6D (talk) 08:09, 16 February 2024 (UTC)


 * Adding intermediate steps for the discussion, I think this is quite relevant as there is quite an old citation [1] linked on wikpedia in the mutual information page that claims VI(X,Y) induces a metric in the space of distributions, this is getting more and more cited and more used so its a good time to try and clarify this.
 * VI(X,Y) = H(X) + H(Y) - 2I(X;Y)
 * as Y=1-X the entropies are the same (e.g. H(X) = 1/3 log 3 + 1/2 log(2), H(Y) = 1/3 log 3 + 1/2 log(2)), shannon/discrete entropy is invariant under a function of the RV tbh, you are calulating the same sum over a different index set.
 * so we have
 * VI(X,Y) = 2 H(X) - 2I(X;Y)
 * To make life easier lets rewrite things a bit
 * I(X;Y) = H(X) - H(X|Y) = H(X)
 * => VI(X,Y) = -2 * H(X|Y) = - 2 * H(X|1-X) (at this point we can stop as this is 0)
 * Looking at wikpedia the claim is H(X|Y) is only 0 if the values are completely determined by one another, which is the case but going to go over the calculation for book keeping
 * To calculate the conditional entropy term we need the joint distribution:
 * p(y|x) = kronecker_delta_{y,1-x}
 * p(x,y) = p(y|x) p(x) = kronecker_delta_{y,1-x} p(x) = kronecker_delta_{x,1-y} p(y)
 * H(X|Y) = - \sum_ij p(x=i,y=j) ln kronecker_delta_{i, 1-j} = - sum_i p(x=i,y=1-i) ln kronecker_delta_{i,i} = sum_i p(y=i) ln 1 = 0
 * => VI(X,Y) = 0
 * I guess it "forms" a metric space if you change your notion of equality to mean X =^d Y if X and Y are dependant rvs. I need to read [1] again to see if this is the case its (this Rajhski distance / VI metric) not a very well defined in both the Mutual info wiki and here).
 * [1] Rajski, C., 1961. A metric space of discrete probability distributions. Information and Control, 4(4), pp.371-377. Francisco Vargas Palomo (talk) 03:49, 14 June 2024 (UTC)
 * Actually just gave [1] a skim:
 * > Two distributions of discrete random variables will be said to be equal if and only if the matrix of their joint probabilities is quasi-diagonal.
 * So its a metric space in the space of RVs (in the case of this work discrete RVs) where equality is considered in the sense that the two RVs can be completely determined from one another. So this https://en.wikipedia.org/wiki/Mutual_information#Metric needs a bit more detail Francisco Vargas Palomo (talk) 03:55, 14 June 2024 (UTC)

Universal?
Even more, it is a universal metric, in that if any other distance measure two items close-by, then the variation of information will also judge them close.

Seems doubtful. There must be some (or many) conditions on the "other" distance.

In any case, mathematics doesn't generally admit "universal" metrics. Topologies, Banach space norms, etc can generally be coarser or finer without bound. The only "universal"  metric in the sense defined above is the zero metric.

See also more extensive comments at Talk:Mutual_information where I've quoted from the original article. 129.132.211.9 (talk) 19:59, 22 November 2014 (UTC)

Probability measure
There was something wrong with the previous definition. It defined VI in terms of entropy concepts in other articles, but didn't draw attention to the fact that you have to use a uniform probability measure on the underlying set to make the other concepts useful. I wrote an independent definition in the spirit of the probability values p_i that were previously defined in the article, but never used. This required defining things for q_j as well and naming the overall set A. I hope the definition is okay. 178.38.151.183 (talk) 18:49, 29 November 2014 (UTC)

Strange reference
As far as I know, this "variation of information" is usually called the Crutchfield information metric, having been proposed by Jim Crutchfield in 1989 (http://csc.ucdavis.edu/~cmg/compmech/pubs/IAIMTitlePage.htm). This article doesn't include this reference and instead, for some reason, references a 2003 ArXiv preprint that is about a different quantity. (The reference only briefly mentions the Crutchfield metric, giving a reference to the standard textbook by Cover and Thomas, but then simply says it is "not appropriate for our purposes.") The page also references, without a citation, a 1973 paper that appears to be about a completely different topic.

In short, the references for this article seem to be a random selection of irrelevant articles, and the obvious primary reference is not included. I hope that an expert (possibly me when I have time) can address these issues.

Nathaniel Virgo (talk) 05:43, 19 March 2018 (UTC)

Zurek would seem to precede Crutchfield. See and Polnasam (talk) 21:13, 15 November 2019 (UTC)

In relation to the reference of 1973, that's actually an excellent reference: it corrects my previous comment on who's first. I added a subsection in the definition that bridges the gap with that paper. — Preceding unsigned comment added by Polnasam (talk • contribs) 22:15, 16 November 2019 (UTC)

Correction
The proof that the VI distance is a metric is
 * H(X|Z) ≤ H(X,Y|Z) = H(X|Y,Z) + H(Y|Z) ≤ H(X|Y) + H(Y|Z).

where the inequalities are explained by
 * 1) Adding variables increases the entropy.
 * 2) This is the "chain rule". Seeking a reference.
 * 3) Conditioning less increases the entropy.

2A02:1210:2642:4A00:8C85:4B63:B849:BF6D (talk) 08:15, 16 February 2024 (UTC)