Dual total correlation

In information theory, dual total correlation, information rate, excess entropy, or binding information is one of several known non-negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the dual total correlation is bounded by the joint-entropy of the n elements. Although well behaved, dual total correlation has received much less attention than the total correlation. A measure known as "TSE-complexity" defines a continuum between the total correlation and dual total correlation.

Definition


For a set of n random variables $$\{X_1,\ldots,X_n\}$$, the dual total correlation $$D(X_1,\ldots,X_n)$$ is given by


 * $$ D(X_1,\ldots,X_n) = H\left( X_1, \ldots, X_n \right) - \sum_{i=1}^n H\left( X_i \mid X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n \right) ,$$

where $$H(X_{1},\ldots,X_{n})$$ is the joint entropy of the variable set $$\{X_{1},\ldots,X_{n}\}$$ and $$H(X_i \mid \cdots )$$ is the conditional entropy of variable $$X_{i}$$, given the rest.

Normalized
The dual total correlation normalized between [0,1] is simply the dual total correlation divided by its maximum value $$H(X_{1}, \ldots, X_{n})$$,


 * $$ND(X_1,\ldots,X_n) = \frac{D(X_1,\ldots,X_n)}{H(X_1,\ldots,X_n)} .$$

Relationship with Total Correlation
Dual total correlation is non-negative and bounded above by the joint entropy $$H(X_1, \ldots, X_n)$$.


 * $$ 0 \leq D(X_1, \ldots, X_n) \leq H(X_1, \ldots, X_n) .$$

Secondly, Dual total correlation has a close relationship with total correlation, $$C(X_1, \ldots, X_n)$$, and can be written in terms of differences between the total correlation of the whole, and all subsets of size $$N-1$$:


 * $$ D(\textbf{X}) = (N-1)C(\textbf{X}) - \sum_{i=1}^{N} C(\textbf{X}^{-i}) $$

where $$\textbf{X} = \{X_1,\ldots,X_n\}$$ and $$ \textbf{X}^{-i} = \{X_1,\ldots,X_{i-1},X_{i+1},\ldots,X_n\}$$

Furthermore, the total correlation and dual total correlation are related by the following bounds:


 * $$ \frac{C(X_1, \ldots, X_n)}{n-1} \leq D(X_1, \ldots, X_n) \leq (n-1) \; C(X_1, \ldots, X_n) .$$

Finally, the difference between the total correlation and the dual total correlation defines a novel measure of higher-order information-sharing: the O-information:


 * $$\Omega(\textbf{X}) = C(\textbf{X}) - D(\textbf{X}) $$.

The O-information (first introduced as the "enigmatic information" by James and Crutchfield is a signed measure that quantifies the extent to which the information in a multivariate random variable is dominated by synergistic interactions (in which case $$\Omega(\textbf{X})<0$$) or redundant interactions (in which case $$\Omega(\textbf{X}) > 0$$.

History
Han (1978) originally defined the dual total correlation as,

\begin{align} & D(X_1,\ldots,X_n) \\[10pt] \equiv {} & \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) \right] - (n-1) \; H(X_1, \ldots, X_n) \;. \end{align} $$ However Abdallah and Plumbley (2010) showed its equivalence to the easier-to-understand form of the joint entropy minus the sum of conditional entropies via the following:



\begin{align} & D(X_1,\ldots,X_n) \\[10pt] \equiv {} & \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) \right] - (n-1) \; H(X_1, \ldots, X_n) \\ = {} & \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) \right] + (1-n) \; H(X_1, \ldots, X_n) \\ = {} & H(X_1, \ldots, X_n) + \left[ \sum_{i=1}^n H(X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n ) - H(X_1, \ldots, X_n) \right] \\ = {} & H\left( X_1, \ldots, X_n \right) - \sum_{i=1}^n H\left( X_i \mid X_1, \ldots, X_{i-1}, X_{i+1}, \ldots, X_n \right)\;. \end{align} $$