Transfer entropy

Transfer entropy is a non-parametric statistic measuring the amount of directed (time-asymmetric) transfer of information between two random processes. Transfer entropy from a process X to another process Y is the amount of uncertainty reduced in future values of Y by knowing the past values of X given past values of Y. More specifically, if $$ X_t $$  and  $$ Y_t $$  for  $$ t\in \mathbb{N} $$  denote two random processes and the amount of information is measured using Shannon's entropy, the transfer entropy can be written as:



T_{X\rightarrow Y} = H\left( Y_t \mid Y_{t-1:t-L}\right) - H\left( Y_t \mid Y_{t-1:t-L}, X_{t-1:t-L}\right), $$

where H(X) is Shannon's entropy of X. The above definition of transfer entropy has been extended by other types of entropy measures such as Rényi entropy.

Transfer entropy is conditional mutual information, with the history of the influenced variable $$Y_{t-1:t-L}$$ in the condition:



T_{X\rightarrow Y} = I(Y_t ; X_{t-1:t-L} \mid Y_{t-1:t-L}). $$

Transfer entropy reduces to Granger causality for vector auto-regressive processes. Hence, it is advantageous when the model assumption of Granger causality doesn't hold, for example, analysis of non-linear signals. However, it usually requires more samples for accurate estimation. The probabilities in the entropy formula can be estimated using different approaches (binning, nearest neighbors) or, in order to reduce complexity, using a non-uniform embedding. While it was originally defined for bivariate analysis, transfer entropy has been extended to multivariate forms, either conditioning on other potential source variables or considering transfer from a collection of sources, although these forms require more samples again.

Transfer entropy has been used for estimation of functional connectivity of neurons, social influence in social networks and statistical causality between armed conflict events. Transfer entropy is a finite version of the directed information which was defined in 1990 by James Massey as $$I(X^n\to Y^n) =\sum_{i=1}^n I(X^i;Y_i|Y^{i-1})$$, where $$X^n$$ denotes the vector $$X_1,X_2,...,X_n$$ and $$Y^n$$ denotes $$Y_1,Y_2,...,Y_n$$. The directed information places an important role in characterizing the fundamental limits (channel capacity) of communication channels with or without feedback and gambling with causal side information.