Tjøstheim's coefficient

Tjøstheim's coefficient is a measure of spatial association that attempts to quantify the degree to which two spatial data sets are related. Developed by Norwegian statistician Dag Tjøstheim. It is similar to rank correlation coefficients like Spearman's rank correlation coefficient and the Kendall rank correlation coefficient but also explicitly considers the spatial relationship between variables.

Consider two variables, $$F(x,y)$$ and $$G(x,y)$$, observed at the same set of $$N$$ spatial locations with co-ordinates $$x_i$$ and $$y_i$$. The Rank of $$F$$ at $$(x_i,y_i)$$ is


 * $$R_F(x_i,y_i) = \sum_i^N \theta(F(x_i,y_i) - F(x_j,y_j) )$$

with a similar definition for $$G$$. Here $$\theta$$ is a step function and this formula counts how many values $$F(x_j,y_j)$$ are less than or equal to the value at the target point $$F(x_i,y_i)$$.

Now define
 * $$X_F(i) = \sum_j^N x_j \delta(i, R_F(x_j,y_j))$$

where $$\delta$$ is the Kronecker delta. This is the $$x$$ coordinate of the $$i^{\text{th}}$$ ranked $$F$$ value. The quantities $$Y_F(i), X_G(i)$$ and $$Y_G(i)$$ can be defined similarly.

Tjøstheim's coefficient is defined by


 * $$ A = \frac{\sum_i^N (X_F(i) - \bar{X}_F)(X_G(i) - \bar{X}_G) + (Y_F(i) - \bar{Y}_F)(Y_G(i) - \bar{Y}_G) }{\left(\sum_i^N\left[(X_F(i) - \bar{X}_F)^2 + (Y_F(i) - \bar{Y}_F)^2\right] \sum_i^N\left[(X_G(i) - \bar{X}_G)^2 + (Y_G(i) - \bar{Y}_G)^2\right] \right)^{1/2}}$$

Under the assumptions that $$F$$ and $$G$$ are independent and identically distributed random variables and are independent of each other it can be shown that $$E[A] = 0$$ and


 * $$var(A) = \frac{\left(\sum_i^N x_i^2\right)^2 + 2 \left(\sum_i^N x_i y_i\right)^2 + \left(\sum_i^N y_i^2\right)^2 }{(N-1)\left(\sum_i^N x_i^2 + \sum_i^N y_i^2\right)^2 }$$

The maximum variance of $$1/(N-1)$$ occurs when all points are on a straight line and the minimum variance of $$1/(2(N-1))$$ occurs for a symmetric cross pattern where $$x_i y_i = 0$$ and $$\sum_i^N x_i^2 = \sum_i^N y_i^2$$.

Tjøstheim's coefficient is implemented as cor.spatial in the R package SpatialPack. Numerical simulations suggest that $$A$$ is an effective measure of correlation between variables but is sensitive to the degree of autocorrelation in $$F$$ and $$G$$.