Hopkins statistic

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set. It belongs to the family of sparse sampling tests. It acts as a statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed. If individuals are aggregated, then its value approaches 0, and if they are randomly distributed, the value tends to 0.5.

Preliminaries
A typical formulation of the Hopkins statistic follows.
 * Let $$X$$ be the set of $$n$$ data points.
 * Generate a random sample $$\overset{\sim}{X}$$ of $$m \ll n$$ data points sampled without replacement from $$X$$.
 * Generate a set $$Y$$ of $$m$$ uniformly randomly distributed data points.
 * Define two distance measures,
 * $$u_i,$$ the minimum distance (given some suitable metric) of $$y_i \in Y$$ to its nearest neighbour in $$X$$, and
 * $$w_i,$$ the minimum distance of $$\overset{\sim}{x}_i \in \overset{\sim}{X}\subseteq X$$ to its nearest neighbour $$x_j \in X,\, \overset{\sim}{x_i}\ne x_j.$$

Definition
With the above notation, if the data is $$d$$ dimensional, then the Hopkins statistic is defined as:

$$ H=\frac{\sum_{i=1}^m{u_i^d}}{\sum_{i=1}^m{u_i^d}+\sum_{i=1}^m{w_i^d}} \, $$

Under the null hypotheses, this statistic has a Beta(m,m) distribution.