Centerpoint (geometry)

In statistics and computational geometry, the notion of centerpoint is a generalization of the median to data in higher-dimensional Euclidean space. Given a set of points in d-dimensional space, a centerpoint of the set is a point such that any hyperplane that goes through that point divides the set of points in two roughly equal subsets: the smaller part should have at least a 1/(d + 1) fraction of the points. Like the median, a centerpoint need not be one of the data points. Every non-empty set of points (with no duplicates) has at least one centerpoint.

Related concepts
Closely related concepts are the Tukey depth of a point (the minimum number of sample points on one side of a hyperplane through the point) and a Tukey median of a point set (a point maximizing the Tukey depth). A centerpoint is a point of depth at least n/(d + 1), and a Tukey median must be a centerpoint, but not every centerpoint is a Tukey median. Both terms are named after John Tukey.

For a different generalization of the median to higher dimensions, see geometric median.

Existence
A simple proof of the existence of a centerpoint may be obtained using Helly's theorem. Suppose there are n points, and consider the family of closed half-spaces that contain more than dn/(d + 1) of the points. Fewer than n/(d + 1) points are excluded from any one of these halfspaces, so the intersection of any subset of d + 1 of these halfspaces must be nonempty. By Helly's theorem, it follows that the intersection of all of these halfspaces must also be nonempty. Any point in this intersection is necessarily a centerpoint.

Algorithms
For points in the Euclidean plane, a centerpoint may be constructed in linear time. In any dimension d, a Tukey median (and therefore also a centerpoint) may be constructed in time O(nd &minus; 1 + n log n).

A randomized algorithm that repeatedly replaces sets of d + 2 points by their Radon point can be used to compute an approximation to a centerpoint of any point set, in the sense that its Tukey depth is linear in the sample set size, in an amount of time that is polynomial in the dimension.