Metric k-center

In graph theory, the metric $k$-center problem is a combinatorial optimization problem studied in theoretical computer science. Given $n$ cities with specified distances, one wants to build $k$ warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of $k$ vertices for which the largest distance of any point to its closest vertex in the $k$-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Formal definition
Let $$(X,d)$$ be a metric space where $$X$$ is a set and $$d$$ is a metric

A set $$\mathbf{V}\subseteq\mathcal{X}$$, is provided together with a parameter $$k$$. The goal is to find a subset $$\mathcal{C}\subseteq \mathbf{V}$$ with $$|\mathcal{C}|=k$$ such that the maximum distance of a point in $$\mathbf{V}$$ to the closest point in $$\mathcal{C}$$ is minimized. The problem can be formally defined as follows:

For a metric space ($$\mathcal{X}$$,d),
 * Input: a set $$\mathbf{V}\subseteq\mathcal{X}$$, and a parameter $$k$$.
 * Output: a set $$\mathcal{C}\subseteq\mathbf{V}$$ of $$k$$ points.
 * Goal: Minimize the cost $$r^\mathcal{C}(\mathbf{V}) = \underset{v\in V}{\max}$$ d(v,$$\mathcal{C}$$)

That is, every point in a cluster is in distance at most $$r^\mathcal{C}(V)$$ from its respective center.

The k-Center Clustering problem can also be defined on a complete undirected graph G = (V, E) as follows:

Given a complete undirected graph G = (V, E) with distances d(vi, vj) &isin; N satisfying the triangle inequality, find a subset C &sube; V with |C| = k while minimizing:


 * $$\max_{v \in V} \min_{c \in C} d(v,c)$$

Computational complexity
In a complete undirected graph G = (V, E), if we sort the edges in non-decreasing order of the distances: d(e1) &le; d(e2) &le; ... &le; d(em) and let Gi = (V, Ei), where Ei = {e1, e2, ..., ei}. The k-center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k.

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that Gi has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

A simple greedy algorithm
A simple greedy approximation algorithm that achieves an approximation factor of 2 builds $$\mathcal{C}$$ using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:


 * Pick an arbitrary point $$\bar{c}_1$$ into $$C_1$$
 * For every point $$v\in \mathbf{V}$$ compute $$d_1[v]$$ from $$\bar{c}_1$$
 * Pick the point $$\bar{c}_2$$ with highest distance from $$\bar{c}_1$$.
 * Add it to the set of centers and denote this expanded set of centers as $$C_2$$. Continue this till k centers are found

Running time

 * The ith iteration of choosing the ith center takes $$\mathcal{O}(n)$$ time.
 * There are k such iterations.
 * Thus, overall the algorithm takes $$\mathcal{O}(nk)$$ time.

Proving the approximation factor
The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points $$\mathbf{V}\subseteq\mathcal{X}$$, belonging to a metric space ($$\mathcal{X}$$,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e. $$r^{\mathbf{K}}(\mathbf{V})\leq 2r^{opt}(\mathbf{V},\textit{k})$$

This theorem can be proven using two cases as follows,

Case 1: Every cluster of $$\mathcal{C}_{opt}$$ contains exactly one point of $$\mathbf{K}$$


 * Consider a point $$v\in \mathbf{V}$$
 * Let $$\bar{c}$$ be the center it belongs to in $$\mathcal{C}_{opt}$$
 * Let $$\bar{k}$$ be the center of $$\mathbf{K}$$ that is in $$\Pi(\mathcal{C}_{opt},\bar{c})$$
 * $$d(v,\bar{c})=d(v,\mathcal{C}_{opt})\leq r^{opt}(\mathbf{V},k)$$
 * Similarly, $$d(\bar{k},\bar{c})=d(\bar{k},\mathcal{C}_{opt})\leq r^{opt}$$
 * By the triangle inequality: $$d(v,\bar{k})\leq d(v,\bar{c})+d(\bar{c},\bar{k})\leq 2r^{opt}$$

Case 2: There are two centers $$\bar{k}$$ and $$\bar{u}$$ of $$\mathbf{K}$$ that are both in $$\Pi(\mathcal{C}_{opt},\bar{c})$$, for some $$\bar{c}\in \mathcal{C}_{opt}$$ (By pigeon hole principle, this is the only other possibility) $$		\begin{align} r^\mathbf{K}(\mathbf{V})\leq r^{\mathcal{C}_{i-1}}(\mathbf{V})&=d(\bar{u},\mathcal{C}_{i-1})\\ &\leq d(\bar{u},\bar{k})\\ &\leq d(\bar{u},\bar{c})+d(\bar{c},\bar{k})\\ &\leq 2r^{opt} \end{align} $$
 * Assume, without loss of generality, that $$\bar{u}$$ was added later to the center set $$\mathbf{K}$$ by the greedy algorithm, say in ith iteration.
 * But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that $$\bar{k}\in\mathcal{C}_{i-1}$$and,

Another 2-factor approximation algorithm
Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k and computes a maximal independent set of Gi, looking for the smallest index i that has a maximal independent set with a size of at least k. It is not possible to find an approximation algorithm with an approximation factor of 2 &minus; &epsilon; for any &epsilon; > 0, unless P = NP. Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP.

Parameterized approximations
It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2 &minus; &epsilon; for any &epsilon; > 0, when using k as the parameter. This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP. When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme. This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.