Covering number

In mathematics, a covering number is the number of balls of a given size needed to completely cover a given space, with possible overlaps between the balls. The covering number quantifies the size of a set and can be applied to general metric spaces. Two related concepts are the packing number, the number of disjoint balls that fit in a space, and the metric entropy, the number of points that fit in a space when constrained to lie at some fixed minimum distance apart.

Definition
Let (M, d) be a metric space, let K be a subset of M, and let r be a positive real number. Let Br(x) denote the ball of radius r centered at x. A subset C of M is an r-external covering of K if:
 * $$K \subseteq \bigcup_{x \in C} B_r(x)$$.

In other words, for every $$y\in K$$ there exists $$x\in C$$ such that $$d(x,y)\leq r$$.

If furthermore C is a subset of K, then it is an r-internal covering.

The external covering number of K, denoted $$N^{\text{ext}}_r(K)$$, is the minimum cardinality of any external covering of K. The internal covering number, denoted $$N^{\text{int}}_r(K)$$, is the minimum cardinality of any internal covering.

A subset P of K is a packing if $$P \subseteq K$$ and the set $$\{B_r(x)\}_{x \in P}$$ is pairwise disjoint. The packing number of K, denoted $$N^{\text{pack}}_r(K)$$, is the maximum cardinality of any packing of K.

A subset S of K is r-separated if each pair of points x and y in S satisfies d(x, y) ≥ r. The metric entropy of K, denoted $$N^{\text{met}}_r(K)$$, is the maximum cardinality of any r-separated subset of K.

Examples
1. The metric space is the real line $\mathbb{R}$. $K\subset \mathbb{R}$ is a set of real numbers whose absolute value is at most $k$. Then, there is an external covering of $\left\lceil \frac{2k}{r} \right\rceil$ intervals of length $r$, covering the interval $[-k, k]$. Hence:
 * $N^{\text{ext}}_r(K) \leq \frac{2 k}{r}$
 * The metric space is the Euclidean space $\mathbb{R}^m$ with the Euclidean metric. $K\subset \mathbb{R}^m$ is a set of vectors whose length (norm) is at most $k$. If $K$ lies in a d-dimensional subspace of $\mathbb{R}^m$, then:
 * $N^{\text{ext}}_r(K) \leq \left(\frac{2 k \sqrt{d}}{r}\right)^d$.
 * The metric space is the space of real-valued functions, with the l-infinity metric. The covering number $N^{\text{int}}_r(K)$ is the smallest number $k$ such that, there exist $h_1,\ldots,h_k \in K$ such that, for all $h\in K$ there exists $i\in\{1,\ldots,k\}$ such that the supremum distance between $h$ and $h_i$ is at most $r$. The above bound is not relevant since the space is $\infty$-dimensional. However, when $K$ is a compact set, every covering of it has a finite sub-covering, so $N^{\text{int}}_r(K)$ is finite.
 * undefined

Properties
1. The internal and external covering numbers, the packing number, and the metric entropy are all closely related. The following chain of inequalities holds for any subset K of a metric space and any positive real number r.


 * $N^{\text{met}}_{2 r}(K) \leq N^{\text{pack}}_r(K) \leq N^{\text{ext}}_r(K) \leq N^{\text{int}}_r(K) \leq N^{\text{met}}_r(K)$
 * Each function except the internal covering number is non-increasing in r and non-decreasing in K. The internal covering number is monotone in r but not necessarily in K.
 * undefined

The following properties relate to covering numbers in the standard Euclidean space, $$\mathbb{R}^m$$: 1. If all vectors in $K$ are translated by a constant vector $k_0\in \mathbb{R}^m$, then the covering number does not change.

2. If all vectors in $K$ are multiplied by a scalar $k \in \mathbb{R}$, then:
 * for all $r$: $N^{\text{ext}}_{|k|\cdot r}(k\cdot K) = N^{\text{ext}}_{r}(K)$
 * If all vectors in $K$ are operated by a Lipschitz function $\phi$ with Lipschitz constant $k$, then:
 * for all $r$: $N^{\text{ext}}_{|k|\cdot r}(\phi\circ K) \leq N^{\text{ext}}_{r}(K)$


 * undefined

Application to machine learning
Let $$K$$ be a space of real-valued functions, with the l-infinity metric (see example 3 above). Suppose all functions in $$K$$ are bounded by a real constant $$M$$. Then, the covering number can be used to bound the generalization error of learning functions from $$K$$, relative to the squared loss:

\operatorname{Prob}\left[ \sup_{h\in K} \big\vert\text{GeneralizationError}(h) - \text{EmpiricalError}(h)\big\vert \geq \epsilon \right] \leq N^\text{int}_r (K)\, 2\exp{-m\epsilon^2 \over 2M^4} $$

where $$r = {\epsilon \over 8M}$$ and $$m$$ is the number of samples.