Shattered set

A class of sets is said to shatter another set if it is possible to "pick out" any element of that set using intersection. The concept of shattered sets plays an important role in Vapnik–Chervonenkis theory, also known as VC-theory. Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory.

Definition
Suppose A is a set and C is a class of sets. The class C shatters the set A if for each subset a of A, there is some element c of C such that


 * $$a = c \cap A.$$

Equivalently, C shatters A when their intersection is equal to A's power set: P(A) = { c ∩ A | c ∈ C }.

We employ the letter C  to refer to a "class" or "collection" of sets, as in a Vapnik–Chervonenkis class (VC-class). The set A is often assumed to be finite because, in empirical processes, we are interested in the shattering of finite sets of data points.

Example
We will show that the class of all discs in the plane (two-dimensional space) does not shatter every set of four points on the unit circle, yet the class of all convex sets in the plane does shatter every finite set of points on the unit circle.

Let A be a set of four points on the unit circle and let C be the class of all discs. To test where C shatters A, we attempt to draw a disc around every subset of points in A. First, we draw a disc around the subsets of each isolated point. Next, we try to draw a disc around every subset of point pairs. This turns out to be doable for adjacent points, but impossible for points on opposite sides of the circle. As visualized below:

With a bit of thought, we could generalize that any set of finite points on a unit circle could be shattered by the class of all convex sets (visualize connecting the dots).

Shatter coefficient
To quantify the richness of a collection C of sets, we use the concept of shattering coefficients (also known as the growth function). For a collection C of sets $$s \subset \Omega$$, $$\Omega$$ being any space, often a sample space,  we define the nth shattering coefficient of C as


 * $$ S_C(n) = \max_{\forall x_1,x_2,\dots,x_n \in \Omega } \operatorname{card} \{\,\{\,x_1,x_2,\dots,x_n\}\cap s, s\in C \}$$

where $$\operatorname{card}$$ denotes the cardinality of the set and $$x_1,x_2,\dots,x_n \in \Omega $$ is any set of n points,.

$$ S_C(n) $$ is the largest number of subsets of any set A of n points that can be formed by intersecting  A with the sets in collection C.

Here are some facts about $$S_C(n)$$: The third property means that if C cannot shatter any set of cardinality N then it can not shatter sets of larger cardinalities.
 * 1) $$S_C(n)\leq 2^n$$ for all n because $$\{s\cap A|s\in C\}\subseteq P(A)$$ for any $$A\subseteq \Omega$$.
 * 2) If $$S_C(n)=2^n$$, that means there is a set of cardinality n, which can be shattered by C.
 * 3) If $$S_C(N)<2^N$$ for some $$N>1$$ then $$S_C(n)<2^n$$ for all $$n\geq N$$.

Vapnik–Chervonenkis class
The VC dimension of a class C is defined as
 * $$VC(C)=\underset{n}{\min}\{n:S_C(n)<2^n\}\,$$

or, alternatively, as
 * $$VC_0(C)=\underset{n}{\max}\{n:S_C(n)=2^n\}.\,$$

Note that $$VC(C)=VC_0(C)+1.$$

If for any n there is a set of cardinality n which can be shattered by C, then $$S_C(n)=2^n$$ for all n and the VC dimension of this class C is infinite.

A class with finite VC dimension is called a Vapnik–Chervonenkis class or VC class. A class C is uniformly Glivenko–Cantelli if and only if it is a VC class.