Concentration of measure

In mathematics, concentration of measure (about a median) is a principle that is applied in measure theory, probability and combinatorics, and has consequences for other fields such as Banach space theory. Informally, it states that "A random variable that depends in a Lipschitz way on many independent variables (but not too much on any of them) is essentially constant".

The concentration of measure phenomenon was put forth in the early 1970s by Vitali Milman in his works on the local theory of Banach spaces, extending an idea going back to the work of Paul Lévy. It was further developed in the works of Milman and Gromov, Maurey, Pisier, Schechtman, Talagrand, Ledoux, and others.

The general setting
Let $$(X, d)$$ be a metric space with a measure $$\mu$$ on the Borel sets with $$\mu(X) = 1$$. Let
 * $$\alpha(\epsilon) = \sup \left\{\mu( X \setminus A_\epsilon) \, | A \mbox{ is a Borel set and} \, \mu(A) \geq 1/2 \right\},$$

where
 * $$A_\epsilon = \left\{ x \, | \, d(x, A) < \epsilon \right\} $$

is the $$\epsilon$$-extension (also called $$\epsilon$$-fattening in the context of the Hausdorff distance) of a set $$A$$.

The function $$\alpha(\cdot)$$ is called the concentration rate of the space $$X$$. The following equivalent definition has many applications:
 * $$\alpha(\epsilon) = \sup \left\{ \mu( \{ F \geq \mathop{M} + \epsilon \}) \right\},$$

where the supremum is over all 1-Lipschitz functions $$F: X \to \mathbb{R}$$, and the median (or Levy mean) $$ M = \mathop{\mathrm{Med}} F $$ is defined by the inequalities
 * $$\mu \{ F \geq M \} \geq 1/2, \, \mu \{ F \leq M \} \geq 1/2.$$

Informally, the space $$X$$ exhibits a concentration phenomenon if $$\alpha(\epsilon)$$ decays very fast as $$\epsilon$$ grows. More formally, a family of metric measure spaces $$(X_n, d_n, \mu_n)$$ is called a Lévy family if the corresponding concentration rates $$\alpha_n$$ satisfy
 * $$\forall \epsilon > 0 \,\, \alpha_n(\epsilon) \to 0 {\rm \;as\; } n\to \infty,$$

and a normal Lévy family if
 * $$\forall \epsilon > 0 \,\, \alpha_n(\epsilon) \leq C \exp(-c n \epsilon^2)$$

for some constants $$c,C>0$$. For examples see below.

Concentration on the sphere
The first example goes back to Paul Lévy. According to the spherical isoperimetric inequality, among all subsets $$A$$ of the sphere $$S^n$$ with prescribed spherical measure $$\sigma_n(A)$$, the spherical cap
 * $$ \left\{ x \in S^n | \mathrm{dist}(x, x_0) \leq R \right\}, $$

for suitable $$R$$, has the smallest $$\epsilon$$-extension $$A_\epsilon$$ (for any $$\epsilon > 0$$).

Applying this to sets of measure $$\sigma_n(A) = 1/2$$ (where $$\sigma_n(S^n) = 1$$), one can deduce the following concentration inequality:
 * $$\sigma_n(A_\epsilon) \geq 1 - C \exp(- c n \epsilon^2) $$,

where $$C,c$$ are universal constants. Therefore $$(S^n)_n$$ meet the definition above of a normal Lévy family.

Vitali Milman applied this fact to several problems in the local theory of Banach spaces, in particular, to give a new proof of Dvoretzky's theorem.

Concentration of measure in physics
All classical statistical physics is based on the concentration of measure phenomena: The fundamental idea (‘theorem’) about equivalence of ensembles in thermodynamic limit (Gibbs, 1902 and Einstein, 1902-1904  ) is exactly the thin shell concentration theorem. For each mechanical system consider the phase space equipped by the invariant Liouville measure (the phase volume) and conserving energy E. The microcanonical ensemble is just an invariant distribution over the surface of constant energy E obtained by Gibbs as the limit of distributions in phase space with constant density in thin layers between the surfaces of states with energy E and with energy E+ΔE. The canonical ensemble is given by the probability density in the phase space (with respect to the phase volume) $$\rho = e^{\frac{F - E}{k T}},$$ where quantities F=const and T=const are defined by the conditions of probability normalisation and the given expectation of energy E.

When the number of particles is large, then the difference between average values of the macroscopic variables for the canonical and microcanonical ensembles tends to zero, and their fluctuations are explicitly evaluated. These results are proven rigorously under some regularity conditions on the energy function E by Khinchin (1943). The simplest particular case when E is a sum of squares was well-known in detail before Khinchin and Lévy and even before Gibbs and Einstein. This is the Maxwell–Boltzmann distribution of the particle energy in ideal gas.

The microcanonical ensemble is very natural from the naïve physical point of view: this is just a natural equidistribution on the isoenergetic hypersurface. The canonical ensemble is very useful because of an important property: if a system consists of two non-interacting subsystems, i.e. if the energy E is the sum, $$E=E_1(X_1)+E_2(X_2)$$, where $$X_1, X_2$$ are the states of the subsystems, then the equilibrium states of subsystems are independent, the equilibrium distribution of the system is the product of equilibrium distributions of the subsystems with the same T. The equivalence of these ensembles is the cornerstone of the mechanical foundations of thermodynamics.

Other examples

 * Borell–TIS inequality
 * Gaussian isoperimetric inequality
 * McDiarmid's inequality
 * Talagrand's concentration inequality
 * Asymptotic equipartition property