Disintegration theorem

In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.

Motivation
Consider the unit square $$S = [0,1]\times[0,1]$$ in the Euclidean plane $$\mathbb{R}^2$$. Consider the probability measure $$\mu$$ defined on $$S$$ by the restriction of two-dimensional Lebesgue measure $$\lambda^2$$ to $$S$$. That is, the probability of an event $$E\subseteq S$$ is simply the area of $$E$$. We assume $$E$$ is a measurable subset of $$S$$.

Consider a one-dimensional subset of $$S$$ such as the line segment $$L_x = \{x\}\times[0, 1]$$. $$L_x$$ has $$\mu$$-measure zero; every subset of $$L_x$$ is a $$\mu$$-null set; since the Lebesgue measure space is a complete measure space, $$E \subseteq L_{x} \implies \mu (E) = 0.$$

While true, this is somewhat unsatisfying. It would be nice to say that $$\mu$$ "restricted to" $$L_x$$ is the one-dimensional Lebesgue measure $$\lambda^1$$, rather than the zero measure. The probability of a "two-dimensional" event $$E$$ could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" $$E\cap L_x$$: more formally, if $$\mu_x$$ denotes one-dimensional Lebesgue measure on $$L_x$$, then $$\mu (E) = \int_{[0, 1]} \mu_{x} (E \cap L_{x}) \, \mathrm{d} x$$ for any "nice" $$E\subseteq S$$. The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.

Statement of the theorem
(Hereafter, $$\mathcal{P}(X)$$ will denote the collection of Borel probability measures on a topological space $$(X, T)$$.) The assumptions of the theorem are as follows:
 * Let $$Y$$ and $$X$$ be two Radon spaces (i.e. a topological space such that every Borel probability measure on it is inner regular, e.g. separably metrizable spaces; in particular, every probability measure on it is outright a Radon measure).
 * Let $$\mu\in\mathcal{P}(Y)$$.
 * Let $$\pi : Y\to X$$ be a Borel-measurable function. Here one should think of $$\pi$$ as a function to "disintegrate" $$Y$$, in the sense of partitioning $$Y$$ into $$\{ \pi^{-1}(x)\ |\ x \in X\}$$. For example, for the motivating example above, one can define $$\pi((a,b)) = a$$, $$(a,b) \in [0,1]\times [0,1]$$, which gives that $$\pi^{-1}(a) = a \times [0,1]$$, a slice we want to capture.
 * Let $$\nu \in\mathcal{P}(X)$$ be the pushforward measure $$\nu = \pi_{*}(\mu) = \mu \circ \pi^{-1}$$. This measure provides the distribution of $$x$$ (which corresponds to the events $$\pi^{-1}(x)$$).

The conclusion of the theorem: There exists a $$\nu$$-almost everywhere uniquely determined family of probability measures $$\{\mu_x\}_{x\in X} \subseteq \mathcal{P}(Y)$$, which provides a "disintegration" of $$\mu$$ into $\{\mu_x\}_{x \in X}$, such that:
 * the function $$x \mapsto \mu_{x}$$ is Borel measurable, in the sense that $$x \mapsto \mu_{x} (B)$$ is a Borel-measurable function for each Borel-measurable set $$B\subseteq Y$$;
 * $$\mu_x$$ "lives on" the fiber $$\pi^{-1}(x)$$: for $$\nu$$-almost all $$x\in X$$, $$\mu_{x} \left( Y \setminus \pi^{-1} (x) \right) = 0,$$ and so $$\mu_x(E) =\mu_x(E\cap\pi^{-1}(x))$$;
 * for every Borel-measurable function $$f : Y \to [0,\infty]$$, $$\int_{Y} f(y) \, \mathrm{d} \mu (y) = \int_{X} \int_{\pi^{-1} (x)} f(y) \, \mathrm{d} \mu_x (y) \, \mathrm{d} \nu (x).$$ In particular, for any event $$E\subseteq Y$$, taking $$f$$ to be the indicator function of $$E$$, $$\mu (E) = \int_X \mu_x (E) \, \mathrm{d} \nu (x).$$

Product spaces
The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.

When $$Y$$ is written as a Cartesian product $$Y = X_1\times X_2$$ and $$\pi_i : Y\to X_i$$ is the natural projection, then each fibre $$\pi_1^{-1}(x_1)$$ can be canonically identified with $$X_2$$ and there exists a Borel family of probability measures $$\{ \mu_{x_{1}} \}_{x_{1} \in X_{1}}$$ in $$\mathcal{P}(X_2)$$ (which is $$(\pi_1)_*(\mu)$$-almost everywhere uniquely determined) such that $$\mu = \int_{X_{1}} \mu_{x_{1}} \, \mu \left(\pi_1^{-1}(\mathrm d x_1) \right)= \int_{X_{1}} \mu_{x_{1}} \, \mathrm{d} (\pi_{1})_{*} (\mu) (x_{1}),$$ which is in particular $$\int_{X_1\times X_2} f(x_1,x_2)\, \mu(\mathrm d x_1,\mathrm d x_2) = \int_{X_1}\left( \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2\mid x_1) \right) \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right)$$ and $$\mu(A \times B) = \int_A \mu\left(B\mid x_1\right) \, \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right).$$

The relation to conditional expectation is given by the identities $$\operatorname E(f\mid \pi_1)(x_1)= \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2\mid x_1),$$ $$\mu(A\times B\mid \pi_1)(x_1)= 1_A(x_1) \cdot \mu(B\mid x_1).$$

Vector calculus
The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface $\Sigma \subset \mathbb{R}^3$, it is implicit that the "correct" measure on $$\Sigma$$ is the disintegration of three-dimensional Lebesgue measure $$\lambda^3$$ on $$\Sigma$$, and that the disintegration of this measure on ∂Σ is the same as the disintegration of $$\lambda^3$$ on $$\partial\Sigma$$.

Conditional distributions
The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability. The theorem is related to the Borel–Kolmogorov paradox, for example.