Polar factorization theorem

In optimal transport, a branch of mathematics, polar factorization of vector fields is a basic result due to Brenier (1987), with antecedents of Knott-Smith (1984) and Rachev (1985), that generalizes many existing results among which are the polar decomposition of real matrices, and the rearrangement of real-valued functions.

The theorem
 Notation. Denote $$\xi_\# \mu$$ the image measure of $$\mu$$ through the map $$\xi$$.

 Definition: Measure preserving map.  Let $$(X,\mu)$$ and $$(Y,\nu)$$ be some probability spaces and $$\sigma :X \rightarrow Y$$ a measurable map. Then, $$\sigma$$ is said to be measure preserving iff $$\sigma_{\#}\mu = \nu$$, where $$\#$$ is the pushforward measure. Spelled out: for every $$\nu$$-measurable subset $$\Omega$$ of $$Y$$, $$\sigma^{-1}(\Omega)$$ is $$\mu$$-measurable, and $$\mu(\sigma^{-1}(\Omega))=\nu(\Omega )$$. The latter is equivalent to:
 * $$ \int_{X}(f\circ \sigma)(x) \mu(dx) =\int_X (\sigma^*f)(x) \mu(dx) =\int_Y f(y) (\sigma_{\#}\mu)(dy) = \int_{Y}f(y) \nu(dy)$$

where $$f$$ is $$\nu$$-integrable and $$ f\circ \sigma $$ is $$\mu$$-integrable.

 Theorem.  Consider a map $$\xi :\Omega \rightarrow R^{d}$$ where $$\Omega$$ is a convex subset of $$R^{d}$$, and $$\mu$$ a measure on $$\Omega$$ which is absolutely continuous. Assume that $$\xi_{\#}\mu$$ is absolutely continuous. Then there is a convex function $$\varphi :\Omega \rightarrow R$$ and a map $$\sigma :\Omega \rightarrow \Omega$$ preserving $$\mu$$ such that

$$ \xi =\left( \nabla \varphi \right) \circ \sigma $$

In addition, $$\nabla \varphi$$ and $$\sigma$$ are uniquely defined almost everywhere.

Dimension 1
In dimension 1, and when $$\mu$$ is the Lebesgue measure over the unit interval, the result specializes to Ryff's theorem. When $$d=1$$ and $$\mu$$ is the uniform distribution over $$\left[0,1\right]$$, the polar decomposition boils down to

$$ \xi \left( t\right) =F_{X}^{-1}\left( \sigma \left( t\right) \right) $$

where $$F_{X}$$ is cumulative distribution function of the random variable $$\xi \left( U\right)$$ and $$U$$ has a uniform distribution over $$\left[ 0,1\right]$$. $$F_{X}$$ is assumed to be continuous, and $$\sigma \left( t\right)=F_{X}\left( \xi \left( t\right) \right)$$ preserves the Lebesgue measure on $$\left[ 0,1\right]$$.

Polar decomposition of matrices
When $$\xi$$ is a linear map and $$\mu$$ is the Gaussian normal distribution, the result coincides with the polar decomposition of matrices. Assuming $$\xi \left( x\right) =Mx$$ where $$M$$ is an invertible $$d\times d$$ matrix and considering $$\mu$$ the $$\mathcal{N}\left( 0,I_{d}\right)$$ probability measure, the polar decomposition boils down to

$$ M=SO $$

where $$S$$ is a symmetric positive definite matrix, and $$O$$ an orthogonal matrix. The connection with the polar factorization is $$\varphi \left(x\right) =x^{\top }Sx/2$$ which is convex, and $$\sigma \left( x\right) =Ox$$ which preserves the $$\mathcal{N}\left( 0,I_{d}\right)$$ measure.

Helmholtz decomposition
The results also allow to recover Helmholtz decomposition. Letting $$x\rightarrow V\left( x\right)$$ be a smooth vector field it can then be written in a unique way as

$$ V=w+\nabla p $$

where $$p$$ is a smooth real function defined on $$\Omega$$, unique up to an additive constant, and $$w$$ is a smooth divergence free vector field, parallel to the boundary of $$\Omega$$.

The connection can be seen by assuming $$\mu $$ is the Lebesgue measure on a compact set $$\Omega \subset R^{n}$$ and by writing $$\xi$$ as a perturbation of the identity map

$$ \xi _{\epsilon }(x)=x+\epsilon V(x) $$

where $$\epsilon$$ is small. The polar decomposition of $$\xi _{\epsilon }$$ is given by $$\xi _{\epsilon }=(\nabla \varphi_{\epsilon })\circ \sigma_{\epsilon }$$. Then, for any test function $$f:R^{n}\rightarrow R$$ the following holds:

$$ \int_{\Omega }f(x+\epsilon V(x))dx=\int_{\Omega }f((\nabla \varphi _{\epsilon })\circ \sigma _{\epsilon }\left( x\right) )dx=\int_{\Omega }f(\nabla \varphi _{\epsilon }\left( x\right) )dx $$

where the fact that $$\sigma _{\epsilon }$$ was preserving the Lebesgue measure was used in the second equality.

In fact, as $$\textstyle \varphi _{0}(x)=\frac{1}{2}\Vert x\Vert ^{2}$$, one can expand $$\textstyle \varphi _{\epsilon }(x)=\frac{1}{2}\Vert x\Vert ^{2}+\epsilon p(x)+O(\epsilon ^{2})$$, and therefore $$\textstyle \nabla \varphi_{\epsilon }\left( x\right) =x+\epsilon \nabla p(x)+O(\epsilon ^{2})$$. As a result, $$\textstyle \int_{\Omega }\left( V(x)-\nabla p(x)\right) \nabla f(x))dx$$ for any smooth function $$f$$, which implies that $$w\left( x\right) =V(x)-\nabla p(x)$$ is divergence-free.