Kramers–Moyal expansion

In stochastic processes, Kramers–Moyal expansion refers to a Taylor series expansion of the master equation, named after Hans Kramers and José Enrique Moyal. In many textbooks, the expansion is used only to derive the Fokker–Planck equation, and never used again. In general, continuous stochastic processes are essentially all Markovian, and so Fokker–Planck equations are sufficient for studying them. The higher-order Kramers–Moyal expansion only come into play when the process is jumpy. This usually means it is a Poisson-like process.

For a real stochastic process, one can compute its central moment functions from experimental data on the process, from which one can then compute its Kramers–Moyal coefficients, and thus empirically measure its Kolmogorov forward and backward equations. This is implemented as a python package

Statement
Start with the integro-differential master equation


 * $$\frac{\partial p(x,t)}{\partial t} =\int p(x,t|x_0, t_0)p(x_0, t_0) dx_0$$

where $$p(x, t|x_0, t_0)$$ is the transition probability function, and $$p(x,t)$$ is the probability density at time $$t$$. The Kramers–Moyal expansion transforms the above to an infinite order partial differential equation


 * $$\partial_t p(x,t) = \sum_{n=1}^\infty (-\partial_x)^n[D_n(x,t) p(x,t)]$$

and also$$\partial_t p(x, t|x_0, t_0) = \sum_{n=1}^\infty (-\partial_x)^n [D_n(x, t) p(x, t|x_0, t_0) ] $$

where $$D_n(x, t)$$ are the Kramers–Moyal coefficients, defined by$$D_n(x, t) = \frac{1}{n!}\lim_{\tau\to 0} \frac{1}{\tau} \mu_n(t|x, t-\tau)$$and $$\mu_n$$ are the central moment functions, defined by


 * $$\mu_n(t' | x, t) = \int_{-\infty}^\infty (x'-x)^n p(x', t'\mid x, t) \ dx'.$$

The Fokker–Planck equation is obtained by keeping only the first two terms of the series in which $$D_1$$ is the drift and $$D_2$$ is the diffusion coefficient.

Also, the moments, assuming they exist, evolves as

$$\frac{\partial}{\partial t}\left\langle x^n\right\rangle=\sum_{k=1}^n \frac{n !}{(n-k) !}\left\langle x^{n-k} D^{(k)}(x, t)\right\rangle$$where angled brackets mean taking the expectation: $$\left\langle f\right\rangle = \int f(x) p(x, t)dx$$.

n-dimensional version
The above version is the one-dimensional version. It generalizes to n-dimensions. (Section 4.7 )

Proof
In usual probability, where the probability density does not change, the moments of a probability density function determines the probability density itself by a Fourier transform (details may be found at the characteristic function page):$$p(x) = \frac{1}{2\pi} \int e^{-ikx}\tilde p(k)dk = \sum_{n=0}^\infty \frac{(-1)^n}{n!}\delta^{(n)}(x)\mu_n $$$$ \tilde p(k) = \int e^{ikx} p(x) dx = \sum_{n=0}^\infty\frac{(ik)^n}{n!} \mu_n $$Similarly, $$p(x, t| x_0, t_0 ) = \sum_{n=0}^\infty \frac{(-1)^n}{n!}\delta^{(n)}(x-x_0) \mu_n(t|x_0, t_0)$$ Now we need to integrate away the Dirac delta function. Fixing a small $$\tau > 0$$, we have by the Chapman-Kolmogorov equation,$$\begin{align} p(x, t) &= \int p(x,t|x', t-\tau) p(x', t-\tau) dx' \\ &= \sum_{n=0}^\infty \frac{(-1)^n}{n!}\int p(x', t-\tau) \delta^{(n)}(x-x') \mu_n(t|x', t-\tau) dx' \\ &= \sum_{n=0}^\infty \frac{(-1)^n}{n!} \partial_x^n (p(x, t-\tau) \mu_n(t|x, t-\tau)) \end{align} $$The $$n=0$$ term is just $$p(x, t-\tau)$$, so taking derivative with respect to time,$$\partial_t p(x, t) = \lim_{\tau \to 0^+}\frac 1\tau \sum_{n=1}^\infty \frac{(-1)^n}{n!} \partial_x^n (p(x, t-\tau) \mu_n(t|x, t-\tau)) = \sum_{n=1}^\infty (-\partial_x)^n (p(x, t) D_n(x, t)) $$

The same computation with $$p(x, t|x_0, t_0)$$ gives the other equation.

Forward and backward equations
The equation can be recast into a linear operator form, using the idea of infinitesimal generator. Define the linear operator $$\mathcal A f := \sum_{n=1}^\infty (-\partial_x)^n[D_n(x,t) f(x,t)] $$then the equation above states $$\begin{align} \partial_t p(x, t) &= \mathcal{A} p(x, t) \\ \partial_t p(x, t|x_0, t_0) &= \mathcal{A} p(x, t|x_0, t_0) \end{align} $$In this form, the equations are precisely in the form of a general Kolmogorov forward equation. The backward equation then states that$$\partial_t p(x_1, t_1|x, t) = -\mathcal{A}^\dagger p(x_1, t_1|x, t) $$where$$\mathcal A^\dagger f := \sum_{n=1}^\infty D_n(x,t) \partial_x^n[f(x,t)] $$ is the Hermitian adjoint of $$\mathcal A$$.

Computing the Kramers–Moyal coefficients
By definition,$$D_n(x, t) = \frac{1}{n!}\lim_{\tau\to 0} \frac{1}{\tau} \mu_n(t|x, t-\tau)$$This definition works because $$\mu_n(t|x, t) = 0$$, as those are the central moments of the Dirac delta function. Since the even central moments are nonnegative, we have $$D_{2n} \geq 0$$ for all $$n\geq 1$$. When the stochastic process is the Markov process $$dX = bdt + \sigma dW_t$$, we can directly solve for $$p(x, t|x, t-\tau)$$ as approximated by a normal distribution with mean $$x + b(x)\tau$$ and variance $$\sigma^2\tau$$. This then allows us to compute the central moments, and so$$D_1 = b, \quad D_2 = \frac 12 \sigma^2, \quad D_3=D_4=\cdots = 0$$This then gives us the 1-dimensional Fokker–Planck equation:$$\partial_t p = -\partial_x(bp) + \frac 12 \partial_x^2(\sigma^2 p)$$

Pawula theorem
Pawula theorem states that either the sequence $$D_1, D_2, D_3, ...$$ becomes zero at the third term, or all its even terms are positive.

Proof
By Cauchy–Schwarz inequality, the central moment functions satisfy $$\mu_{n+m}^2 \leq \mu_{2n}\mu_{2m}$$. So, taking the limit, we have $$D_{n+m}^2 \leq \frac{(2n)!(2m)!}{(n+m)!^2}D_{2n}D_{2m}$$. If some $$D_{2+n} \neq 0$$ for some $$n \geq 1$$, then $$D_2 D_{2+2n}> 0$$. In particular, $$D_{2+n}, D_{2+2n}, D_{2+4n}, ... > 0$$. So the existence of any nonzero coefficient of order $$\geq 3$$ implies the existence of nonzero coefficients of arbitrarily large order. Also, if $$D_n \neq 0$$, then $$D_2D_{2n-2} > 0, D_4D_{2n-4} > 0, ...$$. So the existence of any nonzero coefficient of order $$n$$ implies all coefficients of order $$2, 4, ..., 2n-2$$ are positive.

Interpretation
Let the operator $$\mathcal A_m $$ be defined such $$\mathcal A_m f := \sum_{n=1}^m (-\partial_x)^n[D_n(x,t) f(x,t)] $$. The probability density evolves by $$\partial_t\rho \approx \mathcal A_m \rho$$. Different order of $$m$$ gives different level of approximation.
 * $$m = 0$$: the probability density does not evolve
 * $$m=1$$: it evolves by deterministic drift only.
 * $$m=2$$: it evolves by drift and Brownian motion (Fokker-Planck equation)
 * $$m=\infty$$: the fully exact equation.

Pawula theorem means that if truncating to the second term is not exact, that is, $$\mathcal A_2 \neq \mathcal A$$, then truncating to any term is still not exact. Usually, this means that for any truncation $$\mathcal A_m$$, there exists a probability density function $$\rho$$ that can become negative during its evolution $$\partial_t\rho \approx\mathcal A_m \rho$$ (and thus fail to be a probability density function). However, this doesn't mean that Kramers-Moyal expansions truncated at other choices of $$m$$ is useless. Though the solution must have negative values at least for sufficiently small times, the resulting approximation probability density may still be better than the $$m=2$$ approximation.