User:Benlansdell/particlefilter

Particle filters or Sequential Monte Carlo (SMC) methods are a set of genetic-type particle Monte Carlo methodologies to solve the filtering problem with a set of particles (also called individuals, or samples) to represent the posterior distribution of some stochastic process given some noisy and/or partial observations. The state-space model can be nonlinear and the initial state and noise distributions can take any form required. Particle filter techniques provide a well-established methodology for generating samples from the required distribution without requiring assumptions about the state-space model or the state distributions. However, these methods do not perform well when applied to very high-dimensional systems.

The particle filter methodology is used to solve Hidden Markov Chain (HMM) and nonlinear filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists in estimating the internal states in dynamical systems when partial observations are made, and random perturbations are present in the sensors as well as in the dynamical system. The objective is to compute the conditional probability (a.k.a. posterior distributions)  of the states of some Markov process, given some noisy and partial observations. With the notable exception of linear-Gaussian signal-observation models (Kalman filter) or wider classes of models (Benes filter ) Mireille Chaleyat-Maurel and Dominique Michel proved in 1984 that the sequence of posterior distributions of the random states of the signal given the observations (a.k.a. optimal filter) have no finitely recursive recursion. Various numerical methods based on fixed grid approximations, Markov Chain Monte Carlo techniques (MCMC), conventional linearization, extended Kalman filters, or determining the best linear system (in expect cost-error sense) have never really coped with large scale systems, unstable processes or when the nonlinearities are not sufficiently smooth.

Particle filters and Feynman-Kac particle methodologies find application in signal and image processing, Bayesian inference, machine learning, risk analysis and rare event sampling, engineering and robotics, artificial intelligence, bioinformatics, phylogenetics, computational science, Economics and mathematical finance, molecular chemistry, computational physics, pharmacokinetic and other fields.

Terminology
The term "particle filters" was first coined in 1996 by Del Moral in reference to mean field interacting particle methods used in fluid mechanics since the beginning of the 1960s. The terminology "sequential Monte Carlo" was proposed by Liu and Chen in 1998.

Heuristic like algorithms
From the statistical and probabilistic point of view, particle filters can be interpreted as mean field particle interpretations of Feynman-Kac probability measures. These particle integration techniques were developed in molecular chemistry and computational physics by Theodore E. Harris and Herman Kahn in 1951, Marshall. N. Rosenbluth and Arianna. W. Rosenbluth in 1955 and more recently by Jack H. Hetherington in 1984. In computational physics, these Feynman-Kac type path particle integration methods are also used in Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods. Feynman-Kac interacting particle methods are also strongly related to mutation-selection genetic algorithms currently used in evolutionary computing to solve complex optimization problems.

From the statistical and probabilistic viewpoint, particle filters belong to the class of branching/genetic type algorithms, and mean field type interacting particle methodologies. The interpretations of these particle methods depends on the scientific discipline. In Evolutionary Computing, mean field genetic type particle methodologies are often used as a heuristic and natural search algorithms (a.k.a. Metaheuristic). In computational physics and molecular chemistry they are used to solve Feynman-Kac path integration problems, or the compute Boltzmann-Gibbs measures, top eigenvalues and ground states of Schrödinger operators. In Biology and Genetics they also represent the evolution of a population of individuals or genes in some environment.

The origins of mean field type evolutionary computational techniques can be traced to 1950 and 1954 with the seminal work of Alan Turing on genetic type mutation-selection learning machines and the articles by Nils Aall Barricelli at the Institute for Advanced Study in Princeton, New Jersey. The first trace of particle filters in statistical methodology dates back to the mid-50's; the 'Poor Man's Monte Carlo', that was proposed by Hammersley et al., in 1954, contained hints of the genetic type particle filtering methods used today. In 1963, Nils Aall Barricelli simulated a genetic type algorithm to mimic the ability of individuals to play a simple game. In evolutionary computing literature, genetic type mutation-selection algorithms became popular through the seminal work of John Holland in the early 1970s, and particularly his book published in 1975.

In Biology and Genetics, the Australian geneticist Alex Fraser also published in 1957 a series of papers on the genetic type simulation of artificial selection of organisms. The computer simulation of evolution by biologists became more common in the early 1960s, and the methods were described in books by Fraser and Burnell (1970) and Crosby (1973). Fraser's simulations included all of the essential elements of modern mutation-selection genetic particle algorithms.

From the mathematical viewpoint, the conditional distribution of the random states of a signal given some partial and noisy observations is described by a Feynman-Kac probability on the random trajectories of the signal weighted by a sequence of likelihood potential functions. Quantum Monte Carlo, and more specifically Diffusion Monte Carlo methods can also be interpreted as a mean field genetic type particle approximation of Feynman-Kac path integrals. The origins of Quantum Monte Carlo methods are often attributed to Enrico Fermi and Robert Richtmyer who developed in 1948 a mean field particle interpretation of neutron-chain reactions, but the first heuristic-like and genetic type particle algorithm (a.k.a. Resampled or Reconfiguration Monte Carlo methods) for estimating ground state energies of quantum systems (in reduced matrix models) is due to Jack H. Hetherington in 1984. We also quote an earlier seminal works of Theodore E. Harris and Herman Kahn in particle physics, published in 1951, using mean field but heuristic-like genetic methods for estimating particle transmission energies. In molecular chemistry, the use of genetic heuristic-like particle methodologies (a.k.a. pruning and enrichment strategies) can be traced back to 1955 with the seminal work of Marshall. N. Rosenbluth and Arianna. W. Rosenbluth.

The use of genetic particle algorithms in advanced signal processing and Bayesian inference is more recent. It was in 1993, that Gordon et al., published in their seminal work the first application of genetic type algorithm in Bayesian statistical inference. The authors named their algorithm 'the bootstrap filter', and demonstrated that compared to other filtering methods, their bootstrap algorithm does not require any assumption about that state-space or the noise of the system. We also quote another pioneering article in this field of Genshiro Kitagawa on a related "Monte Carlo filter", and the ones by Pierre Del Moral and Himilcon Carvalho, Pierre Del Moral, André Monin and Gérard Salut on particle filters published in the mid-1990s. Particle filters were also developed in signal processing in the early 1989-1992 by P. Del Moral, J.C. Noyer, G. Rigal, and G. Salut in the LAAS-CNRS in a series of restricted and classified research reports with STCAN (Service Technique des Constructions et Armes Navales), the IT company DIGILOG, and the LAAS-CNRS (the Laboratory for Analysis and Architecture of Systems) on RADAR/SONAR and GPS signal processing problems.

Example
Particle filters implement the prediction-updating transitions of the filtering equation directly by using a genetic type mutation-selection particle algorithm. The samples from the distribution are represented by a set of particles; each particle has a likelihood weight assigned to it that represents the probability of that particle being sampled from the probability density function. Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms; however it can be mitigated by including a resampling step before the weights become too uneven. Several adaptive resampling criteria can be used, including the variance of the weights and the relative entropy w.r.t. the uniform distribution. In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights.

Mathematical foundations
From 1950 to 1996, all the publications on particle filters, genetic algorithms, including the pruning and resample Monte Carlo methods introduced in computational physics and molecular chemistry, present natural and heuristic-like algorithms applied to different situations without a single proof of their consistency, nor a discussion on the bias of the estimates and on genealogical and ancestral tree based algorithms.

The mathematical foundations and the first rigorous analysis of these particle algorithms are due to Pierre Del Moral in 1996. The article also contains a proof of the unbiased properties of a particle approximations of likelihood functions and unnormalized conditional probability measures. The unbiased particle estimator of the likelihood functions presented in this article is used today in Bayesian statistical inference.

Branching type particle methodologies with varying population sizes were also developed toward the end of the 1990s by Dan Crisan, Jessica Gaines and Terry Lyons, and by Dan Crisan, Pierre Del Moral and Terry Lyons. Further developments in this field were developed in 2000 by P. Del Moral, A. Guionnet and L. Miclo. The first central limit theorems are due to Pierre Del Moral and Alice Guionnet in 1999 and Pierre Del Moral and Laurent Miclo in 2000. The first uniform convergence results with respect to the time parameter for particle filters were developed in the end of the 1990s by Pierre Del Moral and Alice Guionnet. The first rigorous analysis of genealogical tree based particle filter smoothers is due to P. Del Moral and L. Miclo in 2001

The theory on Feynman-Kac particle methodologies and related particle filters algorithms has been developed in 2000 and 2004 in the books. These abstract probabilistic models encapsulate genetic type algorithms, particle and bootstrap filters, interacting Kalman filters (a.k.a. Rao–Blackwellized particle filter ), importance sampling and resampling style particle filter techniques, including genealogical tree based and particle backward methodologies for solving filtering and smoothing problems. Other classes of particle filtering methodologies includes genealogical tree based models, backward Markov particle models, adaptive mean field particle models, island type particle models,  and particle Markov chain Monte Carlo methodologies.

Objective
The objective of a particle filter is to estimate the posterior density of the state variables given the observation variables. The particle filter is designed for a hidden Markov Model, where the system consists of hidden and observable variables. The observable variables (observation process) are related to the hidden variables (state-process) by some functional form that is known. Similarly the dynamical system describing the evolution of the state variables is also known probabilistically.

A generic particle filter estimates the posterior distribution of the hidden states using the observation measurement process. Consider a state-space shown in the diagram below.


 * $$\begin{array}{cccccccccc}

X_0&\to &X_1&\to &X_2&\to&X_3&\to &\cdots&\text{signal}\\ \downarrow&&\downarrow&&\downarrow&&\downarrow&&\cdots&\\ Y_0&&Y_1&&Y_2&&Y_3&&\cdots&\text{observation} \end{array}$$ The filtering problem is to estimate sequentially the values of the hidden states $$X_k$$, given the values of the observation process $$Y_0,\cdots,Y_k,$$ at any time step k.

All Bayesian estimates of $$X_k$$ follow from the posterior density p(xk | y0,y1,…,yk). The particle filter methodology provides an approximation of these conditional probabilities using the empirical measure associated with a genetic type particle algorithm. In contrast, the MCMC or importance sampling approach would model the full posterior p(x0,x1,…,xk | y0,y1,…,yk).

The Signal-Observation Model
Particle methods often assume $$X_k$$ and the observations $$Y_k$$ can be modeled in this form:


 * $$X_0, X_1, \cdots$$ is a Markov process on $$\mathbb R^{d_x}$$ (for some $$d_x\geqslant 1$$) that evolves according to the transition probability density $$p(x_k|x_{k-1})$$. This model is also often written in a synthetic way as
 * $$X_k|X_{k-1}=x_{k-1} \sim p(x_k|x_{k-1})$$
 * with an initial probability density $$p(x_0)$$.

An example of system with these properties is:
 * The observations $$Y_0, Y_1, \cdots$$ take values in some state space on $$\mathbb{R}^{d_y}$$ (for some $$d_y\geqslant 1$$) are conditionally independent provided that $$X_0, X_1, \cdots$$ are known. In other words, each $$Y_k$$ only depends on $$X_k$$. In addition, we assume conditional distribution for $$Y_k$$ given $$X_k=x_k$$ are absolutely continuous, and in a synthetic way we have
 * $$Y_k|X_k=x_k \sim p(y_k|x_k)$$


 * $$X_k = g(X_{k-1}) + W_k$$
 * $$Y_k = h(X_k) + V_k$$

where both $$W_k$$ and $$V_k$$ are mutually independent sequences with known probability density functions and g and h are known functions. These two equations can be viewed as state space equations and look similar to the state space equations for the Kalman filter. If the functions g and h in the above example are linear, and if both $$W_k$$ and $$V_k$$ are Gaussian, the Kalman filter finds the exact Bayesian filtering distribution. If not, Kalman filter based methods are a first-order approximation (EKF) or a second-order approximation (UKF in general, but if probability distribution is Gaussian a third-order approximation is possible).

The assumption that the initial distribution and the transitions of the Markov chain are absolutely continuous with respect to the Lebesgue measure can be relaxed. To design a particle filter we simply need to assume that we can sample the transitions $$X_{k-1} \to X_k$$ of the Markov chain $$X_k,$$ and to compute the likelihood function $$x_k\mapsto p(y_k|x_k)$$ (see for instance the genetic selection mutation description of the particle filter given below). The absolutely continuous assumption on the Markov transitions of $$X_k$$ are only used to derive in an informal (and rather abusive) way different formulae between posterior distributions using the Bayes' rule for conditional densities.

Approximate Bayesian Computation models
In some important problems, the conditional distribution of the observations given the random states of the signal may fail to have a density or may be impossible or too complex to compute. In this situation, we need to resort to an additional level of approximation. One strategy is to replace the signal $$X_k$$ by the Markov chain $$\mathcal X_k=\left(X_k,Y_k\right)$$ and to introduce a virtual observation of the form


 * $$\mathcal Y_k=Y_k+\epsilon \mathcal V_k\quad\mbox{for some parameter}\quad\epsilon\in [0,1]$$

for some sequence of independent sequences with known probability density functions. The central idea is to observe that


 * $$\text{Law}\left(X_k|\mathcal Y_0=y_0,\cdots, \mathcal Y_k=y_k\right)\approx_{\epsilon\downarrow 0} \text{Law}\left(X_k|Y_0=y_0,\cdots, Y_k=y_k\right)$$

The particle filter associated with the Markov process $$\mathcal X_k=\left(X_k,Y_k\right)$$ given the partial observations $$\mathcal Y_0=y_0,\cdots, \mathcal Y_k=y_k,$$ is defined in terms of particles evolving in $$\mathbb R^{d_x+d_y}$$ with a likelihood function given with some obvious abusive notation by $$p(\mathcal Y_k|\mathcal X_k)$$. These probabilistic techniques are closely related to Approximate Bayesian Computation (ABC). In the context of particle filters, these ABC particle filtering techniques were introduced in 1998 by P. Del Moral, J. Jacod and P. Protter in the article. They were further developed by P. Del Moral, A. Doucet and A. Jasra.

The nonlinear filtering equation
The Bayes' rule for conditional probability gives:


 * $$p(x_0, \cdots, x_k|y_0,\cdots,y_k) =\frac{p(y_0,\cdots,y_k|x_0, \cdots, x_k) p(x_0,\cdots,x_k)}{p(y_0,\cdots,y_k)}$$

where


 * $$\begin{align}

p(y_0,\cdots,y_k) &=\int p(y_0,\cdots,y_k|x_0,\cdots, x_k) p(x_0,\cdots,x_k) dx_0\cdots dx_k \\ p(y_0,\cdots, y_k|x_0,\cdots ,x_k) &=\prod_{l=0}^{k} p(y_l|x_l) \\ p(x_0,\cdots, x_k) &=p_0(x_0)\prod_{l=0}^{k} p(x_l|x_{l-1}) \end{align}$$

Particle filters are also an approximation, but with enough particles they can be much more accurate. The nonlinear filtering equation is given by the recursion

with the convention $$p(x_0|y_0,\cdots,y_{-1})=p(x_0)$$ for k = 0. The nonlinear filtering problem consists in computing sequentially these sequence of conditional distributions.

Feynman-Kac formulation
We fix a time horizon n and a sequence of observations $$Y_0=y_0,\cdots,Y_n=y_n$$, and for each k = 0, ..., n we set:


 * $$G_k(x_k)=p(y_k|x_k).$$

In this notation, for any bounded function F on the set of trajectories of $$X_k$$ from the origin k = 0 up to time k = n, we have the Feynman-Kac formula


 * $$\begin{align}

\int F(x_0,\cdots,x_n) p(x_0,\cdots,x_n|y_0,\cdots,y_n) dx_0\cdots dx_n &= \frac{\int F(x_0,\cdots,x_n) \left\{\prod\limits_{k=0}^{n} p(y_k|x_k)\right\}p(x_0,\cdots,x_n) dx_0\cdots dx_n}{\int \left\{\prod\limits_{k=0}^{n} p(y_k|x_k)\right\}p(x_0,\cdots,x_n) dx_0\cdots dx_n}\\ &=\frac{E\left(F(X_0,\cdots,X_n)\prod\limits_{k=0}^{n} G_k(X_k)\right)}{E\left(\prod\limits_{k=0}^{n} G_k(X_k)\right)} \end{align}$$

These Feynman-Kac path integration models arise in a variety of scientific disciplines, including in computational physics, biology, information theory and computer sciences. Their interpretations depend on the application domain. For instance, if we choose the indicator function $$G_n(x_n)=1_A(x_n)$$ of some subset of the state space, they represent the conditional distribution of a Markov chain given it stays in a given tube; that is, we have:


 * $$E\left(F(X_0,\cdots,X_n) | X_0\in A, \cdots, X_n\in A\right) =\frac{E\left(F(X_0,\cdots,X_n)\prod\limits_{k=0}^{n} G_k(X_k)\right)}{E\left(\prod\limits_{k=0}^{n} G_k(X_k)\right)}$$

and
 * $$P\left(X_0\in A,\cdots, X_n\in A\right)=E\left(\prod\limits_{k=0}^{n} G_k(X_k)\right)$$

as soon as the normalizing constant is strictly positive.

A Genetic type particle algorithm
Initially we start with N independent random variable $$\left(\xi^i_0\right)_{1\leqslant i\leqslant N}$$ with common probability density $$p(x_0)$$. The genetic algorithm selection-mutation transitions


 * $$\xi_k:=\left(\xi^i_{k}\right)_{1\leqslant i\leqslant N}\stackrel{\text{selection}}{\longrightarrow} \widehat{\xi}_k:=\left(\widehat{\xi}^i_{k}\right)_{1\leqslant i\leqslant N}\stackrel{\text{mutation}}{\longrightarrow} \xi_{k+1}:=\left(\xi^i_{k+1}\right)_{1\leqslant i\leqslant N}$$

mimic/approximate the updating-prediction transitions of the optimal filter evolution ($$):


 * During the selection-updating transition we sample N (conditionally) independent random variables $$\widehat{\xi}_k:=\left(\widehat{\xi}^i_{k}\right)_{1\leqslant i\leqslant N}$$ with common (conditional) distribution
 * $$\sum_{i=1}^N \frac{p(y_k|\xi^i_k)}{\sum_{j=1}^Np(y_k|\xi^j_k)} \delta_{\xi^i_k}(dx_k)$$


 * During the mutation-prediction transition, from each selected particle $$\widehat{\xi}^i_k$$ we sample independently a transition
 * $$\widehat{\xi}^i_k \longrightarrow\xi^i_{k+1} \sim p(x_{k+1}|\widehat{\xi}^i_k), \qquad i=1,\cdots,N.$$

In the above displayed formulae $$p(y_k|\xi^i_k)$$ stands for the likelihood function $$x_k\mapsto p(y_k|x_k)$$ evaluated at $$x_k=\xi^i_k$$, and $$p(x_{k+1}|\widehat{\xi}^i_k)$$ stands for the conditional density $$p(x_{k+1}|x_k)$$ evaluated at $$x_k=\widehat{\xi}^i_k$$.

At each time k, we have the particle approximations


 * $$\widehat{p}(dx_k|y_0,\cdots,y_k):=\frac{1}{N} \sum_{i=1}^N \delta_{\widehat{\xi}^i_k} (dx_k) \approx_{N\uparrow\infty} p(dx_k|y_0,\cdots,y_k) \approx_{N\uparrow\infty}

\sum_{i=1}^N \frac{p(y_k|\xi^i_k)}{\sum_{i=1}^N p(y_k|\xi^j_k)} \delta_{\xi^i_k}(dx_k)$$

and


 * $$\widehat{p}(dx_k|y_0,\cdots,y_{k-1}):=\frac{1}{N}\sum_{i=1}^N \delta_{\xi^i_k}(dx_k) \approx_{N\uparrow\infty} p(dx_k|y_0,\cdots,y_{k-1})$$

A detailed proof of these convergence results can be found in, see also the more recent developments provided in the books. In Genetic algorithms and Evolutionary computing community, the mutation-selection Markov chain described above is often called the genetic algorithm with proportional selection. Several branching variants, including with random population sizes have also been proposed in the articles.

Monte Carlo principles
Particle methods, like all sampling-based approaches (e.g., MCMC), generate a set of samples that approximate the filtering density


 * $$p(x_k|y_0, \cdots, y_k).$$

For example, we may have N samples from the approximate posterior distribution of $$X_k$$, where the samples are labeled with superscripts as


 * $$\widehat{\xi}_k^1, \cdots, \widehat{\xi}_k^{N}.$$

Then, expectations with respect to the filtering distribution are approximated by

with


 * $$\widehat{p}(dx_k|y_0,\cdots,y_k)=\frac{1}{N}\sum_{i=1}^N \delta_{\widehat{\xi}^i_k}(dx_k)$$

where $$\delta_a$$ stands for the Dirac measure at a given state a. The function f, in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some degree of approximation. When the approximation equation ($$) is satisfied for any bounded function f we write


 * $$p(dx_k|y_0,\cdots,y_k):=p(x_k|y_0,\cdots,y_k) dx_k \approx_{N\uparrow\infty} \widehat{p}(dx_k|y_0,\cdots,y_k)=\frac{1}{N}\sum_{i=1}^N \delta_{\widehat{\xi}^{i}_k}(dx_k)$$

Particle filters can be interpreted as a genetic type particle algorithm evolving with mutation and selection transitions. We can keep track of the ancestral lines


 * $$\left(\widehat{\xi}^{i}_{0,k}, \widehat{\xi}^{i}_{1,k},\cdots,\widehat{\xi}^{i}_{k-1,k},\widehat{\xi}^i_{k,k}\right)$$

of the particles $$i=1,\cdots,N$$. The random states $$\widehat{\xi}^{i}_{l,k}$$, with the lower indices l=0,...,k, stands for the ancestor of the individual $$\widehat{\xi}^{i}_{k,k}=\widehat{\xi}^i_k$$ at level l=0,...,k. In his situation, we have the approximation formula

with the empirical measure


 * $$\widehat{p}(d(x_0,\cdots,x_k)|y_0,\cdots,y_k):=\frac{1}{N}\sum_{i=1}^N \delta_{\left(\widehat{\xi}^{i}_{0,k},\widehat{\xi}^{i}_{1,k},\cdots,\widehat{\xi}^{i}_{k,k}\right)}(d(x_0,\cdots,x_k))$$

Here F stands for any founded function on the path space of the signal. In a more synthetic form ($$) is equivalent to


 * $$\begin{align}

p(d(x_0,\cdots,x_k)|y_0,\cdots,y_k)&:=p(x_0,\cdots,x_k|y_0,\cdots,y_k) \, dx_0\cdots dx_k \\ &\approx_{N\uparrow\infty} \widehat{p}(d(x_0,\cdots,x_k)|y_0,\cdots,y_k) \\ &:=\frac{1}{N}\sum_{i=1}^N \delta_{\left(\widehat{\xi}^{i}_{0,k}, \cdots,\widehat{\xi}^{i}_{k,k}\right)}(d(x_0,\cdots,x_k)) \end{align}$$

Particle filters can be interpreted in many different ways. From the probabilistic point of view they coincide with a mean field particle interpretation of the nonlinear filtering equation. The updating-prediction transitions of the optimal filter evolution can also be interpreted as the classical genetic type selection-mutation transitions of individuals. The sequential importance resampling technique provides another interpretation of the filtering transitions coupling importance sampling with the bootstrap resampling step. Last, but not least, particle filters can be seen as an acceptance-rejection methodology equipped with a recycling mechanism.

The general probabilistic principle
The nonlinear filtering evolution can be interpreted as a dynamical system in the set of probability measures of the following form $$\eta_{n+1}=\Phi_{n+1}\left(\eta_{n}\right)$$ where $$\Phi_{n+1}$$ stands for some mapping from the set of probability distribution into itself. For instance, the evolution of the one-step optimal predictor $$ \eta_n(dx_n) =p(x_n|y_0,\cdots,y_{n-1})dx_n$$

satisfies a nonlinear evolution starting with the probability distribution $$\eta_0(dx_0)=p(x_0)dx_0$$. One of the simplest way to approximate these probability measures is to start with N independent random variables $$\left(\xi^i_0\right)_{1\leqslant i\leqslant N}$$ with common probability distribution $$\eta_0(dx_0)=p(x_0)dx_0$$. Suppose we have defined a sequence of N random variables $$\left(\xi^i_n\right)_{1\leqslant i\leqslant N}$$ such that


 * $$\frac{1}{N}\sum_{i=1}^N \delta_{\xi^i_n}(dx_n) \approx_{N\uparrow\infty} \eta_n(dx_n)$$

At the next step we sample N (conditionally) independent random variables $$\xi_{n+1}:=\left(\xi^i_{n+1}\right)_{1\leqslant i\leqslant N}$$ with common law.


 * $$\Phi_{n+1}\left(\frac{1}{N}\sum_{i=1}^N \delta_{\xi^i_n}\right) \approx_{N\uparrow\infty} \Phi_{n+1}\left(\eta_{n}\right)=\eta_{n+1}$$

A particle interpretation of the filtering equation
We illustrate this mean field particle principle in the context of the evolution of the one step optimal predictors

For k = 0 we use the convention $$p(x_0|y_0,\cdots,y_{-1}):=p(x_0)$$.

By the law of large numbers, we have


 * $$\widehat{p}(dx_0)=\frac{1}{N}\sum_{i=1}^N \delta_{\xi^{i}_0}(dx_0)\approx_{N\uparrow\infty} p(x_0)dx_0$$

in the sense that


 * $$\int f(x_0)\widehat{p}(dx_0)=\frac{1}{N}\sum_{i=1}^N f(\xi^i_0)\approx_{N\uparrow\infty} \int f(x_0)p(dx_0)dx_0$$

for any bounded function $$f$$. We further assume that we have constructed a sequence of particles $$\left(\xi^i_k\right)_{1\leqslant i\leqslant N}$$ at some rank k such that


 * $$\widehat{p}(dx_k|y_0,\cdots,y_{k-1}):=\frac{1}{N}\sum_{i=1}^N \delta_{\xi^{i}_k}(dx_k)\approx_{N\uparrow\infty}~p(x_k~|~y_0,\cdots,y_{k-1})dx_k$$

in the sense that for any bounded function $$f$$ we have


 * $$\int f(x_k)\widehat{p}(dx_k|y_0,\cdots,y_{k-1})=\frac{1}{N}\sum_{i=1}^N f(\xi^i_k)\approx_{N\uparrow\infty} \int f(x_k)p(dx_k|y_0,\cdots,y_{k-1})$$

In this situation, replacing by the empirical measure  in the evolution equation of the one-step optimal filter stated in ($$) we find that


 * $$p(x_{k+1}|y_0,\cdots,y_k)\approx_{N\uparrow\infty} \int p(x_{k+1}|x'_{k}) \frac{p(y_k|x_k') \widehat{p}(dx'_k|y_0,\cdots,y_{k-1})}{ \int p(y_k|x_k) \widehat{p}(dx_k|y_0,\cdots,y_{k-1})}$$

Notice that the right hand side in the above formula is a weighted probability mixture


 * $$\int p(x_{k+1}|x'_{k}) \frac{p(y_k|x_k') \widehat{p}(dx'_k|y_0,\cdots,y_{k-1})}{\int p(y_k|x_k) \widehat{p}(dx_k|y_0,\cdots,y_{k-1})}=\sum_{i=1}^N \frac{p(y_k|\xi^i_k)}{\sum_{i=1}^N p(y_k|\xi^j_k)} p(x_{k+1}|\xi^i_k)=:\widehat{q}(x_{k+1}|y_0,\cdots,y_k)$$

where $$p(y_k|\xi^i_k)$$ stands for the density $$p(y_k|x_k)$$ evaluated at $$x_k=\xi^i_k$$, and $$p(x_{k+1}|\xi^i_k)$$ stands for the density $$p(x_{k+1}|x_k)$$ evaluated at $$x_k=\xi^i_k$$ for $$i=1,\cdots,N.$$

Then, we sample N independent random variable $$\left(\xi^i_{k+1}\right)_{1\leqslant i\leqslant N}$$ with common probability density $$\widehat{q}(x_{k+1}|y_0,\cdots,y_k)$$ so that


 * $$\widehat{p}(dx_{k+1}|y_0,\cdots,y_{k}):=\frac{1}{N}\sum_{i=1}^N \delta_{\xi^{i}_{k+1}}(dx_{k+1})\approx_{N\uparrow\infty} \widehat{q}(x_{k+1}|y_0,\cdots,y_{k}) dx_{k+1} \approx_{N\uparrow\infty} p(x_{k+1}|y_0,\cdots,y_{k})dx_{k+1}$$

Iterating this procedure, we design a Markov chain such that


 * $$\widehat{p}(dx_k|y_0,\cdots,y_{k-1}):=\frac{1}{N}\sum_{i=1}^N \delta_{\xi^i_k}(dx_k) \approx_{N\uparrow\infty} p(dx_k|y_0,\cdots,y_{k-1}):=p(x_k|y_0,\cdots,y_{k-1}) dx_k$$

Notice that the optimal filter is approximated at each time step k using the Bayes' formulae


 * $$p(dx_{k}|y_0,\cdots,y_{k}) \approx_{N\uparrow\infty} \frac{p(y_{k}|x_{k}) \widehat{p}(dx_{k}|y_0,\cdots,y_{k-1})}{\int p(y_{k}|x'_{k})\widehat{p}(dx'_{k}|y_0,\cdots,y_{k-1})}=\sum_{i=1}^N \frac{p(y_k|\xi^i_k)}{\sum_{j=1}^Np(y_k|\xi^j_k)}~\delta_{\xi^i_k}(dx_k)$$

The terminology "mean field approximation" comes from the fact that we replace at each time step the probability measure $$p(dx_k|y_0,\cdots,y_{k-1})$$ by the empirical approximation $$\widehat{p}(dx_k|y_0,\cdots,y_{k-1})$$. The mean field particle approximation of the filtering problem is far from being unique. Several strategies are developed in the books.

Some convergence results
The analysis of the convergence of particle filters has been started in 1996 and in 2000 in the book and the series of articles. More recent developments can be found in the books, When the filtering equation is stable (in the sense that it corrects any erroneous initial condition), the bias and the variance of the particle particle estimates


 * $$I_k(f):=\int f(x_k) p(dx_k|y_0,\cdots,y_{k-1}) \approx_{N\uparrow\infty} \widehat{I}_k(f):=\int f(x_k) \widehat{p}(dx_k|y_0,\cdots,y_{k-1})$$

are controlled by the non asymptotic uniform estimates


 * $$\sup_{k\geqslant 0}\left\vert E\left(\widehat{I}_k(f)\right)-I_k(f)\right\vert\leqslant \frac{c_1}{N}$$
 * $$\sup_{k\geqslant 0}E\left(\left[\widehat{I}_k(f)-I_k(f)\right]^2\right)\leqslant \frac{c_2}{N}$$

for any function f bounded by 1, and for some finite constants $$c_1,c_2.$$ In addition, for any $$x\geqslant 0$$:


 * $$\mathbf{P} \left ( \left| \widehat{I}_k(f)-I_k(f)\right|\leqslant c_1 \frac{x}{N}+c_2 \sqrt{\frac{x}{N}}\land \sup_{0\leqslant k\leqslant n}\left| \widehat{I}_k(f)-I_k(f)\right|\leqslant c \sqrt{\frac{x\log(n)}{N}} \right ) > 1-e^{-x}$$

for some finite constants $$c_1, c_2$$ related to the asymptotic bias and variance of the particle estimate, and some finite constant c. The same results are satisfied if we replace the one step optimal predictor by the optimal filter approximation.

Genealogical tree based particle smoothing
As shown in the evolution of the genealogical tree coincides with a mean field particle interpretation of the evolution equations associated with the posterior densities of the signal trajectories. For more details on these path space models, we refer to the books.

Unbiased particle estimates of likelihood functions
We use the product formula


 * $$p(y_0,\cdots,y_n)=\prod_{k=0}^n p(y_k|y_0,\cdots,y_{k-1})$$

with


 * $$p(y_k|y_0,\cdots,y_{k-1})=\int p(y_k|x_k) p(dx_k|y_0,\cdots,y_{k-1})$$

and the conventions $$p(y_0|y_0,\cdots,y_{-1})=p(y_0)$$ and $$p(x_0|y_0,\cdots,y_{-1})=p(x_0),$$ for k = 0. Replacing $$p(x_k|y_0,\cdots,y_{k-1})dx_k$$ by the empirical approximation


 * $$\widehat{p}(dx_k|y_0,\cdots,y_{k-1}):=\frac{1}{N}\sum_{i=1}^N \delta_{\xi^i_k}(dx_k) \approx_{N\uparrow\infty} p(dx_k|y_0,\cdots,y_{k-1})$$

in the above displayed formula, we design the following unbiased particle approximation of the likelihood function


 * $$p(y_0,\cdots,y_n) \approx_{N\uparrow\infty} \widehat{p}(y_0,\cdots,y_n)=\prod_{k=0}^n \widehat{p}(y_k|y_0,\cdots,y_{k-1}) $$

with


 * $$\widehat{p}(y_k|y_0,\cdots,y_{k-1})=\int p(y_k|x_k) \widehat{p}(dx_k|y_0,\cdots,y_{k-1})=\frac{1}{N}\sum_{i=1}^N p(y_k|\xi^i_k)$$

where $$p(y_k|\xi^i_k)$$ stands for the density $$p(y_k|x_k)$$ evaluated at $$x_k=\xi^i_k$$. The design of this particle estimate and the unbiasedness property has been proved in 1996 in the article. Refined variance estimates can be found in and.

Some convergence results
We shall assume that filtering equation is stable, in the sense that it corrects any erroneous initial condition.

In this situation, the particle approximations of the likelihood functions are unbiased and the relative variance is controlled by


 * $$E\left(\widehat{p}(y_0,\cdots,y_n)\right)= p(y_0,\cdots,y_n), \qquad E\left(\left[\frac{\widehat{p}(y_0,\cdots,y_n)}{p(y_0,\cdots,y_n)}-1\right]^2\right)\leqslant \frac{cn}{N},$$

for some finite constant c. In addition, for any $$x\geqslant 0$$:


 * $$\mathbf{P} \left ( \left\vert \frac{1}{n}\log{\widehat{p}(y_0,\cdots,y_n)}-\frac{1}{n}\log{\widehat{p}(y_0,\cdots,y_n)}\right\vert \leqslant c_1 \frac{x}{N}+c_2 \sqrt{\frac{x}{N}} \right ) > 1-e^{-x} $$

for some finite constants $$c_1, c_2$$ related to the asymptotic bias and variance of the particle estimate, and for some finite constant c.

The bias and the variance of the particle particle estimates based on the ancestral lines of the genealogical trees


 * $$\begin{align}

I^{path}_k(F) &:=\int F(x_0,\cdots,x_k) p(d(x_0,\cdots,x_k)|y_0,\cdots,y_{k-1}) \\ &\approx_{N\uparrow\infty} \widehat{I}^{path}_k(F) \\ &:=\int F(x_0,\cdots,x_k) \widehat{p}(d(x_0,\cdots,x_k)|y_0,\cdots,y_{k-1}) \\ &=\frac{1}{N}\sum_{i=1}^N F\left(\xi^i_{0,k},\cdots,\xi^i_{k,k}\right) \end{align}$$

are controlled by the non asymptotic uniform estimates


 * $$\left| E\left(\widehat{I}^{path}_k(F)\right)-I_k^{path}(F)\right|\leqslant \frac{c_1 k}{N}, \qquad E\left(\left[\widehat{I}^{path}_k(F)-I_k^{path}(F)\right]^2\right)\leqslant \frac{c_2 k}{N},$$

for any function F bounded by 1, and for some finite constants $$c_1, c_2.$$ In addition, for any $$x\geqslant 0$$:


 * $$\mathbf{P} \left ( \left| \widehat{I}^{path}_k(F)-I_k^{path}(F)\right | \leqslant c_1 \frac{kx}{N}+c_2 \sqrt{\frac{kx}{N}} \land \sup_{0\leqslant k\leqslant n}\left| \widehat{I}_k^{path}(F)-I^{path}_k(F)\right| \leqslant c \sqrt{\frac{xn\log(n)}{N}} \right ) > 1-e^{-x}$$

for some finite constants $$c_1, c_2$$ related to the asymptotic bias and variance of the particle estimate, and for some finite constant c. The same type of bias and variance estimates hold for the backward particle smoothers. For additive functionals of the form


 * $$\overline{F}(x_0,\cdots,x_n):=\frac{1}{n+1}\sum_{0\leqslant k\leqslant n}f_k(x_k)$$

with


 * $$I^{path}_n(\overline{F}) \approx_{N\uparrow\infty} I^{\flat, path}_n(\overline{F}):=\int \overline{F}(x_0,\cdots,x_n) \widehat{p}_{backward}(d(x_0,\cdots,x_n)|(y_0,\cdots,y_{n-1}))$$

with functions $$f_k$$ bounded by 1, we have


 * $$\sup_{n\geqslant 0}{\left\vert E\left(\widehat{I}^{\flat,path}_n(\overline{F})\right)-I_n^{path}(\overline{F})\right\vert} \leqslant \frac{c_1}{N}$$

and


 * $$E\left(\left[\widehat{I}^{\flat,path}_n(F)-I_n^{path}(F)\right]^2\right)\leqslant \frac{c_2}{nN}+ \frac{c_3}{N^2}$$

for some finite constants $$c_1,c_2,c_3.$$ More refined estimates including exponentially small probability of errors are developed in.

The bootstrap filter
Sequential importance sampling (SIS), the original bootstrap filtering algorithm (Gordon et al. 1993), is also a very commonly used filtering algorithm, which approximates the filtering probability density $$p(x_k|y_0,\cdots,y_k)$$ by a weighted set of N samples


 * $$ \left \{ \left (w^{(i)}_k,x^{(i)}_k \right ) \ : \ i\in\{1,\cdots,N\} \right \}.$$

The importance weights $$w^{(i)}_k$$ are approximations to the relative posterior probabilities (or densities) of the samples such that


 * $$\sum_{i=1}^N w^{(i)}_k = 1.$$

SIS is a sequential (i.e., recursive) version of importance sampling. As in importance sampling, the expectation of a function f can be approximated as a weighted average


 * $$ \int f(x_k) p(x_k|y_0,\dots,y_k) dx_k \approx \sum_{i=1}^N w_k^{(i)} f(x_k^{(i)}).$$

For a finite set of samples, the algorithm performance is dependent on the choice of the proposal distribution


 * $$\pi(x_k|x_{0:k-1},y_{0:k})\, $$.

The "optimal" proposal distribution is given as the target distribution
 * $$\pi(x_k|x_{0:k-1},y_{0:k}) = p(x_k|x_{k-1},y_{k})=\frac{p(y_k|x_k)}{\int p(y_k|x_k)p(x_k|x_{k-1})dx_k}~p(x_k|x_{k-1}).$$

This particular choice of proposal transition has been proposed by P. Del Moral in in 1996 and 1998. When it is difficult to sample transitions according to the distribution $$ p(x_k|x_{k-1},y_{k})$$ one natural strategy is to use the following particle approximation


 * $$\begin{align}

\frac{p(y_k|x_k)}{\int p(y_k|x_k)p(x_k|x_{k-1})dx_k} p(x_k|x_{k-1})dx_k &\simeq_{N\uparrow\infty} \frac{p(y_k|x_k)}{\int p(y_k|x_k)\widehat{p}(dx_k|x_{k-1})} \widehat{p}(dx_k|x_{k-1}) \\ &= \sum_{i=1}^N \frac{p(y_k|X^i_k(x_{k-1}))}{\sum_{j=1}^N p(y_k|X^j_k(x_{k-1}))} \delta_{X^i_k(x_{k-1})}(dx_k) \end{align}$$

with the empirical approximation


 * $$ \widehat{p}(dx_k|x_{k-1})= \frac{1}{N}\sum_{i=1}^{N} \delta_{X^i_k(x_{k-1})}(dx_k)~\simeq_{N\uparrow\infty} p(x_k|x_{k-1})dx_k $$

associated with N (or any other large number of samples) independent random samples $$X^i_k(x_{k-1}), i=1,\cdots,N $$with the conditional distribution of the random state $$X_k$$ given $$X_{k-1}=x_{k-1}$$. The consistency of the resulting particle filter of this approximation and other extensions are developed in. In the above display $$\delta_a$$ stands for the Dirac measure at a given state a.

However, the transition prior probability distribution is often used as importance function, since it is easier to draw particles (or samples) and perform subsequent importance weight calculations:
 * $$\pi(x_k|x_{0:k-1},y_{0:k}) = p(x_k|x_{k-1}).$$

Sequential Importance Resampling (SIR) filters with transition prior probability distribution as importance function are commonly known as bootstrap filter and condensation algorithm.

Resampling is used to avoid the problem of degeneracy of the algorithm, that is, avoiding the situation that all but one of the importance weights are close to zero. The performance of the algorithm can be also affected by proper choice of resampling method. The stratified sampling proposed by Kitagawa (1996) is optimal in terms of variance.

A single step of sequential importance resampling is as follows:


 * 1) For $$i=1,\cdots,N$$ draw samples from the proposal distribution
 * $$x^{(i)}_k \sim \pi(x_k|x^{(i)}_{0:k-1},y_{0:k})$$


 * 2) For $$i=1,\cdots,N$$ update the importance weights up to a normalizing constant:
 * $$\hat{w}^{(i)}_k = w^{(i)}_{k-1} \frac{p(y_k|x^{(i)}_k) p(x^{(i)}_k|x^{(i)}_{k-1})} {\pi(x_k^{(i)}|x^{(i)}_{0:k-1},y_{0:k})}.$$
 * Note that when we use the transition prior probability distribution as the importance function,
 * $$ \pi(x_k^{(i)}|x^{(i)}_{0:k-1},y_{0:k}) = p(x^{(i)}_k|x^{(i)}_{k-1}),$$
 * this simplifies to the following :
 * $$ \hat{w}^{(i)}_k = w^{(i)}_{k-1} p(y_k|x^{(i)}_k), $$


 * 3) For $$i=1,\cdots,N$$ compute the normalized importance weights:
 * $$w^{(i)}_k = \frac{\hat{w}^{(i)}_k}{\sum_{j=1}^N \hat{w}^{(j)}_k}$$


 * 4) Compute an estimate of the effective number of particles as
 * $$\hat{N}_\mathit{eff} = \frac{1}{\sum_{i=1}^N\left(w^{(i)}_k\right)^2} $$
 * This criterion reflects the variance of the weights, other criteria can be found in the article, including their rigorous analysis and central limit theorems.


 * 5) If the effective number of particles is less than a given threshold $$\hat{N}_\mathit{eff} < N_{thr}$$, then perform resampling:
 * a) Draw N particles from the current particle set with probabilities proportional to their weights. Replace the current particle set with this new one.
 * b) For $$i=1,\cdots,N$$ set $$w^{(N)}_k = 1/N.$$

The term Sampling Importance Resampling is also sometimes used when referring to SIR filters.

Sequential importance sampling (SIS)

 * Is the same as sequential importance resampling, but without the resampling stage.

Other particle filters

 * Exponential Natural Particle Filter
 * Auxiliary particle filter
 * Regularized auxiliary particle filter
 * Gaussian particle filter
 * Unscented particle filter
 * Gauss–Hermite particle filter
 * Cost Reference particle filter
 * Hierarchical/Scalable particle filter
 * Rao–Blackwellized particle filter
 * Rejection-sampling based optimal particle filter
 * Feynman-Kac and mean field particle methodologies