Single-particle trajectory

Single-particle trajectories (SPTs) consist of a collection of successive discrete points causal in time. These trajectories are acquired from images in experimental data. In the context of cell biology, the trajectories are obtained by the transient activation by a laser of small dyes attached to a moving molecule.

Molecules can now by visualized based on recent super-resolution microscopy, which allow routine collections of thousands of short and long trajectories. These trajectories explore part of a cell, either on the membrane or in 3 dimensions and their paths are critically influenced by the local crowded organization and molecular interaction inside the cell, as emphasized in various cell types such as neuronal cells, astrocytes, immune cells and many others.

SPTs allow observing moving molecules inside cells to collect statistics
SPT allowed observing moving particles. These trajectories are used to investigate cytoplasm or membrane organization, but also the cell nucleus dynamics, remodeler dynamics or mRNA production. Due to the constant improvement of the instrumentation, the spatial resolution is continuously decreasing, reaching now values of approximately 20 nm, while the acquisition time step is usually in the range of 10 to 50 ms to capture short events occurring in live tissues. A variant of super-resolution microscopy called sptPALM is used to detect the local and dynamically changing organization of molecules in cells, or events of DNA binding by transcription factors in mammalian nucleus. Super-resolution image acquisition and particle tracking are crucial to guarantee a high quality data

Assembling points into a trajectory based on tracking algorithms
Once points are acquired, the next step is to reconstruct a trajectory. This step is done known tracking algorithms to connect the acquired points. Tracking algorithms are based on a physical model of trajectories perturbed by an additive random noise.

Extract physical parameters from redundant SPTs
The redundancy of many short (SPTs) is a key feature to extract biophysical information parameters from empirical data at a molecular level. In contrast, long isolated trajectories have been used to extract information along trajectories, destroying the natural spatial heterogeneity associated to the various positions. The main statistical tool is to compute the mean-square displacement (MSD) or second order statistical moment:


 * $$\langle|X(t+\Delta t)- X(t)|^2\rangle \sim t^\alpha$$ (average over realizations), where $$\alpha$$ is the called the anomalous exponent.

For a Brownian motion, $$\langle|X(t+\Delta t)- X(t)|^2\rangle=2 n Dt$$, where D is the diffusion coefficient, n is dimension of the space. Some other properties can also be recovered from long trajectories, such as the radius of confinement for a confined motion. The MSD has been widely used in early applications of long but not necessarily redundant single-particle trajectories in a biological context. However, the MSD applied to long trajectories suffers from several issues. First, it is not precise in part because the measured points could be correlated. Second, it cannot be used to compute any physical diffusion coefficient when trajectories consists of switching episodes for example alternating between free and confined diffusion. At low spatiotemporal resolution of the observed trajectories, the MSD behaves sublinearly with time, a process known as anomalous diffusion, which is due in part to the averaging of the different phases of the particle motion. In the context of cellular transport (ameoboid), high resolution motion analysis of long SPTs in micro-fluidic chambers containing obstacles revealed different types of cell motions. Depending on the obstacle density: crawling was found at low density of obstacles and directed motion and random phases can even be differentiated.

Langevin and Smoluchowski equations as a model of motion
Statistical methods to extract information from SPTs are based on stochastic models, such as the Langevin equation or its Smoluchowski's limit and associated models that account for additional localization point identification noise or memory kernel. The Langevin equation describes a stochastic particle driven by a Brownian force $$\Xi$$ and a field of force (e.g., electrostatic, mechanical, etc.) with an expression $$F(x,t)$$:


 * $$m\ddot x+\Gamma \dot x-F(x,t)=\Xi,$$

where m is the mass of the particle and $$\Gamma= 6\pi a \rho$$is the friction coefficient of a diffusing particle, $$\rho$$ the viscosity. Here $$\Xi$$ is the $$\delta$$-correlated Gaussian white noise. The force can derived from a potential well U so that $$F(x,t)=- U'(x)$$ and in that case, the equation takes the form


 * $$m\frac{d^2 x}{dt^2} +\Gamma \frac{d x}{dt} +\nabla U(x)=\sqrt{2\varepsilon\gamma}\,\frac{d\eta}{dt},$$

where $$\varepsilon=k_\text{B} T,$$ is the energy and $$k_\text{B}$$ the Boltzmann constant and T the temperature. Langevin's equation is used to describe trajectories where inertia or acceleration matters. For example, at very short timescales, when a molecule unbinds from a binding site or escapes from a potential well and the inertia term allows the particles to move away from the attractor and thus prevents immediate rebinding that could plague numerical simulations.

In the large friction limit $$\gamma\to\infty$$ the trajectories $$x(t)$$ of the Langevin equation converges in probability to those of the Smoluchowski's equation


 * $$\gamma \dot{x}+U^\prime (x)=\sqrt{2\varepsilon\gamma}\,\dot{w},$$

where $$\dot w(t) $$ is $$\delta$$-correlated. This equation is obtained when the diffusion coefficient is constant in space. When this is not case, coarse grained equations (at a coarse spatial resolution) should be derived from molecular considerations. Interpretation of the physical forces are not resolved by Ito's vs Stratonovich integral representations or any others.

General model equations
For a timescale much longer than the elementary molecular collision, the position of a tracked particle is described by a more general overdamped limit of the Langevin stochastic model. Indeed, if the acquisition timescale of empirical recorded trajectories is much lower compared to the thermal fluctuations, rapid events are not resolved in the data. Thus at this coarser spatiotemporal scale, the motion description is replaced by an effective stochastic equation


 * $$\dot{X}(t)={b}(X(t)) +\sqrt{2}{B}_e(X(t))\dot{w}(t), \qquad\qquad (1) $$

where $${b}(X) $$ is the drift field and $${B}_e $$the diffusion matrix. The effective diffusion tensor can vary in space $$D(X)=\frac{1}{2} B(X) B^T X^T$$ ($X^T $  denotes the transpose of $ X $ ). This equation is not derived but assumed. However the diffusion coefficient should be smooth enough as any discontinuity in D should be resolved by a spatial scaling to analyse the source of discontinuity (usually inert obstacles or transitions between two medias). The observed effective diffusion tensor is not necessarily isotropic and can be state-dependent, whereas the friction coefficient $$\gamma$$ remains constant as long as the medium stays the same and the microscopic diffusion coefficient (or tensor) could remain isotropic.

Statistical analysis of these trajectories
The development of statistical methods are based on stochastic models, a possible deconvolution procedure applied to the trajectories. Numerical simulations could also be used to identify specific features that could be extracted from single-particle trajectories data. The goal of building a statistical ensemble from SPTs data is to observe local physical properties of the particles, such as velocity, diffusion, confinement or attracting forces reflecting the interactions of the particles with their local nanometer environments. It is possible to use stochastic modeling to construct from diffusion coefficient (or tensor) the confinement or local density of obstacles reflecting the presence of biological objects of different sizes.

Empirical estimators for the drift and diffusion tensor of a stochastic process
Several empirical estimators have been proposed to recover the local diffusion coefficient, vector field and even organized patterns in the drift, such as potential wells. The construction of empirical estimators that serve to recover physical properties from parametric and non-parametric statistics. Retrieving statistical parameters of a diffusion process from one-dimensional time series statistics use the first moment estimator or Bayesian inference.

The models and the analysis assume that processes are stationary, so that the statistical properties of trajectories do not change over time. In practice, this assumption is satisfied when trajectories are acquired for less than a minute, where only few slow changes may occur on the surface of a neuron for example. Non stationary behavior are observed using a time-lapse analysis, with a delay of tens of minutes between successive acquisitions.

The coarse-grained model Eq. 1 is recovered from the conditional moments of the trajectory by computing the increments $$\Delta X= X(t+\Delta t)- X(t)$$:


 * $$a( x)=\lim_{\Delta t \rightarrow 0} \frac{E[\Delta X(t)\mid X(t)= x]}{\Delta t},$$


 * $$D( x)=\lim_{\Delta t \rightarrow 0} \frac{E[\Delta X(t)^T\,\Delta X(t)\mid X(t)= x]}{2\,\Delta t}.$$

Here the notation $$E[\cdot\,|\, X(t)= x]$$means averaging over all trajectories that are at point x at time t. The coefficients of the Smoluchowski equation can be statistically estimated at each point x from an infinitely large sample of its trajectories in the neighborhood of the point x at time t.

Empirical estimation
In practice, the expectations for a and D are estimated by finite sample averages and$$\Delta t$$ is the time-resolution of the recorded trajectories. Formulas for a and D are approximated at the time step $$\Delta t$$, where for tens to hundreds of points falling in any bin. This is usually enough for the estimation.

To estimate the local drift and diffusion coefficients, trajectories are first grouped within a small neighbourhood. The field of observation is partitioned  into square bins $$S( x_k,r)$$of side r and centre $$x_k$$ and the local drift and diffusion are estimated for each of the square. Considering a sample with $$N_t$$  trajectories $$\{x^i(t_1),\dots, x^i(t_{N_s}) \},$$ where $$t_j$$ are the sampling times, the discretization of equation for the drift $$a(x_k)=(a_x(x_k),a_y(x_k))$$at position $$x_k$$ is given for each spatial projection on the x and y axis by


 * $$a_x(x_k) \approx \frac{1}{N_k}\sum_{j=1}^{N_t} \sum_{i=0, \tilde x^j_i\in S(x_k,r)}^{N_s-1}\left(\frac{ x^j_{i+1}- x^j_i}{\Delta t} \right)$$


 * $$a_y(x_k) \approx \frac{1}{N_k}\sum_{j=1}^{N_t}\sum_{i=0, \tilde x^j_i\in S(x_k,r)}^{N_s-1} \left(\frac{ y^j_{i+1}- y^j_i}{\Delta t}\right),$$

where $$N_k$$is the number of points of trajectory that fall in the square $$S( x_k,r)$$. Similarly, the components of the effective diffusion tensor $$D( x_k)$$ are approximated by the empirical sums


 * $$D_{xx}(x_k) \approx \frac{1}{N_k} \sum_{j=1}^{N_t} \sum_{i=0, x_i\in S(x_k,r)}^{N_s-1} \frac{(x^j_{i+1}-x^j_i)^2} {2\,\Delta t},$$


 * $$D_{yy}(x_k) \approx \frac{1}{N_k} \sum_{j=1}^{N_t} \sum_{i=0,x_i\in S(x_k,r)}^{N_s-1} \frac{(y^j_{i+1}-y^j_i)^2} {2\,\Delta t},$$


 * $$D_{xy}(x_k) \approx \frac{1}{N_k}\sum_{j=1}^{N_t}\sum_{i=0,x_i\in S(x_k,r)}^{N_s-1}\frac{(x^j_{i+1}-x^j_i)(y^j_{i+1}-y^j_i)}{2\,\Delta t}.$$

The moment estimation requires a large number of trajectories passing through each point, which agrees precisely with the massive data generated by the a certain types of super-resolution data such as those acquired by sptPALM  technique on biological samples. The exact inversion of Lagenvin's equation demands in theory an infinite number of trajectories passing through any point x of interest. In practice, the recovery of the drift and diffusion tensor is obtained after a region is subdivided by a square grid of radius r or by moving sliding windows (of the order of 50 to 100 nm).

Automated recovery of the boundary of a nanodomain
Algorithms based on mapping the density of points extracted from trajectories allow to reveal local binding and trafficking interactions and organization of dynamic subcellular sites. The algorithms can be applied to study regions of high density, revealved by SPTs. Examples are organelles such as endoplasmic reticulum or cell membranes. The method is based on spatiotemporal segmentation to detect local architecture and boundaries of high-density regions for domains measuring hundreds of nanometers.