Redundancy principle (biology)

The redundancy principle in biology       expresses the need of many copies of the same entity (cells, molecules, ions) to fulfill a biological function. Examples are numerous: disproportionate numbers of spermatozoa during fertilization compared to one egg, large number of neurotransmitters released during neuronal communication compared to the number of receptors, large numbers of released calcium ions during transient in cells, and many more in molecular and cellular transduction or gene activation and cell signaling. This redundancy is particularly relevant when the sites of activation are physically separated from the initial position of the molecular messengers. The redundancy is often generated for the purpose of resolving the time constraint of fast-activating pathways. It can be expressed in terms of the theory of extreme statistics to determine its laws and quantify how the shortest paths are selected. The main goal is to estimate these large numbers from physical principles and mathematical derivations.

When a large distance separates the source and the target (a small activation site), the redundancy principle explains that this geometrical gap can be compensated by large number. Had nature used less copies than normal, activation would have taken a much longer time, as finding a small target by chance is a rare event and falls into narrow escape problems.

Molecular rate
The time for the fastest particles to reach a target in the context of redundancy depends on the numbers and the local geometry of the target. In most of the time, it is the rate of activation. This rate should be used instead of the classical Smoluchowski's rate describing the mean arrival time, but not the fastest. The statistics of the minimal time to activation set kinetic laws in biology, which can be quite different from the ones associated to average times.

Stochastic process
The motion of a particle located at position $$X_t$$ can be described by the Smoluchowski's limit of the Langevin equation:

$$dX_t=\sqrt{2D} \, dB_t+\frac{1}{\gamma}F(x)dt,$$

where $$D$$ is the diffusion coefficient of the particle, $$\gamma$$ is the friction coefficient per unit of mass, $$F(x)$$ the force per unit of mass, and $$B_t$$ is a Brownian motion. This model is classically used in molecular dynamics simulations.

Jump processes
$$\begin{align} x_{n+1}= \begin{cases} x_n-a, & \text{with probability } l(x_n) \\ x_n+b, & \text{ with probability } r(x_n) \end{cases} \end{align}$$, which is for example a model of telomere length dynamics. Here $$r(x)=\frac{1}{1+\beta x},$$, with $$r(x)+l(x)=1$$.

Directed motion process
$$\dot{X}=v_0 \bf u,$$ where $$ \bf u$$ is a unit vector chosen from a uniform distribution. Upon hitting an obstacle at a boundary point $$X_0 \in \partial \Omega$$, the velocity changes to $$\dot{X}=v_0 \bf v,$$ where $$\bf v$$ is chosen on the unit sphere in the supporting half space at $$X_0$$ from a uniform distribution, independently of $$ \bf u$$. This rectilinear with constant velocity is a simplified model of spermatozoon motion in a bounded domain $$ \Omega$$. Other models can be diffusion on graph, active graph motion.

Mathematical formulation: Computing the rate of arrival time for the fastest
The mathematical analysis of large numbers of molecules, which are obviously redundant in the traditional activation theory, is used to compute the in vivo time scale of stochastic chemical reactions. The computation relies on asymptotics or probabilistic approaches to estimate the mean time of the fastest to reach a small target in various geometries.

With N non-interacting i.i.d. Brownian trajectories (ions) in a bounded domain Ω  that bind at a site, the shortest arrival time  is by definition

$$ \tau^{1}=\min (t_1,\ldots,t_N), $$ where $$t_i$$ are the independent arrival times of the N ions in the medium. The survival distribution of arrival time of the fastest $$Pr(\tau^{1}>t)$$   is expressed in terms of a single particle, $$ Pr(\tau^{1}>t)=Pr^N(t_1>t) $$. Here $$Pr\{t_{1}>t \}$$ is the survival probability of a single particle prior to binding at the target.This probability is computed from the solution of the diffusion equation in a domain $$\Omega$$:

$$ \frac{\partial p(x,t)}{\partial t} =D \Delta p(x,t) \hbox { for } x \in \Omega, t>0 $$

$$ \begin{align} p(x,0)=&p_0(x) \hbox{ for } x \in \Omega \\ \frac{\partial p}{\partial n}(x,t) &=0 \hbox{ for } x \in \partial \Omega_r\\ p(x,t)&=0 \hbox{ for } x \in \partial \Omega_a, \end{align} $$

where the boundary $$\partial \Omega$$ contains NR binding sites $$\partial \Omega_i\subset\partial \Omega$$ ($$\partial \Omega_a=\bigcup\limits_{i=1}^{N_R}\partial\Omega_i,\ \partial\Omega_r=\partial\Omega-\partial\Omega_a$$). The single particle survival probability is

$$ \Pr\{t_{1}>t \} =\int\limits_{\Omega} p(x,t)dx, $$ so that $$ \Pr\{\tau^{1}=t \} = \frac{d}{dt}\Pr\{\tau^{1}t \})^{N-1}\Pr\limits\{t_{1}=t \}, $$where

$$ \Pr\{t_{1}=t \}= {\oint_{\partial \Omega_a}} \frac{\partial p(x,t)}{\partial n}\, dS_{x} $$and $$\Pr\{t_{1}=t \}= N_R {\oint_{\partial \Omega_1}} \frac{\partial p(x,t)}{\partial n}\,dS_{x} $$.

The probability density function (pdf) of the arrival time is

$$ \Pr\{\tau^{1}=t \} =N N_R \left[\int\limits_{\Omega} p(x,t)dx \right]^{N-1}\oint\limits_{\partial \Omega_1} \frac{\partial p(x,t)}{\partial n} dS_{x}, $$ which gives the MFPT

$$ \bar{\tau}^{1}=\int\limits\limits_0 ^{\infty}\Pr\{\tau^{1}>t\} dt = \int\limits_0 ^{\infty} \left[ \Pr\{t_{1}>t\} \right]^N dt. $$ The probability $$\Pr\{t_{1}>t \}$$ can be computed using short-time asymptotics of the diffusion equation as shown in the next sections.

Explicit computation in dimension 1
The short-time asymptotic of the diffusion equation is based on the ray method approximation. For an semi-interval $$[0,\infty[$$, the survival pdf is solution of

$$\begin{align} \frac{\partial (x,t)}{\partial t}& =D \frac{\partial^2 p(x,t)}{\partial x^2} \quad\mbox{ for } x>0,\ t>0 \\ p(x,0)&=\delta(x-a)\quad\mbox{ for }\ x>0,\quad p(0,t)=0\quad\mbox{ for } t>0, \end{align}$$

that is$$p(x,t) =\frac{1}{\sqrt{4D \pi t}}\left[\exp\left\{ - \frac{(x-a)^2}{4Dt}\right\}- \exp\left\{ - \frac{(x+a)^2}{4Dt}\right\}\right]. $$

The survival probability with D=1 is $$\Pr\{t_{1}>t \}=\int\limits\limits_{0}^{\infty} p(x,t)\,dx=1-\frac{2}{\sqrt{\pi}} \int\limits\limits_{a/\sqrt{4t}}^{\infty}e^{-u^2}\,du $$. To compute the MFPT, we expand the complementary error function

$$\frac{2}{\sqrt{\pi}} \int\limits\limits_{x}^{\infty}e^{-u^2}\,du =\frac{e^{-x^2}}{x\sqrt{\pi}}\left(1-\frac{1}{2x^2}+O(x^{-4})\right)\quad\mbox{for}\ x\gg1, $$ which gives$$\bar{\tau}^{1}=\int\limits\limits_0 ^{\infty} \left[ \Pr\{t_{1}>t\} \right]^N dt \approx \int\limits\limits_0 ^{\infty} \exp\left\{ N\ln\left(1-\frac{e^{-(a/\sqrt{4t})^2}}{(a/\sqrt{4t})\sqrt{\pi}}\right)\right\}\, dt \approx \frac{a^2}{4}\int\limits\limits_0^{\infty} \exp \left\{ -N\frac{\sqrt{u}e^{-\frac{1}{u}}}{\sqrt{\pi}} \right\}du $$,

leading (the main contribution of the integral is near 0) to $$\bar{\tau}^{1} \approx \frac{a^2}{4D\ln \frac{N}{\sqrt{\pi}}}\quad\mbox{for}\ N\gg1. $$

This result is reminiscent of using the Gumbel's law. Similarly, escape from the interval [0,a] is computed from the infinite sum

$$p(x,t\,|\,y) =\frac{1}{\sqrt{ 4 D \pi t}}\sum\limits_{n=-\infty}^{\infty} \left[\exp \left\{ -\frac{(x-y+2na)^2}{4t} \right\} -\exp \left\{ -\frac{(x+y+2na)^2}{4t} \right\} \right] $$.The conditional survival probability is approximated by

$$\Pr\{t_{1}>t\,|\,y \}=\int\limits\limits_{0}^{a} p(x,t\,|\,y)\,dx ds\sim1-\max\frac{2\sqrt{t}}{\sqrt{\pi}}\left[\frac{e^{-y^2/4t}}{y},\frac{e^{-(a-y)^2/4t}}{a-y}\right] \quad\mbox{as}\ t\to0 $$, where the maximum occurs at $$\delta= $$ min[y,a-y] for 0t\} \right]^N dt \approx \int\limits\limits_0 ^{\infty} \exp\left\{ N\ln\left(1-\frac{8\sqrt{t}}{\delta\sqrt{\pi}} e^{-\delta^2/16t}\right) \right\}dt \approx \frac{\delta^2}{16D\ln\frac{2N}{\sqrt{\pi}}}\quad\mbox{for}\ N\gg1. $$

Arrival times of the fastest in higher dimensions
The arrival times of the fastest among many Brownian motions are expressed in terms of the shortest distance from the source S to the absorbing window A, measured by the distance $$\delta_{min}=d(S,A),$$where d is the associated Euclidean distance. Interestingly, trajectories followed by the fastest are as close as possible from the optimal trajectories. In technical language, the associated trajectories of the fastest among N, concentrate near the optimal trajectory (shortest path) when the number N of particles increases. For a diffusion coefficient D and a window of size a, the expected first arrival times of N identically independent distributed Brownian particles initially positioned at the source S are expressed in the following asymptotic formulas :

$$ \bar\tau^{d1} \approx \frac{\delta^2_{min}}{4D\ln\left(\frac{N}{\sqrt{\pi}}\right)}, \hbox{in dim  1, valid for} N \gg1 , $$

$ \bar \tau^{d2} \approx \frac{\delta^2_{min}}{ 4 D \log \left(\frac{\pi \sqrt{2}N}{8\log\left(\frac{1}{a}\right)}\right)}, \hbox{ in  dim 2 for } \frac{N}{\log (\frac{1}{\epsilon})}\gg1, $

$$ \bar\tau^{d3} \approx \frac{\delta^2_{min}}{4D{\log\left( N\frac{4a^2}{\pi^{1/2}\delta^2_{min}}\right)}}, \hbox{ in dim } 3, \hbox{ for } \frac{Na^2}{\delta^2_{min}}\gg1. $$

These formulas show that the expected arrival time of the fastest particle is in dimension 1 and 2, O(1/\log(N)). They should be used instead of the classical forward rate in models of activation in biochemical reactions. The method to derive formulas is based on short-time asymptotic and the Green's function representation of the Helmholtz equation. Note that other distributions could lead to other decays with respect N.

Minimizing The optimal path in large N
The optimal paths for the fastest can be found using the Wencell-Freidlin functional in the Large-deviation theory. These paths correspond to the short-time asymptotics of the diffusion equation from a source to a target. In general, the exact solution is hard to find, especially for a space containing various distribution of obstacles.

The Wiener integral representation of the pdf for a pure Brownian motion is obtained for a zero drift and diffusion tensor $$\sigma=D$$ constant, so that it is given by the probability of a sampled path until it exits at the small window $$\partial\Omega_a$$at the random time T

$$Pr\{ x_N(t_{1,M})\in\Omega,{x}_N(t_{2,M})\in\Omega,\dots, x_M(t)=x, t\leq T\leq t+\Delta t |x(0)=y\}$$

$$=[\int\limits_{\Omega} \cdots \int\limits\limits_{\Omega}\prod_{j=1}^{M} \frac{d{y}_j}{\sqrt{(2\pi \Delta t)^n\det {\sigma}(x)(t_{j-1,M}))}} \exp \{ -\frac{1}{2\Delta t} \left[{y}_j-x(t_{j-1,N})- {a}({x}(t_{j-1,N}))\Delta t \right]^T{\sigma}^{-1}(x(t_{j-1,N}))\left[{y}_j-x(t_{j-1,N})-{a}(x(t_{j-1,N}))\Delta t \right]\} $$

where

$$\Delta t=t/M, t_{j,N}=j\Delta t,\ x(t_{0,N})=y \hbox{ and } {y}_j=x(t_{j,N})$$ in the product and T is the exit time in the narrow absorbing window $$\partial\Omega_a.$$ Finally,

$$\langle\tau^{(n)}\rangle=\int\limits\limits_0 ^{\infty}\exp \left\{ n \log \int\limits_{\Omega} p(x,t|y)\,dx\right\} dt =\int_0 ^{\infty} \tau_{\sigma} Pr\{ \hbox{ Path }\sigma \in S_n (y), \tau_{\sigma}=t \} dt,$$

where $$S_n(y)$$ is the ensemble of shortest paths selected among n Brownian trajectories, starting at point y and exiting between time t and t+dt from the domain $$\Omega$$. The probability$$Pr\{ \hbox{ Path }\sigma \in S_n \}$$ is used to show that the empirical stochastic trajectories of $$S_n$$ concentrate near the shortest paths starting from y and ending at the small absorbing window $$\partial \Omega_a$$, under the condition that $$\epsilon=\frac{|\partial \Omega_a|}{|\partial\Omega|} \ll 1$$. The paths of $$S_n(y)$$ can be approximated using discrete broken lines among a finite number of points and we denote the associated ensemble by $$\tilde S_n(y)$$. Bayes' rule leads to$$Pr\{ \hbox{ Path }\sigma \in \tilde S_n(y)| t<\tau_{\sigma}<t+dt \}=\sum_{m=0}^{\infty}

Pr\{ \hbox{ Path }\sigma \in \tilde S_n(y)|m, t<\tau_{\sigma}<t+dt \}Pr\{ m \mbox{ steps}\}$$ where $$Pr\{ m \mbox{ steps}\}=Pr\{ \mbox{the paths of }\tilde S_n(y) \mbox{exit in m steps} \}$$ is the probability that a path of $$\tilde S_n(y)$$  exits in m-discrete time steps. A path made of broken lines (random walk with a time step$$\Delta t$$) can be expressed using Wiener path-integral. The probability of a Brownian path x(s) can be expressed in the limit of a path-integral with the functional:

$$Pr\{ x(s)| s\in[0,t] \} \approx \exp \left(-\int_{0}^t |\dot x|^2ds \right).$$

The Survival probability conditioned on starting at y is given by the Wiener representation:

$$S(t|x_0)= \int_{x\in \Omega} dx \int_{x(0)}^{x(t)=x} {\mathcal D} (x)\exp \left(-\int_{0}^t |\dot x|^2ds \right),$$

where $${\mathcal D} (x)$$ is the limit Wiener measure: the exterior integral is taken over all end points x and the path integral is over all paths starting from x(0). When we consider n-independent paths $$(\sigma_1,..\sigma_n)$$ (made of points with a time step $$\Delta t$$ that exit in m-steps, the probability of such an event is

$$ Pr \{ \sigma_1,..\sigma_n \in S_n(y)|m, \tau_{\sigma}=m \Delta t \}= \left(\int\limits_{y_0=y} \cdots \int\limits_{{y}_j \in\Omega} \int\limits_{{y}_n\in \partial \Omega_a}  \frac{1}{(4D\Delta t)^{dm/2}}\prod_{j=1}^{m} \exp \Bigg \{ -\frac{1}{4D\Delta t} \left[|{y}_j-{y}_{j-1})|^2 \right] \} \right)^n$$$$\approx \left(\frac{1}{(4D\Delta t)^{dm/2}}\right)^n\int_{x} {\mathcal D} (x)\exp \Bigg \{-n\int\limits_0^{m \Delta t} \dot{x}^2ds \Bigg \} $$.Indeed, when there are n paths of m steps, and the fastest one escapes in m-steps, they should all exit in m steps. Using the limit of path integral, we get heuristically the representation

$$Pr \{ \hbox{ Path }\sigma \in \tilde S_n(y)|m, \tau_{\sigma}=m \Delta t \}= \left(\int\limits_{{y}_0=y} \cdots \int\limits_{{y}_j \in\Omega}\int\limits_{{y}_n\in\partial\Omega_a} \frac{1}{(4D\Delta t)^{dm/2}}\prod_{j=1}^{m} \exp ( -\frac{1}{4D\Delta t} \left[|{y}_j-{y}_{j-1})|^2 \right])\right)^n $$

$$\approx \int_{ x \in \Omega } dx \int_{x(0)=y}^{x(t)=x} {D} (x)\exp (-n \int\limits_0^{m \Delta t} \dot{x}^2ds ) ,$$

where the integral is taken over all paths starting at y(0) and exiting at time $$m\Delta t$$. This formula suggests that when n is large, only the paths that minimize the integrant will contribute. For large n, this formula suggests that paths that will contribute the most are the ones that will minimize the exponent, which allows selecting the paths for which the energy functional is minimal, that is

$$E=\min_{X\in \mathcal P_t}\int\limits_0^T \dot{x}^2ds,$$

where the integration is taken over the ensemble of regular paths$$\mathcal P_t$$ inside $$\Omega$$ starting at y and exiting in $$\partial \Omega_a$$, defined as

$$\mathcal P_T=\{ P(0)=y, P(T)\in \partial \Omega_a \hbox{ and } P(s) \in \Omega \hbox{ and } 0\leq s\leq T\}.$$

This formal argument shows that the random paths associated to the fastest exit time are concentrated near the shortest paths. Indeed, the Euler-Lagrange equations for the extremal problem are the classical geodesics between y and a point in the narrow window $$\partial \Omega_a$$.

Fastest escape from a cusp in two dimensions
The formula for the fastest escape can generalize to the case where the absorbing window is located in funnel cusp and the initial particles are distributed outside the cusp. The cusp has a size $$\epsilon$$ in the opening and a curvature R. The diffusion coefficient is D.  The shortest arrival time, valid for large n is given by  $$\tau^{(n)} \approx  \frac{\pi^2 R^3}{4\epsilon D (\frac{1-\cos(c\sqrt{\tilde \epsilon})}{\tilde\epsilon })^2 \log(\frac{2n}{\sqrt{\pi}})}. $$ Here$$\tilde \epsilon=\frac{\epsilon}{R}$$and c is a constant that depends on the diameter of the domain. The time taken by the first arrivers is proportional to the reciprocal of the size of the narrow target $$\epsilon$$. This formula is derived for fixed geometry and large n and not in the opposite limit of large n and small epsilon.

Concluding remarks
How nature sets the disproportionate numbers of particles remain unclear, but can be found using the theory of diffusion. One example is the number of neurotransmitters around 2000 to 3000 released during synaptic transmission, that are set to compensate the low copy number of receptors, so the probability of activation is restored to one.

In natural processes these large numbers should not be considered wasteful, but are necessary for generating the fastest possible response and make possible rare events that otherwise would never happen. This property is universal, ranging from the molecular scale to the population level.

Nature's strategy for optimizing the response time is not necessarily defined by the physics of the motion of an individual particle, but rather by the extreme statistics, that select the shortest paths. In addition, the search for a small activation site selects the particle to arrive first: although these trajectories are rare, they are the ones that set the time scale. We may need to reconsider our estimation toward numbers when punctioning nature in agreement with the redundant principle that quantifies the request to achieve the biological function.