Wikipedia:Reference desk/Archives/Mathematics/2013 June 19

= June 19 =

Trial Solution for Nonhomogeneous Differential Equation
I'm teaching myself differential equations and am a bit stuck. I'm trying to solve the following ODE:

$$ y'' + y' - 2y = e^{-2t}cos(2t) $$

The complementary solution includes $e^{-2t}$, so I set my trial particular solution as:

$$ \Big((At + B)e^{-2t}\Big)\Big(Ccost(2t) + Dsin(2t)\Big) $$

I think this is right, but after taking the first and second derivatives of the trial solution, I have about two dozen terms to simplify. Should I expect this many terms for this type of problem? If so, is there some trick to make the calculations easier that I'm missing?OldTimeNESter (talk) 17:51, 19 June 2013 (UTC)
 * It looks as if there should be a lot of simplification by considering it to be the real part of $$ y'' + y' - 2y = e^{(i-1)2t} $$ MChesterMC (talk) 09:34, 20 June 2013 (UTC)


 * Actually, that's not right; I mean e^{-2t} is a solution of
 * $$y'' + y' - 2y = 0$$
 * But the exponent of the RHS is -2+/-2i, not -2. Hence, the "particular" solution to work with is
 * $$e^{-2t} (C \cos(2t) + D \sin(2t))$$.
 * MCHesterMC's idea, is a good one, also. — Arthur Rubin  (talk) 09:41, 21 June 2013 (UTC)

Continuous-time Markov chain in the limit of infinitely many states
First of all sorry for the really long post. I really cannot figure out a way to express my challenge more concise.

My math is rusty and I request a little push to pursue an idea I have, and which is driving me nuts. I have a physical real-valued feature x bound to an interval $$[x_\mathrm{min};x_\mathrm{max}]$$ for a system, which evolves randomly and continuously over time x(t) in a stochastic process, which has the Markov property. Given a value x0 at time t0 I would like to calculate the probability density of the feature some time interval &tau; later


 * $$P(x(t_0+\tau)|x_0)$$.

Due to the Markov property the exact end and start time actually does not matter, the distribution will only depend on the time interval since the last known value of the feature, so I might write this as


 * $$P(x|x_0;\tau)$$

Without yet stating anything about how the feature evolves statistically over time I can say something about how this distribution behaves in its limits. Obviously, as $$\tau \rightarrow 0$$ the distribution will narrow down around the x0 and eventually become a Dirac delta function


 * $$P(x|x_0;\tau) \rightarrow \delta(x-x_0) \quad \mathrm{for} \quad \tau \rightarrow 0$$

In the other extreme case, if the time becomes much longer than any characteristic times in the system the probability density will be become independent of the value x0. All memory will be lost for such long time intervals, and the distribution will just be the unconditional probability density P(x), the prio density. That is, the distribution of x when accumulated over an infinite amount of time


 * $$P(x|x_0;\tau) \rightarrow P(x) \quad \mathrm{for} \quad \tau \rightarrow \infty$$

The system is a complex system and P(x), although well-behaved has in general a complicated multimodal structure, which is not directly expressable as some simple distribution function, although it can be modelled fairly accurately by a Gaussian mixture, truncated to the $$[x_\mathrm{min};x_\mathrm{max}]$$ intervals. It is known experimentally.

Now, the challenge is to figure out how the Dirac delta spreads out to the unconditional distribution as the time interval evolves.

To figure that out I need to know more of the system or at least how it behaves statistically, and for this purpose I have access to a large data base of actual time evolutions of the system. But how to use that data? The most simple-minded brute force approach would be to simply make a huge sample distribution of how the system has actually hopped between any two feature values over any time interval. But, ...although my data base is large, I do not have enough samples to sufficiently accurately estimate this three-dimensional (start value, end value and time interval) probability density function, and in the end I also want to implement a way to calulate this, which is efficient in a concrete implementation, where perhaps a thousand of such calculations are needed each second on a modest processing platform.

Now, the next idea I had was to bin my feature data in n feature space intervals, and model it as a continuous-time Markov chain. The chain is simple in the repect that a transition can only be made to one state up or down in the chain corresponding to a step up or down in the feature space. And the states are limited in number by the known range of the feature space. Now, from the database of feature data it is now easy to find the average time the feature stays within an interval corresponding to a bin, and the probability for hopping up or down, when the state is left can also be found. From this the transition rate matrix Q for this discretized system can be found. Since a state only couples with its neighbours, this rate matrix is a tridiagonal matrix. From the rate matrix I can find the probability for the feature to jump from state i to state j over the time interval as the ith row and the jth column of the matrix exponential of the transition rate matrix multipled by the elapsed time:
 * $$P(x_j|x_i;\tau) = \exp(\boldsymbol{Q}\tau)_{i,j}$$

This discretized version has the right properties in the limit, where time goes to zero, since the matrix exponential of the null matrix is the identity matrix, and the ith, jth element of this is the Kronecker delta.
 * $$P(x_j|x_i;\tau = 0) = \delta_{i,j}$$

In the limit, where the time interval goes to infinity, we can do an eigendecomposition of the transition state matrix
 * $$\boldsymbol{Q} = \boldsymbol{P}\cdot\boldsymbol{\Lambda}\cdot\boldsymbol{P}^{-1}$$

where
 * $$\boldsymbol{P} = \begin{bmatrix}\boldsymbol{v}_1 & \boldsymbol{v_2} & \cdots \end{bmatrix}$$

contain the eigenvectors v1, v2, ... of Q as its columns, and &Lambda; is a diagonal matrix containing the eigenvalues of the rate matrix (I think of them as eigenrates). P-1 is the matrix inverse of P, which I write as a horizontal stack of row vectors


 * $$\boldsymbol{P}^{-1} = \begin{bmatrix} \boldsymbol{w}_1 \\ \boldsymbol{w}_2 \\ \vdots \end{bmatrix}$$

Since the sum of rates in each column of the rate matrix is zero by definition (to conserve provbability) and all rate elements on the diagonal are negative, the largest eigenvalue will always be zero (&lambda;1=0), and then a spectrum of negative rates, the second-largest eigenvalue corresponding to the largest time scale in the system, &tau;2 = -1 / &lambda;2.

Using this eigencomposition I can write the conditional probability as
 * $$P(x_j|x_i;\tau) = \left(\boldsymbol{P}\cdot\exp(\boldsymbol{\Lambda}\tau)\cdot\boldsymbol{P}^{-1}\right)_{i,j}$$

If we insert express P and P-1 in terms of its column and row vectors we get more generally


 * $$P(x_j|x_i;\tau) = \sum_{k=1}^{\cdots} (\boldsymbol{v}_k \cdot \boldsymbol{w}_k)_{i,j}\exp(\lambda_k\cdot\tau)$$

In the limit where $$\tau \gg \tau_2$$, only the k=1 term survives since &lambda;1 = 0 and all other eigenvalues are negative.


 * $$P(x_j|x_i;\tau) \rightarrow (\boldsymbol{v}_1 \cdot \boldsymbol{w}_1)_{i,j} \quad \mathrm{for} \quad \tau \rightarrow \infty $$

So the prior distribution is just given from the eigenvector corresponding to the highest steady-state eigenvalue of the rate matrix time the first row in the inverted matrix of eigenvectors. A nice result. And also compuatble for other times, as long as there are not too many bins.

Now, I have tried to work with this Markov chain using real system data, but I find that the rates matrices are quite sensitive to the number of bins and and where the bins are places. Especially because I am interested in looking at data from different systems, which have different time-evolutions of the feature, but the ability to distinguishing them is very sensitive to the binning, as some systems have a larger prevalence for "riding on the edge" between two bins that other systems, giving a bias for classifying as one system over the other due to the binning.

As I make the bins smaller and smaller I notice that the rates on the diagonal of the transition rate matrix varies smoothly with state number and so does the off-diagonal element in the increasingly large tridiagonal rate matrix. But the matrices also become unmanageably large and the rates increase the smaller the bins. I can parametrize the rate change by fitting the observed rates with smoothly varying functions, and this gives me the thought that it should be possible somehow work with a continuum of Markov states instead each state spanning an infinitisemal amount in feature space. From the data base I can estimate the distribution of the timederivative of the feature as a function of the value of the feature
 * $$P(\dot{x}(x))$$

which is in general unimodal smoothly varying probablity density as a function of feature, but with non-zero mean and skewness. This reminds me of a Wiener process, except that the noise is not Gaussian with zero mean, or some kind of "directed random walk" or weird kind of assymmetric diffusion, where the diffusion constant varies - and is directed in a cross-section of the material. I am sure this must be a well-known problem, but I cannot figure out how to express it riguourously, nor can I find a way to calculate the hopping probability over some time interval in a model, where the feature x is treated as a continuous real-valued parameter.

Any inputs to this would be highly appreciated. Thanks.

--Slaunger (talk) 20:49, 19 June 2013 (UTC)


 * What you have is a stochastic differential equation. I believe (as occurs on that page) the continuous-time Markov kernel can always be "linearized": you can treat the noise as a Weiner process (as you supposed) with merely a scale coefficient that depends on x.  Then, assuming the drift velocity and the noise scale vary slowly with x (relative to the spacing of your successive samples) and that the time to cross those variations is large (relative to your sample period), you should be able to simply take every successive pair $$((x_i,t_i),(x_{i+1},t_{i+1}))$$ (doing every pair would not be invalid, but would tighten the "slowly varying" requirement excessively) and fit to that model.  There may be more sophisticated ways of doing so, but I would try first obtaining the velocity by fitting a smooth function $$\hat v(x)$$ to the pairs $$\left(x_i,\frac{\Delta x_i}{\Delta t_i}\right)$$ (where $$\Delta ?_i:=?_{i+1}-?_i$$; we of course don't try to hit every point because of the noise).  Then remove that velocity field and fit a similarly smooth function $$\hat\sigma^2(x)$$ to the pairs $$\left(x_i,\frac{(\Delta x_i-\hat v(x)\Delta t_i)^2}{\Delta t_i}\right)$$.
 * If you want to relate this to the stationary state of the system, we can consider a non-interacting fluid following these dynamics, and write a time-independent advection-diffusion equation: $$\frac d{dx}D(x)\rho'(x)=v'(x)\rho(x)$$, where there will of course be a normalization factor. ($$D\propto\sigma^2$$, but I don't remember the factor off-hand.)  --Tardis (talk) 14:51, 20 June 2013 (UTC)
 * Thanks for the "push" and telling me what those terms are called! I now have some new subjects to read up on! I also think now that I can use my known steady state distribution of the feature to put some constraints on the distribution of the velocity field as a function of x - especially the balance between upgoing and downgoing rates. I will fiddle a little more with the math in this. Actually, the feature I am looking on has been subjected to some pretty sophisticated fixed-interval smoothing, which means that sampling the distribution of the velocity field will is a quite robust thing, which I can calculate quite accurately also in between sample points. Again, thanks for your help! --Slaunger (talk) 17:06, 20 June 2013 (UTC)