Wikipedia:Reference desk/Archives/Mathematics/2011 June 14

= June 14 =

Software
WHAT IS THE COMPUTER SOFTWER? — Preceding unsigned comment added by 117.197.165.191 (talk) 05:38, 14 June 2011 (UTC)
 * Try asking at The Computing Helpdesk. Perhaps a little more detail about your question would help. &#8213; J u P it E er  (talk) 06:50, 14 June 2011 (UTC)
 * Or read our article on computer software. Gandalf61 (talk) 09:20, 14 June 2011 (UTC)
 * I think the OP might be wondering how Wikipedia works. I don't know if Wikipedia has a software per se. But, as JuPitEer said: the computer reference desk would be a better place to ask. Maybe try the Wikipedia article too. — Fly by Night  ( talk )  14:28, 14 June 2011 (UTC)
 * The software that runs wikipedia is MediaWiki. Staecker (talk) 23:27, 15 June 2011 (UTC)

Evaluating the matrix exponential of a tridiagonal matrix
I am working on a classification problem based on Bayesian inference, where I observe the discrete state of an object at non-equidistant points in time (the evidence). My objective is to keep updated probabilities that the object belongs to different classes Ck, k=1,...m using Bayes' theorem. The evolution of the state can be modelled as a continuous-time Markov process. The states can be ordered such that the state only jumps to a neighboring state. The rates at which the jumps occur are a characteristic of the object class, and I have collected data from objects with known classes and based on this I have good estimates of the class specific rate matrices Rk. The rate matrices are tridiagonal since the state jumps are restricted to nearest neighbours.

The number of states n are in the range 5-20 and the number of classes m is less than 10. For the Bayseian inference update I need to evaluate the state-transition probability conditioned the class hypothesis Ck
 * $$P(s_j|s_i, \Delta t_{i,j}, C_k) = \left[\exp(\mathbf{R}_k \Delta t_{i,j})\right]_{i,j}$$

where In my problem I need to evaluate the matrix exponential
 * $$s_j$$:Posterior state at time $$t_j$$
 * $$s_i$$:Prior state at time $$t_j$$
 * $$\Delta t_{i,j}$$: Time difference $$t_j - t_i$$
 * $$\exp(\mathbf{R}_k \Delta t_{i,j})$$

regularly for each of the m object classes but with different update intervals. One way to do that is to diagonalize the rate matrices as
 * $$\mathbf{R}_k = \mathbf{P}^{-1}_k\boldsymbol{\Lambda}_k \mathbf{P}_k$$

where $$\boldsymbol{\Lambda}$$ is a diagonal matrix containing the eigenvalues (eigenrates) as then
 * $$\exp(\mathbf{R}_k \Delta t_{i,j}) = \mathbf{P}^{-1}_k \exp(\boldsymbol{\Lambda}\Delta t_{i,j})\mathbf{P}_k$$

which makes recalculating the matrix exponential for different update times easier, as calculating P and its inverse only needs to be done once per object class, so what I basically need to do is to make n scalar exponential calculations and then make the matrix product, which scales as $$\mathcal{O}(n^2)$$. From the eigenrates, I also know the characteristic times in the system and I may even precalculate the matrix exponentials for some logarithmically distrubuted update times and simply pick the closest match in update time.

However, I am concerned about the numerical stability in the diagonalization. The largest eigenvalue should always be zero (corresponding to the steady state solution), but this is susceptible to roundoff error. (Until now I has postcorrected the eigenvalue setting the highest one to zero) For the remaining eigenvalues the ratio between min and max can be assumed to be smaller than 1000. I read in the matrix exponential article that doing this right is far from trivial.

Can I use the fact that my rate matrices are tridiagonal to make a more robust or elegant computation of the matrix exponentials?

--Slaunger (talk) 09:06, 14 June 2011 (UTC)
 * Try the formula $$e^M=\lim_{n\rarr \infty} \left(1+2^{-n}M\right)^{2^n}$$ to reduce the matrix exponential to n squarings of the matrix $$1+2^{-n}M$$, and not using the fact that M is tridiagonal. Bo Jacoby (talk) 15:56, 14 June 2011 (UTC).
 * Thank you for your response. Ok, I think I understand, but if I digonalize once, I only have to do three matrix multiplications for each different $$\Delta t_{i,j}$$, whereas using the approach you suggest I need make n multiplications, where n depends on how fast the expression converge, assuming I understand correctly? I realize it will converge very fast though... Obviously n should be selected such that that $$2^{-n}\boldsymbol{M} \ll \mathbf{I}$$, and this is mathematically correct, but will it be more numerically stableerrors than the diagonalization approach? What I am concerned about is the numerical stability and reliability of the diagonalization approach. It is mentioned in Matrix exponential that


 * Finding reliable and accurate methods to compute the matrix exponential is difficult, and this is still a topic of considerable current research in mathematics and numerical analysis.


 * In mathematical libraries, also the Padé approximant is often used as a robust way to do the matrix exponential for general matrices (except if it is ill-conditioned), but it seems like overkill for my case where the rate matrices are constant and I only multiply it with a scalar prior to taking the exponential.
 * In the Octave documentation for expm I noticed a reference to Moller and Van Loan, Nineteen Dubious Ways to Compute the Exponential of a Matrix, SIAM Review, 1978, which seems very interesting in this context. Unfortunately I do not have access to that paper, but the abstract seems very interesting:
 * And I was thinking that maybe there was a nifty, reliable and elegant method for calculating the matrix exponential if the matrix was tridiagonal and multiplied by some scalar - at least I seem to recall that solving the eigenvalue problem for a tridiagonal matrix is much easier than for a general diagonalizable matrix. --Slaunger (talk) 19:12, 14 June 2011 (UTC)
 * Even when a scalar exponential eM is approximated by (1+2&minus;nM)2 n, n should be chosen with care. If n is too small, then the approximation has insufficient precision, and if n is too big, then round-off errors make 1 + 2&minus;nM equal to 1. Floating-point numbers have fixed precision, and not everything can be done. Bo Jacoby (talk) 08:49, 15 June 2011 (UTC).
 * Yes, that is why the question is not so simple. It is more about the most efficient and reliable algorithm than about the maths (although you need some linear algebra maths to arrive at the best algorithm). Maybe the Computer or science help desk is better place for my question? I was in doubt if numerical methods/algorithms are in scope of this help desk, when I asked the question, but decided to try here as the question was pretty mathematically oriented... In my problem i need to do these caclulation in real time for many objects on a computational platform with rather limited resources, so I need to understand how to find the best processing compromise. --Slaunger (talk) 10:11, 15 June 2011 (UTC)
 * As far as I'm concerned, asking at the math desk is fine. I think at the computing desk they're more used to answering questions about programming languages and systems administration than numerical algorithms. --Trovatore (talk) 10:30, 15 June 2011 (UTC)
 * In order to find the most efficient and reliable algorithm you need to test some algorithms. Have you tried my method before you think that something else is better? Bo Jacoby (talk) 15:51, 15 June 2011 (UTC).
 * A valid point. I have tried experimenting with the Padé approximant implementation in scipy, which seem to converge very fast for the examples I have tried. I have implemented the diagonalization approach in Python, which also seem to work and which is efficient as the number of computations is fixed once the rate matrices have been diagonalized. I have not tried your approach yet, but my fealing is that it is not as stable as the Padé approximant. However, I need to go to a production language like C to properly profile the different options, and this takes a considerable effort. I think I will try to get hold on the reference I mentioned earlier, where I have noticed there is also a 25 years later updated review article, and study what others have done instead of spending a lot of time of reinventing the wheel. Tak for din tid! --Slaunger (talk) 19:40, 15 June 2011 (UTC)
 * The 25 years later reference is also publicly available at http://www.cs.cornell.edu/cv/researchpdf/19ways+.pdf. I am reading it with interest - a lot of goodies there.... --Slaunger (talk) 07:45, 16 June 2011 (UTC)
 * To compute the formula $$\left(I+2^{-n}M\right)^{2^n}$$ to good numerical precision, start with $$2^{-n}M$$ rather than $$I+2^{-n}M$$ and use $$(I+X)^2=I+(2X+X^2)$$ to perform the squaring without adding in the $$I$$. At some point, unless $$M$$ is very small, the intermediate value being stored gets large and then the $$I$$ can be added in and ordinary squaring can be used for the remaining steps. McKay (talk) 04:32, 16 June 2011 (UTC)
 * After having studied the review article, it appears to me that series expansion and multiplicative methods as suggested here will not be stable as $$\Delta t_{i,j} \rightarrow \infty$$ for my rate matrix problem. In my case the highest eigenvalue is always exactly zero, whereas all others are negative real numbers, which implies that
 * $$\lim_{\Delta t_{i,j} \rarr \infty}\exp(\mathbf{R}_k \Delta t_{i,j}) \rightarrow \mathbf{S}_k)$$,
 * where $$\mathbf{S}_k$$ is the steady-state state-transition probability matrix containing n identical column vectors. Any column vector in will $$\mathbf{S}_k$$ represent the steady-state (or a priori) probability distribution between the states. As the time difference grows, the elements of the matrix to exponentiate also grows, and so will be number of multiplications and roundoff errors. In the specific case, where the I have this eigenvalue property of the rate matrices, a matrix decomposition method seems to be a good path. However, the decomposition does not necessarily have to be an eigenvalue decomposition (although that is very easy for a tridiagonal matrix). If the condition of the matrix Pk containing the eigenvectors is too high (if it is nearly defective), one should be careful. I also learned that other decomposition methods such QR decomposition into an orthogonal and a triangular matrix can be a useful alternative to eigenvlaue decomposition, as calculating the matrix exponential of a triangular matrix should be fairly simple (have not studied the details).
 * Another observation is that the decomposition method require of the order $$\mathcal{O}(n^3)$$ computations to make the decomposition but only $$\mathcal{O}(n^2)$$ for each calculations for each subsequent $$\Delta t_{i,j}$$. --Slaunger (talk) 09:48, 16 June 2011 (UTC)
 * McKay's trick above is a clever one. The review article is scholarly and inconclusive. High floating point precision was not available in 1978. If you know that no eigenvalue has a positive real part, then use eMt=(e&minus;Mt)&minus;1 Bo Jacoby (talk) 12:31, 16 June 2011 (UTC).
 * It is the updated 25 years after article from 2003 I have been reading, albeit it still contains quite some old stuff reminescent of 1978 single precision computing possibilities. I agree it is not very conclusive, the bottom-line is that all methods has it pros and cons, although the pro/con balance differs from method to method and to some extend depend especially on how well-behaved the matrix you want to exponentiate is.
 * Thanks for the inversion trick. But does the inversion trick help when the largest eigenvalue in the rate matrix is identical zero? The rate matrices fulfill the condition that the sum of elements in each column is zero, the diagonals are negative and the offdiagonals are positive. Does a sign reversal on all elements really make it easier to exponentiate, and is the matrix inversion needed afterwards not a $$\mathcal{O}(n^3)$$ process, or am I missing something? --Slaunger (talk) 12:54, 16 June 2011 (UTC)
 * The series expansion of e&minus;5 is unstable, due to round-off errors, while (e5)&minus;1 is stable. e0 is the easier case. Yes, matrix inversion is n3, so don't do it if anything else works. The squaring method is stable, I think. The tridiagonal property is apparently not helpful. Good luck on the project! Bo Jacoby (talk) 16:34, 16 June 2011 (UTC).
 * Ah, I see. Thanks for all the helpful advice of which some helped me to go in a reasonable direction myself. I think I will go for a eigenvalue decomposition n2 approach first, and if that fails I have a couple of good n3 ideas to pursue here. Tak igen. --Slaunger (talk) 18:42, 16 June 2011 (UTC)

Dimension and Cardinality
Let M be a smooth, real, n−dimensional manifold. I'd like to know the dimension and cardinality of C∞(M,R), i.e. the space of smooth functions from M to R. I believe that the dimension is at least ℵ1, and maybe even ℵ2. I have no idea what the cardinality might be; although that isn't of paramount importance. Can anyone confirm, or deny, that the dimension is at least ℵ1, and maybe even ℵ2? If someone could supply a reference then that'd be great. — Fly by Night  ( talk )  15:45, 14 June 2011 (UTC)
 * I may be being stupid, but isn't M a disjoint union of as many open balls as you like a perfectly good smooth n-dimensional manifold which has as many such functions as you like? Or is there an implicit connectedness assumption somewhere in that? 128.232.241.211 (talk) 16:23, 14 June 2011 (UTC)
 * You're not being stupid at all, in fact you raise an interesting point. I guess you're right. I should have mentioned that it's connected. Although I'm not sure that it matters. You'd get a Cartesian product of the form
 * $$ C^{\infty}(M_1 \sqcup \ldots \sqcup M_k,\R) = C^{\infty}(M_1,\R) \times \cdots \times C^{\infty}(M_k,\R) \, . $$
 * I'm not sure that that would change the dimension. The because k⋅ℵ1 = ℵ1. Or at least I think so&hellip; — Fly by Night  ( talk )  16:42, 14 June 2011 (UTC)
 * I think (part of) 128's point was that k need not be finite. AndrewWTaylor (talk) 17:04, 14 June 2011 (UTC)
 * I was just thinking of a nice hypersurface like we use in differential geometry all the time. — Fly by Night  ( talk )  17:37, 14 June 2011 (UTC)
 * Okay, so that's at least &sigma;-compact, which still gets you separable (unless I was wrong about compact manifold => separable). --Trovatore (talk) 18:22, 14 June 2011 (UTC)
 * For an abstract topological space, compact does not imply separable; for example the so-called Lexicographic order topology on the unit square. It's compact and connected, but it's not separable. As for a manifold, I'm not sure. It's usual to assume connectedness and paracompactness (that makes it a metrizable space). — Fly by Night  ( talk )  18:47, 14 June 2011 (UTC)
 * It's actually pretty trivial; I just didn't see it right away. Given a compact manifold, each point has an open neighborhood that's homeomorphic to Rn, so that neighborhood has a countable dense subset.  By compactness, finitely many such neighborhoods cover the whole manifold. --Trovatore (talk) 19:04, 14 June 2011 (UTC)
 * Good work. It's easy when you know how&hellip; — Fly by Night  ( talk )  19:53, 14 June 2011 (UTC)


 * OK, first of all, I kind of suspect that you're using $$\aleph_1$$ and $$\aleph_2$$ to mean $$2^{\aleph_0}$$ and $$2^{2^{\aleph_0}}$$ respectively, is that true? That's a bad habit, unfortunately one encouraged by a lot of popularizations.  The GCH is not some harmless little notational thing but rather a strong combinatorial assertion.  You can write instead $$\beth_1$$ and $$\beth_2$$, although these are somewhat less universally recognized and suffer from the problem that $$\beth$$ and $$\gimel$$ look awfully similar.
 * Are we assuming M is compact? Then the cardinality of C∞(M,R) is $$2^{\aleph_0}$$.  The lower bound should be pretty trivial; for the upper bound, note that M is separable (I think that follows from "compact manifold"), and pick a countable dense subset; then any continuous function from M to R is determined by its values on that subset.  If you drop "compact" then you can find non-separable (topological) manifolds like the long line and things get a little more complicated, but if I recall correctly there's no way to make the long line into a smooth manifold.  But again that's more complicated.
 * For the dimension, what sort of dimension are you interested in? Topological dimension?  Dimension as a real vector space? --Trovatore (talk) 16:40, 14 June 2011 (UTC)
 * Ermm, errr. I wasn't assuming M to be compact; I didn't think that it would have mattered. But it obviously does. I'm thinking of it as a ring, with point-wise addition and multiplication as the operations. So (I guess) that the vector space dimension would be of more useful for me. You obviously know your stuff, so go easy. Try not the fry my mind. — Fly by Night  ( talk )  16:59, 14 June 2011 (UTC)

Assuming M to be connected and paracompact, $$C^\infty(M)$$, equipped with the topology of uniform convergence of all derivatives on compact subsets, is a separable Frechet space. That's probably more useful to know than the linear dimension (which is $$2^{\aleph_0}$$.) Sławomir Biały  (talk) 17:55, 14 June 2011 (UTC)
 * Is there an example of a connected infinitely differentiable manifold that's not separable? Did I remember correctly that there's no infinitely differentiable structure on the long line? --Trovatore (talk) 18:24, 14 June 2011 (UTC)
 * I think the Lexicographic order topology on the unit square gives a topology on a manifold (with boundary) that is compact and connected, but which is not separable. I'm not sure if that's what you were asking. — Fly by Night  ( talk )  18:53, 14 June 2011 (UTC)
 * There is a smooth structure on the long line (according to our article long line). The lexicographic order topology is not a manifold with boundary. No neighborhood of a point (x,1) for 0<x<1 is homeomorphic to an interval. Sławomir Biały  (talk) 19:21, 14 June 2011 (UTC)
 * Also, every connected paracompact manifold is separable. (Paracompactness and local compactness implies that each component is sigma compact, hence Lindeloef, hence separable.)   Sławomir Biały  (talk) 19:46, 14 June 2011 (UTC)
 * Okay, that makes sense. Would it just be a (topological) manifold then? — Fly by Night  ( talk )  19:55, 14 June 2011 (UTC)
 * It's not any sort of manifold.  Sławomir Biały  (talk) 19:57, 14 June 2011 (UTC)
 * The article on topological manifold says that each point in the space must have a neighbourhood homeomorphic to an open subset of Rn. But to define a homeomorphism you need to know the topology in both the source and the target. What must the topology be on Rn to be able to talk about a homeomorphism? The unit square with the lexicographic order topology is compact, connected and Hausdorff. — Fly by Night  ( talk )  21:20, 14 June 2011 (UTC)
 * It's with the usual Euclidean topology. Sławomir Biały  (talk) 23:57, 14 June 2011 (UTC)

Back to the Question
My question has blossomed into a very interesting topological discussion. But could we bring it back to the question at hand. What are the main conclusions? It seems that the space of smooth functions on a manifold M to R is a Fréchet space. Does that tell us anything about the dimension? Have we decided that the dimension is 2ℵ0 (which may or may not be the same as ℵ1)? — Fly by Night  ( talk )  21:23, 14 June 2011 (UTC)
 * Any infinite-dimensional separable Frechet space has dimension as a linear space equal to the continuum. However it might be more natural to think of the dimension as countable, since there is a dense sunspace whose linear dimension is countable.  This is in much the same way that the space $$\ell^2$$ has continuum dimension as a linear space, but its "Hilbert dimension" is countable.   Sławomir Biały  (talk) 23:57, 14 June 2011 (UTC)