Talk:Square root of a matrix

Proof of unitary freedom
I think the proof of the unitary freedom of roots is not correct. It assumes that the rows of B, the positive root of T, span Cn. This is equivalent to saying B is invertible, which need not to be the case. B won't be invertible if T isn't, and T will only be invertible if it is positive-definite. If it is positive-semidefinite, having some null eigenvalues, T will be singular. For an example, pick T = 0, then B = 0, but the columns of B don't span anything at all. Daniel Estévez (talk) 12:46, 6 December 2008 (UTC)


 * in the finite dimensional setting, a partial isometry can be extended to a unitary. for the same reason, U can be taken to be unitary in the polar decomposition UP of a matrix. Mct mht (talk) 17:16, 6 December 2008 (UTC)


 * I really don't see how that helps. In the proof it is said that each column of A is linear combination of only the columns of B, because the columns of B span Cn. Moreover, I dont see how B can be an orthogonal basis for Cn. If T diagonalizes as T = UDU*, with eigenvalues real and not negative, because T is positive semidefinite. Then you pick B as B = UD1/2U*. U is an orthonormal basis for Cn, but B isn't. What I'm saying is that if T (and then B) is positive definite, then the proof is OK, but if T is only positive semidefinite, then B has kernel, and so its columns cannot span the whole Cn. Of course I may be completely wrong, as I am not really into linear algebra/functional analysis Daniel Estévez (talk) 18:11, 6 December 2008 (UTC)


 * if A*A = B*B, the proof constructs a partial isometry whose initial and final subspaces are Ran(A) and Ran(B). when A (and therefore B) is not invertible, their ranges are not all of C^n, as you say. this is where one uses "in the finite dimensional setting, a partial isometry can be extended to a unitary." call this unitary U, then AU*UA = B*B. Mct mht (talk) 21:30, 6 December 2008 (UTC)


 * Thanks. Now I understand how the proof works, provided you can construct the partial isometry. But I think this is not stated very clearly in the current proof. I would be glad if you reread the proof, adding this details you have told me. Daniel Estévez (talk) 10:22, 7 December 2008 (UTC)

Cholesky vs square root
This article states that Cholesky decomposition gives the square root. This is, I think, a mistake. B.B = A and L.L^T = A are not the same, and B will not equal L, unless L is diagonal. c.f. http://yarchive.net/comp/sqrtm.html I have edited this article appropriately --Winterstein 11:03, 26 March 2007 (UTC)


 * well, the Cholesky decomposition gives a square root. the term "square root" are used in different senses in the two sections. Mct mht 11:09, 26 March 2007 (UTC)


 * At the beginning of section Square root of positive operators the article defines the square root of A as the operator B for which A = B*B and in the next sentence it denotes the operator T&frac12; to mean the matrix for which T = (T&frac12;)2. This was very confusing until I realized that T&frac12; is necessarily self-adjoint and that it is therefore also a square-root in the former sense and that only the self-adjoint square root is unique. Is this reasoning correct? --Drizzd (talk) 11:17, 9 February 2008 (UTC)


 * yes, more precisely the positive square root is unique. Mct mht (talk) 17:23, 6 December 2008 (UTC)


 * surely this should be changed then. Cholesky decomposition does not give a unique root. If L is considered a root because L.L^T = A then L.R where R is orthogonal is also a root since L.R.R^T.L^T = L.L^T = A . —Preceding unsigned comment added by Parapunter (talk • contribs) 05:01, 3 March 2010 (UTC)

Calculating the square root of a diagonizable matrix
The description on calculating the square root of a diagonizable matrix could be improved.

Currently is takes the matrix of eigen vectors as a given, then takes steps to calculate the eigen values from this. It is a very rare situation to have eigen vectors before you have eigen values. They are often calculated simultaneously or for small matrices the eigen values are found first, by finding the roots of the characteristic polynomial.

I realize it is easier to describe the step from eigen vectors to eigen values in matrix-notation tah the other way around, but the description should decide whether it wants to be a recipe or a theorem. If it's a recipe, it should have practical input and if its a theorem, the eigenvalues should be given alongside the eigenvectors.

Please comment if you believe this to be a bad idea. I will fix the article in some weeks if no one stops me - if I remember. :-) -- Palmin


 * Good point. Please do fix it. As an alternative, perhaps consider diagonalization as one step (and refer to diagonalizable matrix), but if you think it's better to spell it out in more detail, be my guest! -- Jitse Niesen (talk) 00:34, 15 March 2007 (UTC)


 * As I didn't see any edits from you, I rewrote the section myself. -- Jitse Niesen (talk) 12:38, 20 May 2007 (UTC)


 * I'm disappointed to see that's been up for years ignoring the fact that matrices in general have negative eigenvalues. 175.45.147.38 (talk) 12:43, 1 September 2010 (UTC)  [spelling of "eigenvalues" 2.28.102.164 (talk) 14:10, 22 February 2021 (UTC)]

There are at least 2 major mistakes in the second half of the article. The definition of the square root of a matrix is wrong: given A, A^(1/2)means that A^(1/2) times A^(1/2) = A, and not A^(1/2)* times A^(1/2) [e.g. see Horn & Johnson - Matrix Analisys]. The other mistake is in the Polar Decomposition definition: a positive definite (not simply positive) matrix M can be decomposed as M=HP, where H is an hermitian (symmetric in the real case) matrix and P is a unitary (othogonal) matrix [e.g. see Bhatia - Matrix Analisys]. The fact that M is nonsingular is not enough to prove the theorem: There could be negative eigenvalues whose effect would an imaginary diagonal element in D^(1/2). [Giuseppe Cammarata] —Preceding unsigned comment added by 88.75.149.153 (talk) 19:44, 7 April 2011 (UTC)
 * I'm not so certain these are mistakes: they can probably all be explained by different conventions in different fields. (I seem to remember having the same reaction at first, but then seeing sources that supported the usage in the article.) That said, I agree that the article should be brought into accord with what seems to be the prevailing conventions. It's going to involve a bit of careful work and rewriting though, but I would support you in your efforts to correct the article. Sławomir Biały  (talk) 20:41, 7 April 2011 (UTC)

There is an iteration for the square root of a matrix in the text, the Babylonian method iterating to the square root of A:  X(k+1) = (1/2)(X(k) + A.X(k)^(-1) ) Is it possible to replace A.X(k)^(-1) by (X(k)^(-1)).A ? Matrix multiplication is not commutative but since that part of the expression iterates towards the identity matrix, it might be possible. 2.28.102.223 (talk) 18:07, 22 January 2021 (UTC) I have tried the above and they give very similar convergence. Motivated by considerations of symmetry, I tried another iteration. Using XA as shorthand for (X(k)^-1).A and AX as shorthand for A.(X(k)^-1) I found that significantly better convergence and robustness is achieved by X(k+1)=(1/2)X(k) + (1/4)XA + (1/4)AX. Unfortunately that comes from my own trials, which constitutes original research, so I can't (by wikipedia rules) put it on the main page. 2.28.102.164 (talk) 14:07, 22 February 2021 (UTC)
 * If you start from an initial guess that commutes with A, it is easy to show that all matrices generated during the computation commute. So, the three methods compute the same matrix, and are thus equivalent. On the other hand, replacing XA by (1/2)(XA+AX) may reduce the rounding errors, specially when the input is not well conditioned. The literature on numerical stability in matrix computation is very large. So, the case of the square root of a matrix has certainly be deeply studied, and I would be very surprised if your result would be original research, although I am unable to provide a reference.
 * The case of a non-commuting starting point seems more difficult, as it is not immediately clear whether the matrix square root is a fixed point of the iteration. D.Lazard (talk) 17:19, 22 February 2021 (UTC)
 * My initial guess does not commute with A. I have rotation matrices which do not commute (for sizes of 3x3 and larger - I have tested up to 10x10). In the limit with perfect convergence, A=X.X (by definition) so A commutes with X and A also commutes with (X^-1). However, before it has converged, when X is still an approximation, A is not the same as X.X so A and X do not commute.
 * Yes, a square root is a fixed point of the iteration. If X is a square root, i.e. A = X.X, then A.(X^-1) = (X^-1).A = X. Ergo, any linear combination, (once convergence is achieved) such as X/2 + A.(X^-1)/4 + (X^-1).A/4 is equal to X.
 * My iteration using (1/2)(XA+AX) takes about half the number of iterations compared to using just AX or XA. The difference is certainly a lot bigger than rounding error. The arithmetic I'm using has about 16 significant figures and a change of 10^-16 (i.e. just rounding error) would not remotely explain the number of iterations going from about 10 to about 5 (in easy cases) and from about 20 to about 10 (in difficult cases).
 * I am measuring the error by subtracting X.X from A and I get RMS values of error matrix elements of 10^-12 in most cases and about 10^-10 in difficult cases.
 * My experiments are original, (I thought of them) but not necessarily unique (others could have thought of the same thing before me, possibly going back to Newton). I don't claim to be the only person to discover this, but my attempt at using (XA+AX)/2 instead of just AX gave a substantial improvement, not a minor improvement. If there's anyone out there who can find a reference, that would be good because then I could put it on the main page instead of here in the comments.2.28.102.226 (talk) 17:25, 23 February 2021 (UTC)
 * My experiments are original, (I thought of them) but not necessarily unique (others could have thought of the same thing before me, possibly going back to Newton). I don't claim to be the only person to discover this, but my attempt at using (XA+AX)/2 instead of just AX gave a substantial improvement, not a minor improvement. If there's anyone out there who can find a reference, that would be good because then I could put it on the main page instead of here in the comments.2.28.102.226 (talk) 17:25, 23 February 2021 (UTC)

Unitary freedom of square roots of positive operators
The current proof seems to make too many assumptions. For a strictly positive T = AA* = BB*, with B = T½ defined as the unique non-negative square root of T, then it's true that B is invertible, and you can construct U = B&minus;1A. However, the given proof that U is unitary also seems to assume that A is invertible:

\begin{align} U^*U &= \left(A^*(B^*)^{-1}\right)\left(B^{-1}A\right) = A^*(T^{-1})A \\ &=A^*((A^*)^{-1}A^{-1})A = I. \end{align}$$

Is it necessarily the case that A is invertible? Wouldn't it be easier to use the equivalent definitions T = A*A = B*B (where again B = T½ is the unique non-negative square root of T) and construct U = AB&minus;1 instead? That way, the transformation A=UB can be proved unitary without assuming that A is invertible:

\begin{align} U^*U &= \left((B^*)^{-1}A^*\right)\left(AB^{-1}\right) = (B^*)^{-1}T (B^{-1}) \\ &= (B^*)^{-1} B^* B (B^{-1}) = I. \end{align}$$

Rundquist (talk) 07:41, 3 February 2012 (UTC)


 * I updated the article to reflect the changes I suggested above. Rundquist (talk) 07:44, 9 February 2012 (UTC)


 * Of course it is not true in general that A is invertible. The original proof addressed this and this was already discussed in the first section of this very talk page. And what is your B&minus;1? If A is not invertible, B won't be either. The statement "...transformation A=UB can be proved unitary without assuming that A is invertible" is just non-sense.


 * Unfortunately, this article has become a bit of a mess... Mct mht (talk) 21:28, 12 March 2012 (UTC)


 * Yes please take it in hand, if you're willing.  Sławomir Biały  (talk) 02:05, 13 March 2012 (UTC)

Proof at the beginning of the Properties section
I think the proof has some flaws. In fact it says that the property of having $$ 2^n $$ different square root matrices stems from the fact that the matrix is diagonalizable (ie it can be written as $$ P^{-1}DP$$); but it is clearly wrong, as the identity matrix is diagonalizable but, as reported above, it has infinite square roots. Maybe it should be make explicit in the proof that the key property is that every eigenvalue is distinct ? (I didn't check if it would work that way, though.. Anyhow it is wrong as currently stated ) — Preceding unsigned comment added by 80.180.2.4 (talk) 21:15, 11 December 2014 (UTC)
 * I moved your comment to the bottom of the list, as is customary, from the top. It must be midnight in Parabiago. What part of the up-front qualifying statement of having n distinct eigenvalues is not clear? Does any identity matrix have more than one distinct eigenvalue? Cuzkatzimhut (talk) 21:43, 11 December 2014 (UTC)

I am saying that the "proof" only use the fact that the matrix is diagonalizable, nowhere is used the fact that it must have distinct eigenvalues for the proof to work. So there are flaws in the way it is currently presented. The fact that the distinct eigenvalues condition is essential is easily seen thinking about the identity matrix, that has an infinite number of roots, so the argument presented has no meaning. — Preceding unsigned comment added by 80.180.2.4 (talk) 21:46, 12 December 2014 (UTC)


 * The statement "where $D$½ is any square root matrix of $D$, which must be diagonal with diagonal elements equal to square roots of the diagonal elements of  $D$" fails for equal eigenvalues. So, e.g., you may convince yourself that, for 2×2, it holds for diag(1,a), as long as a≠1. Cuzkatzimhut (talk) 01:37, 13 December 2014 (UTC)

I know that! That is what I'm talking about. Dont' you think that it should be explicitly stated in the proof that it does not work for equal eigenvalues, and why exactly is that? That's the whole point :) — Preceding unsigned comment added by 80.180.2.4 (talk) 15:39, 13 December 2014 (UTC)


 * I reiterated the condition, somewhat repetitively, to demarcate the correct statement provided. No, personally, the leading discussion of non-diagonal square roots for the identity suffices. If you wish to provide a pithy, compact footnote, or parenthetical statement, preferably with references, justifying the evident to your satisfaction, you may propose it here, on this page. Why don't you get a WP account? Cuzkatzimhut (talk) 17:09, 13 December 2014 (UTC)

Mine was just a suggestion. I made it because when I read the page the other day I was confused about this, and had to sort it out. I decided to post a comment to signal the problem. I am not a regular contributor to wikipedia I just did it because I thought it was possible to improve the page. — Preceding unsigned comment added by 80.180.2.4 (talk) 16:50, 14 December 2014 (UTC)

I just read the edit you made. Do not take this the wrong way, maybe I have not been very clear, but I think it is still wrong. "This is because such a matrix, $A$, has a decomposition $VDV^{–1}$ where $V$ is the matrix whose columns are eigenvectors of $A$ and $D$ is the diagonal matrix whose diagonal elements are the corresponding $n$ distinct eigenvalues $λ_{i}$. Thus ....  "

This is what bugs me. It is not because such a matrix has a decomposition $VDV^{–1}$ (otherwise one could restate the condition as it is diagonalizable, instead of having distinct eigenvalues) The property of having distinct eigenvalues is something more, a more powerful condition to diagonalizability, necessary to the proof! But this is not specified at the beginning (someone who is not very confident with algebra may think that it is necessary that there are different eigenvalues in order to write the matrix as $VDV^{–1}$ ) and it is not used in the proof (thus confirming to inattentive eyes that indeed the distinct eigenvalues condition is only there to assure that the matrix can be written as $VDV^{–1}$) I hope it's more clear what I mean now! :) — Preceding unsigned comment added by 80.180.2.4 (talk) 17:00, 14 December 2014 (UTC)


 * I am not sure what you want... I disagree that it is wrong. I can't bring myself to imagine diagonalizability of D as confusable with distinct eigenvalues. Maybe you can propose something that is less confusing to you. You want to skip "This is because" and just Provide the correct statement? I cannot second guess how one can get confused. I will be more repetitive, but it is going to get weird, I fear.... Maybe you could propose what you want to see... Cuzkatzimhut (talk) 17:09, 14 December 2014 (UTC) I now moved the "distinct eigenvalues" phrase out of the diagonalizing point, to the central (pun intended) fact that the square root of the diagonal matrix D  with distinct eigenvalues must be diagonal. (If it were not, the diagonalizing tfmation of the square root would have to commute with the diagonal D, which would require non-distinct e'ves. I would argue against blasting footnotes and matrix book references on that, but show me if you are proposing something extra.) In any case, I believe the statement was and is correct. But I am one of may editors... propose how to improve it to your satisfaction, yourself. Cuzkatzimhut (talk) 17:27, 14 December 2014 (UTC)