Matrix sign function

In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.

It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University, which was later published in a journal in 1980.

Definition
The matrix sign function is a generalization of the complex signum function

$$ \operatorname{csgn}(z)= \begin{cases} 1 & \text{if } \mathrm{Re}(z) > 0, \\ -1 & \text{if } \mathrm{Re}(z) < 0, \end{cases} $$

to the matrix valued analogue $$\operatorname{csgn}(A)$$. Although the sign function is not analytic, the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis, see for example the Jordan-form-based definition (where the derivatives are all zero).

Properties
Theorem: Let $$A\in\C^{n\times n}$$, then $$\operatorname{csgn}(A)^2 = I$$.

Theorem: Let $$A\in\C^{n\times n}$$, then $$\operatorname{csgn}(A)$$ is diagonalizable and has eigenvalues that are $$\pm 1$$.

Theorem: Let $$A\in\C^{n\times n}$$, then $$(I+\operatorname{csgn}(A))/2$$ is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane, and analogously for $$(I-\operatorname{csgn}(A))/2$$ and the left-half plane.

Theorem: Let $$A\in\C^{n\times n}$$, and $$A = P\begin{bmatrix}J_+ & 0 \\ 0 & J_-\end{bmatrix}P^{-1}$$ be a Jordan decomposition such that $$J_+$$ corresponds to eigenvalues with positive real part and $$J_-$$ to eigenvalue with negative real part. Then $$\operatorname{csgn}(A) = P\begin{bmatrix}I_+ & 0 \\ 0 & -I_-\end{bmatrix}P^{-1}$$, where $$I_+$$ and $$I_-$$ are identity matrices of sizes corresponding to $$J_+$$ and $$J_-$$, respectively.

Computational methods
The function can be computed with generic methods for matrix functions, but there are also specialized methods.

Newton iteration
The Newton iteration can be derived by observing that $$\operatorname{csgn}(x) = \sqrt{x^2}/x$$, which in terms of matrices can be written as $$\operatorname{csgn}(A) = A^{-1}\sqrt{A^2}$$, where we use the matrix square root. If we apply the Babylonian method to compute the square root of the matrix $$A^2$$, that is, the iteration $X_{k+1} = \frac{1}{2} \left(X_k + A X_k^{-1}\right)$, and define the new iterate $$Z_k = A^{-1}X_k$$, we arrive at the iteration

$$ Z_{k+1} = \frac{1}{2}\left(Z_k + Z_k^{-1}\right) $$,

where typically $$Z_0=A$$. Convergence is global, and locally it is quadratic.

The Newton iteration uses the explicit inverse of the iterates $$Z_k$$.

Newton–Schulz iteration
To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse, $$Z_k^{-1}\approx Z_k\left(2I-Z_k^2\right)$$, derived by Schulz(de) in 1933. Substituting this approximation into the previous method, the new method becomes

$$ Z_{k+1} = \frac{1}{2}Z_k\left(3I - Z_k^2\right) $$.

Convergence is (still) quadratic, but only local (guaranteed for $$\|I-A^2\|<1$$).

Solutions of Sylvester equations
Theorem:  Let $$A,B,C\in\R^{n\times n}$$ and assume that $$A$$ and $$B$$ are stable, then the unique solution to the Sylvester equation, $$AX +XB = C$$, is given by $$X$$ such that

$$ \begin{bmatrix} -I &2X\\ 0 & I \end{bmatrix} = \operatorname{csgn} \left( \begin{bmatrix} A &-C\\ 0 & -B \end{bmatrix} \right). $$

Proof sketch: The result follows from the similarity transform

$$ \begin{bmatrix} A &-C\\ 0 & -B \end{bmatrix} = \begin{bmatrix} I & X \\ 0 & I \end{bmatrix} \begin{bmatrix} A & 0\\ 0 & -B \end{bmatrix} \begin{bmatrix} I & X \\ 0 & I \end{bmatrix}^{-1}, $$

since

$$ \operatorname{csgn} \left( \begin{bmatrix} A &-C\\ 0 & -B \end{bmatrix} \right) = \begin{bmatrix} I & X \\ 0 & I \end{bmatrix} \begin{bmatrix} I & 0\\ 0 & -I \end{bmatrix} \begin{bmatrix} I & -X \\ 0 & I \end{bmatrix}, $$

due to the stability of $$A$$ and $$B$$.

The theorem is, naturally, also applicable to the Lyapunov equation. However, due to the structure the Newton iteration simplifies to only involving inverses of $$A$$ and $$A^T$$.

Solutions of algebraic Riccati equations
There is a similar result applicable to the algebraic Riccati equation, $$A^H P + P A - P F P + Q = 0 $$. Define $$V,W\in\Complex^{2n\times n} $$ as

$$ \begin{bmatrix} V & W \end{bmatrix} = \operatorname{csgn} \left( \begin{bmatrix} A^H &Q\\ F & -A \end{bmatrix} \right) - \begin{bmatrix} I &0\\ 0 & I \end{bmatrix}. $$

Under the assumption that $$F,Q\in\Complex^{n\times n} $$ are Hermitian and there exists a unique stabilizing solution, in the sense that $$A-FP $$ is stable, that solution is given by the over-determined, but consistent, linear system

$$ VP=-W. $$

Proof sketch: The similarity transform

$$ \begin{bmatrix} A^H &Q\\ F & -A \end{bmatrix} = \begin{bmatrix} P & -I \\ I & 0 \end{bmatrix} \begin{bmatrix} (-A-FP) & -F\\ 0 & (A-FP) \end{bmatrix} \begin{bmatrix} P & -I \\ I & 0 \end{bmatrix}^{-1}, $$

and the stability of $$A-FP $$ implies that

$$ \left( \operatorname{csgn} \left( \begin{bmatrix} A^H &Q\\ F & -A \end{bmatrix} \right) - \begin{bmatrix} I &0\\ 0 & I \end{bmatrix} \right) \begin{bmatrix} X & -I \\ I & 0 \end{bmatrix} = \begin{bmatrix} X & -I\\ I & 0 \end{bmatrix} \begin{bmatrix} 0 & Y \\ 0 & -2I \end{bmatrix}, $$

for some matrix $$Y\in\Complex^{n\times n} $$.

Computations of matrix square-root
The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that $$A - PIP=0$$ is a degenerate algebraic Riccati equation and by definition a solution $$P$$ is the square root of $$A$$.