Schur–Horn theorem

In mathematics, particularly linear algebra, the Schur–Horn theorem, named after Issai Schur and Alfred Horn, characterizes the diagonal of a Hermitian matrix with given eigenvalues. It has inspired investigations and substantial generalizations in the setting of symplectic geometry. A few important generalizations are Kostant's convexity theorem, Atiyah–Guillemin–Sternberg convexity theorem, Kirwan convexity theorem.

Statement
The inequalities above may alternatively be written: $$\begin{alignat}{7} d_1 &\;\leq\;&& \lambda_1 \\[0.3ex] d_2 + d_1 &\;\leq&& \lambda_1 + \lambda_2 \\[0.3ex] \vdots &\;\leq&& \vdots \\[0.3ex] d_{N-1} + \cdots + d_2 + d_1 &\;\leq&& \lambda_1 + \lambda_2 + \cdots + \lambda_{N-1} \\[0.3ex] d_N + d_{N-1} + \cdots + d_2 + d_1 &\;=&& \lambda_1 + \lambda_2 + \cdots + \lambda_{N-1} + \lambda_N. \\[0.3ex] \end{alignat}$$

The Schur–Horn theorem may thus be restated more succinctly and in plain English:


 * Schur–Horn theorem: Given any non-increasing real sequences of desired diagonal elements $$d_1 \geq \cdots \geq d_N$$ and desired eigenvalues $$\lambda_1 \geq \cdots \geq \lambda_N,$$ there exists a Hermitian matrix with these eigenvalues and diagonal elements if and only if these two sequences have the same sum and for every possible integer $$n,$$ the sum of the first $n$ desired diagonal elements never exceeds the sum of the first $$n$$ desired eigenvalues.

Reformation allowing unordered diagonals and eigenvalues
Although this theorem requires that $$d_1 \geq \cdots \geq d_N$$ and $$\lambda_1 \geq \cdots \geq \lambda_N$$ be non-increasing, it is possible to reformulate this theorem without these assumptions.

We start with the assumption $$\lambda_1 \geq \cdots \geq \lambda_N.$$ The left hand side of the theorem's characterization (that is, "there exists a Hermitian matrix with these eigenvalues and diagonal elements") depends on the order of the desired diagonal elements $$d_1, \dots, d_N$$ (because changing their order would change the Hermitian matrix whose existence is in question) but it does depend on the order of the desired eigenvalues $$\lambda_1, \dots, \lambda_N.$$

On the right hand right hand side of the characterization, only the values of $$\lambda_1 + \cdots + \lambda_n$$ depend on the assumption $$\lambda_1 \geq \cdots \geq \lambda_N.$$ Notice that this assumption means that the expression $$\lambda_1 + \cdots + \lambda_n$$ is just notation for the sum of the $$n$$ largest desired eigenvalues. Replacing the expression $$\lambda_1 + \cdots + \lambda_n$$ with this written equivalent makes the assumption $$\lambda_1 \geq \cdots \geq \lambda_N$$ completely unnecessary:


 * Schur–Horn theorem: Given any $$N$$ desired real eigenvalues and a non-increasing real sequence of desired diagonal elements $$d_1 \geq \cdots \geq d_N,$$ there exists a Hermitian matrix with these eigenvalues and diagonal elements if and only if these two sequences have the same sum and for every possible integer $$n,$$ the sum of the first $$n$$ desired diagonal elements never exceeds the sum of the $$n$$ desired eigenvalues.

Permutation polytope generated by a vector
The permutation polytope generated by $$\tilde{x} = (x_1, x_2,\ldots, x_n) \in \Reals^n$$ denoted by $$\mathcal{K}_{\tilde{x}}$$ is defined as the convex hull of the set $$\{(x_{\pi(1)},x_{\pi(2)},\ldots,x_{\pi(n)}) \in \Reals^n : \pi \in S_n\}.$$ Here $$S_n$$ denotes the symmetric group on $$\{1, 2, \ldots, n\}.$$ In other words, the permutation polytope generated by $$(x_1, \dots, x_n)$$ is the convex hull of the set of all points in $$\Reals^n$$ that can be obtained by rearranging the coordinates of $$(x_1, \dots, x_n).$$ The permutation polytope of $$(1, 1, 2),$$ for instance, is the convex hull of the set $$\{(1, 1, 2), (1, 2, 1), (2, 1, 1)\},$$ which in this case is the solid (filled) triangle whose vertices are the three points in this set. Notice, in particular, that rearranging the coordinates of $$(x_1, \dots, x_n)$$ does not change the resulting permutation polytope; in other words, if a point $$\tilde{y}$$ can be obtained from $$\tilde{x} = (x_1, \dots, x_n)$$ by rearranging its coordinates, then $$\mathcal{K}_{\tilde{y}} = \mathcal{K}_{\tilde{x}}.$$

The following lemma characterizes the permutation polytope of a vector in $$\Reals^n.$$

Reformulation of Schur–Horn theorem
In view of the equivalence of (i) and (ii) in the lemma mentioned above, one may reformulate the theorem in the following manner.

Theorem. Let $$d_1, \dots, d_N$$ and $$\lambda_1, \dots, \lambda_N$$ be real numbers. There is a Hermitian matrix with diagonal entries $$d_1, \dots, d_N$$ and eigenvalues $$\lambda_1, \dots, \lambda_N$$ if and only if the vector $$(d_1, \ldots, d_n)$$ is in the permutation polytope generated by $$(\lambda_1, \ldots, \lambda_n).$$

Note that in this formulation, one does not need to impose any ordering on the entries of the vectors $$d_1, \dots, d_N$$ and $$\lambda_1, \dots, \lambda_N.$$

Proof of the Schur–Horn theorem
Let $$A = (a_{jk})$$ be a $$n \times n$$ Hermitian matrix with eigenvalues $$\{\lambda_i\}_{i=1}^n,$$ counted with multiplicity. Denote the diagonal of $$A$$ by $$\tilde{a},$$ thought of as a vector in $$\Reals^n,$$ and the vector $$(\lambda_1, \lambda_2, \ldots, \lambda_n)$$ by $$\tilde{\lambda}.$$ Let $$\Lambda$$ be the diagonal matrix having $$\lambda_1, \lambda_2, \ldots, \lambda_n$$ on its diagonal.

($$\Rightarrow$$) $$A$$ may be written in the form $$U \Lambda U^{-1},$$ where $$U$$ is a unitary matrix. Then $$a_{ii} = \sum_{j=1}^n \lambda_j |u_{ij}|^2, \; i = 1, 2, \ldots, n.$$

Let $$S = (s_{ij})$$ be the matrix defined by $$s_{ij} = |u_{ij}|^2.$$ Since $$U$$ is a unitary matrix, $$S$$ is a doubly stochastic matrix and we have $$\tilde{a} = S\tilde{\lambda}.$$ By the Birkhoff–von Neumann theorem, $$S$$ can be written as a convex combination of permutation matrices. Thus $$\tilde{a}$$ is in the permutation polytope generated by $$\tilde{\lambda}.$$ This proves Schur's theorem.

($$\Leftarrow$$) If $$\tilde{a}$$ occurs as the diagonal of a Hermitian matrix with eigenvalues $$\{\lambda_i\}_{i=1}^n,$$ then $$t\tilde{a} + (1-t)\tau(\tilde{a})$$ also occurs as the diagonal of some Hermitian matrix with the same set of eigenvalues, for any transposition $$\tau$$ in $$S_n.$$ One may prove that in the following manner.

Let $$\xi$$ be a complex number of modulus $$1$$ such that $$\overline{\xi a_{jk}} = - \xi a_{jk}$$ and $$U$$ be a unitary matrix with $$\xi \sqrt{t}, \sqrt{t}$$ in the $$j, j$$ and $$k, k$$ entries, respectively, $$-\sqrt{1-t}, \xi \sqrt{1-t}$$ at the $$j, k$$ and $$k, j$$ entries, respectively, $$1$$ at all diagonal entries other than $$j, j$$ and $$k, k,$$ and $$0$$ at all other entries. Then $$UAU^{-1}$$ has $$ta_{jj} + (1-t)a_{kk}$$ at the $$j, j$$ entry, $$(1-t)a_{jj} + ta_{kk}$$ at the $$k,k$$ entry, and $$a_{ll}$$ at the $$l,l$$ entry where $$l \neq j, k.$$ Let $$\tau$$ be the transposition of $$\{1, 2, \dots, n\}$$ that interchanges $$j$$ and $$k.$$

Then the diagonal of $$UAU^{-1}$$ is $$t\tilde{a} + (1-t)\tau(\tilde{a}).$$

$$\Lambda$$ is a Hermitian matrix with eigenvalues $$\{ \lambda_i \}_{i=1}^n.$$ Using the equivalence of (i) and (iii) in the lemma mentioned above, we see that any vector in the permutation polytope generated by $$(\lambda_1, \lambda_2, \ldots, \lambda_n),$$ occurs as the diagonal of a Hermitian matrix with the prescribed eigenvalues. This proves Horn's theorem.

Symplectic geometry perspective
The Schur–Horn theorem may be viewed as a corollary of the Atiyah–Guillemin–Sternberg convexity theorem in the following manner. Let $$\mathcal{U}(n)$$ denote the group of $$n \times n$$ unitary matrices. Its Lie algebra, denoted by $$\mathfrak{u}(n),$$ is the set of skew-Hermitian matrices. One may identify the dual space $$\mathfrak{u}(n)^*$$ with the set of Hermitian matrices $$\mathcal{H}(n)$$ via the linear isomorphism $$\Psi : \mathcal{H}(n) \rightarrow \mathfrak{u}(n)^*$$ defined by $$\Psi(A)(B) = \mathrm{tr}(iAB)$$ for $$A \in \mathcal{H}(n), B \in \mathfrak{u}(n).$$ The unitary group $$\mathcal{U}(n)$$ acts on $$\mathcal{H}(n)$$ by conjugation and acts on $$\mathfrak{u}(n)^*$$ by the coadjoint action. Under these actions, $$\Psi$$ is an $$\mathcal{U}(n)$$-equivariant map i.e. for every $$U \in \mathcal{U}(n)$$ the following diagram commutes,



Let $$\tilde{\lambda} = (\lambda_1, \lambda_2, \ldots, \lambda_n) \in \Reals^n$$ and $$\Lambda \in \mathcal{H}(n)$$ denote the diagonal matrix with entries given by $$\tilde{\lambda}.$$ Let $$\mathcal{O}_{\tilde{\lambda}}$$ denote the orbit of $$\Lambda$$ under the $$\mathcal{U}(n)$$-action i.e. conjugation. Under the $$\mathcal{U}(n)$$-equivariant isomorphism $$\Psi,$$ the symplectic structure on the corresponding coadjoint orbit may be brought onto $$\mathcal{O}_{\tilde{\lambda}}.$$ Thus $$\mathcal{O}_{\tilde{\lambda}}$$ is a Hamiltonian $$\mathcal{U}(n)$$-manifold.

Let $$\mathbb{T}$$ denote the Cartan subgroup of $$\mathcal{U}(n)$$ which consists of diagonal complex matrices with diagonal entries of modulus $$1.$$ The Lie algebra $$\mathfrak{t}$$ of $$\mathbb{T}$$ consists of diagonal skew-Hermitian matrices and the dual space $$\mathfrak{t}^*$$ consists of diagonal Hermitian matrices, under the isomorphism $$\Psi.$$ In other words, $$\mathfrak{t}$$ consists of diagonal matrices with purely imaginary entries and $$\mathfrak{t}^*$$ consists of diagonal matrices with real entries. The inclusion map $$\mathfrak{t} \hookrightarrow \mathfrak{u}(n)$$ induces a map $$\Phi : \mathcal{H}(n) \cong \mathfrak{u}(n)^* \rightarrow \mathfrak{t}^*,$$ which projects a matrix $$A$$ to the diagonal matrix with the same diagonal entries as $$A.$$ The set $$\mathcal{O}_{\tilde{\lambda}}$$ is a Hamiltonian $$\mathbb{T}$$-manifold, and the restriction of $$\Phi$$ to this set is a moment map for this action.

By the Atiyah–Guillemin–Sternberg theorem, $$\Phi(\mathcal{O}_{\tilde{\lambda}})$$ is a convex polytope. A matrix $$A \in \mathcal{H}(n)$$ is fixed under conjugation by every element of $$\mathbb{T}$$ if and only if $$A$$ is diagonal. The only diagonal matrices in $$\mathcal{O}_{\tilde{\lambda}}$$ are the ones with diagonal entries $$\lambda_1, \lambda_2, \ldots, \lambda_n$$ in some order. Thus, these matrices generate the convex polytope $$\Phi(\mathcal{O}_{\tilde{\lambda}}).$$ This is exactly the statement of the Schur–Horn theorem.