Von Neumann entropy

In physics, the von Neumann entropy, named after John von Neumann, is an extension of the concept of Gibbs entropy from classical statistical mechanics to quantum statistical mechanics. For a quantum-mechanical system described by a density matrix $ρ$, the von Neumann entropy is


 * $$ S = - \operatorname{tr}(\rho \ln \rho),$$

where $$\operatorname{tr}$$ denotes the trace and ln denotes the (natural) matrix logarithm. If the density matrix $ρ$ is written in a basis of its eigenvectors $$|1\rangle, |2\rangle, |3\rangle, \dots$$ as
 * $$ \rho = \sum_j \eta_j \left| j \right\rang \left\lang j \right| ,$$

then the von Neumann entropy is merely
 * $$ S = -\sum_j \eta_j \ln \eta_j .$$

In this form, S can be seen as the information theoretic Shannon entropy.

The von Neumann entropy is also used in different forms (conditional entropies, relative entropies, etc.) in the framework of quantum information theory to characterize the entropy of entanglement.

Background
John von Neumann established a rigorous mathematical framework for quantum mechanics in his 1932 work Mathematical Foundations of Quantum Mechanics. In it, he provided a theory of measurement, where the usual notion of wave-function collapse is described as an irreversible process (the so-called von Neumann or projective measurement).

The density matrix was introduced, with different motivations, by von Neumann and by Lev Landau. The motivation that inspired Landau was the impossibility of describing a subsystem of a composite quantum system by a state vector. On the other hand, von Neumann introduced the density matrix in order to develop both quantum statistical mechanics and a theory of quantum measurements.

The density matrix formalism, thus developed, extended the tools of classical statistical mechanics to the quantum domain. In the classical framework, the probability distribution and partition function of the system allows us to compute all possible thermodynamic quantities. Von Neumann introduced the density matrix to play the same role in the context of quantum states and operators in a complex Hilbert space. The knowledge of the statistical density matrix operator would allow us to compute all average quantum entities in a conceptually similar, but mathematically different, way.

Let us suppose we have a set of wave functions |Ψ〉 that depend parametrically on a set of quantum numbers n1, n2, ..., nN. The natural variable which we have is the amplitude with which a particular wavefunction of the basic set participates in the actual wavefunction of the system. Let us denote the square of this amplitude by p(n1, n2, ..., nN). The goal is to turn this quantity p into the classical density function in phase space. We have to verify that p goes over into the density function in the classical limit, and that it has ergodic properties. After checking that p(n1, n2, ..., nN) is a constant of motion, an ergodic assumption for the probabilities p(n1, n2, ..., nN) makes p a function of the energy only.

After this procedure, one finally arrives at the density matrix formalism when seeking a form where p(n1, n2, ..., nN) is invariant with respect to the representation used. In the form it is written, it will only yield the correct expectation values for quantities which are diagonal with respect to the quantum numbers n1, n2, ..., nN. Expectation values of operators which are not diagonal involve the phases of the quantum amplitudes. Suppose we encode the quantum numbers n1, n2, ..., nN into the single index i or j. Then our wave function has the form


 * $$ \left| \Psi \right\rangle = \sum_i a_i \left| \psi_i \right\rangle . $$

The expectation value of an operator B which is not diagonal in these wave functions, so


 * $$ \left\langle B \right\rangle = \sum_{i,j} a_i^{*}a_j \left\langle i \right| B \left| j \right\rangle .$$

The role which was originally reserved for the quantities $$ \left| a_i \right| ^2$$ is thus taken over by the density matrix of the system S.


 * $$ \left\langle j \right| \rho \left| i \right\rangle = a_j a_i^{*} .$$

Therefore, 〈B〉 reads
 * $$ \left\langle B \right\rangle = \operatorname{tr} (\rho B) .$$

The invariance of the above term is described by matrix theory. The trace is invariant under cyclic permutations, and both the matrices $ρ$ and B can be transformed into whatever basis is convenient, typically a basis of the eigenvectors. By cyclic permutations of the matrix product, it can be seen that an identity matrix will arise and so the trace will not be affected by the change in basis. A mathematical framework was described where the expectation value of quantum operators, as described by matrices, is obtained by taking the trace of the product of the density operator $$\hat{\rho}$$ and an operator  $$\hat{B}$$   (Hilbert scalar product between operators). The matrix formalism here is in the statistical mechanics framework, although it applies as well for finite quantum systems, which is usually the case, where the state of the system cannot be described by a pure state, but as a statistical operator $$\hat{\rho}$$ of the above form. Mathematically, $$\hat{\rho}$$ is a positive-semidefinite Hermitian matrix with unit trace.

Definition
Given the density matrix ρ, von Neumann defined the entropy as


 * $$S(\rho) = -\operatorname{tr} (\rho \ln \rho),$$

which is a proper extension of the Gibbs entropy (up to a factor $k_{B}$) and the Shannon entropy to the quantum case. To compute S(ρ) it is convenient (see logarithm of a matrix) to compute the eigendecomposition of  $$~\rho = \sum_j \eta_j \left| j \right\rangle \left\langle j \right| $$. The von Neumann entropy is then given by


 * $$S(\rho) = - \sum_j \eta_j \ln \eta_j .$$

Properties
Some properties of the von Neumann entropy:


 * $S(ρ)$ is zero if and only if $ρ$ represents a pure state.
 * $S(ρ)$ is maximal and equal to $$\ln N$$ for a maximally mixed state, $N$  being the dimension of the Hilbert space.
 * $S(ρ)$ is invariant under changes in the basis of $ρ$, that is, $S(ρ) = S(UρU^{†})$, with $U$ a unitary transformation.
 * $S(ρ)$ is concave, that is, given a collection of positive numbers $λ_{i}$ which sum to unity ($$\Sigma_i \lambda_i = 1$$) and density operators $ρ_{i}$, we have
 * $$ S\bigg(\sum_{i=1}^k \lambda_i \rho_i \bigg) \geq \sum_{i=1}^k \lambda_i S(\rho_i). $$


 * $S(ρ)$ satisfies the bound
 * $$ S\bigg(\sum_{i=1}^k \lambda_i \rho_i \bigg) \leq \sum_{i=1}^k \lambda_i S(\rho_i) - \sum_{i=1}^k \lambda_i \log \lambda_i. $$
 * where equality is achieved if the $ρ_{i}$ have orthogonal support, and as before $ρ_{i}$ are density operators and $λ_{i}$ is a collection of positive numbers which sum to unity ($$\Sigma_i \lambda_i = 1$$)


 * $S(ρ)$ is additive for independent systems. Given two density matrices  $ρ_{A}, ρ_{B}$ describing independent systems A and B, we have
 * $$S(\rho_A \otimes \rho_B)=S(\rho_A)+S(\rho_B)$$.


 * $S(ρ)$ is strongly subadditive for any three systems A, B, and C:
 * $$S(\rho_{ABC}) + S(\rho_{B}) \leq S(\rho_{AB}) + S(\rho_{BC}).$$
 * This automatically means that $S(ρ)$ is subadditive:
 * $$S(\rho_{AC}) \leq S(\rho_{A}) +S(\rho_{C}).$$

Below, the concept of subadditivity is discussed, followed by its generalization to strong subadditivity.

Subadditivity
If $ρ_{A}, ρ_{B}$ are the reduced density matrices of the general state $ρ_{AB}$, then
 * $$ \left| S(\rho_A) - S(\rho_B) \right| \leq S(\rho_{AB}) \leq S(\rho_A) + S(\rho_B) . $$

This right hand inequality is known as subadditivity. The two inequalities together are sometimes known as the triangle inequality. They were proved in 1970 by Huzihiro Araki and Elliott H. Lieb. While in Shannon's theory the entropy of a composite system can never be lower than the entropy of any of its parts, in quantum theory this is not the case, i.e., it is possible that $S(ρ_{AB}) = 0$, while $S(ρ_{A}) = S(ρ_{B}) > 0$.

Intuitively, this can be understood as follows: In quantum mechanics, the entropy of the joint system can be less than the sum of the entropy of its components because the components may be entangled. For instance, as seen explicitly, the Bell state of two spin-½s,
 * $$ \left| \psi \right\rangle = \left| \uparrow \downarrow \right\rangle + \left| \downarrow \uparrow \right\rangle ,$$

is a pure state with zero entropy, but each spin has maximum entropy when considered individually in its reduced density matrix. The entropy in one spin can be "cancelled" by being correlated with the entropy of the other. The left-hand inequality can be roughly interpreted as saying that entropy can only be cancelled by an equal amount of entropy.

If system $A$ and system $B$ have different amounts of entropy, the smaller can only partially cancel the greater, and some entropy must be left over. Likewise, the right-hand inequality can be interpreted as saying that the entropy of a composite system is maximized when its components are uncorrelated, in which case the total entropy is just a sum of the sub-entropies. This may be more intuitive in the phase space formulation, instead of Hilbert space one,  where the Von Neumann entropy amounts to minus the expected value of the ★ -logarithm of the Wigner function,  $−∫ f ★ log_{_{★} }f dx dp$,  up to an offset shift. Up to this normalization offset shift, the entropy is majorized by that of its classical limit.

Strong subadditivity
The von Neumann entropy is also strongly subadditive. Given three Hilbert spaces, A, B, C,


 * $$S(\rho_{ABC}) + S(\rho_{B}) \leq S(\rho_{AB}) + S(\rho_{BC}).$$

This is a more difficult theorem and was proved first by J. Kiefer in 1959   and independently by Elliott H. Lieb and Mary Beth Ruskai in 1973, using a matrix inequality of Elliott H. Lieb proved in 1973. By using the proof technique that establishes the left side of the triangle inequality above, one can show that the strong subadditivity inequality is equivalent to the following inequality.


 * $$S(\rho_{A}) + S(\rho_{C}) \leq S(\rho_{AB}) + S(\rho_{BC})$$

when $ρ_{AB}$, etc. are the reduced density matrices of a density matrix $ρ_{ABC}$. If we apply ordinary subadditivity to the left side of this inequality, and consider all permutations of A, B, C, we obtain the triangle inequality for $ρ_{ABC}$: Each of the three numbers $S(ρ_{AB}), S(ρ_{BC}), S(ρ_{AC})$ is less than or equal to the sum of the other two.

Canonical ensemble
Theorem. The canonical distribution is the unique maximum of the Helmholtz free entropy $f[\hat\rho]$, which has the solution$$\hat\rho(\beta) = \sum_i e^{-\beta E_i}| i \rangle \langle i | = e^{-\beta \hat H}$$in the eigenbasis of the Hamiltonian operator $\hat H$. This state has free entropy$$f = \ln Z$$where $Z = \sum_i e^{-\beta E_i} = Tr(\hat\rho)$ is the partition function.

Equivalently, the canonical distribution is the unique maximum of entropy under constraint:$$\begin{cases} \max S[\hat \rho] \\ \langle H\rangle = E \end{cases}$$

Coarse graining
Since, for a pure state, the density matrix is idempotent, ρ = ρ2, the entropy S(ρ) for it vanishes. Thus, if the system is finite (finite-dimensional matrix representation), the entropy S(ρ) quantifies the departure of the system from a pure state. In other words, it codifies the degree of mixing of the state describing a given finite system.

Measurement decoheres a quantum system into something noninterfering and ostensibly classical; so, e.g., the vanishing entropy of a pure state $$\Psi = ( \left| 0 \right\rangle + \left| 1 \right\rangle ) / \sqrt{2}$$, corresponding to a density matrix
 * $$\rho = {1\over 2} \begin{pmatrix}

1 & 1 \\ 1 & 1 \end{pmatrix}  $$ increases to $S = \ln 2 \approx 0.69$ for the measurement outcome mixture
 * $$\rho = {1\over 2} \begin{pmatrix}

1 & 0 \\ 0 & 1 \end{pmatrix}  $$ as the quantum interference information is erased.

However, if the measuring device is also quantum mechanical, and it starts at a pure state as well, then the joint system of device-system is just a larger quantum system. Since it starts at a pure state, it ends up with a pure state as well, and so the von Neumann entropy never increases. The problem can be resolved by using the idea of coarse graining.

Concretely, let the system be a qubit, and let the measuring device be another qubit. The measuring device starts at the $$\left| 0 \right\rangle $$ state. The measurement process is a CNOT gate, so that we have $$\left| 0 \right\rangle \left| 0 \right\rangle \mapsto \left| 0 \right\rangle\left| 0 \right\rangle $$, $$\left| 1 \right\rangle \left| 0 \right\rangle \mapsto \left| 1 \right\rangle\left| 1 \right\rangle $$. That is, if the system starts at the pure 1 state, then after measuring, the measurement device is also at the pure 1 state.

Now if the system starts at the $$\Psi = ( \left| 0 \right\rangle + \left| 1 \right\rangle ) / \sqrt{2}$$ state, then after measurement, the joint system is in the Bell state $$( \left| 0 \right\rangle \left| 0 \right\rangle + \left| 1 \right\rangle \left| 1 \right\rangle) / \sqrt{2}$$. The vN entropy of the joint system is still 0, since it's still a pure state. However, if we coarse grain the system by measuring the vN entropy of just the device, then just the qubit, then add them together, we get $$2\ln 2$$.

By subadditivity, $$S(\rho_{AB}) \leq S(\rho_{A}) + S(\rho_{B}) $$, that is, any way to coarse-grain the entire system into parts would equal or increase vN entropy.