Holevo's theorem

Holevo's theorem is an important limitative theorem in quantum computing, an interdisciplinary field of physics and computer science. It is sometimes called Holevo's bound, since it establishes an upper bound to the amount of information that can be known about a quantum state (accessible information). It was published by Alexander Holevo in 1973.

Statement of the theorem
Suppose Alice wants to send a classical message to Bob by encoding it into a quantum state, and suppose she can prepare a state from some fixed set $$\{\rho_1,...,\rho_n\}$$, with the i-th state prepared with probability $$p_i$$. Let $$X$$ be the classical register containing the choice of state made by Alice. Bob's objective is to recover the value of $$X$$ from measurement results on the state he received. Let $$Y$$ be the classical register containing Bob's measurement outcome. Note that $$Y$$ is therefore a random variable whose probability distribution depends on Bob's choice of measurement.

Holevo's theorem bounds the amount of correlation between the classical registers $$X$$ and $$Y$$, regardless of Bob's measurement choice, in terms of the Holevo information. This is useful in practice because the Holevo information does not depend on the measurement choice, and therefore its computation does not require performing an optimization over the possible measurements.

More precisely, define the accessible information between $$X$$ and $$Y$$ as the (classical) mutual information between the two registers maximized over all possible choices of measurements on Bob's side:$$I_{\rm acc}(X:Y) = \sup_{\{\Pi^B_i\}_i } I(X:Y|\{\Pi^B_i\}_i),$$where $$I(X:Y|\{\Pi^B_i\}_i)$$ is the (classical) mutual information of the joint probability distribution given by $$p_{ij} = p_i \operatorname{Tr}(\Pi^B_j \rho_i)$$. There is currently no known formula to analytically solve the optimization in the definition of accessible information in the general case. Nonetheless, we always have the upper bound:$$I_{\rm acc} (X : Y) \leq \chi(\eta) \equiv S\left(\sum_i p_i \rho_i\right) - \sum_i p_i S(\rho_i),$$where $$\eta\equiv\{(p_i,\rho_i)\}_i$$ is the ensemble of states Alice is using to send information, and $$S$$ is the von Neumann entropy. This $$\chi(\eta)$$ is called the Holevo information or Holevo χ quantity.

Note that the Holevo information also equals the quantum mutual information of the classical-quantum state corresponding to the ensemble:$$\chi(\eta) = I\left(\sum_i p_i |i\rangle\!\langle i|\otimes \rho_i\right),$$with $$I(\rho_{AB}) \equiv S(\rho_A)+S(\rho_B) - S(\rho_{AB})$$ the quantum mutual information of the bipartite state $$\rho_{AB}$$. It follows that Holevo's theorem can be concisely summarized as a bound on the accessible information in terms of the quantum mutual information for classical-quantum states.

Proof
Consider the composite system that describes the entire communication process, which involves Alice's classical input $$X$$, the quantum system $$Q$$, and Bob's classical output $$Y$$. The classical input $$X$$ can be written as a classical register $$\rho^X := \sum\nolimits_{x=1}^n p_x |x\rangle \langle x|$$ with respect to some orthonormal basis $$\{|x\rangle\}_{x=1}^n$$. By writing $$X$$ in this manner, the von Neumann entropy $$S(X)$$ of the state $$\rho^X$$ corresponds to the Shannon entropy $$H(X)$$ of the probability distribution $$\{p_x\}_{x=1}^n$$:

S(X) = -\operatorname{tr}\left(\rho^X \log \rho^X \right) = -\operatorname{tr}\left(\sum_{x=1}^n p_x \log p_x  |x\rangle\langle x|\right) = -\sum_{x=1}^n p_x \log p_x = H(X). $$

The initial state of the system, where Alice prepares the state $$\rho_x$$ with probability $$p_x$$, is described by
 * $$\rho^{XQ} := \sum_{x=1}^n p_x |x\rangle \langle x|\otimes\rho_x.$$

Afterwards, Alice sends the quantum state to Bob. As Bob only has access to the quantum system $$Q$$ but not the input $$X$$, he receives a mixed state of the form $$\rho := \operatorname{tr}_X\left(\rho^{XQ}\right) = \sum\nolimits_{x=1}^n p_x \rho_x$$. Bob measures this state with respect to the POVM elements $$\{E_y\}_{y=1}^m$$, and the probabilities $$\{q_y\}_{y=1}^m$$ of measuring the outcomes $$y=1,2,\dots,m$$ form the classical output $$Y$$. This measurement process can be described as a quantum instrument
 * $$\mathcal{E}^{Q}(\rho_x) = \sum_{y=1}^m q_{y|x} \rho_{y|x} \otimes |y\rangle \langle y|,$$

where $$q_{y|x} = \operatorname{tr}\left(E_y\rho_x\right)$$ is the probability of outcome $$y$$ given the state $$\rho_x$$, while $$\rho_{y|x} = W\sqrt{E_y}\rho_x\sqrt{E_y}W^\dagger/q_{y|x}$$ for some unitary $$W$$ is the normalised post-measurement state. Then, the state of the entire system after the measurement process is
 * $$\rho^{XQ'Y} := \left[\mathcal{I}^{X}\otimes\mathcal{E}^{Q}\right]\!\left(\rho^{XQ}\right) = \sum_{x=1}^n\sum_{y=1}^m p_x q_{y|x} |x\rangle \langle x|\otimes\rho_{y|x}\otimes |y\rangle \langle y|.$$

Here $$\mathcal{I}^X$$ is the identity channel on the system $$X$$. Since $$\mathcal{E}^Q$$ is a quantum channel, and the quantum mutual information is monotonic under completely positive trace-preserving maps, $$S(X:Q'Y) \leq S(X:Q)$$. Additionally, as the partial trace over $$Q'$$ is also completely positive and trace-preserving, $$S(X:Y) \leq S(X:Q'Y)$$. These two inequalities give
 * $$S(X:Y) \leq S(X:Q).$$

On the left-hand side, the quantities of interest depend only on
 * $$\rho^{XY} := \operatorname{tr}_{Q'}\left(\rho^{XQ'Y}\right) = \sum_{x=1}^n\sum_{y=1}^m p_x q_{y|x} |x\rangle \langle x|\otimes |y\rangle \langle y|

= \sum_{x=1}^n\sum_{y=1}^m p_{x,y} |x,y\rangle \langle x,y|,$$ with joint probabilities $$p_{x,y}=p_x q_{y|x}$$. Clearly, $$\rho^{XY}$$ and $$\rho^Y := \operatorname{tr}_X(\rho^{XY})$$, which are in the same form as $$\rho^X$$, describe classical registers. Hence,
 * $$S(X:Y) = S(X)+S(Y)-S(XY) = H(X)+H(Y)-H(XY) = I(X:Y).$$

Meanwhile, $$S(X:Q)$$ depends on the term
 * $$\log \rho^{XQ} = \log\left(\sum_{x=1}^n p_x |x\rangle \langle x|\otimes\rho_x\right)

= \sum_{x=1}^n |x\rangle \langle x| \otimes \log\left(p_x\rho_x\right) = \sum_{x=1}^n \log p_x |x\rangle \langle x| \otimes I^Q + \sum_{x=1}^n |x\rangle \langle x| \otimes \log\rho_x,$$ where $$I^Q$$ is the identity operator on the quantum system $$Q$$. Then, the right-hand side is
 * $$\begin{aligned}

S(X:Q) &= S(X)+S(Q)-S(XQ) \\ &= S(X) + S(\rho) + \operatorname{tr}\left(\rho^{XQ}\log\rho^{XQ}\right) \\ &= S(X) + S(\rho) + \operatorname{tr}\left(\sum_{x=1}^n p_x\log p_x |x\rangle \langle x| \otimes \rho_x\right) + \operatorname{tr}\left(\sum_{x=1}^n p_x|x\rangle \langle x| \otimes \rho_x\log\rho_x\right)\\ &= S(X) + S(\rho) + \underbrace{\operatorname{tr}\left(\sum_{x=1}^n p_x\log p_x |x\rangle \langle x|\right)}_{-S(X)} + \operatorname{tr}\left(\sum_{x=1}^n p_x \rho_x\log\rho_x\right)\\ &= S(\rho) + \sum_{x=1}^n p_x \underbrace{\operatorname{tr}\left(\rho_x\log\rho_x\right)}_{-S(\rho_x)} \\ &= S(\rho) - \sum_{x=1}^n p_x S(\rho_x), \end{aligned}$$ which completes the proof.

Comments and remarks
In essence, the Holevo bound proves that given n qubits, although they can "carry" a larger amount of (classical) information (thanks to quantum superposition), the amount of classical information that can be retrieved, i.e. accessed, can be only up to n classical (non-quantum encoded) bits. It was also established, both theoretically and experimentally, that there are computations where quantum bits carry more information through the process of the computation than is possible classically.