Classical capacity

In quantum information theory, the classical capacity of a quantum channel is the maximum rate at which classical data can be sent over it error-free in the limit of many uses of the channel. Holevo, Schumacher, and Westmoreland proved the following least upper bound on the classical capacity of any quantum channel $$\mathcal{N}$$:



\chi(\mathcal{N}) = \max_{\rho^{XA}} I(X;B)_{\mathcal{N}(\rho)} $$

where $$\rho^{XA}$$ is a classical-quantum state of the following form:

\rho^{XA} = \sum_x p_X(x) \vert x \rangle \langle x \vert^X \otimes \rho_x^A , $$ $$p_X(x)$$ is a probability distribution, and each $$\rho_x^A$$ is a density operator that can be input to the channel $$\mathcal{N}$$.

Achievability using sequential decoding
We briefly review the HSW coding theorem (the statement of the achievability of the Holevo information rate $$I(X;B)$$ for communicating classical data over a quantum channel). We first review the minimal amount of quantum mechanics needed for the theorem. We then cover quantum typicality, and finally we prove the theorem using a recent sequential decoding technique.

Review of quantum mechanics
In order to prove the HSW coding theorem, we really just need a few basic things from quantum mechanics. First, a quantum state is a unit trace, positive operator known as a density operator. Usually, we denote it by $$\rho$$, $$\sigma$$, $$\omega$$, etc. The simplest model for a quantum channel is known as a classical-quantum channel: $$ x\mapsto \rho_{x}. $$ The meaning of the above notation is that inputting the classical letter $$x$$ at the transmitting end leads to a quantum state $$\rho_{x}$$ at the receiving end. It is the task of the receiver to perform a measurement to determine the input of the sender. If it is true that the states $$\rho_{x}$$ are perfectly distinguishable from one another (i.e., if they have orthogonal supports such that $$\mathrm{Tr}\,\left\{ \rho_{x}\rho_{x^{\prime}}\right\}  =0$$ for $$x\neq x^{\prime} $$), then the channel is a noiseless channel. We are interested in situations for which this is not the case. If it is true that the states $$\rho_{x}$$ all commute with one another, then this is effectively identical to the situation for a classical channel, so we are also not interested in these situations. So, the situation in which we are interested is that in which the states $$\rho_{x}$$ have overlapping support and are non-commutative.

The most general way to describe a quantum measurement is with a positive operator-valued measure (POVM). We usually denote the elements of a POVM as $$\left\{ \Lambda_{m}\right\}  _{m}$$. These operators should satisfy positivity and completeness in order to form a valid POVM:

\Lambda_{m}  \geq0\ \ \ \ \forall m$$
 * $$\sum_{m}\Lambda_{m} =I.

$$ The probabilistic interpretation of quantum mechanics states that if someone measures a quantum state $$\rho$$ using a measurement device corresponding to the POVM $$\left\{ \Lambda_{m}\right\}  $$, then the probability $$p\left( m\right) $$ for obtaining outcome $$m$$ is equal to

p\left( m\right)  =\text{Tr}\left\{  \Lambda_{m}\rho\right\}  , $$ and the post-measurement state is

\rho_{m}^{\prime}=\frac{1}{p\left( m\right)  }\sqrt{\Lambda_{m}}\rho \sqrt{\Lambda_{m}}, $$ if the person measuring obtains outcome $$m$$. These rules are sufficient for us to consider classical communication schemes over cq channels.

Quantum typicality
The reader can find a good review of this topic in the article about the typical subspace.

Gentle operator lemma
The following lemma is important for our proofs. It demonstrates that a measurement that succeeds with high probability on average does not disturb the state too much on average:

Lemma: [Winter] Given an ensemble $$\left\{ p_{X}\left(  x\right)  ,\rho_{x}\right\}  $$ with expected density operator $$\rho\equiv\sum_{x}p_{X}\left( x\right)  \rho_{x}$$, suppose that an operator $$\Lambda$$ such that $$I\geq\Lambda\geq0$$ succeeds with high probability on the state $$\rho$$: $$ \text{Tr}\left\{ \Lambda\rho\right\}  \geq1-\epsilon. $$ Then the subnormalized state $$\sqrt{\Lambda}\rho_{x}\sqrt{\Lambda}$$ is close in expected trace distance to the original state $$\rho_{x}$$: $$ \mathbb{E}_{X}\left\{ \left\Vert \sqrt{\Lambda}\rho_{X}\sqrt{\Lambda} -\rho_{X}\right\Vert _{1}\right\} \leq2\sqrt{\epsilon}. $$ (Note that $$\left\Vert A\right\Vert _{1}$$ is the nuclear norm of the operator $$A$$ so that $$\left\Vert A\right\Vert _{1}\equiv$$Tr$$\left\{ \sqrt{A^{\dagger} A}\right\}  $$.)

The following inequality is useful for us as well. It holds for any operators $$\rho$$, $$\sigma$$, $$\Lambda$$ such that $$0\leq\rho,\sigma,\Lambda\leq I$$: The quantum information-theoretic interpretation of the above inequality is that the probability of obtaining outcome $$\Lambda$$ from a quantum measurement acting on the state $$\rho$$ is upper bounded by the probability of obtaining outcome $$\Lambda$$ on the state $$\sigma$$ summed with the distinguishability of the two states $$\rho$$ and $$\sigma$$.

Non-commutative union bound
Lemma: [Sen's bound] The following bound holds for a subnormalized state $$\sigma$$ such that $$0\leq\sigma$$ and $$Tr\left\{ \sigma\right\}  \leq1$$ with $$\Pi_{1}$$, ... , $$\Pi_{N}$$ being projectors: $$ \text{Tr}\left\{ \sigma\right\}  -\text{Tr}\left\{  \Pi_{N}\cdots\Pi _{1}\ \sigma\ \Pi_{1}\cdots\Pi_{N}\right\} \leq2\sqrt{\sum_{i=1}^{N} \text{Tr}\left\{ \left(  I-\Pi_{i}\right)  \sigma\right\}  }, $$

We can think of Sen's bound as a "non-commutative union bound" because it is analogous to the following union bound from probability theory: $$ \Pr\left\{ \left(  A_{1}\cap\cdots\cap A_{N}\right)  ^{c}\right\} =\Pr\left\{ A_{1}^{c}\cup\cdots\cup A_{N}^{c}\right\}  \leq\sum_{i=1}^{N} \Pr\left\{ A_{i}^{c}\right\}  , $$ where $$A_{1}, \ldots, A_{N}$$ are events. The analogous bound for projector logic would be

\text{Tr}\left\{ \left(  I-\Pi_{1}\cdots\Pi_{N}\cdots\Pi_{1}\right) \rho\right\} \leq\sum_{i=1}^{N}\text{Tr}\left\{  \left(  I-\Pi_{i}\right) \rho\right\} , $$ if we think of $$\Pi_{1}\cdots\Pi_{N}$$ as a projector onto the intersection of subspaces. Though, the above bound only holds if the projectors $$\Pi_{1}$$, ..., $$\Pi_{N}$$ are commuting (choosing $$\Pi_{1}=\left\vert +\right\rangle \left\langle +\right\vert $$, $$\Pi_{2}=\left\vert 0\right\rangle \left\langle 0\right\vert $$, and $$\rho=\left\vert 0\right\rangle \left\langle 0\right\vert $$ gives a counterexample). If the projectors are non-commuting, then Sen's bound is the next best thing and suffices for our purposes here.

HSW theorem with the non-commutative union bound
We now prove the HSW theorem with Sen's non-commutative union bound. We divide up the proof into a few parts: codebook generation, POVM construction, and error analysis.

Codebook Generation. We first describe how Alice and Bob agree on a random choice of code. They have the channel $$x\rightarrow\rho_{x}$$ and a distribution $$p_{X}\left( x\right)  $$. They choose $$M$$ classical sequences $$x^{n}$$ according to the IID\ distribution $$p_{X^{n}}\left( x^{n}\right)  $$. After selecting them, they label them with indices as $$\left\{ x^{n}\left( m\right)  \right\}  _{m\in\left[  M\right]  }$$. This leads to the following quantum codewords: $$ \rho_{x^{n}\left( m\right)  }=\rho_{x_{1}\left(  m\right)  }\otimes \cdots\otimes\rho_{x_{n}\left( m\right)  }. $$ The quantum codebook is then $$\left\{ \rho_{x^{n}\left(  m\right)  }\right\} $$. The average state of the codebook is then where $$\rho=\sum_{x}p_{X}\left( x\right)  \rho_{x}$$.

POVM Construction. Sens' bound from the above lemma suggests a method for Bob to decode a state that Alice transmits. Bob should first ask "Is the received state in the average typical subspace?" He can do this operationally by performing a typical subspace measurement corresponding to $$\left\{ \Pi_{\rho,\delta} ^{n},I-\Pi_{\rho,\delta}^{n}\right\} $$. Next, he asks in sequential order, "Is the received codeword in the $$m^{\text{th}}$$ conditionally typical subspace?" This is in some sense equivalent to the question, "Is the received codeword the $$m^{\text{th}}$$ transmitted codeword?" He can ask these questions operationally by performing the measurements corresponding to the conditionally typical projectors $$\left\{ \Pi_{\rho_{x^{n}\left(  m\right) },\delta},I-\Pi_{\rho_{x^{n}\left( m\right)  },\delta}\right\}  $$.

Why should this sequential decoding scheme work well? The reason is that the transmitted codeword lies in the typical subspace on average:

\mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{  \Pi_{\rho,\delta}\ \rho_{X^{n} }\right\} \right\}    =\text{Tr}\left\{  \Pi_{\rho,\delta}\ \mathbb{E} _{X^{n}}\left\{ \rho_{X^{n}}\right\}  \right\}  $$
 * $$ =\text{Tr}\left\{ \Pi_{\rho,\delta}\ \rho^{\otimes n}\right\}  $$
 * $$ \geq1-\epsilon,$$

where the inequality follows from (\ref{eq:1st-typ-prop}). Also, the projectors $$\Pi_{\rho_{x^{n}\left( m\right)  },\delta}$$ are "good detectors" for the states $$\rho_{x^{n}\left( m\right) }$$ (on average) because the following condition holds from conditional quantum typicality: $$ \mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{  \Pi_{\rho_{X^{n}},\delta} \ \rho_{X^{n}}\right\} \right\}  \geq1-\epsilon. $$

Error Analysis. The probability of detecting the $$m^{\text{th}}$$ codeword correctly under our sequential decoding scheme is equal to $$ \text{Tr}\left\{ \Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\hat{\Pi} _{\rho_{X^{n}\left( m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( 1\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{x^{n}\left(  m\right) }\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right)  },\delta }\cdots\hat{\Pi}_{\rho_{X^{n}\left( m-1\right)  },\delta}\Pi_{\rho _{X^{n}\left( m\right)  },\delta}\right\}  , $$ where we make the abbreviation $$\hat{\Pi}\equiv I-\Pi$$. (Observe that we project into the average typical subspace just once.) Thus, the probability of an incorrect detection for the $$m^{\text{th}}$$ codeword is given by $$ 1-\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\hat{\Pi} _{\rho_{X^{n}\left( m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( 1\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{x^{n}\left(  m\right) }\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right)  },\delta }\cdots\hat{\Pi}_{\rho_{X^{n}\left( m-1\right)  },\delta}\Pi_{\rho _{X^{n}\left( m\right)  },\delta}\right\}  , $$ and the average error probability of this scheme is equal to $$ 1-\frac{1}{M}\sum_{m}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left(  m\right) },\delta}\hat{\Pi}_{\rho_{X^{n}\left( m-1\right)  },\delta}\cdots\hat{\Pi }_{\rho_{X^{n}\left( 1\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho _{x^{n}\left( m\right)  }\ \Pi_{\rho,\delta}^{n}\ \hat{\Pi}_{\rho _{X^{n}\left( 1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left( m-1\right)  },\delta}\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right\}. $$ Instead of analyzing the average error probability, we analyze the expectation of the average error probability, where the expectation is with respect to the random choice of code:

Our first step is to apply Sen's bound to the above quantity. But before doing so, we should rewrite the above expression just slightly, by observing that

1  =\mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{ \rho_{X^{n}\left( m\right)  }\right\}  \right\}  $$
 * $$ =\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi

_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\right\}  +\text{Tr}\left\{ \hat{\Pi}_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\right\}  \right\} $$
 * $$ =\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi

_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}\right\} \right\} +\frac{1}{M}\sum_{m}\text{Tr}\left\{  \hat{\Pi}_{\rho,\delta} ^{n}\mathbb{E}_{X^{n}}\left\{ \rho_{X^{n}\left(  m\right)  }\right\} \right\} $$
 * $$ =\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi

_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}\right\} \right\} +\text{Tr}\left\{  \hat{\Pi}_{\rho,\delta}^{n}\rho^{\otimes n}\right\} $$
 * $$ \leq\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{

\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta} ^{n}\right\} \right\}  +\epsilon $$ Substituting into ($$) (and forgetting about the small $$\epsilon$$ term for now) gives an upper bound of

\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi _{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}\right\} \right\} $$

-\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi _{\rho_{X^{n}\left( m\right)  },\delta}\hat{\Pi}_{\rho_{X^{n}\left( m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta }\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left( m\right)  }\ \Pi_{\rho,\delta }^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right)  },\delta}\cdots\hat{\Pi} _{\rho_{X^{n}\left( m-1\right)  },\delta}\Pi_{\rho_{X^{n}\left(  m\right) },\delta}\right\} \right\}. $$ We then apply Sen's bound to this expression with $$\sigma=\Pi_{\rho,\delta }^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}$$ and the sequential projectors as $$\Pi_{\rho_{X^{n}\left( m\right)  },\delta}$$, $$\hat{\Pi} _{\rho_{X^{n}\left( m-1\right)  },\delta}$$, ..., $$\hat{\Pi}_{\rho _{X^{n}\left( 1\right)  },\delta}$$. This gives the upper bound $$ \mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}2\left[  \text{Tr}\left\{ \left( I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho ,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}\right\} +\sum_{i=1}^{m-1}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left(  i\right)  },\delta }\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta} ^{n}\right\} \right]  ^{1/2}\right\}. $$ Due to concavity of the square root, we can bound this expression from above by

2\left[ \mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{ \left( I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho ,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}\right\} +\sum_{i=1}^{m-1}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left(  i\right)  },\delta }\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta} ^{n}\right\} \right\}  \right]  ^{1/2}$$
 * $$ \leq2\left[ \mathbb{E}_{X^{n}}\left\{  \frac{1}{M}\sum_{m}\text{Tr}\left\{

\left( I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho ,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}\right\} +\sum_{i\neq m}\text{Tr}\left\{ \Pi_{\rho_{X^{n}\left(  i\right)  },\delta }\Pi_{\rho,\delta}^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta} ^{n}\right\} \right\}  \right]  ^{1/2}, $$ where the second bound follows by summing over all of the codewords not equal to the $$m^{\text{th}}$$ codeword (this sum can only be larger).

We now focus exclusively on showing that the term inside the square root can be made small. Consider the first term:

\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \left( I-\Pi_{\rho_{X^{n}\left(  m\right)  },\delta}\right)  \Pi_{\rho,\delta} ^{n}\rho_{X^{n}\left( m\right)  }\Pi_{\rho,\delta}^{n}\right\}  \right\}  $$
 * $$ \leq\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \left(

I-\Pi_{\rho_{X^{n}\left( m\right)  },\delta}\right)  \rho_{X^{n}\left( m\right) }\right\}  +\left\Vert \rho_{X^{n}\left(  m\right)  }-\Pi _{\rho,\delta}^{n}\rho_{X^{n}\left(  m\right)  }\Pi_{\rho,\delta} ^{n}\right\Vert _{1}\right\}  $$
 * $$ \leq\epsilon+2\sqrt{\epsilon}.

$$ where the first inequality follows from ($$) and the second inequality follows from the gentle operator lemma and the properties of unconditional and conditional typicality. Consider now the second term and the following chain of inequalities:

\sum_{i\neq m}\mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{  \Pi_{\rho _{X^{n}\left( i\right)  },\delta}\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left( m\right)  }\ \Pi_{\rho,\delta}^{n}\right\}  \right\}  $$
 * $$ =\sum_{i\neq m}\text{Tr}\left\{  \mathbb{E}_{X^{n}}\left\{  \Pi

_{\rho_{X^{n}\left( i\right)  },\delta}\right\}  \ \Pi_{\rho,\delta} ^{n}\ \mathbb{E}_{X^{n}}\left\{ \rho_{X^{n}\left(  m\right)  }\right\} \ \Pi_{\rho,\delta}^{n}\right\} $$
 * $$ =\sum_{i\neq m}\text{Tr}\left\{ \mathbb{E}_{X^{n}}\left\{  \Pi

_{\rho_{X^{n}\left( i\right)  },\delta}\right\}  \ \Pi_{\rho,\delta} ^{n}\ \rho^{\otimes n}\ \Pi_{\rho,\delta}^{n}\right\} $$
 * $$ \leq\sum_{i\neq m}2^{-n\left[  H\left(  B\right)  -\delta\right]

}\ \text{Tr}\left\{ \mathbb{E}_{X^{n}}\left\{  \Pi_{\rho_{X^{n}\left( i\right)  },\delta}\right\}  \ \Pi_{\rho,\delta}^{n}\right\} $$ The first equality follows because the codewords $$X^{n}\left( m\right)  $$ and $$X^{n}\left( i\right)  $$ are independent since they are different. The second equality follows from ($$). The first inequality follows from (\ref{eq:3rd-typ-prop}). Continuing, we have

\leq\sum_{i\neq m}2^{-n\left[ H\left(  B\right)  -\delta\right] }\ \mathbb{E}_{X^{n}}\left\{ \text{Tr}\left\{  \Pi_{\rho_{X^{n}\left( i\right)  },\delta}\right\}  \right\}  $$
 * $$ \leq\sum_{i\neq m}2^{-n\left[  H\left(  B\right)  -\delta\right]

}\ 2^{n\left[ H\left(  B|X\right)  +\delta\right]  }$$
 * $$ =\sum_{i\neq m}2^{-n\left[  I\left(  X;B\right)  -2\delta\right]  }$$
 * $$ \leq M\ 2^{-n\left[  I\left(  X;B\right)  -2\delta\right]  }.

$$ The first inequality follows from $$\Pi_{\rho,\delta}^{n}\leq I$$ and exchanging the trace with the expectation. The second inequality follows from (\ref{eq:2nd-cond-typ}). The next two are straightforward.

Putting everything together, we get our final bound on the expectation of the average error probability:

1-\mathbb{E}_{X^{n}}\left\{ \frac{1}{M}\sum_{m}\text{Tr}\left\{  \Pi _{\rho_{X^{n}\left( m\right)  },\delta}\hat{\Pi}_{\rho_{X^{n}\left( m-1\right)  },\delta}\cdots\hat{\Pi}_{\rho_{X^{n}\left(  1\right)  },\delta }\ \Pi_{\rho,\delta}^{n}\ \rho_{X^{n}\left( m\right)  }\ \Pi_{\rho,\delta }^{n}\ \hat{\Pi}_{\rho_{X^{n}\left( 1\right)  },\delta}\cdots\hat{\Pi} _{\rho_{X^{n}\left( m-1\right)  },\delta}\Pi_{\rho_{X^{n}\left(  m\right) },\delta}\right\} \right\}  $$
 * $$\leq\epsilon+2\left[ \left(  \epsilon+2\sqrt{\epsilon}\right)

+M\ 2^{-n\left[ I\left(  X;B\right)  -2\delta\right]  }\right]  ^{1/2}. $$ Thus, as long as we choose $$M=2^{n\left[ I\left(  X;B\right)  -3\delta \right] }$$, there exists a code with vanishing error probability.