Channel state information

In wireless communications, channel state information (CSI) is the known channel properties of a communication link. This information describes how a signal propagates from the transmitter to the receiver and represents the combined effect of, for example, scattering, fading, and power decay with distance. The method is called channel estimation. The CSI makes it possible to adapt transmissions to current channel conditions, which is crucial for achieving reliable communication with high data rates in multiantenna systems.

CSI needs to be estimated at the receiver and usually quantized and feedback to the transmitter (although reverse-link estimation is possible in time-division duplex (TDD) systems). Therefore, the transmitter and receiver can have different CSI. The CSI at the transmitter and the CSI at the receiver are sometimes referred to as CSIT and CSIR, respectively.

Different kinds of channel state information
There are basically two levels of CSI, namely instantaneous CSI and statistical CSI.

Instantaneous CSI (or short-term CSI) means that the current channel conditions are known, which can be viewed as knowing the impulse response of a digital filter. This gives an opportunity to adapt the transmitted signal to the impulse response and thereby optimize the received signal for spatial multiplexing or to achieve low bit error rates.

Statistical CSI (or long-term CSI) means that a statistical characterization of the channel is known. This description can include, for example, the type of fading distribution, the average channel gain, the line-of-sight component, and the spatial correlation. As with instantaneous CSI, this information can be used for transmission optimization.

The CSI acquisition is practically limited by how fast the channel conditions are changing. In fast fading systems where channel conditions vary rapidly under the transmission of a single information symbol, only statistical CSI is reasonable. On the other hand, in slow fading systems instantaneous CSI can be estimated with reasonable accuracy and used for transmission adaptation for some time before being outdated.

In practical systems, the available CSI often lies in between these two levels; instantaneous CSI with some estimation/quantization error is combined with statistical information.

Mathematical description
In a narrowband flat-fading channel with multiple transmit and receive antennas (MIMO), the system is modeled as
 * $$\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{n}$$

where $$ \mathbf{y}$$ and $$ \mathbf{x}$$ are the receive and transmit vectors, respectively, and $$\mathbf{H}$$ and $$\mathbf{n}$$ are the channel matrix and the noise vector, respectively. The noise is often modeled as circular symmetric complex normal with
 * $$\mathbf{n} \sim \mathcal{CN}(\mathbf{0},\,\mathbf{S})$$

where the mean value is zero and the noise covariance matrix $$\mathbf{S}$$ is known.

Instantaneous CSI
Ideally, the channel matrix $$\mathbf{H}$$ is known perfectly. Due to channel estimation errors, the channel information can be represented as
 * $$\mbox{vec} (\mathbf{H}_{\textrm{estimate}}) \sim \mathcal{CN}(\mbox{vec}(\mathbf{H}),\,\mathbf{R}_{\textrm{error}})$$

where $$\mathbf{H}_{\textrm{estimate}}$$ is the channel estimate and $$\mathbf{R}_{\textrm{error}}$$ is the estimation error covariance matrix. The vectorization $$\mbox{vec}$$ was used to achieve the column stacking of $$\mathbf{H}$$, as multivariate random variables are usually defined as vectors.

Statistical CSI
In this case, the statistics of $$\mathbf{H}$$ are known. In a Rayleigh fading channel, this corresponds to knowing that
 * $$\mbox{vec} (\mathbf{H}) \sim \mathcal{CN}(\mathbf{0},\,\mathbf{R})$$

for some known channel covariance matrix $$\mathbf{R}$$.

Estimation of CSI
Since the channel conditions vary, instantaneous CSI needs to be estimated on a short-term basis. A popular approach is so-called training sequence (or pilot sequence), where a known signal is transmitted and the channel matrix $$\mathbf{H}$$ is estimated using the combined knowledge of the transmitted and received signal.

Let the training sequence be denoted $$ \mathbf{p}_1,\ldots,\mathbf{p}_N$$, where the vector $$\mathbf{p}_i$$ is transmitted over the channel as
 * $$\mathbf{y}_i = \mathbf{H}\mathbf{p}_i + \mathbf{n}_i.$$

By combining the received training signals $$\mathbf{y}_i$$ for $$i=1,\ldots,N$$, the total training signalling becomes
 * $$\mathbf{Y}=[\mathbf{y}_1, \ldots,\mathbf{y}_N] = \mathbf{H}\mathbf{P} + \mathbf{N}$$

with the training matrix $$ \mathbf{P}=[\mathbf{p}_1, \ldots,\mathbf{p}_N]$$ and the noise matrix $$ \mathbf{N}=[\mathbf{n}_1, \ldots,\mathbf{n}_N]$$.

With this notation, channel estimation means that $$ \mathbf{H}$$ should be recovered from the knowledge of $$\mathbf{Y}$$ and $$ \mathbf{P}$$.

Least-square estimation
If the channel and noise distributions are unknown, then the least-square estimator (also known as the minimum-variance unbiased estimator) is
 * $$\mathbf{H}_{\textrm{LS-estimate}} = \mathbf{Y} \mathbf{P}^H(\mathbf{P} \mathbf{P}^H)^{-1} $$

where $$^H $$ denotes the conjugate transpose. The estimation mean squared error (MSE) is proportional to
 * $$\mathrm{tr} (\mathbf{P} \mathbf{P}^H)^{-1}$$

where $$\mathrm{tr}$$ denotes the trace. The error is minimized when $$\mathbf{P} \mathbf{P}^H$$ is a scaled identity matrix. This can only be achieved when $$N$$ is equal to (or larger than) the number of transmit antennas. The simplest example of an optimal training matrix is to select $$\mathbf{P}$$ as a (scaled) identity matrix of the same size that the number of transmit antennas.

MMSE estimation
If the channel and noise distributions are known, then this a priori information can be exploited to decrease the estimation error. This approach is known as Bayesian estimation and for Rayleigh fading channels it exploits that
 * $$\mbox{vec} (\mathbf{H}) \sim \mathcal{CN}(0,\,\mathbf{R}), \quad \mbox{vec}(\mathbf{N}) \sim \mathcal{CN}(0,\,\mathbf{S}).$$

The MMSE estimator is the Bayesian counterpart to the least-square estimator and becomes
 * $$\mbox{vec}(\mathbf{H}_{\textrm{MMSE-estimate}}) = \left(\mathbf{R}^{-1} + (\mathbf{P}^T \, \otimes\, \mathbf{I})^H \mathbf{S}^{-1} (\mathbf{P}^T \, \otimes\, \mathbf{I}) \right)^{-1} (\mathbf{P}^T \, \otimes\, \mathbf{I})^H \mathbf{S}^{-1} \mbox{vec}(\mathbf{Y}) $$

where $$\otimes$$ denotes the Kronecker product and the identity matrix $$\scriptstyle \mathbf{I}$$ has the dimension of the number of receive antennas. The estimation MSE is
 * $$ \mathrm{tr} \left(\mathbf{R}^{-1} + (\mathbf{P}^T \, \otimes\, \mathbf{I})^H \mathbf{S}^{-1} (\mathbf{P}^T \, \otimes\, \mathbf{I}) \right)^{-1}$$

and is minimized by a training matrix $$ \mathbf{P}$$ that in general can only be derived through numerical optimization. But there exist heuristic solutions with good performance based on waterfilling. As opposed to least-square estimation, the estimation error for spatially correlated channels can be minimized even if $$N$$ is smaller than the number of transmit antennas. Thus, MMSE estimation can both decrease the estimation error and shorten the required training sequence. It needs however additionally the knowledge of the channel correlation matrix $$\mathbf{R}$$ and noise correlation matrix $$\mathbf{S}$$. In absence of an accurate knowledge of these correlation matrices, robust choices need to be made to avoid MSE degradation.

Neural network estimation
With the advances of deep learning there has been work that shows that the channel state information can be estimated using Neural network such as 2D/3D CNN and obtain better performance with less pilot signals. The main idea is that the neural network can do a good interpolation in time and frequency.

Data-aided versus blind estimation
In a data-aided approach, the channel estimation is based on some known data, which is known both at the transmitter and at the receiver, such as training sequences or pilot data. In a blind approach, the estimation is based only on the received data, without any known transmitted sequence. The tradeoff is the accuracy versus the overhead. A data-aided approach requires more bandwidth or it has a higher overhead than a blind approach, but it can achieve a better channel estimation accuracy than a blind estimator.