Centering matrix

In mathematics and multivariate statistics, the centering matrix is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.

Definition
The centering matrix of size n is defined as the n-by-n matrix
 * $$C_n = I_n - \tfrac{1}{n}J_n $$

where $$I_n\,$$ is the identity matrix of size n and $$J_n$$ is an n-by-n matrix of all 1's.

For example


 * $$C_1 = \begin{bmatrix}

0 \end{bmatrix} $$,


 * $$C_2= \left[ \begin{array}{rrr}

1 & 0 \\ 0 & 1 \end{array} \right] - \frac{1}{2}\left[ \begin{array}{rrr} 1 & 1 \\ 1 & 1 \end{array} \right]  = \left[ \begin{array}{rrr} \frac{1}{2} & -\frac{1}{2} \\ -\frac{1}{2} & \frac{1}{2} \end{array} \right] $$ ,


 * $$C_3 = \left[ \begin{array}{rrr}

1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] - \frac{1}{3}\left[ \begin{array}{rrr} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array} \right] = \left[ \begin{array}{rrr} \frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} & -\frac{1}{3} \\ -\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} \end{array} \right] $$

Properties
Given a column-vector, $$\mathbf{v}\,$$ of size n, the centering property of $$C_n\,$$ can be expressed as
 * $$C_n\,\mathbf{v} = \mathbf{v} - (\tfrac{1}{n}J_{n,1}^\textrm{T}\mathbf{v})J_{n,1}$$

where $$J_{n,1}$$ is a column vector of ones and $$\tfrac{1}{n}J_{n,1}^\textrm{T}\mathbf{v}$$ is the mean of the components of $$\mathbf{v}\,$$.

$$C_n\,$$ is symmetric positive semi-definite.

$$C_n\,$$ is idempotent, so that $$C_n^k=C_n$$, for $$k=1,2,\ldots$$. Once the mean has been removed, it is zero and removing it again has no effect.

$$C_n\,$$ is singular. The effects of applying the transformation $$C_n\,\mathbf{v}$$ cannot be reversed.

$$C_n\,$$ has the eigenvalue 1 of multiplicity n &minus; 1 and eigenvalue 0 of multiplicity 1.

$$C_n\,$$ has a nullspace of dimension 1, along the vector $$J_{n,1}$$.

$$C_n\,$$ is an orthogonal projection matrix. That is, $$C_n\mathbf{v}$$ is a projection of $$\mathbf{v}\,$$ onto the (n &minus; 1)-dimensional subspace that is orthogonal to the nullspace $$J_{n,1}$$. (This is the subspace of all n-vectors whose components sum to zero.)

The trace of $$C_n$$ is $$n(n-1)/n = n-1$$.

Application
Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it is a convenient analytical tool. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of an m-by-n matrix $$X$$.

The left multiplication by $$C_m$$ subtracts a corresponding mean value from each of the n columns, so that each column of the product $$C_m\,X$$ has a zero mean. Similarly, the multiplication by $$C_n$$ on the right subtracts a corresponding mean value from each of the m rows, and each row of the product $$X\,C_n$$ has a zero mean. The multiplication on both sides creates a doubly centred matrix $$C_m\,X\,C_n$$, whose row and column means are equal to zero.

The centering matrix provides in particular a succinct way to express the scatter matrix, $$S=(X-\mu J_{n,1}^{\mathrm{T}})(X-\mu J_{n,1}^{\mathrm{T}})^{\mathrm{T}}$$ of a data sample $$X\,$$, where $$\mu=\tfrac{1}{n}X J_{n,1}$$ is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as
 * $$S=X\,C_n(X\,C_n)^{\mathrm{T}}=X\,C_n\,C_n\,X\,^{\mathrm{T}}=X\,C_n\,X\,^{\mathrm{T}}.$$

$$C_n$$ is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are $$k=n$$, and $$p_1=p_2=\cdots=p_n=\frac{1}{n}$$.