Quadratic form (statistics)

In multivariate statistics, if $$\varepsilon$$ is a vector of $$n$$ random variables, and $$\Lambda$$ is an $$n$$-dimensional symmetric matrix, then the scalar quantity $$\varepsilon^T\Lambda\varepsilon$$ is known as a quadratic form in $$\varepsilon$$.

Expectation
It can be shown that


 * $$\operatorname{E}\left[\varepsilon^T\Lambda\varepsilon\right]=\operatorname{tr}\left[\Lambda \Sigma\right] + \mu^T\Lambda\mu$$

where $$\mu$$ and $$\Sigma$$ are the expected value and variance-covariance matrix of $$\varepsilon$$, respectively, and tr denotes the trace of a matrix. This result only depends on the existence of $$\mu$$ and $$\Sigma$$; in particular, normality of $$\varepsilon$$ is not required.

A book treatment of the topic of quadratic forms in random variables is that of Mathai and Provost.

Proof
Since the quadratic form is a scalar quantity, $$ \varepsilon^T\Lambda\varepsilon = \operatorname{tr}(\varepsilon^T\Lambda\varepsilon)$$.

Next, by the cyclic property of the trace operator,


 * $$ \operatorname{E}[\operatorname{tr}(\varepsilon^T\Lambda\varepsilon)] = \operatorname{E}[\operatorname{tr}(\Lambda\varepsilon\varepsilon^T)]. $$

Since the trace operator is a linear combination of the components of the matrix, it therefore follows from the linearity of the expectation operator that


 * $$ \operatorname{E}[\operatorname{tr}(\Lambda\varepsilon\varepsilon^T)] = \operatorname{tr}(\Lambda \operatorname{E}(\varepsilon\varepsilon^T)). $$

A standard property of variances then tells us that this is


 * $$ \operatorname{tr}(\Lambda (\Sigma + \mu \mu^T)). $$

Applying the cyclic property of the trace operator again, we get


 * $$ \operatorname{tr}(\Lambda\Sigma) + \operatorname{tr}(\Lambda \mu \mu^T) = \operatorname{tr}(\Lambda\Sigma) + \operatorname{tr}(\mu^T\Lambda\mu) = \operatorname{tr}(\Lambda\Sigma) + \mu^T\Lambda\mu.$$

Variance in the Gaussian case
In general, the variance of a quadratic form depends greatly on the distribution of $$\varepsilon$$. However, if $$\varepsilon$$ does follow a multivariate normal distribution, the variance of the quadratic form becomes particularly tractable. Assume for the moment that $$\Lambda$$ is a symmetric matrix. Then,


 * $$\operatorname{var} \left[\varepsilon^T\Lambda\varepsilon\right] = 2\operatorname{tr}\left[\Lambda \Sigma\Lambda \Sigma\right] + 4\mu^T\Lambda\Sigma\Lambda\mu$$.

In fact, this can be generalized to find the covariance between two quadratic forms on the same $$\varepsilon$$ (once again, $$\Lambda_1$$ and $$\Lambda_2$$ must both be symmetric):


 * $$\operatorname{cov}\left[\varepsilon^T\Lambda_1\varepsilon,\varepsilon^T\Lambda_2\varepsilon\right]=2\operatorname{tr}\left[\Lambda _1\Sigma\Lambda_2 \Sigma\right] + 4\mu^T\Lambda_1\Sigma\Lambda_2\mu$$.

In addition, a quadratic form such as this follows a generalized chi-squared distribution.

Computing the variance in the non-symmetric case
The case for general $$\Lambda$$ can be derived by noting that


 * $$\varepsilon^T\Lambda^T\varepsilon=\varepsilon^T\Lambda\varepsilon$$

so


 * $$\varepsilon^T\tilde{\Lambda}\varepsilon=\varepsilon^T\left(\Lambda+\Lambda^T\right)\varepsilon/2$$

is a quadratic form in the symmetric matrix $$\tilde{\Lambda}=\left(\Lambda+\Lambda^T\right)/2$$, so the mean and variance expressions are the same, provided $$\Lambda$$ is replaced by $$\tilde{\Lambda}$$ therein.

Examples of quadratic forms
In the setting where one has a set of observations $$y$$ and an operator matrix $$H$$, then the residual sum of squares can be written as a quadratic form in $$y$$:


 * $$\textrm{RSS}=y^T(I-H)^T (I-H)y.$$

For procedures where the matrix $$H$$ is symmetric and idempotent, and the errors are Gaussian with covariance matrix $$\sigma^2I$$, $$\textrm{RSS}/\sigma^2$$ has a chi-squared distribution with $$k$$ degrees of freedom and noncentrality parameter $$\lambda$$, where


 * $$k=\operatorname{tr}\left[(I-H)^T(I-H)\right]$$
 * $$\lambda=\mu^T(I-H)^T(I-H)\mu/2$$

may be found by matching the first two central moments of a noncentral chi-squared random variable to the expressions given in the first two sections. If $$Hy$$ estimates $$\mu$$ with no bias, then the noncentrality $$\lambda$$ is zero and $$\textrm{RSS}/\sigma^2$$ follows a central chi-squared distribution.