Talk:Covariance matrix

Pooled within-group covariance matrix
I have found use for the "pooled within-group covariance matrix." See for example http://people.revoledu.com/kardi/tutorial/LDA/Numerical%20Example.html or "Analyzing Multivariate Data" by Lattin, Carroll, and Green. This page seems a natural place to put a definition for such a thing, but it doesn't exactly fit into the flow of the page. Suggestions? Otherwise I'll just jump in at a future date. dfrankow (talk) 16:54, 29 December 2008 (UTC)

New problem...
This article starts with a big mess of slashes and sigmas and brackets... not sure if this is just my browser rendering something in a strange manner (though I'm using firefox, so I expect I'm not the only one seeing this) or quite what it is. I don't know how to fix it either; maybe somebody else does?

—The preceding unsigned comment was added by 129.169.10.56 (talk) 16:32, 5 December 2006 (UTC).

Explanatory Formula
This formula looks at first sight very complicated. Actually its derivation is quite simple (for simplicity we assume &mu; to be 0,(just replace everywhere X by X - &mu; if you want)):
 * 1) fix a direction in n dimensions (unit vector), let's call it u
 * 2) project your data onto this direction (you get a number for each of your data vectors, or taking all together a set of scalar samples); you perform just a scalar product i.e.: $$ S(i) = (X(i),u) $$
 * 3) compute ordinary variance for this new set of scalar numbers

We are almost finished. Of course for every unit vector you get (in general) different values, so you do not have just one number like in the scalar case but a whole bunch of numbers (a continuum) parametrised by the unit vectors in n dimensions (actually only the direction counts, u and -u give the same value) Now comes the big trick. We do not have to keep this infinity of numbers, as you can see below all the information is contained in the covariance matrix (wow!)

$$ {\rm var} {((u,X))} = E((u,X)^2) = E((u,X)(u,X)) = E(u^\top XX^\top u)$$

Now because u is a constant we have:

$$E(u^\top XX^\top u) = u^\top E(XX^\top)u$$

or

$$ {\rm var} {((u,X)} = u^\top E(XX^\top)u = u^\top \Sigma u$$

and we are done... (easy, isn't it :)

Comments moved here to Talk page
I have moved the comments above to this discussion page for several reasons. The assertion that this very simple formula looks "very complicated" seems exceedingly silly and definitely not NPOV. Then whoever wrote it refers to "its derivation". That makes no sense. It is a definition, not an identity or any sort of proposition. What proposition that author was trying to derive is never stated. The writing is a model of unclarity. Michael Hardy 22:52 Mar 12, 2003 (UTC)

I was attempting to explain covariance matrix

 * Okay I have written the stuff above and would like to reply...
 * 1) It may be true tha for someone familiar with statistics the covariance matrix is immediatly understandable... nonetheless for someone outside of the statistics world (like me) it looks complicated...
 * 2) by derivation I actually meant Motivation, or some hint how one can understand the meaning of covariance matrix and what it can be used for.
 * If you just throw the formula at people most of them do not understand its meaning and it is not clear at all why a formula like this makes any sense...
 * Of course you can let everyone find out by him/herself but this is just waste of time since the underlying concept is very simple...I did not try to prove any statement.
 * 3) what is a clear writing is certainly discutable ;-) maybe it is not as water proof and correct as yours, but the idea is quite clear and this is the only thing that counts...

Idea was is not clear

 * Except that the idea is not clear without considerable interpretation, and even then I'm not entirely sure of what was meant. I think that what was intended could certainly be said more clearly with far fewer words.  I'll make some attempt at this within a few days. Michael Hardy 20:48 Mar 13, 2003 (UTC)

Another explanation
Okay I try to explain the idea more clearly:

if you have some set of vector measueremnts you can consider it as a cloud of points in n-dimensions. If you want to find something interesting about your data set you can look a the data from different directions, or what is essentially the same perform a projection into 1,2, or 3 dimensions. But there are many projections possible, which one to take ? Life is not enough to try them all ;-) One criterion you can apply (not the best one but is better than nothing and it works sometimes...) is to look at directions for which the data have large variance (this makes sense if you want to find the most "energetic" components...) I tried to explain that the covariance matrix is a tool (at least can be interpreted in this way) to represent the data variances in all possible drections in a effective and compact way. If you once understand this it is immediatly clear why it is useful to look for the eigenvalues and eigenvectors of the covariance matrix.

I can not expect to see your version of this ! ;-)

Rephrased

 * I'll try my hand at the above:


 * If you have many measurements of similar things (such as GNP and airplane passengers for several years), such lists of information can be manipulated with linear algebra as vectors of numbers. If graphs are drawn of all useful combinations of these numbers, the result is many separate points on many graphs.  Comparisons between the numerous graphs can be done if all the numbers are placed within one graph with as many dimensions as there are data items, thus creating a "cloud" of points in this space.


 * For example, GNP of one country for several years can be shown on a page as a graph of years by values. That has only two dimensions.  Airline passenger numbers can be shown to the side of that graph in a three dimensional display.  Adding another measurement, such as electricity consumed, requires a fourth dimension which is often hard to visualise.


 * Finding interesting information about such combinations can be difficult. One issue to examine is which part of that cloud of points has the greatest difference.  The covariance matrix is the result of calculating distances between all the data points in all directions, and producing the direction through the cloud which has the greatest differences.  Recording the distances between different types of data within the cloud produces the covariance matrix, where larger numbers suggest greater relationships.


 * (should eigenvalues and eigenvectors be introduced here, or are those considered to be parts of calculations other than covariance?) SEWilco 18:55, 15 Jan 2004 (UTC)

Article adjusted
I have inserted into the article some language that I think addresses your point, which I still think was quite unclear as you wrote it originally. Michael Hardy 19:53 Mar 14, 2003 (UTC)

PS: "One criterion is ....", but some other criteria may exist. (My point here is that in standard English, "criterion" is the singular. Michael Hardy 19:54 Mar 14, 2003 (UTC)

Still not clear and lost illustration

 * Unfortunately I'm not very happy with your version of my explanation.
 * Such sort of explanations can be found in any book, and I never found them very helpful. You have completely lost the geometric picture which makes all clear and simple. To say that the matrix entries are covariances between the variables

does not explain anything, and actually makes it more complicated because as you said this depends on the basis...
 * I would prefer to let decide the people who visit this site by themselves

which version they find more illuminating... At the moment it is hardly possible because my version is quite hidden :-(


 * Ps.
 * With "criterion" you are right, I'm not very good in english...
 * (nonostante capisci quasi tutto, che miracolo !)

Still looks unclear

 * I can write a more leisurely explanation when I have time, but your version still looks unclear to me as it stands. Michael Hardy 22:43 Mar 17, 2003 (UTC)

Please explain what is defined
I think that it is better to explain the things one defines. The above explanations helped me more to understand the topic than the mathematical absolutely correct formulas one finds, when looking for "covariance matrix". To my opinion most of the people that use Wikipedia are interested in both versions, so they should see them at the same site and not hidden in the discussion group.

What is unclear?

 * Maybe you could be more specific on "looks unclear to me"... I tried to write as clearly as I only can, because I want that everyone understands it with ease. If there is something unclear to you maybe I could explain it better but at the moment I do not know what is unclear ?

Please label your comments
People, label yourself in your comments so we know who is talking. Also be a little more specific about what you are pointing at. There are too many "that", "I", and "you" for it to be clear who is talking about what. I suggest the four-tilde signature so the date is included. SEWilco 17:11, 15 Jan 2004 (UTC)

Yet Another Rephrasing
(SEWilco 08:39, 7 Jul 2004 (UTC)) Maybe something like this will be useful:
 * A vector is a list of numbers. The variance is the square of the difference between a number and an expected value, such as the variance of two lengths is an area the size of the square of the difference.  The covariance matrix has a list of numbers along one side, and the other side has a list of the expected values for each listed value.  Each position of the matrix is filled in with the square of the difference between those two numbers.  The covariance matrix then contains the variance between each listed number and the expected values of all numbers in the list.  This shows how different all numbers are from all the expected values.

Nonstandard notation?
I've never encountered the usage of $$var(\textbf{X})$$ for denoting the covariance matix. I've always used $$\textbf{C}_X$$ for this (and $$\textbf{R}_X$$ for autocorrelation matrix). This is standard notation pracise in the field of signal processing. --Fredrik Orderud 12:26, 20 Apr 2005 (UTC)


 * How about $$\operatorname{cov}(X)$$? Cburnett 14:24, Apr 20, 2005 (UTC)


 * I've now changed the notation to "cov", which is in accordance with mathworld. I've also added a separate "signal processing" section containing the different notation used there. --Fredrik Orderud 21:02, 28 Apr 2005 (UTC)


 * Pardon me, but who wrote the comments immediately above??? Michael Hardy 20:07, 28 Apr 2005 (UTC)


 * Sorry. It as me, and I forgot to sign. --Fredrik Orderud 21:02, 28 Apr 2005 (UTC)

Standard notation:
 * $$\operatorname{var}(\textbf{X}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{X} - E[\textbf{X}])^{T}]$$

ALSO standard notation:
 * $$ \operatorname{cov}(\textbf{X}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{X} - E[\textbf{X}])^{T}]$$

ALSO standard notation:
 * $$\operatorname{cov}(\textbf{X},\textbf{Y}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{Y} - E[\textbf{Y}])^{T}]$$ (the "cross-covariance" between two random vectors)

Unfortunately the first two of these usages jar with each other. The first and third are in perfect harmony. The first notation is found in William Feller's celebrated two-volume book on probability, which everybody is familiar with, so it's surprising that some people are suggesting it first appeared on Wikipedia. It's also found in some statistics texts, e.g., Sanford Weissberg's linear regression text. Michael Hardy 18:05, 28 Apr 2005 (UTC)

Horrible mess!!
This article was starting to become an examplar of crackpothood. Someone who apparently didn't like the opening paragraphs, instead of replacing them with other material, simply put the other material above those opening paragraphs, so that the article started over again, saying, in a later paragraph, "In statistics, a covariance matrix is ..." etc., and giving the same elaborate definition again, with stylistic differences. And that eventually became the second of FOUR such iterations, with stylistic differences! Other things were wrong too. Why, for example was there a "stub" notice?? There should have been a "cleanup" notice right at the top, instead of a "stub" notice at the bottom. Inline TeX often gets misaligned or appears far too big, or both, on many browsers, but it looks as if someone went through and put perfectly good-looking non-TeX inline mathematical notation with TeX (e.g., "an n &times; n matrix" ---> "an $$n \times n$$ matrix"). (Tex generally looks very good when "displayed", however. And when TeX used in the normal way, as opposed to its use on Wikipedia, there's certainly no problem with inline math notation.)  Using lower-case letters for random variables is jarring, since in many cases one wants to write such things as


 * FX(x) = Pr(X &le; x)

and it is crucial to be careful about which of the "x"s above are capital and which are lower-case. The cleanup isn't finished yet .... Michael Hardy 19:16, 28 Apr 2005 (UTC)

Properties
I added the list of properties in the article. There was only two of them stated, and these properties should definitivley be on an article about cov and var matrices. --Steffen Grønneberg 14:06, 5 October 2005 (UTC)

Somehow I find it difficult to understand the 5th property:
 * 5. $$ \operatorname{cov}(\mathbf{X},\mathbf{X}) = \operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$$

Shouldn't the correct formula be:
 * 5. $$ \operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$$

, since no relationship between "X" and "Y" is defined? Does anyone have a reference on this? --Fredrik Orderud 12:01, 11 October 2005 (UTC)

Yep, my bad. Fixed it now. The reference I used is Multivariate Analysis by K. V. Mardia, J. T. Kent, J. M. Bibby. Its in chap. 3. (http://www.amazon.com/gp/product/0124712525/103-2355319-3731041?v=glance&n=283155&n=507846&s=books&v=glance) Is there a place where it's usual to cite this? --Steffen Grønneberg 23:10, 12 October 2005 (UTC)

Before I go make a fool of myself, shouldn't A and B be q x p matrices instead of p x q matrices? --The imp 15:32, 15 March 2006 (UTC)

I think the A and B matrices have the proper dimensions, but I changed the description of $$\mathbf{Y}$$ to a p x 1 vector (from q x 1) and changed $$\mathbf{a}$$ to a q x 1 vector (from a p x 1). I did this based on:
 * Properties 4, 5, 8, etc. that seem to imply that $$\mathbf{X}$$ and $$\mathbf{Y}$$ have the same dimensions.
 * Property 3, where $$\mathbf{AX}$$ would be a q x 1 vector, so I don't think it makes sense to add $$\mathbf{a}$$ if $$\mathbf{a}$$ is a p x 1 vector.

If I'm wrong, my apologies - feel free to correct it back to the original. —Preceding unsigned comment added by 64.22.160.1 (talk) 21:37, 8 May 2008 (UTC)

Conflict of var and cov?
I don't see how var(X,Y) and cov(X,Y) "conflict". We don't say that gamma(x) and x! or asin and arcsin "conflict" or "jar". It is perfectly OK to have var(X)=cov(X)=cov(X,X). The fact that there is a one-argument function cov doesn't exclude there being a two-argument, related, function. Consider, say, Γ(x)=Γ(x,0). --Macrakis 21:51, 21 December 2005 (UTC)


 * Hear, hear!! "Conflicting" notation is things like "is 0 in $$R^+$$" and "where does the $$2\pi$$ go in the Fourier Transform?".  I've toned down the expression of conflict. LachlanA 00:47, 22 January 2007 (UTC)

Calculation of covariance matrix
It would be nice to have a section on computational methods for calculating covariance matrices. I am unfortunately not competant to write it.... Any volunteers? --Macrakis 21:51, 21 December 2005 (UTC)

public Matrix CovarianceMatrix(double[,] myArray)   // For a num of TS, n, there are n*(n-1) covar's.        { //Cov(X, X) = Var(X) //Cov(P, Q) = Cov(Q, P)           //vcvMatrix is symmetric square, with Var(i) on the leading diagonal. //vcvMatrix is positive semi-definite (should i include a safty test??) //Cov(P, Q) is NOT unitless; its units are those of P times those of Q.           int nCols = myArray.GetLength(1); int nRows = myArray.GetLength(0); Matrix vcvMatrix = new Matrix(nCols, nCols); double[] u = mean(myArray); for (int i = 0; i < nCols; i++)     //rows of the vcvMatrix {               for (int j = 0; j < nCols; j++)  //cols of the vcvMatrix {                   double temp = 0; double covar = 0; for (int z = 0; z < nRows; z++) {                       temp += (myArray[z, i] - u[i]) * (myArray[z, j] - u[j]); }                   covar = temp / (nRows - 1); vcvMatrix[i, j] = covar; }           }            if (!vcvMatrix.Symmetric) {               throw new ApplicationException("VCV matrix is not symmetric "); }           return vcvMatrix; }


 * Basically by computation, you mean estimation, so this belongs in estimation of covariance matrices or sample mean and sample covariance Prax54 (talk) 14:43, 10 August 2012 (UTC)


 * However, it would be helpful to many readers if this article pointed out the difference between a covariance matrix for random variables and a sample covariance matrix ( and perhaps a matrix of covariance estimators) and at least gave a link to an article on sample covariance matrices.  I think all the articles on statistical items should perform a similar service for their respective topics - e.g. sample variance vs variance of a random variable, sample mean vs mean of a random variable.  This would be extremely redundant and unnecessarily if we regard the Wikipedia as a big textbook.  From that point of view, these distinctions should be made in one introductory chapter.  But given that most readers consult invidual articles, I think it's a reasonable approach.

Tashiro (talk) 16:43, 30 September 2012 (UTC)

Possible covariance matrices
What are the restrictions on what matrices can be covariance matrices? I guess the matrix has to be symmetric; is any symmetric matrix a possible covariance matrix? --Trovatore 23:11, 19 June 2006 (UTC)


 * A square matrix with real entries is a covariance matrix if and only if it is non-negative definite. If X is a column vector-valued random variable, then the expected value of XXT is the covariance matrix of the scalar compoments of X, so it should be clear why that has to be non-negative definite.  By the spectral theorem in its finite-dimensional version, every non-negative definite real matrix M has a non-negative definite real square root, which let us call M1/2.  Then let X be any column vector of the right size whose entries are random variables the variance of each of which is 1 and the covariance between any two of which is 0.  Then the covariance matrix of the entries in M1/2X is M.  So any non-negative definite matrix is a covariance matrix. Michael Hardy 17:09, 20 June 2006 (UTC)


 * Suppose you know all diagonal entries and some off-diagonal, and you want to generate a nonnegative-definite symmetric matrix having those entries. Is there an efficient way to generate such a thing, in the large-finite-dimensional case? --Trovatore 22:48, 23 June 2006 (UTC)

Expected value operator
In this definition, 'expected value operator' (mu)is used. Per my Excel program explanation of covariance, mu is just the average. Isn't it simpler to just say that mu is just the average value of X rather than the 'expected value'? —The preceding unsigned comment was added by Steve 10-Jan-0771.121.7.79 (talk) 03:41, 11 January 2007 (UTC).

Yes, but "average" is quite ambiguous. It could be a weighted mean, the geometric mean, the median, or lots of other things. The "expected value" is a standard term for the equally-weighted arithmetic mean. LachlanA 00:30, 22 January 2007 (UTC)


 * Also, terms like mean and covariance can be used for the estimators as well as for the parameters they estimate, whereas expected value has no such ambiguity. Btyner (talk) 18:58, 13 January 2008 (UTC)

You need to add what the typical E function is that is used in practice. That is, you don't divide by N, but N-1 typically. I frankly think using expectation is a mistake, as it makes this article more difficult for the newbie than it needs to be. Sure, it may be more general, but that doesn't mean more helpful, clear, or useful. —Preceding unsigned comment added by 71.111.251.229 (talk) 05:53, 2 March 2008 (UTC)

Introduction
"In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar-valued random variable.

If X is a column vector with n scalar random variable components, and μk is the expected value of the kth element of X, i.e., μk = E(Xk), then the covariance matrix is defined as:"

I guess many people, like me, come to visit this article because they want to do some kind of statistical analysis within their studies or other related work. Of course I can only speak for myself but the explanation that a covariance matrix is "a matrix of covariances" did not really help. Also the fact that it's a natural generalization to higher dimension of some concept didn't improve my understanding (and looking at the other comments I'm appearently not alone). Maybe somebody could add an introductory explanation or even a section in the text where this concept is explained for the uninitiated. 84.168.17.109 10:43, 7 February 2007 (UTC)

Agree, this article needs to be made more accessible.67.180.143.83 19:49, 9 February 2007 (UTC)

Extended References and Deeper Development
Although the Wolfram site can be useful, it seems as though this article should point to at least a few good texts which discuss covariance matrices, their properties, their applications, in full glory. I personally only have seen them in one text, which was not a very good text for the ins and outs of covariance matrices (van Kampen); I am unenthusiastically adding it to the Further Reading section. Does anyone have anything better?

Also, it seems as though at least a little bit more development on the Wikipedia page would be nice: e.g., the mathematical and physical meaning of the "generalized variance" (determinant of Cov Matrix).

Alex Roberts 19:46, 20 May 2007 (UTC)

The van Kampen is a very good book, but very advanced and not really about statistics, more about random processes in molecules. Much better for this topic would be "Mathematical Statistics and Data Analysis" by John A. Rice or "Introduction to Mathematical Statistics" by P. G. Hoel, which is old but excellent. —Preceding unsigned comment added by AidanTwomey (talk • contribs) 09:14, 28 March 2008 (UTC)

FYI: Moment-of-inertia discussion
You may be interested in this discussion on the relationship between the covariance matrix and the moment-of-inertia matrix. —Ben FrantzDale (talk) 20:29, 27 August 2008 (UTC)

Standard deviation matrix?
The section "Which matrices are covariance matrices" mentions the existence of the matrix square root of a covariance matrix. Is there a name for this? It seems like "standard deviation matrix" or something similar would be appropriate. That is, aren't the eigenvalues of $$M^{1/2}$$ the standard deviations in the principal directions and so if you wanted to draw a "confidence ellipsoid", you'd transform a sphere by $$M^{1/2}$$, just like you might draw confidence intervals at ±&sigma; for a 1-D distribution? —Ben FrantzDale (talk) 14:37, 4 September 2008 (UTC)


 * I don't think there is a general name. Note that in this context the square root is not unique and there are names associated with varios ways of defining the square root being used. For example "Cholesky decompostion" or "LU decomposition" for one computationally convenient approach, or "symmetric square root" for one based on the eigenvalue decomposition. Melcombe (talk) 13:33, 8 September 2008 (UTC)


 * Interesting. I hadn't thought about the square root being non-unique. I was assuming the root based on eigenvalue decomposition. Would Cholesky decomposition or LU decomposition even produce a meaningful square root in this context? (What would their square root mean?) —Ben FrantzDale (talk) 16:01, 12 September 2008 (UTC)


 * The "standard deviation matrix" is commonly called a whitening matrix or whitening transformation. When the covariance matrix is positive definite the Cholesky decomposition is defined, and can be used as a square-root matrix. Otherwise, when the covariance matrix is positive semi-definite (PSD), using the Singular Value Decomposition (SVD) you can produce a symmetric (numerically almost symmetric) square root matrix. See the book "An Introduction to Multivariate Statistical Analysis". Prax54 (talk) 14:21, 10 August 2012 (UTC)

Tensor?
Is the covariance matrix a tensor? That is, does it transform as a tensor? It appear so. —Ben FrantzDale (talk) 19:08, 26 November 2008 (UTC)
 * I've never gotten accustomed to this language of "transforming" that physicists use when they talk about tensors. But it is an object for which the operation of matrix multiplication makes sense, so I suspect that may be the same thing. Michael Hardy (talk) 00:59, 30 December 2008 (UTC)
 * In as much as the covariance matrix is the expected outer product of the error, it is definitely a tensor. (I think the math folks would call it a (0,2) tensor....) That is, in indicial notation,
 * $$\Sigma = \sum_{\mathbf{v}\in V} v\otimes v = \sum_{\mathbf{v}\in V} \mathbf{e}^i v_i \mathbf{e}^j v_j$$
 * and
 * $$\Sigma_{ij} = \sum_{\mathbf{v}\in V} v_i v_j$$
 * and so if you express v in a different coordinate system, you have to transform both vs in the above. —Ben FrantzDale (talk) 06:05, 11 November 2009 (UTC)
 * I think I had it backward. Try this:
 * The covariance matrix is the expected outer product of the error. Assuming we are measuring vectors (not dual vectors), this makes it the expectation of an outer product of contravariant vectors which would make it a type (2,0) tensor and in indicial notion we would have
 * $$\Sigma = E[\sum_{\mathbf{v}\in V} v\otimes v] = E[\sum_{\mathbf{v}\in V} \mathbf{e}_i v^i \mathbf{e}_j v^j]$$
 * and so
 * $$\Sigma^{ij} = E[\sum_{\mathbf{v}\in V} v^i v^j]$$
 * and so if you express v in a different coordinate system, you have to transform both vs in the above. Similarly, this means that the inverse covariance tensor, S, used in the Mahalonobis norm is a type (0,2) tensor which is consistent with it making for a bilinear form as in $$\sqrt{S_{ij} x^i x^j}$$. —Ben FrantzDale (talk) 14:56, 4 December 2009 (UTC)

Relation to Hessian of log likelihood?
How does the Hessian of the log likelihood function for a zero-mean multivariate normal distribution relate to the covariance matrix? They appear to be matrix inverses of each other, but Wikipedia has no mention of it. —Ben FrantzDale (talk) 22:11, 29 December 2008 (UTC)


 * What is stated explicitly in multivariate normal distribution is the way in which the inverse of the covariance matrix occurs in the probability density function. Michael Hardy (talk) 01:34, 30 December 2008 (UTC)

Complex random vectors
Is there any reference to the fact that covariance matrices for complex random vectors are Hermitian positive definite, could they be Hermitian positive semidefinite? Does anybody have a citation or at least an explanation of why this is? Most of the books in probability only deal with real vectors and when they do talk about complex, they don't go into stating the properties of the covariance matrix, just define the special case when vectors are complex. So, please, anybody with some insight about this, please. Felipe (talk) 19:41, 7 July 2010 (UTC)

XXT
Hmmm.. As far as I know the product X * Transpose(X) is not defined if X is a column matrix. It should be the opposite: Transpose(X) * X, shouldn't it? Yet, throughout the article, the first "notation" is used. —Preceding unsigned comment added by 217.211.151.32 (talk) 14:19, 6 March 2011 (UTC)


 * The article is correct as is. If X is n×1 then XXT is n×n, and its ij element is xixj.  Duoduoduo (talk) 17:31, 6 March 2011 (UTC)
 * Duoduoduo is right. You just follow regular matrix-multiplication rules and wind up with a big square matrix. This operation is also known as an outer product, as opposed to $$X^\top X$$, which is an inner product or scalar product, which results in just a scalar result. —Ben FrantzDale (talk) 13:39, 7 March 2011 (UTC)

Definition contains errors
Hi all,

Since i am not a registred user i write here:

The definition of a covariance matrix as it is now suggests that X1 etc. are single elements, as the article starts with a random variable explanation. This is misleading in the rest of the article as well as X1 being single element is not true.

Note that COV(X,Y) is the sum over all i in X and Y (Xi-mean(X))*(Yi-mean(Y)). Please not that the number of elements within X and Y have to be the same.

In plain english: If you have n observations from 2 variables X and Y then there is just one covariance. If you have n observations of m variables (i.e. your data has m dimensions), you will have m!/(2(m-2)!) covariances.

To have variances and covariances convieniently in one matrix the variance-covariance matrix displays all those covariances and variances in one matrix.

I think this is a very important article being looked up a lot and it should be corrected.

Good reference on the topic, with cristal clear explanation can be found here: http://users.ecs.soton.ac.uk/hbr03r/pa037042.pdf

Cheers Ben —Preceding unsigned comment added by 129.31.217.14 (talk) 23:05, 12 April 2011 (UTC)


 * The article's definition is correct. Xi is indeed a scalar random variable. X1 is a different random variable from X2. Both these random variables, and all the other ones, have kmax observations The expectations in the article are taken over the observations k=1,...,kmax  Duoduoduo (talk) 00:10, 13 April 2011 (UTC)

If the article would mention this yes, unfortunately it does not mention that its taken over k=1,..kmax; and even if it did it remains very confusing to use the same letters for vectors and scalars. —Preceding unsigned comment added by 129.31.216.148 (talk) 16:07, 13 April 2011 (UTC)


 * With my edit to the lede earlier today, it does mention that each of the variables Xi has a certain number of either observations or potential observations. I think that should make it clear. As for using the same letters for vectors and scalars, it simply uses boldface unsubscripted X and Y to refer to vectors, and unboldfaced subscripted X and Y to refer to scalars. This is standard, and I don't think there's any non-standard alternative that would be workable. However, I will put in something alerting the reader to the notational scheme. Duoduoduo (talk) 19:30, 13 April 2011 (UTC)

Not sure if this is the same problem, but I find the definition very confusing. X is referred to as a "column vector." It would therefore seem to be a vector of scalars, e.g. [1 2 3 4 5]. Xi would then be the ith element of this vector. It would seem, then, that $$\mu_i$$ is the mean of this ith element. Which can't be right. The definition seems to be using Xi ambiguously, referring in one context to the ith scalar in a vector of scalars, and in another context as a vector of scalars, namely, the vector of (say) observed values of a variable quantity (e.g., height) in a given sample. Is this correct? If so, is there a way to re-write the definition in a standard notation that would remove the ambiguity? That would be helpful. — Preceding unsigned comment added by 76.100.128.83 (talk) 19:45, 2 December 2013 (UTC)

Matrix random variables
There are now some articles about matrix-valued random variables: e.g. matrix normal distribution, matrix t distribution. Some of these refer to a covariance matrix for these matrix random variables. It would be good to have something here about how such a thing is defined, and if there is a standard way of doing this. Clearly one can re-arrange the matrix into a vector and get a covariance matrix for this, but there must be a standard for working by rows or columns etc., and possibly a different way of treating symmetric=matrix-random-variables. Melcombe (talk) 12:26, 5 July 2012 (UTC)

Inverse precision matrix
The article states "The inverse of this matrix, Σ−1 is the inverse covariance matrix, also known as the concentration matrix or precision matrix;[1] see precision (statistics). The elements of the precision matrix have an interpretation in terms of partial correlations and partial variances.[citation needed]"

It's vague as to what is meant by "an interpretation". If the elements are the partial correlations, then please just say so. I'm not certain myself, but it seems that this indeed is the interpretation given by users of graphical models, as in Friedman, Hastie, & Tibshirani, 2007:

"The basic model for continuous data assumes that the observations have a multivariate Gaussian distribution with mean μ and covariance matrix Σ. If the ijth component of Σ−1 is zero, then variables i and j are conditionally independent, given the other variables."

161.130.188.94 (talk) 15:46, 14 April 2014 (UTC)Joe Hilgard

Properties of the sample covariance matrix
The current Wikipedia article on "Covariance Matrix" has a section on its "properties". It would be useful to know how many of these properties are relevant and correct for the sample covariance matrix. The sample covariance matrix could be regarded as the covariance matrix of a population consisting exactly of the sample, except for 1/(n-1) factor in the estimator. Is that difference important for "properties"?

Tashiro (talk) 19:15, 30 November 2014 (UTC)


 * The properties still hold for the covariance matrix of the sample and for that matrix adusted by the factor 1/(n–1). Loraof (talk) 17:44, 5 July 2018 (UTC)

Second paragraph in the introduction
It seems to me that the second paragraph in the introduction is about something other than what I understand by "covariance matrix". By "covariance matrix" I understand a matrix where the (i,j)th element is the covariance between variables X_i and X_j, but in this paragraph there are two sets of variables and the (i,j)th element is the covariance between X_i and Y_j. This is different from a covariance matrix in many ways, and in particular, it need not be symmetric, positive definite or even square.

From a brief glance the rest of the article seems to be in line with what I'd expect. This includes the first and third paragraphs of the introduction, which thus seem to contradict the second one. Is this use of multiple definitions intentional or should it be changed? Nathaniel Virgo (talk) 05:42, 10 October 2016 (UTC)

Notation in definition
I am not familiar with the E operator being used for expectation value, but that may follow from my background in physics rather than statistics. Even so, there's a structural problem with the E notation. It's not really an operator because nothing is actually being operated on. X_i itself doesn't contain any information about its own mean value, so structurally, E(X_i) is a single token variable that represents some known mean value of X_i. It certainly doesn't look like a single token, and that can lead to serious confusion on the part of the reader while peppering the page with numerous E's. Wolfram Alpha uses $\left\langle\mathrm{angle brackets}\right\rangle$, which I also find (slightly less) confusing for the same reason.

I suggest we use the $\overline{\mathrm{overline}}$, which clearly indicates that $\overline{X_i}$ is a different variable than $X_i$  and that you can't get $\overline{X_i}$  by doing $\sum_i^N\frac{X_i}{N}$ , which is tempting and wrong. It also greatly shortens several formulae. This was standard notation in my mathematical education, and I'm sure I could dig up a few textbooks that use it.

Please discuss before we make any changes. Acronymsical (talk) 15:57, 3 August 2017 (UTC)

Explanation for positive semi-definiteness
Section "Which matrices are covariance matrices?" reads:

"From the symmetry of the covariance matrix's definition it follows that only a positive-semidefinite matrix can be a covariance matrix."

This is false: symmetry (even with nonnegative diagonal entries) does not imply positive semi-definiteness, e.g.

$$\begin{bmatrix}1 & 1 \\ \end{bmatrix} \begin{bmatrix}1 & -2 \\-2 & 1 \\ \end{bmatrix} \begin{bmatrix}1 \\ 1 \\ \end{bmatrix} = -2<0$$.

The correct justification for the positive semi-definiteness of covariance matrices is:

$$ w^{\rm T} \operatorname{E} \left[(\mathbf{X} - \operatorname{E}[\mathbf{X}]) (\mathbf{X} - \operatorname{E}[\mathbf{X}])^{\rm T}\right] w = \operatorname{E} \left[w^{\rm T}(\mathbf{X} - \operatorname{E}[\mathbf{X}]) (\mathbf{X} - \operatorname{E}[\mathbf{X}])^{\rm T}w\right] = \operatorname{E} \left[(w^{\rm T}(\mathbf{X} - \operatorname{E}[\mathbf{X}]))^2\right] \geq 0 $$

I made this change. I found the proof in SE, but I don't know if I should cite SE. Also see 3.21 in Wasserman. — Preceding unsigned comment added by 2001:16A2:87EF:4900:782C:159B:2B2D:753B (talk) 14:48, 22 May 2018 (UTC)

Consistent typography
The typography of the Covariance matrix section is inconsistent with the other sections, e.g. Covariance matrix. I see three ways how to restore consistency:

1. Edit bold and italic fonts in Covariance matrix to be consistent with Eq. 1, which is

2. Unfortunately, the typography in above equation itself is inconsistent since K is a matrix and &mu; is a vector, so they should be bold and roman (and T also should be roman):

3. However, subscripts X should be regarded as names rather than vectors, since using a vector as a subscript is an ill-defined concept. Therefore these subscripts should not be bold:

I think convention 3 is the best but most Wikipedia articles on statistics use convention 2. Before I add a couple of sections to this article (see User_talk:Sandstein) I would like to correct the typography but do not want to make it inconsistent with other articles. What do you think? Is important to keep it consistent across several articles? Or only within any given article? Or its OK to mix conventions even withing the same article? FizykLJF (talk) 18:59, 3 August 2019 (UTC)

Covariance mapping section
As described in the previous post, I have added a section on "Covariance mapping" and corrected some inconsistent typography and notation in other sections. I declare a possible conflict of interest: I am an expert in this research area and the author of reference 9. FizykLJF (talk) 14:05, 18 August 2019 (UTC)

Estimation, reference to "data matrices"
The section links to a definition of "data matrices" where a data matrix is defined to contain one experiment per row. But in the same sentence two data matrices are defined with one experiment per column. This is confusing. The easiest solution would be to just remove the link to the "data matrix" article. A better solution might be to actually _use_ the "one experiment per row" definition, and adapt the formulas in the section on estimation accordingly. Ernstkl (talk) 09:57, 12 July 2021 (UTC)

Autocorrelation matrix: wrong definition
Happy to be corrected here, but what is the reference for the definition currently (02 Mar 22) for the "autocorrelation matrix" as E[XX^T]? I have never seen this before. My understanding is that the autocorrelation matrix is the same as the "correlation matrix" given in the next subsection. E[XX^T] is just the non-centred (auto-)covariance matrix. I'm not aware it has a name, and it's not terribly useful by itself. I suggest "Relation to the autocorrelation matrix" is removed or completely replaced. Wanted to see reactions before I attempt.

Variable rho used without definition
The variable rho is used without definition in section "Inverse of the covariance matrix." Jaguarmountain (talk) 17:43, 30 July 2023 (UTC)