Talk:Covariance/Archive 1

Redundancy
After the latest edit, the article includes this:


 * If X and Y are independent, then their covariance is zero. This follows because under independence,


 * $$E(X \cdot Y)=E(X) \cdot E(Y)=\mu\nu,$$


 * The converse, however, is not true: it is possible that X and Y are not independent, yet their covariance is zero. This is because although under statistical independence,


 * $$E(X \cdot Y)=E(X) \cdot E(Y)=\mu\nu,$$


 * the converse is not true.

The second sentence seems to be just a repetition of the first. Why is the addition of the second sentence, repeating the content of the first, an improvement? Michael Hardy

18:15, 21 August 2006 (UTC)


 * Clarification: On second thought, what I meant was: the part that begins with "This is because..." and ends with "... is not true" seems to repeat what came before it. Michael Hardy 18:24, 21 August 2006 (UTC)

You're right! I introduced some (partial) redundancy - I'll go back to amend it.

{I'm new to Wikipedia & still haven't sorted this 'Talk' thing out yet - so please bear with me while I learn!} Johnbibby 16:59, 22 August 2006 (UTC)

algorithm to estimate
This article should include (an easily understandable) outline of an algorithm to estimate the covariance between two sets of (finite) N measurements of variables X and Y. A note should be included about maximum likelihood vs. unbiased estimators and how to convert between the two.

eg. start with two random variables X and Y, each with N measured values

X = { X_1, X_2, ..., X_N } = { X_n }, n = 1 ... N Y = { Y_1, Y_2, ... , Y_N }

estimate their means

muX = sum(X_i)/N, i = 1 ... N muY = sum(Y_i)/N, i = 1 ... N

'centre' the values about their estimated mean

centreX = { X_i - mu_X }, i = 1 ... N centreY = { Y_i - mu_Y }, i = 1 ... N

then estimate the Covarance of X and Y

Cov(X, Y) = sum( centreX_i * centreY_i ) / (N - 1), i = 1 ... N (unbiased) Cov(X, Y) = sum( centreX_i * centreY_i ) / N, i = 1 ... N (maximum likelihood)

I'm not entirely sure that I've got unbiased / maximum likelihood correct, but this paper seems to agree at the bottom of page 1 and top of page 2.

142.103.107.125 23:34, 30 August 2006 (UTC)

outer product?
Outer_product mentions that the outer product can be used for computing the covariance and auto-covariance matrices for two random variables. How this is accomplished should be outlined on this page, or that page... somewhere. 142.103.107.125 00:53, 31 August 2006 (UTC)

How about a beginners definition
Don't get me wrong, I think it's great that wikipedia has brainiacs that want to include all kinds of details. But how about a really good definition for math newbies? How about a metaphor or an example for folks that only want to understand it enough to complete a conversation and then go back to their relatively mathless life? —The preceding unsigned comment was added by Tghounsell (talk • contribs) 01:09, 20 March 2007 (UTC).

Myu vs v thing
Why are we using a little v thing for the mean of y? Shouldn't we use μy ?? Fresheneesz 07:28, 21 March 2007 (UTC)

ν is the Greek letter "nu", which comes after μ in the alphabet. 66.28.71.70 17:47, 29 March 2007 (UTC)

Inner product
The last part of the section on inner product is not clear. Perhaps someone could explain better why random variables is in quotes, for example. Of course X + K is not distributed the same as X; usually the mean will be different. That doesn't explain why "random variables" is in quotation marks. Maybe it should say it's because a constant K isn't really a random variable? Or just remove the quotation marks? Using quotation marks to indicate vagueness in a math article may not be a good idea; better to state it a different way, correctly. --Coppertwig 12:34, 17 June 2007 (UTC)
 * "It follows that covariance is an inner product over a vector space of "random variables", with a(X) = (aX) and X + Y = (X + Y). "Random variables" is in quotes because it is not true that X + K is distributed the same as X for any constant K; but as long as these three basic properties of covariance apply, the duals of theorems regarding inner products that depend only on those properties will be valid."
 * OK, I think I fixed it. --Coppertwig 12:46, 17 June 2007 (UTC)

Sample Covariance
Can we put a more prominent link to Sample Covariance? It is important to tell readers right away that if they are actually looking to construct a covariance matrix, they need to see the other page.daviddoria (talk) 18:45, 11 September 2008 (UTC)

Drop $$\mu$$ and $$\nu$$?
It seems to me that abbreviating $$\mu := E(X)$$ and $$\nu := E(Y)$$ adds, well, absolutely nothing to the article.

The definition section, for example, would read like this:

The covariance between two real-valued random variables X and Y, with expected values $$\scriptstyle E(X)$$ and $$\scriptstyle E(Y)$$ is defined as


 * $$\operatorname{Cov}(X, Y) = \operatorname{E}((X - \operatorname{E}(X)) (Y - \operatorname{E}(Y))), \,$$

where E is the expected value operator, as above. This can also be written:
 * $$\operatorname{Cov}(X, Y) = \operatorname{E}(X \cdot Y - \operatorname{E}(X) \cdot Y - X \cdot \operatorname{E}(Y) + \operatorname{E}(X) \cdot \operatorname{E}(Y)), \,$$
 * $$\operatorname{Cov}(X, Y) = \operatorname{E}(X \cdot Y) - \operatorname{E}(X) \cdot \operatorname{E}(Y) - \operatorname{E}(X) \cdot \operatorname{E}(Y) + \operatorname{E}(X) \cdot \operatorname{E}(Y), \,$$
 * $$\operatorname{Cov}(X, Y) = \operatorname{E}(X \cdot Y) - \operatorname{E}(X) \cdot \operatorname{E}(Y). \,$$

Random variables whose covariance is zero are called uncorrelated.

If X and Y are independent, then their covariance is zero. This follows because under independence,


 * $$E(X \cdot Y)=E(X) \cdot E(Y).$$

Recalling the final form of the covariance derivation given above, and substituting, we get


 * $$\operatorname{Cov}(X, Y) = E(X) \cdot E(Y) - E(X) \cdot E(Y) = 0.$$

The converse, however, is generally not true: Some pairs of random variables have covariance zero although they are not independent. Under some additional assumptions, covariance zero sometimes does entail independence, as for example in the case of multivariate normal distributions.

The units of measurement of the covariance Cov(X, Y) are those of X times those of Y. By contrast, correlation, which depends on the covariance, is a dimensionless measure of linear dependence.


 * Using these symbols does add absolutely nothing to the article. However it removes some of the clutter in the formulae, making them easier to see and understand ... therefore a good thing I think. Melcombe (talk) 12:10, 15 January 2009 (UTC)

More Citations Please
I'd like to see more citations on this page, particularly having to do with the more abstract treatment toward the end. Wikipedia should act as a portal to more detailed information on the web. I am having a hard time finding treatments of the covariance function, but the person who wrote the information on inner products, Banach spaces, etc should have the information (or else it shouldn't be there!).Trashbird1240 (talk) 20:24, 13 May 2009 (UTC)

citations and simplifications
I don't do this much, so pardon if I inadvertently break rules. My copy of, defines the covariance of two random variables x and y as
 * $$\operatorname{Cov}(x, y) = E((x - \mu)(y - \nu))$$

using $$\mu = E($$ x ), etc. which simplifies to
 * $$\operatorname{Cov}(x, y) = E(xy) - E(x) E(y) $$

(see p. 152, Papoulis) This seems like a simpler formulation than that that appears in the article. I also think that the concept of random variables is not enough emphasized and linked. It is really impossible to understand this discussion without understanding what random variables are. In Papoulis they are always shown in boldface, which is quite helpful, but I don't know how to do that here. -- Alanyoder (talk) 03:47, 2 September 2009 (UTC)

Covariance and Covariance matrix
In this article we use $$\operatorname{Cov}(X, Y)$$ to talk about both the covariance and the covariance matrix. I think that this is confusing. --Belchman (talk) 16:03, 31 October 2009 (UTC)

Covariance operator, etc.
I think the notation currently used in the article is quite confusing. We write Cov(X,Y) for the covariance of two random variables X and Y (which is pretty standard), but later on we also write Cov(x,y) for the “variance-covariance matrix” of a third random variable Z, which somehow isn't even present in the notation, whereas x and y here are just auxiliary quantities demonstrating how the Cov bilinear form acts. So, the covariance operator C:H→H of random variable Z is defined as
 * $$C_Z(f) = \operatorname{E}\big[ (Z-\operatorname{E}Z, f) \cdot (Z-\operatorname{E}Z) \big]$$

Oh and btw, some of the definitions in text are missing this $$-\operatorname{E}Z$$ part. Also, the definition of covariance for the function-valued r.e. valued follows directly from the definition for the generic Hilbert space H, since random processes are defined on the Hilbert space L² of square-integrable functions (or at least we'd need square integrability for covariance to be defined). //  st pasha  » 09:31, 25 February 2010 (UTC)

Definition, X,Y need to be integrable
In the current version there are given 4 definitions, but we would have to agree on the first one (in my understanding the most common) since the third and forth are only equivalent if X and Y as well as their product is integrable. Quiet photon (talk) 18:58, 4 March 2010 (UTC) Actually you could require square-integrability from the start. I will add that. Quiet photon (talk) 19:17, 4 March 2010 (UTC)

Incremental??
I very much doubt wat is written in the section "incremental computation". An incremental form of computation exists for the sample covariance. 82.75.140.46 (talk) 08:29, 21 April 2010 (UTC)

Missing topic: Covariance in error estimation
What I am completely missing in this article is at least one section about how to use covariance in error estimations. For example, the plotting tool Gnuplot calculates the covariance matrix of a linear fit f(x)=a+b*x to a x-y data sample. The covariance of params a,b may be -0.9...something, while the covariance of the data themselves is +0.6...something. To calculate the covariance of the errors one needs to know, how to convert the information from this article into this "second order" problem: I.e., what is the expectation value of each parameter, how does one define the errors etc. How does one define a sample of (a,b)? Just by picking random values around the best-fit a,b and then applying the covariance formulae? Unfortunately, gnuplot.pdf doesn't spend any word about this. If there is any expert on such class of (highly important) problems, his or her contribution to this article would be highly appreciated.--SiriusB (talk) 11:14, 8 April 2011 (UTC)
 * I don't think this article can realistically cover how to calculate covariance of parameter estimates for every statistical model. It's more appropriate to cover that in the article for each model. Sounds like you're thinking about simple linear regression which doesn't currently include this (but perhaps should); however, simple linear regression is a special case of ordinary least squares, and that article does include the formula for the variance-covariance matrix under ordinary least squares. Qwfp (talk) 19:21, 8 April 2011 (UTC)
 * No, I am talking about what gnuplot does for each fit: The correlation matrix (which is closely related to the covariance matrix) shows the correlation of the parameters (a,b), not the x-y-data. I would guess that a large fraction of users looking up this article want to know, how to get this correlation. Meanwhile, I've found out, but did not yet find a citeable source: Calculate an array for both parameters, scanned around the confidence ellipse with sufficient resolution. Take all (a,b) tuples with a chi-square corresponding to sigma <3 (or about this), and calculate the covariance matrix the same way described here for (x,y) points, but with the weights given by the corresponding (exclusion) probability (ca. 0.3% for 3 sigma, and larger inside the 3-sigma contour). A more sophisticated method could include Markov chains, or the ellipse axes (for near-Gaussian errors) instead.--SiriusB (talk) 07:52, 11 April 2011 (UTC)
 * If you're not talking about the correlation or covariance matrix of the estimates of the parameters of a simple linear regression model, I have no idea what you are talking about. Maybe a screenshot or text output sample from gnuplot would help. Qwfp (talk) 08:26, 11 April 2011 (UTC)

Geometrical Interpretation
@Tfkhang: I don't understand what you mean to say witrh:

Geometrically, the covariance can be thought of as the sum of signed rectangular areas, where those lying to the upper right and lower left quadrants relative to the mean of X and Y are positively signed, and those lying elsewhere are negatively signed.

It is not correct anyway. Nijdam (talk) 10:59, 4 December 2011 (UTC)

@Nijdam: I have made a diagram and tried to explain things clearer.

Tfkhang (talk) 00:21, 5 December 2011 (UTC)


 * Sorry, but this does not contribute to any understanding of covariance. Besides it is definitely own research, and it is not a correct way of describing. If you want to improve the article, with a geometrical interpretation, find a reliable source and discuss your idea here on the talk page first, before changing the article. Nijdam (talk) 10:05, 5 December 2011 (UTC)

@Nijdam: I would appreciate it if you could explain why it "does not contribute to any understanding of covariance", instead of dismissing it.

Tfkhang (talk) 11:36, 5 December 2011 (UTC)

In the first place, if there would be a geometric interpretation, it should not be considered a property.

Let (X,Y) be a paired observation.

>>>if it's about observations, this is about the sample covariance and not the variance

Geometrically, we can think of the covariance of X and Y as the average of the sum of signed rectangular areas induced by the relative position of (X,Y) to the coordinate for the mean of X and Y: $$(\mu_x, \mu_y)$$.

>>>what are signed areas, and in what way are they induced?

>>>The (sample) covariance is certainly not an average of areas.

>>>The sample covariance is a measure relative to the sample means, not the expected values

This means that, by centering around $$(\mu_x, \mu_y)$$, rectangles induced by coordinates to the upper right and bottom left would have positive signs, while those induced by coordinates to the upper left and bottom right would have negative signs.

>>>Something like this holds for the relative values, but not for areas.

Nijdam (talk) 15:48, 5 December 2011 (UTC)

@Nijdam: You have some constructive ideas here. I just wish you could engage my writing positively by adding/clarifying points instead of undoing it. At least, this is what I understand what wikipedia is about.

>>>if it's about observations, this is about the sample covariance and not the variance

Accepted.

>>>what are signed areas, and in what way are they induced?

I have a coordinate (x,y), and there's (mu_x, mu_y); I can induce a rectangle with these two coordinates as opposite corners by drawing a horizontal line from either coordinate (towards the other coordinate), and then connecting it to the other coordinate perpendicularly, repeating this until it returns to the original coordinate. Signed areas simply means that the area has a positive or negative sign, depending on where the rectangle is induced. The diagram makes this clear.

>>>The (sample) covariance is certainly not an average of areas.

I just said it could be thought of as such. If a way of thinking is a useful pedagogically, it should be made known. I don't think Wikipedia math/stats should end up reading like a Bourbakian treatise.

And I do not agree that this is "own research". The geometry is just begging to be seen from the formula for the sample covariance.

Tfkhang (talk) 09:22, 12 December 2011 (UTC)


 * There is no geometrical interpretation. May be you refer to a graphical understanding. The main thing a graphical presentation may clarify is the sign of the covariance as an indication of the type of relationship, in the sense that if the data show a positive linear tendency the majority of the data will be found in the quadrants on the right above and on the left behind the central point. These data contribute positive to the covariance, and the data in the other quadrants contribute negative. Is this what you want to explain? Nijdam (talk) 10:03, 12 December 2011 (UTC)

Positive feedback
Just want to say that as a high school student working in a lab, I found this article (especially the first section) to be exceptionally well written. Most higher-level math pages are overly pedantic and throw terminology around far to much, only occaisionally linking to equally poorly defined articles. As an outside observer, I think this article should be an example to other mathematical pages. Etiher that, or I've become far too familar with this type of subject matter! Thanks for making my work eaiser to understand!

Thanks to the above - nice to haev positive feedback! Johnbibby 19:38, 1 September 2006 (UTC)

Unclear line 3
I think the following is unclear, can someone clarify it? (Since it's not clear to me, I doubt I'm the right one to clarify it... (3) positive definite: Var(X) = Cov(X, X) ≥ 0, and Cov(X, X) = 0 -> X is a constant random variable (K). Pt314156 15:38, 27 April 2007 (UTC)

e.g.?
An arithmetic example, especially yielding a covariance value for an example population and then illustrating its use and usefulness would help. 76.175.137.87 (talk) 21:47, 14 July 2010 (UTC)

Comment from the top
I agree to the comment below. How about a motivation at start. What is it in principle? Does it have an easy understandable analogy. Is it a scalar number or a vector/matrix? Why use "expected value" when we can say it simpler as "average" (or mean). And "For random vectors X and Y" what does that mean? Does it mean vectors of random variables (a lot of random variables so to speak) or is it an array of random values? Later in the text from properties and so on, its better. But the beginning is the most important and should be as clear and as easy as possible. (Some notes about precision: The higher precision you aim for, the more complex is it gonna be. Like "average" (look up there) can be defined in many ways, but the idea stays the same. And what is "expected value"? Is it clearer? No. Is it more precise? Yes, since it have a stricter definition. But does it communicate well. No. It does not communicate well because humans tend to put meaning in words. And in my experience one can "expect" a wide range of possibilities. — Preceding unsigned comment added by 85.164.125.248 (talk) 16:02, 18 December 2011 (UTC)

This page is not helpful to anyone. If you understand a word it says, you have no reason to read it. If you want to understand how to calculate covariance this doesnt help. I dont understand why Wikipedia in general must be so terrible at math, finance and statistics. I guess it is because whatever you write, it is incorrect in some special case and thus sentences like "with finite moments" must be added - to make it 100% correct - but 100% incomprehensible to most people. — Preceding unsigned comment added by 130.226.45.152 (talk) 16:28, 17 December 2011 (UTC)

How is covariance related to other similar concepts like correlation? (Statistics need a concept-wash, using too many similar concepts in my opinion) 88.89.12.108 (talk) 04:17, 22 December 2011 (UTC)