Talk:Gaussian process

Inadequate material for non-technical readers
The introduction is pretty much impenetrable for a lay reader. The attempt to define simply what a stochastic process is, for example, says "a collection of random variables indexed by time or space". But what if the reader is not familiar with random variables? Or does not instantly grasp what indexing by time or space means? A real-world example would go a long way here, especially one where the random variables and space/time-indexing can be concretely and intuitively linked to something in everyday experience.

This theme continues throughout the whole article, as the reader is assumed to have a strong mathematical or statistical background. There is a distinct lack of non-specialist, non-abstract examples. The introductory text in the "Applications" section fails to identify a single concrete example of a problem that Gaussian Processes might be applied to. "Given any set of N points in the desired domain of your functions..." OK, but what might those points and functions represent in the real world? "Gaussian processes are thus useful as a powerful non-linear multivariate interpolation tool." OK, but what kind of real-world problem might require non-linear multivariate interpolation?

Etc. — Preceding unsigned comment added by 2A02:6B6E:B8CD:0:7D5A:C359:DADB:4C38 (talk) 21:49, 30 December 2021 (UTC)

Gaussian Process vs. integral of Gaussian Process
Is the integral of a Gaussian process somehow also a Gaussian process? Or is this just a common abuse of terminology? I think it's the later, and made some changes to reflect that... — Preceding unsigned comment added by 132.204.26.35 (talk) 21:33, 16 September 2014 (UTC)

An integral is a linear operator, and linear transformations of gaussian distributions are gaussian, so it is still a gaussian process. Joanico (talk) 18:36, 23 May 2020 (UTC)

Untitled
Added cleanup tag: this article does not give someone in the field an adequate overview of what a Gaussian process is, and goes off on a tangent involving undefined math. —Preceding unsigned comment added by Ninjagecko (talk • contribs)


 * Perhaps it could be made accessible to a somewhat broader audience, but where does it go off on a tangent or get into "undefined math"? It gives the definition and a simple characterization, and then it lists examples, with links. Michael Hardy 21:45, 4 December 2006 (UTC)


 * Of course, any article can be improved in many ways, and surely this one can also. However, I have no idea what you mean by undefined math. In addition, I would have thought that for someone in the field, this article is rather banal and uninteresting, since surely its contents would be already familiar to such an individual. Do you mean someone not in the field? --CSTAR 03:54, 5 December 2006 (UTC)


 * Somehow my reply never went through. Michael-- Yes, you're right. Technically the indices were previously defined way at the top, thus I removed the cleanup tag. Nevertheless it wasn't very clear I thought, so I improved the article lots by categorizing all the glomped-up text, and making the definition abit clearer. CSTAR-- No, I meant what I said: "someone in the field". Even as a reference, it was hard to follow. I've already fixed it though. —The preceding unsigned comment was added by Ninjagecko (talk • contribs) 09:21, 6 December 2006 (UTC).


 * Also CSTAR, I personally find it rather haughty, to imagine the only people who have any business reading this entry are people who've been working with this material for 4+ years. The point of a reference is to be a reference for someone who wants to learn or brush up on the material. No offense. Ninjagecko 09:24, 6 December 2006 (UTC)


 * I don't think your statement(the only people who have any business reading this entry are people who've been working with this material for 4+ years) paraphrases in any way what I said. In any case what I had intended to say was that the article was technicaly correct. --CSTAR 13:36, 6 December 2006 (UTC)

suggestions for clarification
I'm not in the field, and I have found some things I wish this article would clarify. Please feel free to say there is some other, introductory article to the topic that I should have read which would have explained the answers to my questions. 141.214.17.5 (talk) 19:46, 10 December 2008 (UTC)
 * 1. What is an easy, mathematical example of a Gaussian process?
 * 2. Does the definition imply that a Gaussian process is normally distrusted? (I think the answer is obviously yes, but I have no experience to justify changing this article.)
 * 3. How does the definition imply the parenthetical remark "any linear functional applied to the sample function Xt will give a normally distributed result"? An example?  So integrating Xt yields a Gaussian process?
 * 4. What is a sample function? pdf?  cdf?  Other types?

After looking around some more, I can't tell why this doesn't redirect to the article for multivariate normal distributions. Any explanation? 141.214.17.5 (talk) 16:11, 11 December 2008 (UTC)


 * Gaussian processes are distributions over infinite dimensional objects (i.e functions), whereas multivariate normal distributions are defined over finite dimensional objects or variables. In other words, GPs can be thought of as extension of multivariate normal distributions to infinite dimensionality. appoose (talk)


 * I do not know the proof, but for 3, integration of a GP results in a GP as well as any other linear operation (summing, differentiation, etc.) Aghez (talk) 20:52, 11 March 2012 (UTC)

I am in the field. The "definition" will be scrubbed and the "alternate definition" will take its place. Done. — Preceding unsigned comment added by Izmirlig (talk • contribs) 15:34, 17 August 2017 (UTC)

Link to the Gaussian Processes Research Group at the Australian Centre for Field Robotics
I have renamed the link to www.gaussianprocesses.com, to "The Gaussian Processes Research Group at the Australian Centre for Field Robotics". The web site has a very general sounding name, but the home page is currently recruiting students to a lab, rather than explaining the theory of Gaussian processes, as the link description previously claimed to do. I hope this avoids confusion. Mebden (talk) 08:26, 5 March 2009 (UTC)

Alternative definition
Is the $$i$$ that appears in the second display formula of the section the Imaginary unit? If it is an index, it is not bound to any summation sign. Maybe a real-valued variable? I do not have a reference with me of the formula so I cannot fix it, but I guess that something is missing. I would be grateful if someone does fix it. Junkie.dolphin (talk) 15:49, 3 July 2012 (UTC)
 * The fact that it is the imaginary unit is confirmed/implied by the equation being part of a sentence starting "Using characteristic functions ....". Melcombe (talk) 16:55, 3 July 2012 (UTC)
 * Thanks for the clarification, I had somehow failed to notice that detail. Junkie.dolphin (talk) 15:45, 24 July 2012 (UTC)

"Process" is a "distribution"?
The current article says: "A Gaussian process is a statistical distribution Xt, t ∈ T, for which any finite linear combination of samples has a joint Gaussian distribution." I think a "process" is an indexed collection of a random variable while a "distribution" is a function associated with a single random variable. The notation apparently intends to convey the idea of "an indexed collection of distributions", so it would be better to use those words than the singular "a statistical distribution".

Tashiro~enwiki (talk) 18:15, 30 October 2015 (UTC)


 * Yes, this must be wrong and it's confusing. It means you have to look somewhere else for the actual efinition (outside of Wikipedia). 76.118.180.76 (talk) 03:11, 15 December 2015 (UTC)


 * Hello. It is a distribution, but over an infinite dimensional space. Which makes it rather different from more common distributions, like e.g. the Gaussian distribution. I think the term "distribution" is more misleading than helpful here, so I have replaced it with plain "statistical model", since the text does then go on to define a GP. I hope that helps. — Preceding unsigned comment added by Winterstein (talk • contribs) 09:09, 11 June 2016 (UTC)


 * Agreed. This is the first time ever that my opinion of Wikipedia as the definitive source for mathematics has ever taken a big hit. I can appreciate the fact that from the writer's perspective, that the first sentence, referenced just above in this discussion, looks simpler than to say statement as written  — Preceding unsigned comment added by 156.40.216.3 (talk) 15:27, 17 August 2017 (UTC)

Lazy learning and Optimization
, I noticed the addition on the page relating GPs to lazy learning and them usually being fitted with optimization software. While I appreciate that your experience may have given you this practical insight, I am not sure that this is beneficial to someone trying to understand what is a GP.

Regarding lazy learning, I am not familiar enough with the concept to be able to tell if it applies here, but from the short wikipedia article and your blog I can see how it would apply to a GP used for krigging.

Regarding optimization software, what is really necessary is some matrix algebra, which includes a matrix inversion, to get the posterior mean (if you want a single value estimate) and some more to get the posterior variance if you want that too. While in certain cases (large matrices, etc.) optimization software may be used to find these, it is not something fundamental to the process that one reading this article would need to know about.

Finally, it can only be viewed as a machine learning algorithm when used for prediction (krigging) as you mention, so overall I think your comments would be more at home in the Applications section. It might also be more appropriate to give actual sources than a blog entry, despite how impressive your background is. Thank you. Webdrone (talk) 17:38, 7 June 2016 (UTC)


 * Actually it would be a great help if you could help fix the very first sentence which reads "[...] a Gaussian process is a statistical distribution, [...]". Webdrone (talk) 17:42, 7 June 2016 (UTC)


 * Hello Webdrone. Thank you for your thoughtful comments.

I think it is appropriate that the overview section should include notes on the uses of a technique as well as the technical definition -- otherwise it isn't an overview. Also, we'd like the overview to be readable by a range of people. As it was, the overview was not accessible to anyone other than probability theorists. Making it a little more accessible to the machine learning community is a good thing. I think there is more work to be done making this article accessible, both within these communities and to more communities, but I do believe my addition helps.

I also think that the infinite-dimensional distribution-based phrasing is a challenging way to introduce new people to this model (especially for the majority of those who use statistical methods but have not studied e.g. Hilbert spaces). Giving people a couple of ways to get their head around these ideas can only help.

Regarding the mention of "using optimisation software" -- thank you for the observation about matrix algebra being enough. Optimisation software is needed if you use a parameterised kernel (which opens up a wider range of applications beyond "traditional" kriging). I will amend the text now to give both.

Regarding sources for a paragraph that is an aid towards understanding -- academic papers go straight to the technical definitions by their very nature, and I don't know of a GP textbook yet which has an introduction for non-probability-theorists. Blog posts are the "natural" source for this kind of material. If you know of a better source, please do put one in. I don't think it would be appropriate to fully expand this paragraph within this article, as the explanation-for-machine-learning-people would then somewhat swamp the important technical matter.

Thank you again for your comments. I believe we're improving the article considerably through this. --winterstein (talk) 08:49, 11 June 2016 (UTC)


 * Hello WebDrone. Re. the first sentence -- I agree it could use work, but I can't think of a good re-phrasing. I've replaced the "stats distribution" phrase -- which other people have also complained about (see above) with the less confusing (if also less meaningful) phrase "stats model". — Preceding unsigned comment added by Winterstein (talk • contribs) 09:05, 11 June 2016 (UTC)


 * , I guess you are right, including your comments might make it more accessible to people from different backgrounds. I hope we're improving the article -- it annoys me that it's not well-written, but I'm not sure how to improve it.
 * As for the infinite-dimensionality explanation, I feel like alternative explanations are always missing something. I come from physics where Hilbert spaces are often used so maybe that's why. Do you think an explanation along the following lines might help a reader visualise the infinite dimensionality setting?
 * "The function (f(x)=y) can be thought to exist as a single point in a (infinite-dimensional) space where each point x in the function's domain is a separate dimension in this new space. Values of y associated with each x point are coordinates of the function in that x dimension; think of f(x)=y as a very long vector, with an element for each possible x value -- since x is continuous it has infinite possible values and so the vector is infinitely long. We define a covariance kernel which relates an x dimension to another, and use it along with a mean function (m(x) which is usually taken to be 0) to set a multi-variate Gaussian prior over the infinite-dimensional space. We can then consider a set of observations (x, y) to be jointly Gaussian with non-observed points (x*, y*) with mean and covariance given by our prior. Conditioning on the observations, we can create a posterior Gaussian for y*|y, with a new mean and covariance which takes into account given points. Sampling points from this multi-variate Gaussian posterior gives possible functions which satisfy our conditions. Alternatively, just the posterior mean can be used as the MAP estimate of the function, with the new covariance used to find the uncertainty for each dimension (x value). In case of zero noise assumed for observed values (y), the new mean will go through the y values with 0 posterior variance (uncertainty), for the associated x dimensions."
 * Webdrone (talk) 19:30, 18 June 2016 (UTC)
 * "The function (f(x)=y) can be thought to exist as a single point in a (infinite-dimensional) space where each point x in the function's domain is a separate dimension in this new space. Values of y associated with each x point are coordinates of the function in that x dimension; think of f(x)=y as a very long vector, with an element for each possible x value -- since x is continuous it has infinite possible values and so the vector is infinitely long. We define a covariance kernel which relates an x dimension to another, and use it along with a mean function (m(x) which is usually taken to be 0) to set a multi-variate Gaussian prior over the infinite-dimensional space. We can then consider a set of observations (x, y) to be jointly Gaussian with non-observed points (x*, y*) with mean and covariance given by our prior. Conditioning on the observations, we can create a posterior Gaussian for y*|y, with a new mean and covariance which takes into account given points. Sampling points from this multi-variate Gaussian posterior gives possible functions which satisfy our conditions. Alternatively, just the posterior mean can be used as the MAP estimate of the function, with the new covariance used to find the uncertainty for each dimension (x value). In case of zero noise assumed for observed values (y), the new mean will go through the y values with 0 posterior variance (uncertainty), for the associated x dimensions."
 * Webdrone (talk) 19:30, 18 June 2016 (UTC)

Covariance function/Correlation function
The listed examples of covariance functions are really correlation functions (With exeption of the white noise one). I.e. they should be multiplied with sigma^2 — Preceding unsigned comment added by 188.113.80.156 (talk) 20:55, 30 May 2017 (UTC)

Merge with Kriging article?
They are the same ? — Preceding unsigned comment added by 143.159.115.78 (talk) 14:01, 6 March 2017 (UTC)


 * Gaussian process regression and Kriging are very similar(maybe the same except for formalism, but I don't know enough to say so). Gaussian Processes has uses outside of regression, though. — Preceding unsigned comment added by 188.113.80.156 (talk) 21:02, 30 May 2017 (UTC)

Integral of a white noise
About the recent edit by User:Kri: "Dubious|reason=The expected magnitude of a finite difference of a Wiener process divided by the step size approaches infinity as the step size approaches 0, but the expected magnitude of Gaussian noise is finite, so obviously this can't be true as is. So what is it that this (incorrect) statement actually means?"

The expected magnitude of a (usual) Gaussian process is finite, but the white noise is a generalized process; its expected magnitude (at a point) is infinite; only after integration is becomes finite. I'll add a link to white noise article.

Generalized processes are mentioned in : "Also the covariance $$\mathrm{E}(w(t_1)\cdot w(t_2))$$ becomes infinite when $$t_1=t_2$$; and the autocorrelation function $$\mathrm{R}(t_1,t_2)$$ must be defined as $$N \delta(t_1-t_2)$$, where $$N$$ is some real constant and $$\delta$$ is Dirac's "function"." See also : "it does not exist as a random height function. Instead, it is a random generalized function". Boris Tsirelson (talk) 06:55, 17 March 2019 (UTC)


 * Okay, thank you for the clarification. Indeed, it makes more sense if you treat it as a generalized function. —Kri (talk) 15:10, 17 March 2019 (UTC)

Simple cos/sin example is bimodal?
The "simple example" given of

\displaystyle X_{t}=\cos(at)\xi _{1}+\sin(at)\xi _{2} $$ suggests that each variable X_t can be the sum of two Gaussian-distributed variables. But this can't be a Gaussian process, can it, because the sum of two Gaussians is not a Gaussian in general? What am I missing? Fyedernoggersnodden (talk) 13:44, 7 May 2021 (UTC)


 * Sum of Gaussians is another Gaussian (even for dependent RVs). See this Abs xyz (talk) 04:13, 25 October 2022 (UTC)

Collection?
A process is a family, not a set (mathematics)! Collection is to ambiguous.Sigma^2 (talk) 22:37, 28 July 2023 (UTC)

A word of caution regarding the "Wikibook"
The linked Wikibook has some mistakes such as the claim "a stochastic process is a distribution". Another mistake in the Wikibook is for example in the section on operations on Gaussian variables. The user says: "For two correlated signals, the sum can be expressed by a scalar multiplication" which is false. The sum of two non-independent Gaussians is not necessarly Gaussian, it's only Gaussian if they are joint normal.--Tensorproduct (talk) 13:27, 8 September 2023 (UTC)