Talk:Mercer's theorem

[Untitled]
Two pages link to here: kernel trick and James Mercer. Yet, when I click on "what links here" I am told that no pages link to this one. Michael Hardy 22:22, 24 Aug 2003 (UTC)


 * Kernel trick is showing up for me -- perhaps the changes just take a while to propagate? -- Tlotoxl 22:33, 24 Aug 2003 (UTC)

I dispute the last 2 days edits to this article: a kernel is not just defined to map two 1-D intervals (a square) to a real. In general, the arguments of the kernel can elements in a measurable set. For example, see.

I don't mind using a 1-D interval as an example, but the definitions and theorems in this article should not be limited to 1-D intervals. Right now, the generalization to measurable sets is deferred to the very end, which is very misleading. In general, the kernel trick for the vast majority of cases people are interested in needs to operate on measurable sets (typically R^n) ---hike395 00:33, 13 May 2004 (UTC)

Thanks for the article. The machinery you are talking about deaals with an entirely different subject altogether; that of reproducing kernel Hilbert spaces; the theorems you need are
 * Facts about reproducing kernel Hilbert spaces
 * The spectral theorem for compact operators applied to Hilbert Schmidt operators.

Examples of reproducing kernel spaces are spaces of analytic functions, not the only ones

Mercer's theorem is a much older theorem which existed much before reproducing kernel spaces were introduced by Aronszaijn (sp?) and Bergman.

Yes I have seen othre work on learning involvin reproducing kernel Hilbert spaces. We should start an article on that if something doesn't already exist.CSTAR 01:46, 13 May 2004 (UTC)

Reproducing kernel Hilbert space
is now an article. I will add more to it subsequently.CSTAR 02:08, 13 May 2004 (UTC)


 * I don't have access to the original paper by Mercer. Are you saying that he proved the theorem only for 1-D intervals? If so, we need to fix the kernel trick article and not attribute it to Mercer's theorem. This would mean that the article is still somewhat misleading --- the definition of kernel has changed since Mercer's day, and we should use the true, up-to-date definition. -- hike395 02:28, 13 May 2004 (UTC)

I added a representation for an arbitrary continuous kernel on a compact space. As a matter of expository style, I don't recommend putting it befor the more special form, particularly since that form is the one used in represntations of Stochastic processes via Karhunen Loeve. Yes th ekernel trick stuff needs to be fixed. I'll read the article you referenced.CSTAR 02:33, 13 May 2004 (UTC)


 * I respectfully disagree. Stochastic processes also are not limited to have index sets that are intervals. In Kriging, for example, geostatisticians use Gaussian processes with index sets that are typically in R^2. Machine learning researchers use Gaussian processes have index sets of quite high dimension.


 * I'll attempt to make the edit, let's see if we can get agreement.

Yes, of course, you are right about stochastic processes on higher index sets, but the more general case of the kernel representation (the correct generalization which is stated further down) is pedagogically more involved needing a measure witt full support. I still insist that pedagogically this approach is clearer.CSTAR 04:12, 13 May 2004 (UTC)

-

In fact we can consider kernels on locally comapct spaces and on and on. The generalizations are limitless (we can consider even kernels on non-commutatiive spaces a la Alain Connes ... but these articles also have to be approachable by a novice. And as far as stochastic processes go, the classic I. Guikhman and A. Skorokhod (at least in the old SOviet version translated into French that I have) deals with 1-dimensional index sets.CSTAR 04:17, 13 May 2004 (UTC)

The Theorem you need?
Here's the proof: If K is a p.d. continuos kernel on a compact space, then there is a Hilbert space H and a measurable function phi:X -> H such that

K(x,y) = 

Proof: Let H = l^2 and let

phi(x) = sum_i sqrt{lambda_i} e_i(x)

By the trace property of the associated operator, this thing is well defined.

I think this is the right statement of the Kernel Trick (I don't think you can get by with L^2 kernels, because you don't get fast enough convergence.CSTAR 04:36, 13 May 2004 (UTC)~

Further clarification
In fact, the kernel trick mentioned above has really nothing to do with Mercer's theorem. It results from a general procedure for constructing Hilbert spaces out of positive definite functions on arbitrary sets: If K is positive definite function, there is a Hilbert space H and a function phi : X --> H

K(x,y) = 

This is an old trick. Define a (possibly degenerate) inner product on functions of finite support on X using the kernel K as foolws

< a, b > = sum_x,y K(x,y) a(x) b(y)

Divide out by vectors of norm zero ancd complete. This is the oldest trick in the book of Hilbert space constructions. So you can forget about continuity., Mercer's theorem and everything else. CSTAR 05:23, 13 May 2004 (UTC)

More precise reference
A more precise reference of where to find the proofs of the theorems in the article would be nice.

130.238.5.7 18:47, 26 January 2006 (UTC)

The last two theorems (generalizations of Mercer's theorem) are very interesting but I haven't managed to find them in any of the books cited, does anyone have a reference?

Added a reference to the journal where the theorm was first published and cited the year which I added the other day and someone added the tag that it needed to be cited hope its all following the templates and guidelines I have tried to make sure everything is as it should be - sorry if I haven't done it the proper wiki way.

158.143.77.11 11:41, 7 December 2006 (UTC) Thanks for all the work on this topic. I haven't been able to find the last generalization of Mercer's theorem in any of the books cited, in particular, a proof that: if (X,M,μ) is a σ-finite measure space, and K a symmetric nonnegative definite kernel on (X,M,μ), then K has the given spectral expansion. This is a very nice extension of the unnecessarily restrictive condition on K which is usually given in textbooks, namely that K:[a,b]x[a,b]->R. But: how to prove this, or where to find a proof in the literature? Thanks.
 * This follows from the structure theorem for Hilbert-Schmidt operators. Note that it is required that K be a square-integrable kernel. I'm sure this is wel-known, but I'll have to dig up a reference.--CSTAR 14:02, 7 December 2006 (UTC)
 * Thanks for the quick reply, the reference would be very useful. 158.143.77.11 14:42, 7 December 2006 (UTC)

Article on Mercer's condition
There is a separate article on Mercer's condition. Should that be incorporated here? Kjetil Halvorsen 01:21, 25 March 2011 (UTC)


 * Yes. Suggested the merge. Qwertyus (talk) 09:43, 2 July 2012 (UTC)

The article on Mercer's condition is not correct without compactness (mentioned in some of the comments above) or some other restriction. g could be square integrable but not absolutely integral, so Fubini's theorem would not apply and the integrals in the example would be undefined because the positive & negative parts would both be infinite. We must also assume that g is real-valued.70.112.203.172 (talk) 21:01, 4 January 2013 (UTC)

Confusing grammar
The section Generalization has noun-verb disagreement and run-on sentences that make it difficult to figure out what is going on. Is there someone qualified who can clear it up? Thanks. 128.171.78.188 (talk) 09:52, 16 April 2012 (UTC)

Last generalization in the article
Shouldn't it say "If K is a symmetric non-negative definite L2 kernel ..." to be precise? 134.106.106.150 (talk) 10:39, 22 October 2014 (UTC)

Combining Mercer's Theorem and Mercer's Condition
Hello,

I believe that Mercer's Theorem and Mercer's Condition should be combined, as they are closely related and of the same works and time period of Mercer's life.

Further, I think that these two articles should be merged with the barren page of James Mercer, the mathematician that propelled these outcomes and has their namesake. — Preceding unsigned comment added by 2601:280:4A00:CF61:199A:2745:F0E:3CA8 (talk) 19:31, 17 May 2016 (UTC)
 * Don't merge: Mercer's theorem and Mercer's condition seem to be sufficiently distinct to warrant their own pages, and given the complexity of the topic for amateur readers, I feel that they are best kept separate. Klbrain (talk) 22:14, 18 June 2016 (UTC)
 * Reading the earlier correspondence, I have been inclined to changed my mind given the comments of subjcet experts ... suggesting merge. Klbrain (talk) 11:20, 20 June 2016 (UTC)

Product measure
In the second theorem in section on generalizations, the notation μ⊗μ is used. I assume this means the product measure - but the wikipedia article on product measure uses the notation μ×μ. Should the notation in this article be changed?

Spectral Theorem?
The article seems to imply that Mercer's theorem is a special case of what is now called the spectral theorem. Can this be tightened up? Does Mercer's theorem ever say anything more than what the spectral theorem might say? Or is it a pedagogical, historical curiosity that has been supplanted by more modern language, tools and techniques? Likewise, this article talks of "kernels", which, as far as I can tell, is identical to a Fredholm kernel. Is there any sense in which the so-called "kernel" in this article is not (a special case of) a Fredholm kernel? 84.15.182.62 (talk) 08:04, 28 June 2023 (UTC)