Talk:Convergence of measures

Boo
From the point of view of a general encyclopedia reader, this page is complete gibberish. If you have to be a statistician to understand an encyclopedia entry on a statistical item of interest, what's the point?

Why not a simple explanation using two six-sided dice - showing how while the result on each die is random, the sum of the dice converges on a mean of 7? —Preceding unsigned comment added by Tom Adams (talk • contribs) 23:14, 30 April 2009 (UTC)


 * This page is about the measure-theoretic concept. If you want a more lay-person explanation, see the “Convergence of random variables” article. …  st pasha  » 06:29, 21 November 2009 (UTC)


 * Although that page is also pretty lousy, from a lay-person perspective.--128.59.110.50 (talk) 00:29, 17 February 2010 (UTC)


 * Agreed. Even as a physicist I don't understand the contents of this entry. As a scientist, I found this explanation utterly useless as an every-day "pedestrian" type definition, and it doesn't belong in this form in an encyclopedia (Wikipedia is not an encyclopedia written by mathematicians to themselves ...). You should at least add an intuitive explanation of the concept. (BTW, so far I have not found any good mathematician who would be short of giving an intuitive explanation for a definition - even mathematicians need those at times!)

--134.76.219.87 (talk) 16:10, 15 June 2010 (UTC)

The definition of strong convergence was complete rubbish. I have now replaced it by something that is at least correct, but references are still missing. I agree that intuitive explanations are also missing. Hairer (talk) 18:33, 1 October 2011 (UTC)

I have made a (probably lousy) attempt to provide intuitive explanations. Please improve them.98.99.129.198 (talk) 00:29, 15 February 2012 (UTC)

Undefined symbol
In the list of equivalences, does E denote expectation with respect to P? This should be specified, I think.98.99.129.198 (talk) 00:30, 15 February 2012 (UTC)
 * Yes, fixed now (I am not sure how to find a reference for it though: this is basically the so called Portmanteau lemma (sometimes called theorem) which would be presented using different notation in different books. Piyush (talk) 06:41, 23 July 2012 (UTC)


 * After many many years on WP, I have realized that one of the greatest things we can do here is to provide a 'rosetta stone' translating across different notations. I've had this problem, where theorems I might recognize, when written from a measure-theoretic point of view, become utterly foreign and strange looking, when written with using random variables. It gets even worse if there is some overlap to quantum stuff, or with projective varieties or whatever.  So repeating a claim, in alternative notations, is a good thing. linas (talk) 18:35, 24 July 2012 (UTC)

Total variation
I just added a "please clarify" tag to this formula:


 * $$\|\mu- \nu\|_{TV} = \sup \Bigl\{\int_X fd\mu - \int_X fd\nu

\;\; \Big| \;\;f\colon X \to [-1,1] \Bigr\}.$$

Is this the Radon metric? Wasserstein metric? Does f need to be Lipschitz continuous or even continuous at all? Can it be a discontinuous-everywhere but still measurable function? (e.g. derivative of minkowski function) linas (talk) 18:35, 22 July 2012 (UTC)
 * In the definition of the Total Variation distance (often also referred to as the $$\ell_1$$ norm in the finite setting), there are no constraints on the function $$f$$ except that it should be measurable.  If you want the Wasserstein metric, you need to restrict $$f$$ to have Lipschitz constant 1.  Piyush (talk) 18:50, 22 July 2012 (UTC)
 * Thanks! So I see further down in the article that the Lipschitz version in fact gives weak convergence.  (This is correct, as far as you know?) Perhaps it is the case that the radon distance is identical to strong convergence? (which I think is what the article used to say, but when it was re-written, that was dropped.) Two more questions, if you are up to it:  Can anything be said with regards to the weak topology?  And also: can anything be said about a category-theoretic approach? I'm guessing these are all inverse limits on some kind of appropriately defined category, but my ability to guess stops there, and I have been too lazy to google, so far... linas (talk) 18:26, 24 July 2012 (UTC)
 * Actually the weak convergence version is slightly weaker than convergence in the Wasserstein metric. The difference is the following.  In weak convergence, you need that for every fixed $$f$$ with the appropriate properties, and an $$\epsilon > 0 $$,  $$|\inf \int f d\mu_n - \int f d\mu| < \epsilon$$ for $$n$$ large enough (depending upon both $$\epsilon$$ and $$f$$).


 * However, in the Wasserstein case, you have the stronger version that for any $$\epsilon > 0$$, and $$n$$ large enough (depending only upon $$\epsilon$$), and for every $$f$$ with the appropriate properties, $$|\inf f d\mu_n - \int f d\mu| < \epsilon$$ for $$n$$.   The switching of quantifiers makes convergence in the Wasserstein metric stronger than weak convergence.  As for strong convergence, I have never really seen the definition before, but if it is what it is, then as User Hairer points out below, it is strictly weaker than TV (on all measurable spaces), for roughly the same quantifier switching reason as above.


 * This isn't completely true. In "nice" spaces (Polish will do), convergence in Wasserstein is equivalent to weak convergence + convergence of first moment. If you choose a bounded metric (or if you modify the definition of Wasserstein so that you force the test functions to also be bounded by 1, in addition to having Lipschitz constant 1), then convergence in Wasserstein is exactly the same as weak convergence, even though at first glance it would appear to be stronger.Hairer (talk) 07:33, 27 July 2012 (UTC)


 * I am not sure what you mean by "convergence of first moment" in your first sentence above.  Is this the first moment of all integrable functions? or the first moment of the measures themselves (that is, the measures are finite?).  Further, isn't it the case that the equivalence of the Bounded-Lipschitz version of the Wasserstein metric with weak convergence requires the measures themselves to be finite?  Piyush (talk) 17:51, 27 July 2012 (UTC)


 * Yes, I was implicitly assuming all measures are finite. Otherwise you're opening a whole can of worms... By convergence of first moment, I mean that the integral of the distance function (measured from an arbitrary fixed point) converges.Hairer (talk) 20:23, 28 July 2012 (UTC)


 * Also, aren't "nice" and "Polish" essentially the same things: complete separable spaces which either already come with a metric (in the "nice" case), or can be endowed with a metric (in the "Polish" case). Piyush (talk) 17:57, 27 July 2012 (UTC)


 * Also I need to apologize for an error in my original response: in the definition of TV, you probably also need to restrict f to being integrable with respect to both measures (for otherwise the definition does not make sense).  This restriction is not needed when µ and ν are probability measures, since then boundedness of f also implies integrability.   19:27, 26 July 2012 (UTC)

Continuity makes no sense in this context since $$X$$ is an arbitrary measurable space, so doesn't come with a topology and even less with a metric. TV and Radon are the same when $$X$$ is Polish, I've added that. TV convergence is strictly stronger than the strong convergence of measures, this is why I removed the corresponding incorrect statement from an earlier version. Hairer (talk) 11:51, 25 July 2012 (UTC)
 * Thanks for adding the clarification. I was wondering if you knew of a good source we could use for this article and the one on Radon metric?  The situation with respect to strong convergence, for example, seems to be a mess.  The article on Radon measure defines it in terms of the Radon metric, which would be strictly stronger than the definition on this page, as you point out, in case of Polish spaces ( I confess to not being very familiar with measure theory in general, but I believe the proof for continuous functions being dense in the set of integrable functions generalizes to Polish spaces easily).  Also, do you think we should add the caveat that in the definition of TV, f needs to be restricted further to be integrable wrt both µ and ν? (This is not a problem in the probabilistic setting (where bounded => integrable), and so I missed it my comment above.   Piyush (talk) 19:27, 26 July 2012 (UTC)


 * I like Bogachev's two volumes on measure theory. Hairer (talk) 07:33, 27 July 2012 (UTC)

Hmm, I agree with Piyush, I think this article could use a section of equivalences and counter-examples: so e.g. "on Polish spaces, blah is equivalent to blah, but a counter-example is provided by blah which shows that x is not equivalent to y when z." I admit that I'm too lazy to read two volumes on measure theory just at this very moment. linas (talk) 16:39, 29 July 2012 (UTC)

dbar distance
Anyone care to start an article on the d-bar distance or d-bar metric? I'm reading about this now; its used in the theory of dynamical systems, and it seems to be maybe related to the Wasserstein metric, if not maybe identical to it in some or another case. I'll see if I can figure this all out in the next few...days? months? Its needed to define a finitely-determined process. linas (talk) 04:19, 29 July 2012 (UTC)

Portmanteau thm
More clarification is needed. The article currently states that
 * lim P_n(A) = P(A) for all continuity sets A of measure P.

is a variant definition of weak convergence. But if I remove the word 'continuity' in the above, then I get the definition of strong convergence. I'm trying to think of a good example where the two are inequivalent, and where its obvious that the continuity sets were the cause of the inequivalence... having this example in the text would be nice. linas (talk) 16:25, 29 July 2012 (UTC)

Metrizability, etc.
Questions for folks watching this page:


 * Is it customary to consider only sequences in probability? That's a bit funny. The weak-* topology is not metrizable in general.


 * Speaking of the weak-* topology, why specialize to the Borel sigma-algebra in the definition? For any measurable space, the dual of L∞ is the family of measures of finite total variation---the total variation norm should coincide with the operator norm on functionals, I believe. (L∞ is definitely not separable, so the weak-* topology not metrizable.)


 * If one brings in a topology and insists on the Borel sigma-algebra, there is now a pairing between C0(S), say S LCH, and the Radon measures. So the definition "Enf → Ef for all bounded, continuous functions f" is again a little funny from the functional analytic point of view.


 * "If S is also separable, then P(S) is metrizable and separable, for example by the Lévy–Prokhorov metric...": If by P(S) you mean Radon probabilityymeasures, then this should be a trivial fact: C0(S) is separable so the weak-* topology on its dual is metrizable.


 * "...if S is also compact or Polish, so is P(S)": this should be true in general, without any assumption (in particular, any topological assumption) on S. Banach-Alaoglu theorem says the unit-ball of a dual is always weak-* compact.
 * "If S is separable, it naturally embeds into P(S) as the (closed) set of dirac measures, and its convex hull is dense.": Again, why the separability assumption? S, with just the Borel sigma-algebra, always embeds in the Radon probability measures. The weak-* density of its convex hull follows from the Krein-Milman theorem.

My impression is that the too many special assumptions are being made, some are unnecessary, and the article is not clear on exactly why and how they strengthen the intended statements. Mct mht (talk) 12:10, 29 March 2013 (UTC)


 * I agree with you that this section isn't great. A number of your questions/remarks might however be answered by the following observation. While it is true that the space of probability measures consists of the positive elements of norm 1 in the dual of C0(S) (at least when S is a locally compact Polish space), what probabilists call "Weak convergence" is the weak-* convergence with respect to the pairing with Cb(S), the space of all bounded continuous functions. This is the reason why P(S) is not compact in general. Think of the case S = R, and take the sequence of Dirac measures located at the positive integers. When pairing with C0(S), this weak-* converges to 0, which is not a probability measure, this is why this is not a desirable notion of convergence. When pairing with Cb(S) on the other hand, this does not converge to any probability measure on R.


 * One could on the other hand consider the space of all positive elements of norm 1 in the dual of Cb(S). This space would give the space of all probability measures on the Stone-Cech compactification of R, which is then indeed compact. In this space, some subsequence of the above sequence converges to some probability measure concentrated on the various "points at infinity" added by the compactification procedure. Hairer (talk) 21:19, 30 March 2013 (UTC)


 * That makes sense, a little bit. Why don't we put what you said in the article, that this is the relative weak-* topology in the space of probability measures on the Stone-Cech compactification?


 * I am not sure this would be very useful to the casual reader, I wouldn't expect anyone to know what the Stone-Cech compactification is ;-) I suppose that it would however make sense to add a remark/warning stating that while weak convergence is defined by using the dual pairing with Cb(S), the set of positive elements of norm one of its dual is in general strictly larger than the set of probability measures. Hairer (talk) 14:56, 4 April 2013 (UTC)


 * It would be really helpful for the article to delineate between the cases where S is the compact and non-compact. In the compact case, this is just the "obvious" weak-* topology and standard functional analytic results apply (like weak-* compactness by Banach-Alaoglu and weak-* density of convex-hull by Krein-Milman). In the noncompact case, the probability measures on the Stone-Cech compactification is tricky to identify concretely and the corresponding results are less trivial(?). Mct mht (talk) 10:35, 31 March 2013 (UTC)


 * Also, what is the metric structure doing besides adding one (Lipschitz) characterization? Is it necessary for the definition? Mct mht (talk) 15:17, 31 March 2013 (UTC)


 * Not sure, my abstract measure theory isn't that great, but if the topology isn't metrisable, it might break some of the other characterisations. Come to think of it, I am not even 100% sure that the equivalent characterisations stated on this page are really all equivalent without assuming separability... Hairer (talk) 14:56, 4 April 2013 (UTC)

Section Weak convergence of measures as an example of weak-* convergence
I rewrote this section a bit to, hopefully, make it more clear, as well as moved it further down because it uses vague convergence. AnnZMath (talk) 14:21, 8 October 2023 (UTC)

Section Comparison of convergence
As of Bourbaki the vague convergence is a convergence resp. of space C_c, and not C_0. Bourbaki, Integration, Ch. III. — Preceding unsigned comment added by 141.43.110.155 (talk) 20:12, 25 April 2024 (UTC)
 * I'm not very familiar with this, because I mostly work with probability measures (and in that case taking $$C_c$$ or $$C_0$$ makes no difference). But the definition using $$C_c$$ does indeed seem to be the most common one so I am going to fix the article. Malparti (talk) 16:39, 29 April 2024 (UTC)