Talk:Energy distance

Merge discussion
This article covers much the same stuff as the older E-statistic by the same originator/editor. The problems with that article have still not been resolved and there is no point in re-creating the sane problems in yet another article. Melcombe (talk) 16:30, 6 January 2011 (UTC)


 * Agree with merging; and also can see that the bulk of E-statistic article has been copied into this article (Energy distance) in response to the proposal to merge. Mathstat (talk) 02:19, 17 January 2011 (UTC)


 * Go for it. You may also wish to comment at Conflict_of_interest/Noticeboard. SmartSE (talk) 11:50, 17 January 2011 (UTC)

Quadratic form
Energy statistics under a suitable null hypothesis converge to a quadratic form of standard normal RV's. The link to Quadratic form was incorrect, and the other option Quadratic form (statistics) is not general enough to cover this type of distribution. The asymptotic distribution has the form
 * $$ \sum_{i=1}^\infty \lambda_i Z_i^2 $$

where Zi are iid standard normal. This limit arises in the asymptotic theory of V-statistics. The definition of quadratic form in Quadratic form (statistics) is a special case, applied in linear models. The Quadratic form (statistics) article as written would need to start with a more general definition to cover this case. Any thoughts on creating a separate article Quadratic form (probability) vs revision of Quadratic form (statistics)? Mathstat (talk) 17:23, 24 January 2011 (UTC)
 * It seems that Quadratic form (statistics) covers cases that are more general than you need, not less, but it does not go on to cover details about the actual distribution which may be your concern, just the moments. The QF arctile might be extended to cover how to transform the general problem into the case where the weighing matrix is diagonal, which is what you want (?). But you might like also to consider the article Generalized chi-squared distribution. Melcombe (talk) 16:35, 25 January 2011 (UTC)


 * The definition begins with a sample of size n and an n by n matrix. "If ε is a vector of n random variables, and Λ is an n-dimensional symmetric square matrix, then the scalar quantity εTΛε is known as a quadratic form in ε." This definition of Quadratic form (statistics) would not cover the case of the infinite sum, at least not the way it is currently written. In this sense it would not cover the usual definition of quadratic form in asymptotic theory. Mathstat (talk) 18:04, 25 January 2011 (UTC)
 * I had not spotted the infinte summation. While Quadratic form (statistics) might reasonably be extended to include the diagonalisation step and thus appear to lead on to the infinite case you want, I think the type of summation you want to consider is so important in its own right that it deserves its own article. However, it is not really a "quadratic form", just a sum of squares. The reference I have that uses such results a lot doesn't have a handy name for them, but just calls them "a weighted sum of independent chi-squared (1) rv's", which wouldn't really be suitable as an article title. The one term they do use for the context in which they arise is "principal component decomposition" (for a representation of some underlying process), which again may be too confusing here because of confusion with principal components analysis. But you mention "the usual definition of quadratic form in asymptotic theory" .... if the term really is used in relevant literature, then why not a title like "Quadratic form (asymptotic theory)", as I don't think there is a particularly strong distinction to be made here between probability and statistics. An alternative would be "Infinite quadratic form (statistics)".  Melcombe (talk) 10:40, 26 January 2011 (UTC)

Formula clarification
The two sample E-statistic was originally defined here as
 * $$ T = \frac{2nm}{n+m} E_{n,m}(X,Y), $$

and although this is a valid definition, in the references and in the software implementation, the formula used is
 * $$ T = \frac{nm}{n+m} E_{n,m}(X,Y). $$

In the first formula, the statistic En,m(X,Y) is multiplied by the harmonic mean of the sample sizes. However, the factor 2 is not needed. One reason to prefer the second formula is that under the null hypothesis (X and Y are identically distributed), for every n and m,
 * $$\mathbb E[T_{m,n}] = \mathbb E \left[\frac{nm}{n+m} E_{n,m}(X,Y)\right] = \mathbb E \|X-Y\|. $$

For the definition of the statistic, see e.g. or energy package manual. Mathstat (talk) 13:57, 30 January 2011 (UTC)

Energy distance: square root or not?
In almost every paper that describes the energy distance, it is defined as:


 * $$ D(F, G) = 2\operatorname E\|X - Y\| - \operatorname E\|X - X'\| - \operatorname E\|Y - Y'\| \geq 0,$$

not as the square root of that. It is clear to me that the square root of the energy distance is a metric, while the energy distance itself is not. However, I think that Wikipedia should use the same definition that is being used in scientific literature. Vnmabus (talk) 15:16, 8 October 2018 (UTC)