User:SantoshUCDavis/sandbox

In modern data analysis, non-Euclidean data have become increasing popular especially due to applications in bio-sciences and medicine, and in image analysis. Such data arise from metric space valued random variables. For non-Euclidean data, the classical statistical objects such as population mean, sample mean, population variance, and sample variance are not readily available, and they need to be generalized. The Fréchet mean, introduced by Fréchet in 1948, generalizes Centroid to metric spaces. Analogously, Fréchet variance generalizes variance of a mean to a measure of dispersion around a Fréchet mean.

Fréchet Mean
Let $$Y$$ be a random variable taking values in a separable metric space $$(\Omega,d)$$. The population Fréchet mean $$\mu_Y$$ of $$Y$$ is defined by

$$ \mu_F = \underset{\omega \in \Omega}{\arg\min}\,\mathbb{E}\{d^2(\omega,Y)\} $$.

Similarly, for a random sample $$ Y_1, Y_2, \dots, Y_n$$ drawn from $$ Y$$, the sample Fréchet mean $$\hat{\mu}_F$$ is defined as follows:

$$ \hat{\mu}_F = \underset{\omega \in \Omega}{\arg\min}\dfrac{1}{n}\sum_{i=1}^{n}d^2(\omega, Y_i). $$

Fréchet mean includes mean, median, and geometric mean as special cases for different choices of metric. Note that Fréchet mean and sample Fréchet mean, if exists, are elements of the metric space $$ \Omega $$. A sample Fréchet mean is an M-estimator. When $$(\Omega,d)$$ is a Hadamard space or the space of probability distributions on the real line $$\mathbb{R}\,$$ endowed with the $$2$$-Wasserstein metric, the Fréchet mean exists and is unique. Under some assumptions on $$ \mu_F $$, $$ \hat{\mu}_F $$ and the metric space, it can be shown that $$ \hat{\mu}_F $$ is a consistent estimator of $$\mu_F$$.

Fréchet Variance
The Fréchet variance is a measure of dispersion of the random variable $$ Y $$ around $$ \mu_F $$. The population Fréchet variance $$ V_F $$ and the sample Fréchet variance $$ \hat{V}_F $$ are defined as

$$ V_F = \mathbb{E}\{d^2(\mu_F,Y)\},\, \quad \hat{V}_F=\dfrac{1}{n}\sum_{i=1}^{n}d^2(\hat{\mu}_F,Y). $$

Note that $$V_F, \hat{V}_F \in \mathbb{R}$$.

Metric Variance
The Fréchet mean may not be well defined in some spaces depending on distributions and data and an alternative notion of variance is Metric Variance. This is a different generalization of variance in Euclidean spaces, with population and sample versions given as follows:

$$ V_{F}^{M}= \dfrac{1}{2}\mathbb{E}\{d^2(Y,Y')\}, \quad                 \hat{V}^{M}_F = \dfrac{1}{2n^2}\sum_{i,j}d^2(Y_i,Y_j), $$

where $$ {Y'} $$ is an independent copy of $$ Y $$, and the superscript $$ M $$ refers to metric space generalization.

Asymptotic Distribution of Sample Fréchet Variance
Under some assumptions, it can be shown that the sample Fréchet variance is consistent. As $$V_F, \hat{V}_F \in \mathbb{R}$$, under some technical assumptions, a central limit theorem exists for the sample Fréchet variance :

$$\sqrt{n}(\hat{V}_F-V_F)\overset{D}\rightarrow N(0,\sigma_F^2),$$

where $$\sigma_F^2$$ is the variance of the random variable $$d^2(\mu_F,Y)$$.

Applications
The asymptotic distribution of the sample Fréchet variance can be used to construct tests to compare $$k$$ populations of metric space valued data objects in terms of Fréchet mean and variances. A bootstrap version of these tests can be used for relative small sample sizes. An application of Fréchet variance to a study of human mortality data suggested that there is a systematic difference in age-at-death distributions between the Eastern European countries and the other countries in the data set during the time period between 1960 to 2009. A version of this test has also been proposed for change-point analysis.