V-statistic

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics (U for "unbiased") introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

Statistical functions
Statistics that can be represented as functionals $$T(F_n)$$ of the empirical distribution function $$(F_n)$$ are called statistical functionals. Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.

Examples of statistical functions
  The k-th central moment is the functional $$T(F)=\int(x-\mu)^k \, dF(x)$$, where $$\mu = E[X]$$ is the expected value of X. The associated statistical function is the sample k-th central moment,



T_n=m_k=T(F_n) = \frac 1n \sum_{i=1}^n (x_i - \overline x)^k. $$ 

 The chi-squared goodness-of-fit statistic is a statistical function T(Fn), corresponding to the statistical functional



T(F) = \sum_{i=1}^k \frac{(\int_{A_i} \, dF - p_i)^2}{p_i}, $$

where Ai are the k cells and pi are the specified probabilities of the cells under the null hypothesis. 

 The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional



T(F) = \int (F(x) - F_0(x))^2 \, w(x;F_0) \, dF_0(x), $$ where w(x; F0) is a specified weight function and F0 is a specified null distribution. If w is the identity function then T(Fn) is the well known Cramér–von-Mises goodness-of-fit statistic; if $$w(x;F_0)=[F_0(x)(1-F_0(x))]^{-1}$$ then T(Fn) is the Anderson–Darling statistic.  

Representation as a V-statistic
Suppose x1, ..., xn is a sample. In typical applications the statistical function has a representation as the V-statistic

V_{mn} = \frac{1}{n^m} \sum_{i_1=1}^n \cdots \sum_{i_m=1}^n h(x_{i_1}, x_{i_2}, \dots, x_{i_m}), $$ where h is a symmetric kernel function. Serfling discusses how to find the kernel in practice. Vmn is called a V-statistic of degree m.

A symmetric kernel of degree 2 is a function h(x, y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x1, ..., xn, the corresponding V-statistic is defined



V_{2,n} = \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n h(x_i, x_j). $$

Example of a V-statistic
 An example of a degree-2 V-statistic is the second central moment m2.

If h(x, y) = (x &minus; y)2/2, the corresponding V-statistic is



V_{2,n} = \frac{1}{n^2} \sum_{i=1}^n \sum_{j=1}^n \frac{1}{2}(x_i - x_j)^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2, $$ which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:


 * $$s^2=

{n \choose 2}^{-1} \sum_{i < j} \frac{1}{2}(x_i - x_j)^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2$$.  

Asymptotic distribution
In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above. Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics. Let A(m) be the property defined by:
 * A(m):

  Var(h(X1, ..., Xk)) = 0 for k < m, and Var(h(X1, ..., Xk)) > 0 for k = m;   nm/2Rmn tends to zero (in probability). (Rmn is the remainder term in the Taylor series for T.) </ol>

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(Fn) is asymptotically normal.

In the variance example (4), m2 is asymptotically normal with mean $$\sigma^2$$ and variance $$(\mu_4 - \sigma^4)/n$$, where $$\mu_4=E(X-E(X))^4$$.

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and $$E[h^2(X_1,X_2)]<\infty, \, E|h(X_1,X_1)|<\infty, $$ and $$ E[h(x,X_1)]\equiv 0$$. Then nV2,n converges in distribution to a weighted sum of independent chi-squared variables:


 * $$ n V_{2,n} {\stackrel d \longrightarrow} \sum_{k=1}^\infty \lambda_k Z^2_k,$$

where $$Z_k$$ are independent standard normal variables and $$\lambda_k$$ are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional (Example 3) is an example of a degenerate kernel V-statistic.