Hajek projection

In statistics, Hájek projection of a random variable $$T$$ on a set of independent random vectors $$X_1,\dots,X_n$$ is a particular measurable function of $$X_1,\dots,X_n$$ that, loosely speaking, captures the variation of $$T$$ in an optimal way. It is named after the Czech statistician Jaroslav Hájek.

Definition
Given a random variable $$T$$ and a set of independent random vectors $$X_1,\dots,X_n$$, the Hájek projection $$\hat{T}$$ of $$T$$ onto $$\{X_1,\dots,X_n\}$$ is given by


 * $$\hat{T} = \operatorname{E}(T) + \sum_{i=1}^n \left[ \operatorname{E}(T\mid X_i) - \operatorname{E}(T)\right] =

\sum_{i=1}^n \operatorname{E}(T\mid X_i) - (n-1)\operatorname{E}(T)$$

Properties

 * Hájek projection $$\hat{T}$$ is an $$L^2$$projection of $$T$$ onto a linear subspace of all random variables of the form $$\sum_{i=1}^n g_i(X_i)$$, where $$g_i:\mathbb{R}^d \to \mathbb{R} $$ are arbitrary measurable functions such that $$\operatorname{E}(g_i^2(X_i))<\infty $$ for all $$i=1,\dots,n$$
 * $$\operatorname{E} (\hat{T}\mid X_i)=\operatorname{E}(T\mid X_i)$$ and hence $$\operatorname{E}(\hat{T})=\operatorname{E}(T)$$
 * Under some conditions, asymptotic distributions of the sequence of statistics $$T_n=T_n(X_1,\dots,X_n)$$ and the sequence of its Hájek projections $$\hat{T}_n = \hat{T}_n(X_1,\dots,X_n)$$ coincide, namely, if $$\operatorname{Var}(T_n)/\operatorname{Var}(\hat{T}_n) \to 1$$, then $$\frac{T_n-\operatorname{E}(T_n)}{\sqrt{\operatorname{Var}(T_n)}} - \frac{\hat{T}_n-\operatorname{E}(\hat{T}_n)}{\sqrt{\operatorname{Var}(\hat{T}_n)}}$$ converges to zero in probability.