User:Jasminecsy/subpage

Large-scale regression analysis often has to deal with inhomogeneous data. When performing the regression analysis to a large-scale sample, it is unreasonable to assume that samples are drawn independently from the same distribution. In fact, there may be different sources of groups where the data come from. In the previous study, varying-coefficients models are used to this case.

Another method is to do regression analysis by least squares fitting of the model in each group. That may lead to the optimal regression coefficients in distince group be markedly different from each other. To estimate the common effects across the groups, the general method of bootstrap aggregating is not suitable. Instead, a simple estimator of maximin effect, which maximizes the explained variance in the worst group, has been introduced to solve this problem. This estimator is in fact a convex collection of the fitted regression coefficients of each group, determined by the method of maximin aggregating, or magging.

In addition, the asymptotic behavior of this estimator of maximin effect has also been proposed, especially its the asymptotic distribution and confidence region.

Model and Definitions
Here we consider the linear regression model with general assumptions in each group. To be more specific, samples are from several known groups $$ g = 1, \ldots, G $$. For each group, a linear regression model as form $$ Y_g = \textbf{X}_g b_g^0 + \epsilon_g $$ is assumed, where $$ Y_g $$ is a n-dimensional response vector, $$ b_g^0 $$ is a p-dimensional deterministic regression coefficients vector, and $$\textbf{X}_g$$ is a $$n\times p$$ dimensional design matrix containing $$n$$ observations of $$p$$ predictor variables, $$\epsilon _g$$ is a n-dimensional error vector distributed as $$\mathcal{N_\text{n}}(0, \sigma^2 \text{Id}_n)$$. Each group has the same sample size n. However, the parameter vectors $$b_g^0$$ are different within groups which caused the inhomogeneity.

Fit each regression model by least squares fitting method, obtaining $$\hat{b}_g = \arg\underset{b\in \mathbf{R}^p}{\min} \left \| Y_g - \textbf{X}_g b \right \|_2^2$$.

The simplest way to aggregate these coefficients vectors is bagging, i.e., simply averaging these vectors. It is appropriate to use this method when samples are homogeneous, however, it is not applicable in this inhomogeneous case.

It has been proposed that there is another way of aggregation, called maximin aggregation, or magging, which aimed to maximize the minimum explained variance across the groups.

The explained variance in group $$g$$ is defined as $$V(b,b_g^0)$$: $$V(b,b_g^0):= \text{E}\left \| Y_g \right \|_2^2 - \text{E}\left \| Y_g - \textbf{X}_g b \right \|_2^2 = 2b^t\Sigma ^0 b_g^0- b^t\Sigma ^0 b$$ where $$\Sigma ^0:=\text{E}\hat{\Sigma}$$ with $$\hat{\Sigma}:=(nG)^{-1} \textbf{X}^t \textbf{X}$$ is the sample covariance matrix.

To obtain the best effect across all the groups of data, the explained variance in the worst case (i.e., in the group which has minimum explained variance) should be maximized. This effect is actually a common effect among all the groups regardless of other different effects they have.

Therefore, the maximin effect $$b_{\text{maximin}}$$ is defined as: $$ b_{maximin} := \arg\underset{b\in \mathbf{R}^p}{\max} \underset{g=1 \ldots G}{\min} V(b,b_g^0)$$.

And it has been shown in Meinahausen (2014) that the definition is equivalent to: $$ b_\text{maximin} := \arg\underset{b\in \text{CVX}(B^0)}{\min}b^t\Sigma^0 b$$, where $$B^0 = (\hat{b_1^0}, \ldots, \hat{b_G^0}) \in \mathbf{R}^{p \times G}$$ is the matrix of true parameter vectors and $$\text{CVX} (B^0)$$ is the closed convex hull of G vectors in $$B^0$$.

This definition leads to an estimator of maximin effect based on the aggregation of the fitted coefficient vectors. In fact, it is a convex combination of the form of $$\hat{b} := \sum_{g=1}^{G} \alpha_g \hat{b}_g$$, satisfying $$ \alpha := \arg\underset{\alpha \in C_G}{\min}\left \| \sum_{g=1}^{G} \alpha_g \textbf{X} \hat{b}_g\right \|_2 \text{and} $$ $$ C_g:= \left \{ \alpha \in \mathbf{R}^G : \mathop{\min}_g \alpha_g \geq 0 \quad and \quad \sum_{g} \alpha_g = 1 \right \}$$

In the following the magging estimator $$\hat{b}$$ is denoted as $$M_{\hat{\Sigma}} (\hat{B})$$, where $$ M_{\Sigma} (B):=\arg\underset{b\in \text{CVX}(B)}{\min}b^t\Sigma b $$ denoted the maximin effect according to the original definition.

Confidence Regions and Asymptotic Distribution
The asymptotic behavior of the estimator of Maximin effect has also been analyzed by Dominik Rothenhausler(2015). Let $$\textbf{X}$$ denote the matrix consisting of $$\textbf{X}_1,\ldots,\textbf{X}_G$$ row-wisely.

Considering the special case where the components of the each random design matrix $$\textbf{X}_g$$ are all from an identical distribution, it has been proved that the limit distribution of $$\sqrt{n}(M_{\hat{\Sigma}} (\hat{B})-M_{\Sigma^0} (B^0))$$ with $$G$$ fixed as $$n \to \infty $$ is a Gaussian distribution. Let $$W(\hat{B},\hat{\Sigma})$$ be a consistent estimator of the variance of the limit distribution above. A $$(1-\alpha)$$-confidence region is given as $$ \textbf{C}(\hat{B},\hat{\Sigma}):= \left \{M \in \mathbf{R}^p: (M_{\hat{\Sigma}} (\hat{B})-M)^t W(\hat{B},\hat{\Sigma})^{-1} (M_{\hat{\Sigma}} (\hat{B})-M)\leq \frac{\tau}{n} \right \}$$ where $$\tau$$ is the $$(1-\alpha)$$-quantile of the $$\chi_p^2$$-distribution.

Theorem for Asymptotic Confidence Interval
Let $$\Sigma^0$$ be a positive definite matrix. Let $$M_{\Sigma^0} (B^0) = \sum_{g=1}^{G} \alpha_g b_g^0 $$ with $$\alpha_g \geq 0,\sum_{g} \alpha_g = 1$$, and suppose that this representation is unique. Let $$\left | \left \{ g: \alpha_g \neq 0 \right \} \right | > 1$$. Suppose that the hyperplane othonormal to the maximin effect contains only $$b_g^0$$'s with nonvanishing coefficient $$\alpha_g$$,  i.e. $$ \left \{b_g^0:g=1,\ldots, G\right \}\cap \left \{M \in \mathbf{R}^p: \left \langle M-M_{\Sigma^0} (B^0), M_{\Sigma^0} (B^0) \right \rangle_{\Sigma^0} = 0  \right \} \subset  \left \{b_g^0: \alpha_g \neq 0  \right \}.$$ Then $$\lim_{n \to \infty} \mathbf{P}[M_{\Sigma^0} (B^0) \in \textbf{C}(\hat{B},\hat{\Sigma})] = 1-\alpha.$$

This theorem has been proved by Dominik Rothenhausler(2015). . It states that under some assumptions, the region defined by the above formula is exactly a valid $$(1-\alpha)$$-level confidence region of maximin effect in asymptotic way.

Theorem for Weak Convergence Limit Distribution
Suppose the same assupmtions as Theorem above are satisfied, then for $$n \to \infty$$, $$\sqrt{n}(M_{\hat{\Sigma}} (\hat{B})-M_{\Sigma^0} (B^0))\rightharpoonup \mathcal{N_\text{n}}(0, \sigma^2 \sum_{g \in A(B^0,\Sigma^0 )} D_g^t M_{\Sigma^0} (B^0))\Sigma^{-1}D_gM_{\Sigma^0}(B^0) + V(B_{A(B^0,\Sigma^0 )}^0, \Sigma^0)). $$

This Theorem gives the explicit form of the limit distribution, where $$D_g$$ denotes the derivative in direction $$b_g$$, and set $$A(B^0,\Sigma^0 )\subset \left \{ 1,\ldots, G\right \}$$ denotes indices $$g$$ which $$b_g$$ has nonzero coefficient $$\alpha_g$$.

And $$V(B_{A(B^0,\Sigma^0 )}^0,\Sigma^0)=0$$, if $$\Sigma^0$$ is known. Otherwise, $$V(\hat{B}_{A(\hat{B},\hat{\Sigma} )},\hat{\Sigma}) = \hat{D}(\hat{D}^t \hat{\Sigma} \hat{D})^{-1}\hat{D} \hat{C} \hat{D}(\hat{D}^t \hat{\Sigma} \hat{D})^{-1}\hat{D}$$ is a consistent estimator for the variance caused by substituting $$\Sigma^0$$ with $$\hat{\Sigma}$$. Here $$\hat{C}$$ is the sample covariance matrix of vectors $$\frac{1}{\sqrt{G}}\textbf{X}_k.^t \textbf{X}_k.M_{\hat{\Sigma}} (\hat{B}), k=1,\ldots,(nG)$$, and $$\hat{D}$$ is defined as $$\hat{D}:=(\tilde{b}_2,\ldots,\tilde{b}_{G'})-(\tilde{b}_1,\ldots,\tilde{b}_1)$$, with $$ {G}'= \left|A(\hat{B},\hat{\Sigma})\right| $$, and $$\tilde{B}=\hat{B}_{A(\hat{B},\hat{\Sigma})}.$$

Related Sources
Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society. Series B, 55:757-796. Category:Regression analysis