De-sparsified lasso

De-sparsified lasso contributes to construct confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in high-dimensional model.

High-dimensional linear model
$$ Y = X\beta^0 + \epsilon $$ with $$ n \times p $$ design matrix $$X =: [X_1,..., X_p]$$ ($$ n \times p$$ vectors $$X_j$$), $$\epsilon \sim N_n(0, \sigma^2_\epsilon I)$$ independent of $$X$$ and unknown regression $$p \times 1$$ vector $$\beta^0$$.

The usual method to find the parameter is by Lasso: $$ \hat{\beta}^n(\lambda) = \underset{\beta \in \mathbb{R} ^ p}{argmin} \ \frac{1}{2 n} \left\| Y - X \beta \right\| ^ 2 _ 2 + \lambda \left\| \beta \right\| _ 1 $$

The de-sparsified lasso is a method modified from the Lasso estimator which fulfills the Karush–Kuhn–Tucker conditions is as follows:

$$\hat{\beta}^n(\lambda,M) = \hat{\beta}^n(\lambda) + \frac{1}{n} M X^T(Y- X \hat{\beta}^n (\lambda)) $$

where $$M \in R ^{p\times p}$$ is an arbitrary matrix. The matrix $$M$$ is generated using a surrogate inverse covariance matrix.

Generalized linear model
Desparsifying $$l_1$$-norm penalized estimators and corresponding theory can also be applied to models with convex loss functions such as generalized linear models.

Consider the following $$1 \times p$$vectors of covariables $$x_i \in \chi\subset R^p$$ and univariate responses $$y_i \in Y \subset R$$ for $$ i = 1,...,n$$

we have a loss function $$\rho_\beta(y,x) = \rho(y, x \beta) (\beta \in R^p)$$ which is assumed to be strictly convex function in $$\beta \in R^p$$

The $$l_1$$-norm regularized estimator is $$ \hat{\beta}=\underset{\beta}{argmin}(P_n \rho_\beta + \lambda\left\| \beta \right\|_1)$$

Similarly, the Lasso for node wise regression with matrix input is defined as follows: Denote by $$\hat{\Sigma}$$ a matrix which we want to approximately invert using nodewise lasso.

The de-sparsified $$l_1$$-norm regularized estimator is as follows: $$ \hat{\gamma_j}:= \underset{\gamma \in R^{p-1}}{argmin}(\hat{\Sigma}_{j,j} - 2 \hat{\Sigma}_{j,/j} \gamma + \gamma^T \hat{\Sigma}_{/j,/j} \gamma + 2 \lambda_j\left\|\gamma\right\|_1$$

where $$\hat{\Sigma}_{j,/j}$$ denotes the $$j$$th row of $$\hat{\Sigma}$$ without the diagonal element $$(j,j)$$, and $$\hat{\Sigma}_{/j,/j}$$ is the sub matrix without the $$j$$th row and $$j$$th column.