Partially linear model

A partially linear model is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al. used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator  After that, in 1997, local linear method was found by Truong.

Algebra equation
The algebra expression of partially linear model is written as:

$$y_i=\delta_T^i\beta+f(T_i)+\mu_i$$

Equation components outline
$$\delta^i_T$$ and $$T_i$$: Vectors of explanatory variables. Independently random or fixed distributed variables.

$$\beta$$: To be measured Parameter.

$$\mu_i$$: The random error in statistics with 0 mean.

$$f(T_i)$$: To be measured part in partially linear model.

Assumption
Wolfgang, Hua Liang and Jiti Gao consider the assumptions and remarks of partially linear model under fixed and random design conditions.

When randomly distributed, introduce

$$L_j(T_i)=E(\delta_{i,j}|T_i)$$ and $$\mu_{i,j}=\delta_{i,j}-E(\delta_{i,j}|T_i)$$  (1)

$$E(||\delta_1 | |^3 )|T=t) $$is smaller than positive infinity when t value between 0 and 1, and the sum of covariance of $$\delta_1 - E(\delta_1|T_1)$$ is positive. The random errors μ are independent of $$(\delta_i,T_i)  $$,

When $$\delta_i $$ and Ti are fixed distributed, $$L_j  $$valued between 0 and 1, and $$\delta_i  $$satisfies $$\delta_{ij}=L_j(T_i)+\mu_{ij}  $$, where factor i values between 1 and n, and factor j value between 1 and p, Error factor $$\mu_{ij}  $$satisfies, $$\lim_{n \to \infty}1/n\sum_{i=1}^n \mu_i\mu_i^T=\sum  $$.

The least square (LS) estimators
The precondition of application of the least squares estimators is the existence of nonparametric component, and running at random distributed and fixed distributed cases.

Engle, Granger, Rice and Weiss's (1986) smoothing model should be first introduced, before applying the least squares estimators. The algebra function of their model is expressed as $$Y=\delta^T\beta+f(t)           $$ (2).

Wolfgang, Liang and Gao (1988) make an assumption that the pair (ß,g) satisfies $$1/n \textstyle \sum_{i=1}^n \displaystyle E\{Y_i-\delta_i^T\beta-f(T_i)\}^2=\text{min}1/n\textstyle \sum_{i=1}^n \displaystyle E\{Y_i-\delta_i^T-f(T_i)\}^2 $$ (3).

This means that for all $$1 \leq i \leq n $$, $$\delta_i^T\beta_1+f_1(T_i)=\delta_i^T\beta_2+f_2(T_i) $$.

So, $$f_1 = f_2$$ and $$\beta_1 = \beta_2 $$.

Under random distributed case, Wolfgang, Hua Liang and Jiti Gao assume that for all 1 ≤ i ≤ n, $$E[Y_i|(\delta_i,T_i)]=\delta_i^T\beta1+f_1(T_i)=\delta_i^T\beta_2+f_2(T_i)  $$ (4)

so, $$E\{Y_i-\delta_i^T\beta_1-f_1(T_i)\}^2=E\{Y_i-\delta_i^T\beta_2-f_2(T_i)\}^2+(\beta_1-\beta_2)^TE\{(\delta_i-E[\delta_i|T_i])(\delta_i-E[\delta_i|T_i]^T)\}(\beta_1-\beta_2) $$$$\beta_1=\beta_2  $$, due to the fact that $$E\{(\delta_i-E[\delta_i|T_i])(\delta_i-E[\delta_i|T_i]^T)\}  $$ is a positive number, as proved by function (1). So, $$f_j(T_i)=E[Y_i|T_i]-E[\delta_i^T\beta_j|T_i] $$established for all 1≤i≤n and j equals to 1 and 2 when $$f_1=f_2  $$.

Under fixed distributed case, By parameterizing factor  from smoothing model (2) as $$\{f(T_1),.........,f(T_n)\}^T=\omega_r and \omega_r = Q(Y-x\beta) $$where $$Q=\omega(\omega^T\omega)^{-1}\omega^T  $$.

By making same assumption as (4), which follows from assumption (1), $$\beta_1=\beta_2  $$and $$f_1=f_2  $$under the fact of $$1/nE\{(Y-X\beta_1-\omega_{r1})^T(Y-X\beta_1-\omega_{r1})\}=1/nE{(Y-X\beta_2-\omega_{r2})^T(Y-X\beta_2-\omega_{r2})}+1/n(\beta_1-\beta_2)^TX^T(1-Q)X(\beta_1-\beta_2)  $$.

Assuming factors $$\delta_i,T_i,Y_i $$(i here are positive integers) satisfies $$y_i=\delta_i^T\beta+f(T_i)+\mu_i  $$and establish positive weight functions $$\psi_{ni}(t)  $$. Any estimators of $$f(t) $$, for every $$\beta  $$, we have $$f_n(t;\beta)=\sum_{i=1}^n\psi_{ni}(t)(Y_i-\delta_i^T\beta)  $$. By applying LS criterion, the LS estimator of $$\beta_{LS}=\{(\tilde{\delta}^T\tilde{\delta})\}^{-1}\tilde{\delta}^T\tilde{Y} $$. The nonparametric estimator of $$f(n) $$  is expressed as $$\hat{f_n}(t)=\sum_{i=1}^n\psi_{ni}(t)(Y_i-\delta_i^T\beta_{LS})$$. So, When the random errors are identically distributed, the estimators of variance $$\sigma^2$$ is expressed as, $$\hat{\sigma}_n^2=1/n\sum_{i=1}^n(\tilde{Y_i}-\tilde{\delta_i^T}\beta_{LS})$$.

History and applications of partially linear model
The real-world application of partially linear model was first considered for analyzing data by Engle, Granger, Rice and Weiss in 1986.

In their point of view, the relevance between temperature and the consumption of electricity cannot be expressed in a linear model, because there are massive of confounding factors, such as average income, goods price, consumer purchase ability and some other economic activities. Some of the factors are relevance with each other and might influence the observed result. Therefore, they introduced partially linear model, which contained both with parametric and nonparametric factors. The partially linear model enables and simplifies the linear transformation of data (Engle, Granger, Rice and Weiss, 1986). They also applied the smoothing spline technique for their research.

There was a case of application of partially linear model in biometrics by Zeger and Diggle in 1994. The research objective of their paper is the evolution period cycle of CD4 cell amounts in HIV (Human immune-deficiency virus) seroconverters (Zeger and Diggle, 1994). CD4 cell plays a significant role in immune function in human body. Zeger and Diggle aimed to assess the proceed of disease by measuring the changing amount of CD4 cells. The number of CD4 cell is associated with body age and smoking behavior and so on. To clear the group of observation data in their experiment, Zeger and Diggle applied partially linear model for their work. Partially linear model primarily contributes to the estimation of average loss time of CD4 cells and adjusts the time dependence of some other covariables in order to simplify the proceed of data comparison, and also, the partially linear model characterizes the deviation of typical curve for their observed group to estimate the progression curve of the changing amount of CD4 cell. The deviation, granted by partially linear model, potentially helps to recognize the observed targets who had a slow progression on the amounting change of CD4 cells.

In 1999, Schmalensee and Stoker (1999) have used partially linear model in the field of economics. The independent variable of their research is the demand for gasoline in The United States. The primary research target in their paper is the relationship between gasoline consumption and long-run income elasticity in the U.S. Similarly, there are also massive of confounding variables, which might mutually affect. Hence, Schmalemsee and Stoker chose to deal with the issues of linear transformation of data between parametric and nonparametric by applying partially linear model.

In the field of environment science, Prada-Sanchez used partially linear model to predict the sulfur dioxide pollution in 2000 (Prada-Sanchez, 2000), and in the next year, Lin and Carroll applied partially linear model for clustered data (Lin and Carroall, 2001).

Development of partially linear model
According to Liang's paper in 2010 (Liang, 2010), The smoothing spline technique was introduced in partially linear model by Engle, Heckman and Rice in 1986. After that, Robinson found an available LS estimator for nonparametric factors in partially linear model in 1988. At the same year, profile LS method was recommended by Speckman.

Other econometrics tools in partially linear model
Kernel regression also was introduced in partially linear model. The local constant method, which is developed by Speckman, and local linear techniques, which was found by Hamilton and Truong in 1997 and was revised by Opsomer and Ruppert in 1997, are all included in kernel regression. Green et al., Opsomer and Ruppert found that one of the significant characteristic of kernel-based methods is that under-smoothing has been taken in order to find root-n estimator of beta. However, Speckman's research in 1988 and Severini's and Staniswalis's research in 1994 proved that those restriction might be canceled.

==== Bandwidth selection in partially linear model ==== Bandwidth selection in partially linear model is a confusing issue. Liang addressed a possible solution for this bandwidth selection in his literature by applying profile-kernel based method and backfitting methods. Also the necessity of undersmoothing for backfitting method and the reason why profile-kernel based method can work out the optimal bandwidth selection were justified by Liang. The general computation strategy is applied in Liang's literature for estimating nonparametric function. Moreover, the penalized spline method for partially linear models and intensive simulation experiments were introduced to discover the numerical feature of the penalized spline method, profile and backfitting methods.

Kernel-based profile and backfitting method
By introducing $$E(Y|T)={E(X|T)}^T\beta+g(T)$$

Following with $$Y-E(Y|T)=({X-E(X|T)})^T\beta+\epsilon$$

The intuitive estimator of ß can be defined as the LS estimator after appropriately estimating $$E(Y|T)$$ and $$E(X|T)$$.

Then, for all random vector variable $$\xi$$, assume $$\hat{E}(\xi|T)$$is a kernel regression estimator of $$E(\xi|T)$$. Let $$\tilde{\xi}=\xi-E(\xi|T),\textstyle \sum_{X|T} \displaystyle=cov{X-E(X|T)}$$. For example, $$\tilde{X}_i=X_i-E(X_i|T_i)$$. Denote $$Y=(Y_1,...,Y_n)^T$$X,g and T similarly. Let $$m_x(t)=E(X|T=t),m_y(t)=E(Y|T=t)$$. So $$\psi(m_x,m_y,\beta,Y,X,T)={X-m_x(T)}[Y-m_y(T)-{X-m_x(T)^T\beta}]$$

The profile-kernel based estimators $$\hat{\beta_p}$$solves,

$$0=\sum_{i=1}^n\psi(\hat{m_x},\hat{m_y},\beta,Y_i,X_i,T_i)$$

where $$\hat{m_x},\hat{m_y}$$are kernel estimators of mx and my.

The penalized spline method
The penalized spline method was developed by Eilers and Marx in 1996. Ruppert and Carroll in 2000 and Brumback, Ruppert and Wand in 1999 employed this method in LME framework.

Assuming function $$g(t)$$can be estimated by $$g(t,\tau)=\tau_0+\tau_1t+...+\tau_pt^p+\textstyle \sum_{k=1}^K \displaystyle b_k(t-\xi_k)^p$$

where $$p\geqslant1$$is an integer, and $$\xi_1<...<\xi_k$$are fixed knots, $$a_+=max(a,0).$$Denote $$\tau=(tau_0,...,\tau_p)^T$$Consider $$Y=X^T\beta+g(T,\tau)+\epsilon$$. The penalized spline estimator $$(\hat{\beta_{ps}^T},\hat{\tau_{ps}^T})^T of (\beta^T,\tau^T)^T$$is defined as follow

$$\sum_{i=1}^n [Y_i-X_i^T\beta_i-g(T_i,\tau)]^2+\alpha\sum_{k=1}^Kb_k^2$$

Where $$\alpha$$is a smoothing parameter.

As Brumback et al. mentioned in 1999, the estimator $$(\hat{\beta_{ps}^T},\hat{\tau_{ps}^T})^T $$is same as the estimator of $$\beta$$ based on LME model.

$$y=\Lambda(\beta^T,\tau^T)^T+Zb+\epsilon$$,

where $$\Lambda=\begin{pmatrix} x_{11} & ... & x_{1d} & 1 & T_1 & ... & T_1^p\\ x_{21} & ... & x_{2d} & 1 & T_2 & ... & T_2^p\\. & ... & . & . & . & ... & .\\ . & ... & . & . & . & ... & .\\. & ... & . & . & . & ... & .\\x_{n1} & ... & x_{nd} & 1 & T_n & ... & T_n^p \end{pmatrix}$$, $$Z=\begin{pmatrix} (T_1-\xi_1)^p & ... & (T_1-\xi_K)_+^p \\ (T_2-\xi_1)^p & ... & (T_2-\xi_K)_+^p\\. & ... & . \\ . & ... & . \\. & ... & . \\(T_n-\xi_1)^p & ... & (T_n-\xi_K)_+^p \end{pmatrix}$$

Where $$b=(b_1,...,b_k)^T \backsim (0,\sigma_b^2),\epsilon=(\epsilon_1,...,\epsilon_n)^T \sim (0,\sigma_\epsilon^2)$$, and $$\alpha=\sigma_\epsilon^2/\sigma_b^2$$. The matrix shows the penalized spline smoother for up above framework.