User:Afelton/IV

An instrumental variable (or instrument) can be used in regression analysis to deal with an independent variable that is correlated with the dependent variable. This is also known as endogeneity. If this situation exists, ordinary linear regression will not provide consistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the regression, that is correlated with the explanatory variable, and that is uncorrelated with the error term.

There are three main requirements for using an IV:
 * The instrument must be correlated with the model's predicting variable
 * The instrument cannot be correlated with residuals in the second stage model (that is, the instrument cannot suffer from the same problem as the original predicting variable).
 * The instrument must act on the outcome only through the predicting variable, not directly.

Mathematics
For the regression equation $$ y_i = x_i \beta + \epsilon_i,$$ the least squares estimator for $$\beta$$ is


 * $$ \hat{\beta} = \frac{\sum_i x_i y_i}{\sum_i x_i^2} = \frac{\sum_i x_i (x_i \beta + \epsilon_i)}{\sum_i x_i^2} =

\beta + \frac{\sum_i x_i \epsilon_i}{\sum_i x_i^2}.$$

When x and $$ \epsilon$$ are uncorrelated, the second term goes to zero in the limit and the estimator is consistent. If they are correlated, however, the estimator is biased.

An instrumental variable is one which is correlated with the independent variable but not with the error term. The estimator is


 * $$ \hat{\beta} = \frac{\sum_i z_i y_i}{\sum_i z_i x_i} = \frac{\sum_i z_i (x_i \beta + \epsilon_i)}{\sum_i z_i x_i} = \beta + \frac{\sum_i z_i \epsilon_i}{\sum_i z_i x_i}.$$

When z and $$ \epsilon$$ are uncorrelated, the final term vanishes in the limit providing a consistent estimator.

Another method of implementing the technique is a two-stage least-squares (2SLS) approach. Under the 2SLS approach, in a first stage, the question predictor is regressed on the instrument using OLS and the predicted value of the question predictor is computed for each observation in the dataset. If the instrument is exogenous, the these predicted values will contain only the exogenous part of the original endogenous question predictor and can be used in place of the question predictor in a second model to examine the relationship between outcome and question predictor. Therefore, in the second stage of the estimation, the outcome is regressed not on the original question predictor but on the predicted values of the question predictor and the corresponding regression slope parameter is estimated by OLS methods. The slope estimate obtained is then an unbiased estimate of the hypothesized relationship between outcome and question predictor.

Additional predictors are usually added to both the first and second stage models in order to improve the precision of the estimation. It is also usually recommended that the same covariates be added to both the first- and second-stage models in order to avoid simultaneous equations bias. A small correction must be made to the sum-of-squared residuals in the second-stage fitted model in order that the associated standard errors be computed correctly.

Applications and Problems
The use of the instrumental variables estimation technique often provides a useful, convenient and ethical alternative to the classical randomized experiment. In the randomized experiment, exogenous variation in treatment is provided by the random assignment of participants to the treatment and control conditions, causing the investigator to deny the treatment to the control participants. Using IVE, participants can be permitted to self-select into treatment and control, and the investigator can subsequently tease out the exogenous component of the treatment variation using the instrument. Of course, one does not get anything for nothing -- the IVE technique is only as good as the instruments it employs.

The technique is useful for solving the errors in variables problem and for the recovery of structural parameters from simultaneous equations models such as supply and demand. Unfortunately, there is no way to prove that the independent variables are not correlated with the error term, since the error is by definition unobservable. Consequently, one problem is in the selection and defense of suitable instruments. Good instruments are often created by exogenous policy changes (i.e., the cancellation of federal student aid scholarship program), geographic differences in the application of standards (i.e., different states implement different passing standards for a common exam) or generic randomness (e.g., the Vietnam Draft Lottery) have led to exogenous disruptions in the values of the construct being measured by the selected instrument.

Another problem is caused by the selection of "weak" instruments. These are instruments that are very poor predictors of the endogenous question predictor in the first-stage equation. In this latter case, the prediction of the question predictor by the instrument will be poor and the obtained predicted values will have very little variation. Consequently, they are unlikely to have much success in predicting the ultimate outcome when they are used to replace the question predictor in the second-stage equation.