Talk:Instrumental variables estimation

It looks like hoax
This is fake technology. It is possible to improve result of least squares technique by projection matrix and it is elementary. The founding fathers of the method exploit elementary algebraic trick and pretend that they bring to science something new and powerful. Look, what we have


 * $$ y = X \beta + e $$

They use obscure notations. In linear algebra matrices are denoted by upper case letters and vectors by lower case. Bold font is used for matrices and vectors and regular for scalars. Indexes on vectors or matrices means an iteration for entire matrix or vector, when speaking about components, people use brackets []. Following this convention, everything becomes clear. We don't have scalars in this explanation, so we use same font. The unobserved vector error $$e$$ is correlated with known vector $$y$$ or with columns of $$X$$ or both, so usage of classical least squares gives biased estimation for $$\beta$$. The only thing that they actually do is a substitution of matrix $$X$$ by sum of two matrices of the same size


 * $$ X = N + Z$$

We can do that, of course. Why not? And we can choose $$Z$$, such that product of $$Z^T$$ and error $$e$$ is less correlated with vector $$y$$ compared to $$e$$. The error is not known but can be assumed. Now we plug second expression into first one and have


 * $$ y = N\beta + Z\beta + e$$

Next step is multiplication of both parts from left by $$Z^T$$ and that operation suppose to reduce error or make it less correlated with $$y$$. Next is multiplication by inverse $$(Z^T Z)^{-1}$$ (assuming it exists) and by $$Z$$ again. These cofactors form projection matrix $$P$$. Taking into consideration that $$P = P*P$$, we can multiply both parts by projection again, and come back to slightly modified original equation


 * $$ P y = P X \beta $$

The sum of $$N + Z$$ is replaced back by $$X$$. The error is assumed filtered or reduced. The rest is obvious. So the method is an error filtering based on assumed properties of the error, conducted by introducing filter matrix $$Z$$ and using it for constructing projection matrix. Construction of projection matrix $$P$$ does not require existence of inverse for $$(Z^T Z)$$, it can be done anyway. The founding fathers of the method blow its significance out of proportions by introducing the new terminology and obscure explanation with violation of standard notations, so readers can't understand what is going on. Read questions that people have. Matrix $$Z$$ is not instruments, it is filter for error vector. The improvement of result depends on luck with how $$Z$$ is constructed. Most of the time the improvement is very insignificant. Now it becomes clear why columns of matrix $$Z$$ should be correlated with columns of $$X$$ and not correlated with error. Since error is correlated with $$X$$ and not known, it is very challenging to make such $$Z$$ and this explains why it only can give slight improvement of result. If matrix $$Z$$ is not correlated with $$X$$ the projection built on it may kill the solution, that means bring even higher error into equation by reducing its rank. But insignificant improvement is, of course, possible.

Now imagine founding fathers of the method presented their concept in this clear and simple form. They would be that respected, honored and cited.

I can add how to make projection matrix in a quick and simple way when matrix $$Z$$ is degenerate (not full rank). It is known since 1971 (A.Bjorck, C. Bowie, SIAM Journal). Divide $$Z$$ by its Euclidian norm (sum of squares) and apply iterations as


 * $$ Z_{i+1} = 1.5 Z_i - 0.5 Z_i Z_i^T Z_i $$

When matrix stop changing it is ready. $$ P = Z_n Z_n^T $$. You don't need to compute $$P$$. Multiply system by $$Z_n^T$$ and then by $$Z_n$$, new system will have better estimate for $$\beta$$ if $$Z$$ is properly selected.

Isn't that simple? It is idea on the student's level purposely made looking complicated and revolutionary, laughable. Use it as lesson, always be critical. Some researchers believe that multiplication by projection matrix is modification of data and should not be applied. That is debatable. If $$P$$ is full rank matrix, it can't change the solution if your data is valid, and only filters the error. This is why 2SLS needs this justification for introduction of $$Z$$. It is so-called additional information, which is very stupid argument. If you have any information, it should be used to identify model in the first place, not applied when your identification failed miserably. Normally, all known information is included into model and no other information can be obtained.

Better and more examples needed
It's great that an example is provided, but it is not very clear and does help someone who is still learning about econometrics to understand this topic. Improving this example and adding others would be great. Thanks! ChelseaH (talk) 06:55, 4 April 2009 (UTC)

I agree, the effect of tobacco tax rates on smoking isn't clear to me. So i'm left guessing what they mean by tax rates, is it the countrys tobacco tax income? —Preceding unsigned comment added by 85.233.233.166 (talk) 18:28, 25 May 2009 (UTC)

The example that is provided is highly problematic, tobacco tax rate CANNOT be considered to act on health only through tobacco consumption...because...increased tobacco tax rates pushes consumption into higher income brackets. In turn higher income has been universally observed to be a protective health factor. In essence a higher tobacco tax rate selects for a population with better determinants of health.

?
I think it would be a good idea to combine the two pages, as they are on the same topic, altho one is a simple algebraic derivation of the IV estimator and the other is a more discursive presentation of the purposes and assumptions underpinning IV estimation. The two pages are quite complementary.

I agree, merge them. SCB

- There is an important subtlety that is missing from the discussion, and is even often missing in texts. OLS still provides an excellent predictor of response under the current data generating process (DGP). In other words, if I know market price and I want to predict market quantity, OLS does a fine job. On the other hand, if I want to recover the structural equations, supply and demand, I need 'IV'. That allows me to predict quantity given an exogenous shock to price, such as a tax. So, when we say a consistent estimator, it's a little ambiguous. We ought to be saying what we want to estimate consisistently.

Just to be clear, suppose we are trying to guess future earning based on IQ. However, IQ is measured with an additive error. Suppose Jim's measured IQ is 100 and I want to guess his future earnings. I'll do just fine plugging his measured IQ into an OLS estimate, so long as the measurement error for him follows the same DGP as the data for the regression. On the other hand, if I want to know the impact of giving Jim a pill that increases his true IQ by 10 points, then I need an IV estimator. That's because the slope coefficient is biased towards zero to adjust for the measurement error. But, I am adjusting the IQ by a known amount, and thus want to have the true slope coefficient for true (not measured) IQ.

This may be too fine a point for the article, but it's one I often see misunderstood in papers. The ambiguity being what we are trying to estimate: a structural model, or a good predictor under the current DGP. Derex @ 23:55, 8 October 2005 (UTC)

> The slope estimator thus obtained is unbiased.

I think the slope estimator is consistent, but biased. —Preceding unsigned comment added by 125.2.48.91 (talk • contribs)


 * I take it you're referring to the text about the simple IV? Both are right, depending on the experimental assumptions.  If X is viewed as fixed in repeated samples, then it is unbiased.  If X is viewed as random, then IV is consistent but likely biased.  The latter is the realistic case, though the former is often presented in introductory texts.  Probably should change it though, as "consistent" is always correct. Derex 02:54, 1 April 2006 (UTC)

2 stage?
what are the corresponding regression equation for each stage? This is very unclear.
 * stage 1: Xi=Zb+residual  we get Xihat=Zbhat=Z(Z'Z)^(-1)Z'Xi.
 * Xhat=(Xihat ...)

Jackzhp (talk) 04:30, 25 March 2008 (UTC)
 * stage 2: y=Xhat*beta+residual.

Hypothesis testing
The discussion under "hypothesis testing" is simply wrong. The first moment of the simple IV estimator doesn't exist, therefore, the estimator is neither biased nor unbiased. The normality result is asymptotic; in small samples the coefficients are not normally distributed, and the t-ratio does not follow a Student distribution. Since this section is superfluous it would be wise to delete it.

A minor problem is the second sentence is incoherent: ``endogeneity" means that the error term and regressors are correlated, so the piece should not claim that one reason that the error term and the covariates may be correlated is endogeneity! 68.146.25.175 (talk) 22:28, 20 April 2008 (UTC)

Major edit
I have substantially overhauled this page. I entirely rewrote the previous hypothesis testing and "testable implications" sections, which were, as noted above, simply wrong. An example has been added (this material could be expanded and moved to a new section), the bits on weak and invalid instruments have been largely rewritten, a misleading claim about nonlinear models has been modified, and finite sample properties and overidentification tests are discussed. Several references to influential papers have been added. Some of the previous material could be better referenced, and a few more examples would help illustrate ideas. —Preceding unsigned comment added by 68.146.25.175 (talk) 19:21, 22 July 2008 (UTC)

Estimation: describing the DGP
The start of the "Estimation" section begins with a specification of where the data come from. Describing that equation as the "model" may be misleading, as we specifically do not mean that this is the approximation we are using to model the data but rather the actual process generating the data. The best descriptive would be the technical term "data generating process." There is no WP page on that term, however. Somewhat more opaque would be referring to the equation as the "population process." A correct but more generic substitute which actually has a WP page would be "stochastic process." It is not "STANDARD" (I see no call to YELL at me) to fail to distinguish between the estimable model and the data generating process. Sked123 (talk) 16:00, 7 August 2008 (UTC)

Paragraph on limitations seems out of place in "Example" Section.
The section "Example" ends with a paragraph that is a discussion of the limitations of the IV approach.


 * Because demonstrating that the third variable 'z' is causally related to 'y' exclusively via 'x' is an experimental impossibility, and because the same limitations...

Should this be moved to a different section? If so, which?

Blossomonte (talk) 06:59, 15 January 2014 (UTC)

Never defined
The term "Instrumental variable" is either poorly defined, or else not defined at all.

The first sentence in the lede says what the method is used for. The second sentence says what the method allows you to do (and, worse, says this by jumping immediately into mostly incomprehensible technical jargon.) Neither section says what an instrumental variable actually is.

There is, later, a section called "definition", but it starts out as history, and then moves to math that does not really define an instrumental variable (it starts by labeling an instrumental variable as "Z" which is defined in terms of an equation which does not include Z)... then goes on to say that this definition is confusing and ambiguous. Yes, I agree.

In simple language: what is an instrumental variable? Is it a proxy? Is it what you read out from an instrument?Geoffrey.landis (talk) 17:03, 9 November 2015 (UTC)

Dr. Angrist's comment on this article
Dr. Angrist has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"This is weak. Its hard to tell what problem IV solves - the problem of selection bias. And its impossible from this entry to see how IV solves this problem.  The easiest way to explain selection bias is with potential outcomes and an empirical example using the simplest Wald-type IV estimator, which involves only four numbers and produces an intuitive calculation that a non-technical reader easily grasps.

As it stands the discussion is entirely abstract; the graphical causal framework only makes this worse.

In any case I would propose that a standard for econometric and statistical entries require empirical examples illustrating use of the technique."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Angrist has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference : Angrist, Joshua & Chen, Stacey, 2008. "Long-Term Economic Consequences of Vietnam-Era Conscription: Schooling, Experience and Earnings," IZA Discussion Papers 3628, Institute for the Study of Labor (IZA).

ExpertIdeasBot (talk) 03:01, 28 May 2016 (UTC)

Dr. Santos Silva's comment on this article
Dr. Santos Silva has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"This is generally a good article and I only have minor suggestions/corrections.

1 - The GMM estimator that is presented is just the two-stage least squares estimator presented later, so there is some duplication. I would delete the reference to GMM.

2 - If the reference to GMM stays, I would delete the proof that is collapses to the usual IV estimator in the just identified case.

3 - In the interpretation of the two-stage least squares estimator it is important to state that this approach is only valid in linear models; doing two-stage estimation in non-linear models leads to the so called "forbidden regression".

4 - I would omit the proof of the computation of the 2SLS estimator.

5 - The section on testing instrument strength and overidentifying restrictions has a common error: The test for overidentifying restrictions is not a test for instrument validity; the two references below clarify this:

Parente, P.M.D.C. and Santos Silva, J.M.C. (2012), A Cautionary Note on Tests for Overidentifying Restrictions, Economics Letters, 115(2), pp. 314–317.

Guggenberger, Patrik, (2012), A note on the (in)consistency of the test of overidentifying restrictions and the concepts of true and pseudo-true parameters, Economics Letters, 117(3), pp. 901-904."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Santos Silva has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference : J.A.F. Machado & J.M.C. Santos Silva, 2003. "Identification with averaged data and implications for hedonic regression studies," Econometrics 0303002, EconWPA.

ExpertIdeasBot (talk) 13:33, 11 June 2016 (UTC)

Dr. Anatolyev's comment on this article
Dr. Anatolyev has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"Description is very specific to one of desciplines. For an econometrician like me, it sounds very subjective. Even some notation is unfamiliar.

The introduction of instruments and whole description leans on the causal linear model. However, there are other, different contexts when instruments are intensively used, primarily in time series where instruments are generated from past values of variables.

Missing are mentions of the optimal instrumentation theory, the theory of many instruments and many weak instruments, GMM as an embedding framework, etc."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

We believe Dr. Anatolyev has expertise on the topic of this article, since he has published relevant scholarly research:


 * Reference : Stanislav Anatolyev & Nikolay Gospodinov, 2008. "Specification Testing in Models with Many Instruments," Working Papers w0124, Center for Economic and Financial Research (CEFIR).

ExpertIdeasBot (talk) 20:36, 1 July 2016 (UTC)

Dr. Carrasco's comment on this article
Dr. Carrasco has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"This article gives a good overview of the topic. It is a bit long. I found a small typo:

" IV helps to fix this problem by identifying the parameters β → {\displaystyle {\vec {\beta }}} {\vec {\beta }} not based on whether x {\displaystyle x} x is uncorrelated with u {\displaystyle u} u, but based on whether another variable z {\displaystyle z} z (or set of variables) is (are) uncorrelated with u {\displaystyle u} u." There should be no arrow on beta.


 * undefined"

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

We believe Dr. Carrasco has expertise on the topic of this article, since he has published relevant scholarly research:


 * Reference : Marine Carrasco & Rachidi Kotchoni, 2013. "Efficient Estimation Using the Characteristic Function," Working Papers hal-00867850, HAL.

ExpertIdeasBot (talk) 15:29, 24 August 2016 (UTC)

Close paraphasing in the article? (plaigarism)
The smoking/general health example, and the proximity/tutoring example seem to be a superficial rewrite of the examples on this page:

http://www.statisticshowto.com/instrumental-variable/

Which uses smoking/depression and proximity/community centre.

Sentence after sentence matches. — Preceding unsigned comment added by 130.113.148.220 (talk) 21:33, 22 February 2017 (UTC)

3SLS
A group of PhD students were made to write a number of articles on econometrics topics. Encyclopedia quality is poor and they were blocked for not disclosing that they made paid edits, but to make at least some use of their work maybe someone can look at Draft:Three-Stage Least Squares Estimator and see what if any of it can be used. Antimanipulator (talk) 16:28, 28 September 2017 (UTC)

Endogeneity with an exponential regression function
The same group from also wrote Endogeneity with an exponential regression function, which contains salvageable material on instrumental variables estimation in Poisson regression. I suggest that the usable material be merged into a new section here. Wikiacc (¶) 14:57, 22 May 2019 (UTC)
 * Pretty ugly stuff. Nevertheless, support merge and suggest some refinement in situ. ✅ Klbrain (talk) 07:06, 6 July 2020 (UTC)

Another merge proposal
Similar to the previous section, Binary response model with continuous endogenous explanatory variables might be worth merge here, if not deleting. Thoughts? Klbrain (talk) 21:31, 19 July 2020 (UTC)
 * I created a header under which it could go. As with the Poisson section, a merge is worthwhile, though serious cleanup will be needed. Wikiacc (¶) 00:32, 26 July 2020 (UTC)


 * The article in question, Binary response model with continuous endogenous explanatory variables, describes a residual inclusion, not an IV method. If anything, it should be merged into control function (econometrics). --bender235 (talk) 22:51, 2 August 2020 (UTC)
 * Fair point. But isn't also a residual inclusion method? I'm inclined to think the two should stick together, but that could be at the control function article. Wikiacc (¶) 23:12, 2 August 2020 (UTC)


 * You're right. Unfortunately, the definition of IV is very fuzzy. In the narrow sense, the term "IV estimation" only describes the special case of just-identified two-state least squares (2SLS). But since the "idea" of an instrumental variable (a variable that is excluded from the main equation of a model, or "second" stage of a regression) can be extended into non-linear models, the term is now used rather loosely in the literature. I'm not sure whether we should follow that trend.
 * In the end, the "classic" projection version of 2SLS, of which IV is the aforementioned special case, can also be expressed as residual inclusion (i.e., both approaches give the same estimate). However, only the residual inclusion works also in non-linear models. So again, calling this an "IV method" only leads to confusion, especially among those just learning econometrics. But then again, the excluded variable(s) from the first stage of residual inclusion methods are often referred to as "instrument(s)" nonetheless.
 * Long story short: I'm not quite sure whether it's best for us to adopt a narrow IV definition (which the risk of potentially contradicting some textbooks) or stick to the wide IV definition (with the risk of this article turning into a hodge podge of vaguely related methods). --bender235 (talk) 01:53, 3 August 2020 (UTC)
 * This article is messy at present, but I agree that focusing on the linear model will help keep the article from sprawling further. (I'd keep keep overidentified IV here, though.) I support limiting this article to the linear model and developing a section at the control function article about residual inclusion in nonlinear models (both Poisson and probit). Wikiacc (¶) 02:20, 3 August 2020 (UTC)


 * I like the idea of splitting the topic along linear vs. non-linear models ("projection inclusion" vs. residual inclusion). But it's going to be some work to reorganize and rewrite these articles. Unfortunately, giving each of the technical terms that being thrown around in the context of IV estimation at large its own article will lead to a lot of overlap and redundancy (endogeneity, exclusion restriction, identification, reduced form, etc.), not to mention the number of related techniques that almost synonymous with IV (indirect least squares, limited information maximum likelihood). So I guess some detailed planning would be great. --bender235 (talk) 18:19, 3 August 2020 (UTC)
 * The tricky part will be separating "instrumental variables estimation" from simultaneous equations model. I'm not sure what the solution is there. But fortunately the linear/nonlinear split won't affect that at all. Wikiacc (¶) 01:36, 4 August 2020 (UTC)
 * Wow, I didn't even think of that one. But you're right, that's another article with overlap and redundancy potential. Unfortunately I won't have time to get more involved in any large-scale reorganization any time soon. --bender235 (talk) 02:20, 5 August 2020 (UTC)

Rebooted merge proposal
As a result of the discussion above, I have revised the merge proposal to the following: Binary response model with continuous endogenous explanatory variables and both get merged to a new section in control function (econometrics) about residual inclusion methods in nonlinear models. Wikiacc (¶) 02:44, 5 August 2020 (UTC)
 * At the binary response model article, there is a method estimated purely by conditional MLE. Separating that from the residual inclusion method seems silly. Therefore, I suggest the following revision. Instead of control function (econometrics), how about making the target a new article? Say, nonlinear simultaneous equations model (by analogy with simultaneous equations model). This new page could also contain material on the "forbidden regression" (see above) and other related topics, focusing more on the setting than on the particular method. Wikiacc (¶) 03:33, 14 August 2020 (UTC)
 * Scratch that new proposal—too ambitious. I have moved the Poisson regression section to control function (econometrics). Not exactly sure what to do with Binary response model with continuous endogenous explanatory variables. Wikiacc (¶) 04:29, 11 December 2020 (UTC)