User:Dylangroves/sandbox

= Heterogeneous Treatment Effects = Treatment effect heterogeneity refers to variability in causal effects of a treatment across a population of subjects. Most experimental research in the social sciences focuses on identifying the average treatment effect (ATE) of an intervention, but it is often implausible to assume that all subjects are equally affected by an intervention. Researchers often have strong theoretical reasons for investigating conditional average treatment effects (CATE) for sub-groups and contexts of theoretical interest. Research on heterogeneous treatment effects can clarify the reasons that interventions are effective (or ineffective), and help implementers target interventions towards populations most likely to benefit from them. However, standard approaches to the study of heterogeneous treatment effects are beset by challenges related to multiple comparisons and causal inference. These obstacles motivate approaches to HTEs based in experimental design (treatment-by-treatment interactions) and machine learning.

Definition
Given a binary treatment $$Z_i$$ and continuous outcome $$Y_i$$, we can define the treatment effect $$\tau$$ for any subject $$i$$ as $$\tau_i = Y_i(1) - Y_i(0)$$. The constant effects assumption implies that $$Y_i(1) - Y_i(0) = \tau \  \forall \ i$$. Alternatively, heterogeneity in treatment effects implies that $$Var(t_i) > 0$$. Treatment effect heterogeneity is therefore defined as the variance of the treatment effect across a given population.

Examples
Treatment-covariate interactions have received increased attention by experimental researchers. In a 2012 review of field experimental research published in top-10 economics journals, Fink et al find that 76% of articles estimated sub-group treatment effects and 47% explicitly tested treatment-by-covariate interactions.

Prominent findings in the social scientific of treatment effect heterogeneity:


 * Anderson (2008) re-analyzes three interventions on early childhood education and finds substantial short and long-term benefits for girls but no benefits for boys.
 * Bjorkman and Svensson (2011) find that treatment effects of a community health-clinic monitoring program were significantly higher in communities with lower levels of income inequality and ethnic fractionalization.
 * Hastings, Kang, and Staiger (2006) find that a lottery for school choice had positive effects for white female students but not other sub-groups.
 * Kling and Leibman (2004) find that a randomized government program giving parents the opportunity to move to non-poverty neighborhoods had substantial positive effects on education, mental health, and crime outcomes for girls but negative effects for boys.

Alternatively, a number of studies have observed substantial homogeneity of treatments across sub-populations:


 * Ashraf, Karlan, and Yin (2006) find no evidence of heterogeneity in the effect of a a commitment savings program in the Philippines across a wide array of demographic and economic covariates.
 * Coppock, Leeper, and Mullinix (2018) find substantial homogeneity of survey experiment treatments across demographic sub-populations.
 * Duflo, Kremer, and Robinson (2008) find no evidence of heterogeneity in the effects of a fertilizer intervention in Kenya across a range of socio-economic and environmental covariates.

Testing for Heterogeneity
The null hypothesis of no treatment effect heterogeneity implies that $Var(\tau_i) = Var(Y_i(1) - Y_i(0)) = Var(Y_i(1)) + Var(Y_i(0)) - 2Cov(Y_i(1), Y_i(0))$. However, because we never observe $$Y_i(1)$$ and $$Y_i(0)$$ at the same time for any subject $$i$$, we cannot directly determine $$Cov(Y_i(1), Y_i(0))$$ because we do not know the joint distribution of any subject's potential outcomes.

Given the above obstacle, there are three prominent strategies for testing for treatment effect heterogeneity.


 * 1) Comparison of treatment and control group outcome variances: we can test the null hypothesis that $$Var(Y_i(0)) = Var(Y_i(1))$$ by comparing the difference in variances in the observed data against the differences in variance generated under many re-randomizations of treatment assignment (see randomization inference). However, the test is under-powered, cannot identity heterogeneity when treatment effects vary but average variance remains stable, and may be biased if the ATE is biased.
 * 2) Comparison of Cumulative Distribution Functions (CDFs): Ding, Feller, and Miratrix (2016) recommend testing for changes to the cumulative distribution function, which should shift by a constant term under the null hypothesis of homogeneous treatment effects.
 * 3) Bounding: Heckman, Smith, and Clements (1997) propose a strategy for estimating the bounds of $Var(\tau_i)$ . Although we cannot observe the true pairing of $$Y_i(1)$$ and $$Y_i(0)$$ we can estimate the variance of $Var(\tau_i)$  under the most conservative (the lowest observed value in the control condition is matched to the lowest observed value in the treatment condition, the second-lowest to the second-lowest, etc) and the most extreme (the lowest control value to highest treatment value, second lowest control to second highest treatment, etc) possibilities. This tends to produce very wide bounds.

Conditional Average Treatment and Interaction Effects
Assuming the presence of treatment effect heterogeneity, researchers may be interested in exploring treatment effects in particular sub-populations (Conditional Average Treatment Effects) or testing for differences in treatment affects across several sub-populations (interaction effects).

Conditional Average Treatment Effects (CATE)
The CATE is the average treatment effect among a specific sub-population or treatment context. Estimating the CATE is straightforward as long as the variable used to identify the sub-population or context was generated before the treatment was implemented. However, hypothesis testing of CATEs is complicated by the multiple comparisons problem (see below).

Treatment-by-Covariate Interactions
An interaction effect is the change in treatment that occurs between sub-populations. The interaction effect is the difference in treatment effects between two CATEs. We can evaluate interaction effects using regression. If $Y_i$ is an outcome of interest (for example, agriculture output), $Z_i$  is a treatment variable (for example, receipt of fertilizer), $X_i$  is a pre-treatment covariate (for example, gender of farmer), we can estimate the following equation:$$Y_i = \alpha + \beta \Zeta_i + \gamma X_i + \delta \Zeta_i X_i + \epsilon_i $$Where $ \delta$  is the difference in the average treatment effect of receiving fertilizer fro men and for women.

Estimating treatment-by-covariate interaction effects suffers two important complications

Causal Inference: Treatment-by-covariate interactions are helpful descriptive and predictive exercises, but cannot be considered causal statements because they are non-experimental in nature. For example, if a treatment effect varies between old and young subjects, the difference may be driven by age or any number of characteristics correlated with age, like employment, education, and marital status

Multiple Comparisons Problem: Especially when there are a large number of covariates to use for testing treatment-by-covariate interaction effects, the probability of finding at least one statistically significant interaction by chance alone is high. To account for the risk of identifying interaction effects generated by chance alone, researchers can:


 * Reduce and pre-register the number of tests . To avoid the risk of cherry-picking sub-groups, researchers can pre-register tests for heterogeneous treatment using a Pre-Analysis Plan.
 * Adjust p-values for multiple hypotheses . There are a number of recommended strategies for correcting for multiple hypotheses, including the Bonferroni Correction, Family-wise error rate controls, the Westfall-Young stepdown procedure, and False Discovery Rate control methods.
 * Re-phrase the null hypothesis . The null hypothesis can be restated as whether the array of CATE's generated very more than one would expect by chance.
 * Await replication . If the number of interaction tests is large or unknown, researchers can consider interaction effects as exploratory exercises and await replication in similar studies.

Treatment-by-Treatment Interactions
One strategy for overcoming obstacles related to drawing causal inferences from treatment-by-covariate interactions is to experimentally manipulate the covariate of interest. A factorial design overlaps one set of randomly assigned treatment conditions with another set of randomly assigned treatments. Because both treatments were randomly assigned, their interaction can be interpreted causally.

Machine Learning Methods
To deal with large numbers of covariates and minimize researcher discretion, many researchers have turned to machine learning methods for identifying and testing heterogeneous treatment effects. The key benefit of machine learning techniques is that they are automated, which reduces the ad hoc nature of the search for interactions. Common machine learning techniques include Bayesian Additive Regression Trees (BART), recursive partitioning, support vector machines, classification and regression trees, random forests, kernel regularized least squares (KRLS). Ensemble methods, which estimate weighted averages of multiple machine learning approaches.