Fay–Herriot model

The Fay–Herriot model is a statistical model which includes some distinct variation for each of several subgroups of observations. It is an area-level model, meaning some input data are associated with sub-aggregates such as regions, jurisdictions, or industries. The model produces estimates about the subgroups. The model is applied in the context of small area estimation in which there is a lot of data overall, but not much for each subgroup.

The subgroups are determined in advance of estimation and are built into the model structure. The model combines, by averaging, estimates of fixed effects and of the random effects type. The model is typically used to adjust for group-related differences in some dependent variable.

In random effects models like the Fay–Herriot, estimation is built on the assumption that the effects associated with subgroups are drawn independently from a normal (Gaussian) distribution, whose variance is estimated from the data on each subgroup. It is more common to use a fixed-effects model instead for many systematically different groups. A mixed random effects model like the Fay–Herriot is preferred if there are not enough observations per group to reliably estimate the fixed effects, or if for some reason fixed effects would not be consistently estimated.

The Fay–Herriot is a two-stage hierarchical model. The parameters of the distributions within the groups are often assumed to be independent, or it is assumed that they are correlated to those measured for another variable.

Model structure and assumptions
In classical Fay–Herriot (FH), the data used for estimation are aggregate estimates for the subgroups based on surveys.

The model can also be applied to microdata. Consider rows of observations numbered j=1 to J, in groups from i=1 to I, with predictive data $$X_{ij}$$ for dependent variable $$Y_{ij}$$. If the model includes random effects only, it can be expressed by:


 * $$ Y_{ij} = \mu + \beta X_{ij} + U_i + \epsilon_{ij} $$

A probability distribution is assumed for the random effects $$U_i$$, typically a normal distribution. A different distribution can be assumed, e.g. if the sample distribution is known to have heavy tails.

Often fixed effects are included, making it a mixed model, with auxiliary data and economic or probability assumptions that make it possible to identify these effects separately from one another and from sampling variation $$\epsilon_{ij}$$.

Estimation
The parameters of interest including the random effects are estimated together iteratively. Methods can include maximum likelihood estimation, the method of moments, or a Bayesian way.

Fay–Herriot models can be characterized either as mixed models, or in a hierarchical form, or a multilevel regression with poststratification.

The resulting estimates for each area (subgroup) are weighted averages from the direct estimates and indirect estimates based on estimates of variances.

Tests of consistency
For random effects models to make consistent estimates, it is necessary that the subgroup-specific effects be uncorrelated to the other predictor variables in the model. If the subgroup-specific effects are correlated, then random effects estimation would be biased but fixed effects estimation would not be biased.

That correlation can be tested by running both the fixed effects and the random effects models and then applying the Hausman specification test. The test may not reject the hypothesis of no-correlation even when it is false, a Type II error, so that it cannot be definitively concluded that random effects estimation is unbiased even if the Hausman test fails to reject.

History
Robert Fay and Roger Herriot of the U.S. Census Bureau developed the model to make estimates for populations in each of many geographic regions. The authors referred to the method as a James–Stein procedure and did not use the term "random effects." It is an area-level model. The model has been used for the same purpose, called small-area estimation, by other U.S. government agencies.

Rao and Molina's small area estimation text is sometimes characterized of as a definitive source about the FH model.

Applications
The FH model is used extensively in the Small Area Income and Poverty Estimates (SAIPE) program of the U.S. Census Bureau.