User:Michael W. Morrison/sandbox

= Coverage Error = In statistical sampling, coverage error is a type of non-sampling error that occurs when there is not a one-to-one correspondence between the target population and the data frame from which a sample is drawn.

Description
Coverage error occurs when there is a misalignment between the data frame and the target population. For example, a researcher may wish to study the opinions of registered voters (target population) by calling residences listed in a telephone directory (data frame). It is likely that the phone numbers of some registered voters are not listed in the directory, resulting in undercoverage of the target population, and potential bias if the characteristics of voters listed in the directory differ from those who are unlisted. Bias can also occur if some registered voters are over-represented in the directory, as could occur if some residences have more than one telephone listed in the directory. Bias also might occur if some of the phone numbers listed in the directory do not belong to registered voters.

Ideally, the sampling frame would list only members of the target population. Additionally, an ideal sampling frame would list each member of the target population exactly one time so that each member has an equal likelihood of being represented. In practice, developing and maintaining such a sampling frame is difficult. For example, the U.S. Census Bureau prepares and maintains a Master Address File of some 144.9 million addresses that it uses as a sampling frame for the U.S. Decennial census and other surveys. Despite the efforts of some 111,105 field representatives and an expenditure of nearly half a billion dollars, the Census bureau still found a significant number of addresses that had not found their way into the Master Address File.

Quantifying and Correcting for Coverage Error
Many different methods have been used to quantify and correct for coverage error. Often, the methods employed are unique to specific agencies and organizations. For example, the United States Census Bureau has developed models using U.S. Postal Service's Delivery Sequence File, IRS 1040 address data, commercially available foreclosure counts, and other data to develop models capable of predicting undercount by census block. The Census Bureau has reported some success fitting such models to Zero Inflated Negative Binomial or Zero Inflated Poisson (ZIP) distributions.

Another method for estimating the degree of coverage error uses the capture/recapture method used to estimate wildlife population sizes. In capture/recapture methods, a sample is taken directly from the population, marked, and re-introduced to the population. Another sample is then taken from the population, and the proportion of previously captured samples is used to estimate actual population size. This method can be extended to determining the validity of the sampling frame by taking a sample directly from the target population ("capture") and then taking another sample from the data frame ("re-capture") in order to estimate under-coverage.

See Also:
Sampling Error

References:
User:The Jedi Math Squirrel/sandbox