User:The Jedi Math Squirrel/sandbox

Coverage Error
Coverage error is one type of Total survey error. It is a type of non-sampling error that occurs when there is not a one-to-one correspondence between the target population and the data frame from which a sample is drawn. In survey sampling, a Sampling frame is used to draw a random sample from the population. In a census, a sampling frame is still used, but the intent is to include the entire population. Differences between the target population of the survey and the sample frame result in coverage error.

For example, suppose a researcher is using Twitter to determine the opinion of U.S. voters on a recent action taken by the U.S. President. In this example, Twitter users are the frame, and voters are the target population. Because not all voters are twitter users, there is a misalignment between the target population and the frame that could result in undercoverage of voters. If the demographics and opinions of Twitter using voters is not representative of the target population of voters, then the results of the Twitter poll are likely to be biased.

Overcoverage results when some members of the target population are overrepresented in the  data frame. In the previous example, it is possible that some Twitter users have more than one Twitter account, so they are more likely to be included in the poll than Twitter users with only one account, resulting in another potential for bias. It is also likely that many Twitter users are not voters. Including their opinions in the Twitter poll may also bias the results of the researcher's poll.

A Longitudinal study is particularly susceptible to undercoverage due to the evolving of populations over time. Also, not all users are assigned to an individual, and one individual might have multiple accounts. Therefore, the data source will introduce a type of error called overcoverage. Overcoverage is the error that results when data exists for entities that should not be counted or entities are counted more than once. The result of overcoverage and undercoverage is Sampling bias.

As another example, a researcher may wish to study the opinions of registered voters (target population) by calling residences listed in a telephone directory (data frame). It is likely that the phone numbers of some registered voters are not listed in the directory, resulting in undercoverage of the target population, and potential bias if the characteristics of voters listed in the directory differ from those who are unlisted. Bias can also occur if some registered voters are over-represented in the directory, as could occur if some residences have more than one telephone listed in the directory. Bias also might occur if some of the phone numbers listed in the directory do not belong to registered voters.

Ways to Quantify Coverage Error
Many different methods have been used to quantify and correct for coverage error. Methods employed in Mathematical statistics in identifying a plausible Statistical model can be applied. Often, the methods employed are unique to specific agencies and organizations. For example, the United States Census Bureau has developed models using the U.S. Postal Service's Delivery Sequence File, IRS 1040 address data, commercially available foreclosure counts, and other data to develop models capable of predicting undercount by census block. The Census Bureau has reported some success fitting such models to Zero Inflated Negative Binomial or Zero Inflated Poisson (ZIP) distributions. See Zero-inflated model.

Another method to quantify coverage error is to perform an evaluation study. This approach is similar to capture-recapture methodology. In capture-recapture methods, a sample is taken directly from the population, marked, and re-introduced to the population. Another sample is then taken from the population, and the proportion of previously captured samples is used to estimate actual population size. This method can be extended to determining the validity of the sampling frame by taking a sample directly from the target population ("capture") and then taking another sample from the data frame ("re-capture") in order to estimate under-coverage. For example, suppose a census was conducted. After the completion of the census, random samples from the frame are drawn to be counted again. The difference between the two counts of the same area sampled is used to determine coverage error.

Ways to Reduce Coverage Error
One way to reduce coverage error is to rely on multiple sources to either build a sample frame or solicit information. This is called a mixed-mode approach. For example, Washington State University students conducted Student Survey Experience Surveys by building a sample frame using both street addresses and email addresses. In another example, the 2010 U.S. Census primarily relied on residential mail responses, and then field interviewers were deployed for non-responders. This approach had the added benefit of cost reduction as the majority of people responded by mail and did not require a field visit.

Another way to reduce coverage error is by utilizing paradata. An example of this is using paradata to produce a sampling frame of telephone numbers. Suppose the target population is households. Since telephone numbers can include businesses, overcoverage is a concern. There is a method of assigning a score to phone numbers which indicates the number's likelihood of being assigned to a person or business.

2010 Census
The U.S. Census Bureau prepares and maintains a Master Address File of some 144.9 million addresses that it uses as a sampling frame for the U.S. Decennial census and other surveys. Despite the efforts of some 111,105 field representatives and an expenditure of nearly half a billion dollars, the Census bureau still found a significant number of addresses that had not found their way into the Master Address File.

Coverage Follow-Up (CFU) and Field Verification (FV) were United States governmental operations in the 2010 Census that were formed to improve upon the 2000 Census. The type of coverage errors these operations intended to address were as follows: not counting someone who should have been counted; counting someone who should not have been counted; and counting someone who should have been counted, but whose identified location was in error. Coverage errors in the U.S. Census have the potential impact of allowing people groups to be underrepresented by the government. Of particular concern is "differential undercounts" which underestimates demographic groups. Although the efforts of the CFU and FV improved the 2010 Census accuracy, more study was recommended to address the question of differential undercounts.

See Also:
Sampling Error