Survey data collection

With the application of probability sampling in the 1930s, surveys became a standard tool for empirical research in social sciences, marketing, and official statistics. The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way. First there was the change from traditional paper-and-pencil interviewing (PAPI) to computer-assisted interviewing (CAI). Now, face-to-face surveys (CAPI), telephone surveys (CATI), and mail surveys (CASI, CSAQ) are increasingly replaced by web surveys. In addition, remote interviewers could possibly keep the respondent engaged while reducing cost as compared to in-person interviewers.

Modes of data collection
The choice between administration modes is influenced by several factors, including 1) costs, 2) coverage of the target population (including group-specific preferences for certain modes ), 3) flexibility of asking questions, 4) respondents’ willingness to participate and 5) response accuracy. Different methods create mode effects that change how respondents answer. The most common modes of administration are listed under the following headings.

Mobile surveys
Mobile data collection or mobile surveys is an increasingly popular method of data collection. Over 50% of surveys today are opened on mobile devices. The survey, form, app or collection tool is on a mobile device such as a smart phone or a tablet. These devices offer innovative ways to gather data, and eliminate the laborious "data entry" (of paper form data into a computer), which delays data analysis and understanding. By eliminating paper, mobile data collection can also dramatically reduce costs: one World Bank study in Guatemala found a 71% decrease in cost while using mobile data collection, compared to the previous paper-based approach.

Apart from the high mobile phone penetration, further advantages are quicker response times and the possibility to reach previously hard-to-reach target groups. In this way, mobile technology allows marketers, researchers and employers to create real and meaningful mobile engagement in environments different from the traditional one in front of a desktop computer. However, even when using mobile devices to answer the web surveys, most respondents still answer from home.

SMS/IM surveys
SMS surveys can reach any handset, in any language and in any country. As they are not dependent on internet access and the answers can be sent when its convenient, they are a suitable mobile survey data collection channel for many situations that require fast, high volume responses. As a result, SMS surveys can deliver 80% of responses in less than 2 hours and often at much lower cost compared to face-to-face surveys, due to the elimination of travel/personnel costs. IM is similar to SMS, except that a mobile number is not required. IM functions are available in standalone software, such as Skype, or embedded on websites such as Facebook and Google.

Online surveys
Online (Internet) surveys are becoming an essential research tool for a variety of research fields, including marketing, social and official statistics research. According to ESOMAR online survey research accounted for 20% of global data-collection expenditure in 2006. They offer capabilities beyond those available for any other type of self-administered questionnaire. Online consumer panels are also used extensively for carrying out surveys but the quality is considered inferior because the panelists are regular contributors and tend to be fatigued. However, when estimating the measurement quality (defined as product of reliability and validity) using a multitrait-multimethod approach (MTMM), some studies found a quite reasonable quality  and even that the quality of a series of questions in an online opt-in panel (Netquest) was very similar to the measurement quality for the same questions asked in the European Social Survey (ESS), which is a face-to-face survey.

Some studies have compared the quality of face-to-face surveys and/or telephone surveys with that of online surveys, for single questions, but also for more complex concepts measured with more than one question (also called Composite Scores or Index). Focusing only on probability-based surveys (also for the online ones), they found overall that the face-to-face (using show-cards) and web surveys have quite similar levels of measurement quality, whereas the telephone surveys were performing worse. Other studies comparing paper-and-pencil questionnaires with web-based questionnaires showed that employees preferred online survey approaches to the paper-and-pencil format. There are also concerns about what has been called "ballot stuffing" in which employees make repeated responses to the same survey. Some employees are also concerned about privacy. Even if they do not provide their names when responding to a company survey, can they be certain that their anonymity is protected? Such fears prevent some employees from expressing an opinion.

Advantages of online surveys

 * Web surveys are faster, simpler, and cheaper. However, lower costs are not so straightforward in practice, as they are strongly interconnected to errors. Because response rate comparisons to other survey modes are usually not favourable for online surveys, efforts to achieve a higher response rate (e.g., with traditional solicitation methods) may substantially increase costs.
 * The entire data collection period is significantly shortened, as all data can be collected and processed in little more than a month.
 * Interaction between the respondent and the questionnaire is more dynamic compared to e-mail or paper surveys. Online surveys are also less intrusive, and they suffer less from social desirability effects.
 * Complex skip patterns can be implemented in ways that are mostly invisible to the respondent.
 * Pop-up instructions can be provided for individual questions to provide help with questions exactly where assistance is required.
 * Questions with long lists of answer choices can be used to provide immediate coding of answers to certain questions that are usually asked in an open-ended fashion in paper questionnaires.
 * Online surveys can be tailored to the situation (e.g., respondents may be allowed save a partially completed form, the questionnaire may be preloaded with already available information, etc.).
 * Online questionnaires may be improved by applying usability testing, where usability is measured with reference to the speed with which a task can be performed, the frequency of errors and user satisfaction with the interface.

Key methodological issues of online surveys

 * Sampling. The difference between probability samples (where the inclusion probabilities for all units of the target population is known in advance) and non-probability samples (which often require less time and effort but generally do not support statistical inference) is crucial. Probability samples are highly affected by problems of non-coverage (not all members of the general population have Internet access) and frame problems (online survey invitations are most conveniently distributed using e-mail, but there are no e-mail directories of the general population that might be used as a sampling frame). Because coverage and frame problems can significantly impact data quality, they should be adequately reported when disseminating the research results.
 * Invitations to online surveys. Due to the lack of sampling frames many online survey invitations are published in the form of an URL link on web sites or in other media, which leads to sample selection bias that is out of research control and to non-probability samples. Traditional solicitation modes, such as telephone or mail invitations to web surveys, can help overcoming probability sampling issues in online surveys. However, such approaches are faced with problems of dramatically higher costs and questionable effectiveness.
 * Non-response. Online survey response rates are generally low and also vary extremely – from less than 1% in enterprise surveys with e-mail invitations to almost 100% in specific membership surveys. In addition to refusing participation, terminating surveying during the process or not answering certain questions, several other non-response patterns can be observed in online surveys, such as lurking respondents and a combination of partial and item non-response. Response rates can be increased by offering monetary or some other type of incentive to the respondents, by contacting respondents several times (follow-up), and by keeping the questionnaire difficulty as low as possible.  There are draw-backs to using an incentive to garner a response.  Non-bias responses could be questioned in this type of situation.  The most concrete way to gain feedback is to publicize what is done with the results.  To take concrete actions based on feedback and to show that to the customer base is extremely motivating to customers to continue to let their voice be heard.
 * Acquiescence bias. Due to a phenomenon inherently present in human nature, many people have acquiescent personalities and are more likely to agree with statements than disagree - regardless of the content. Often, those people see the question-asker as an expert in their field which causes them to be more likely to react positively to the question asked.
 * Platform Issues. Lack of familiarity with the platform used can cause participants and clients confusion, or limit who may be willing and able to navigate surveys on digital platforms.
 * Questionnaire design. While modern web questionnaires offer a range of design features (different question types, images, multimedia), the use of such elements should be limited to the extent necessary for respondents to understand questions or to stimulate the response. It should not affect their responses, because that would mean lower validity and reliability of data. Appropriate questionnaire design can help lowering the measurement error that can arise also due to the respondents or the survey mode itself (respondent’s motivation, computer literacy, abilities, privacy concerns, etc.).
 * Post-survey adjustments. Various robust procedures have been developed for situations where sampling deviate from probability selection, or, when we face non-coverage and non-response problems. The standard statistical inference procedures (e.g. confidence interval calculations and hypothesis testing) still require a probability sample. The actual survey practice, particularly in marketing research and in public opinion polling, which massively neglects the principles of probability samples, increasingly requires from the statistical profession to specify the conditions where non-probability samples may work.

These issues, and potential remedies, are discussed in a number of sources.

Telephone
Telephone surveys use interviewers to encourage the sample persons to respond, which leads to higher response rates. There are some potential for interviewer bias (e.g., some people may be more willing to discuss a sensitive issue with a female interviewer than with a male one). Depending on local call charge structure and coverage, this method can be cost efficient and may be appropriate for large national (or international) sampling frames using traditional phones or computer assisted telephone interviewing (CATI). Because it is audio-based, this mode cannot be used for non-audio information such as graphics, demonstrations, or taste/smell samples.

Mail
Depending on local bulk mail postage, mail surveys may be relatively lower cost compared to other modes. The field method tends to be longer - often several months - before the surveys are returned and statistical analysis can begin. The questionnaire may be handed to the respondents or mailed to them, but in all cases they are returned to the researcher via mail. Because there is no interviewer presence, the mail mode is not suitable for issues that may require clarification. However, there is no interviewer bias and respondents can answer at their own convenience (allowing them to break up long surveys; also useful if they need to check records to answer a question). To correct nonresponse bias, extrapolation across waves could be done. Response rates can be improved by using mail panels (members of the panel must agree to participate) and prepaid monetary incentives, but response rates are affected by the class of mail through which the survey was sent. Panels can be used in longitudinal designs where the same respondents are surveyed several times.

Visual presentation of survey questions make a difference in how respondents answer them; with four primary design elements: words (meaning), numbers (sequencing), symbols (e.g. arrow), and graphics (e.g. text boxes). In translated surveys, writing practice (e.g. Spanish words are lengthier and require more printing space) and text orientation (e.g. Arabic is read from right to left) must be considered in questionnaire visual design to minimize data missingness.

Face-to-face
The face-to-face mode is suitable for locations where telephone or mail are not developed. Like the telephone mode, the interviewer presence runs the risk of interviewer bias.

Video interviewing
Video interviewing is similar to face-to-face interviewing except that the interviewer and respondent are not physically in the same location, but are communicating via video conferencing such as Zoom or Teams.

Virtual worlds
Virtual-world interviews take place online in a space created for virtual interaction with other users or players, such as Second Life. Both the respondent and interviewer choose avatars to represent themselves and interact by a chat feature or by real voice audio.

Chatbots
A chatbot is used regularly in marketing and sales to gather experience feedback. When used for collecting survey responses, chatbot surveys should be kept short, trained to speak in a friendly human tone, and use easy-to-navigate interface with more advanced Artificial Intelligence.

Mixed-mode surveys
Researchers can combine several above methods for the data collection. For example, researchers can invite shoppers at malls, and send willing participants questionnaires by emails. With the introduction of computers to the survey process, survey mode now includes combinations of different approaches or mixed-mode designs. Some of the most common methods are:
 * Computer-assisted personal interviewing (CAPI): The computer displays the questions on screen, the interviewer reads them to the respondent, and then enters the respondent's answers.
 * Audio computer-assisted self-interviewing (audio CASI): The respondent operates the computer, the computer displays the question on the screen and plays recordings of the questions to the respondents, who then enters his/her answers.
 * Computer-assisted telephone interviewing (CATI)
 * Interactive voice response (IVR): The computer plays recordings of the questions to respondents over the telephone, who then respond by using the keypad of the telephone or speaking their answers aloud.
 * Web surveys: The computer administers the questions online. See computer-assisted web interviewing (CAWI).