User:Sam a jones/sandbox

In research, experimenter bias refers to any situation where the outcome of a study has been influenced by the expectations of a researcher. The term encompasses more specific phenomena such as the observer-expectancy effect and the Clever Hans effect, and is related to the Pygmalion effect. Bias can take many forms, including the manipulation of participants through unconscious suggestion, selection of participants who are more likely to exhibit the expected behaviour, and biased interpretation of data. Experimenter bias has been demonstrated in both human and animal studies.

Experimenter bias threatens both the internal and external validity of a study, and especially increases the risk of Type-I error (finding an effect that is not truly present), so efforts are made to reduce its impact. Strategies such as the use of double-blind experiments, and establishing good inter-rater reliability between multiple observers, have been shown to be effective in controlling experimenter bias in various research situations.

History
Experimenter bias has been repeatedly demonstrated as an issue affecting scientific research since at least the early 20th Century. An early example is the debunking of “N-ray” theory, a subjective phenomenon that relied entirely on biased experiments. In 1903 French physicist Prosper-René Blondlot believed he had identified a novel form of radiation, N-rays, that emanated from most substances. Upon replication, many other researchers reported similar findings, and it was not until physicist Robert W. Wood published a thorough debunking in the journal Nature that N-ray theory was confirmed to be false. Wood concluded that physicists worldwide had reported finding an entirely fictional form of radiation, simply because they were looking for it.

The behavioural manipulation aspect of experimenter bias was first recorded by animal researcher and psychologist Oskar Pfungst in 1911. Upon learning of a horse (Clever Hans) who was able to correctly answer mathematical questions, Pfungst decided to carry out a formal investigation. He concluded that the horse was in fact simply taking cues from humans who knew the answer, demonstrating experimenter bias through unconscious suggestion.

Though the experimenter bias effect continued to be acknowledged and discussed within research, it was not until 1966 that it was recreated experimentally. A study by Rosenthal and Fode allocated rats to two groups of students, who were to teach them to complete a maze. One group was told they had been given "maze-bright" rats, the other "maze-dull". Despite the rats actually being randomly allocated, those given to the students who believed they had the "maze-bright" group did indeed learn the maze faster, demonstrating the effect of the student "experimenters'" biased views. Rosenthal and Jacobson were later able to demonstrate a similar effect in humans: a study conducted in schools found that simply telling teachers that a certain group of students were more likely to perform well led to increased IQ scores in those children a year later.

Since the work of Rosenthal and colleagues, investigation into experimenter bias has continued and sources of bias other than behavioural influence have been identified, along with ways of controlling them. Guidelines now exist to account for experimenter bias in a variety of research situations.

Selection/Allocation Bias
Selection or allocation bias refers to situations in which a researcher has selected from a population that is more likely to demonstrate an expected effect, or allocated participants to conditions based on their chances of showing the effect. This form of bias is often apparent in studies that attempt to demonstrate the efficacy of pseudoscientific treatments such as homeopathy and acupuncture. Researchers conducting such research have previously been shown to select for the study, or selectively allocate to non-placebo groups, participants who believe in the effects of the treatment in question.

A regular cause of selection bias is participant self-selection. That is, participants from a specific subset of the population who are more likely to volunteer as research participants being disproportionately represented in the sample. Self-selection bias reduces the external validity of a study, as conclusions may only be drawn for the population that the participants belong to.

Influencing Participants' Behaviour
This is the most widely researched manifestation of experimenter bias, and refers to the researcher providing (usually unintentional) cues to the participant which then affect their behaviour. Rosenthal and Fode outlined three primary ways in which a researcher may influence the behaviour of their participants: verbal reinforcement of expected responses; paralinguistic cues, such as variation in vocal tone in response to expected answers; and kinaesthetic cues (including facial expression). An example of behavioural influence would be an experimenter smiling and nodding every time an expected response was made, directing the participant to responses of that type in future trials.

It is also possible for a researcher to influence their participant's behaviour through revealing the task demands. If, for example, the experimenter mentions they are testing for a specific effect, the participant may try to display such an effect to please the researcher.

Data Collection and Interpretation
Rosenthal and Fode identified a further two forms of experimenter bias: misjudgement of participant responses in the direction of expected results, and directionally erroneous recording of results. These fall under the umbrella of data collection bias, and can cause considerable Type-I errors if left unchecked. The use of double-blind studies and multiple observers can minimise the impact of data collection bias.

It is also possible for investigators to bias results during the interpretation and analysis of data. The researcher may elect to run statistical tests that are more likely to show an expected effect, for example. The nature of significance testing means that it is also possible to manipulate the statistical significance (p-value) of a test simply by changing the number of data points, so a researcher could artificially create a non-significant result simply by excluding some cases, or a significant result by adding some.

Qualitative Bias
Experimenter bias can be a particular problem for qualitative research, as the data collected is much more open to subjective interpretation. Most qualitative research requires researchers to respond on a personal level to the behaviour of their participants, meaning the neutrality of said research is often reliant upon the researcher identifying and addressing their own biases. Such neutrality can be very difficult to achieve, especially when the researcher expects a particular outcome, making experimenter bias a common criticism levelled at qualitative research.

Double-blind testing
The most commonly accepted way to address issues of experimenter bias in human and animal studies, especially minimising the chance of experimenters influencing participants’ behaviour, is to incorporate double-blind testing procedures into the experimental design. In a double-blind study, both the participant and the researcher are unaware of the predicted outcome. In the example of a drugs trial, this would mean that neither the participant nor the researcher administering the drug knows whether the participant is receiving a placebo. Such a design ensures that a researcher cannot provide any cues that may bias the results, and is considered to be the gold standard for many types of research study.

The concept of double-blind testing is not a new one: Physiologist Claude Bernard wrote an essay in 1868 advocating the use of a naïve observer to eliminate bias in scientific experiments. However, the method has been used with increasing prominence, especially since the middle of the 20th Century.

The only major disadvantage of double-blind testing is the extra time and resources it requires. Because of these limitations, double-blind procedures are often only used in situations where experimenter influence over participant behaviour is likely to bias results, such as drug trials.

Data Interpretation
The triple-blind study approach may be considered an extension of single-blind testing, and involves employing an (also naïve) third party for analysis of data. This approach is rarely practical, however, as effective analysis of a dataset generally requires knowledge of the study that only the team designing it would have.

Bias in data analysis and interpretation may be combated in other ways, though. In research that requires statistical analysis of the data, deciding on the statistical tests to be used a priori can reduce the danger of choosing only analyses likely to produce expected results. This does reduce the options for exploring unexpected findings, so studies often make use of a combination of planned comparisons and post-hoc tests.

Selection Bias
Some basic safeguards may be employed against selection and allocation bias. Firstly, random or pseudo-random group allocation may be employed to remove any bias in experimental groups. Self-selection bias (and many other confounds) may also be minimised by ensuring participants are unaware of the aims of the study, and by the proper use of placebo and control groups.

If collected data are known to have been drawn from a biased sample, it is often still possible to make use of them in research. Statistical methods such as the Heckman correction may be employed to account for selection bias in a dataset, allowing conclusions drawn to be applied to the wider population.

In Qualitative Research
In the case of qualitative research (especially unstructured interviews), the use of blind study paradigms is often not possible as the researcher must direct the participant towards the subject of interest. However, various strategies do exist for minimising experimenter bias in qualitative research. For example, when interview data or observations must be subjectively coded, multiple researchers may independently code the data to establish inter-rater reliability.