Intuitive statistics

Intuitive statistics, or folk statistics, is the cognitive phenomenon where organisms use data to make generalizations and predictions about the world. This can be a small amount of sample data or training instances, which in turn contribute to inductive inferences about either population-level properties, future data, or both. Inferences can involve revising hypotheses, or beliefs, in light of probabilistic data that inform and motivate future predictions. The informal tendency for cognitive animals to intuitively generate statistical inferences, when formalized with certain axioms of probability theory, constitutes statistics as an academic discipline.

Because this capacity can accommodate a broad range of informational domains, the subject matter is similarly broad and overlaps substantially with other cognitive phenomena. Indeed, some have argued that "cognition as an intuitive statistician" is an apt companion metaphor to the computer metaphor of cognition. Others appeal to a variety of statistical and probabilistic mechanisms behind theory construction and category structuring. Research in this domain commonly focuses on generalizations relating to number, relative frequency, risk, and any systematic signatures in inferential capacity that an organism (e.g., humans, or non-human primates) might have.

Background and theory
Intuitive inferences can involve generating hypotheses from incoming sense data, such as categorization and concept structuring. Data are typically probabilistic and uncertainty is the rule, rather than the exception, in learning, perception, language, and thought. Recently, researchers have drawn from ideas in probability theory, philosophy of mind, computer science, and psychology to model cognition as a predictive and generative system of probabilistic representations, allowing information structures to support multiple inferences in a variety of contexts and combinations. This approach has been called a probabilistic language of thought because it constructs representations probabilistically, from pre-existing concepts to predict a possible and likely state of the world.

Probability
Statisticians and probability theorists have long debated about the use of various tools, assumptions, and problems relating to inductive inference in particular. David Hume famously considered the problem of induction, questioning the logical foundations of how and why people can arrive at conclusions that extend beyond past experiences - both spatiotemporally and epistemologically. More recently, theorists have considered the problem by emphasizing techniques for arriving from data to hypothesis using formal content-independent procedures, or in contrast, by considering informal, content-dependent tools for inductive inference. Searches for formal procedures have led to different developments in statistical inference and probability theory with different assumptions, including Fisherian frequentist statistics, Bayesian inference, and Neyman-Pearson statistics.

Gerd Gigerenzer and David Murray argue that twentieth century psychology as a discipline adopted probabilistic inference as a unified set of ideas and ignored the controversies among probability theorists. They claim that a normative but incorrect view of how humans "ought to think rationally" follows from this acceptance. They also maintain, however, that the intuitive statistician metaphor of cognition is promising, and should consider different formal tools or heuristics as specialized for different problem domains, rather than a content- or context-free toolkit. Signal detection theorists and object detection models, for example, often use a Neyman-Pearson approach, whereas Fisherian frequentist statistics might aid cause-effect inferences.

Frequentist inference
Frequentist inference focuses on the relative proportions or frequencies of occurrences to draw probabilistic conclusions. It is defined by its closely related concept, frequentist probability. This entails a view that "probability" is nonsensical in the absence of pre-existing data, because it is understood as a relative frequency that long-run samples would approach given large amounts of data. Leda Cosmides and John Tooby have argued that it is not possible to derive a probability without reference to some frequency of previous outcomes, and this likely has evolutionary origins: Single-event probabilities, they claim, are not observable because organisms evolved to intuitively understand and make statistical inferences from frequencies of prior events, rather than to "see" probability as an intrinsic property of an event.

Bayesian inference
Bayesian inference generally emphasizes the subjective probability of a hypothesis, which is computed as a posterior probability using Bayes' Theorem. It requires a "starting point" called a prior probability, which has been contentious for some frequentists who claim that frequency data are required to develop a prior probability, in contrast to taking a probability as an a priori assumption.

Bayesian models have been quite popular among psychologists, particularly learning theorists, because they appear to emulate the iterative, predictive process by which people learn and develop expectations from new observations, while giving appropriate weight to previous observations. Andy Clark, a cognitive scientist and philosopher, recently wrote a detailed argument in support of understanding the brain as a constructive Bayesian engine that is fundamentally action-oriented and predictive, rather than passive or reactive. More classic lines of evidence cited among supporters of Bayesian inference include conservatism, or the phenomenon where people modify previous beliefs toward, but not all the way to, a conclusion implied by previous observations. This pattern of behavior is similar to the pattern of posterior probability distributions when a Bayesian model is conditioned on data, though critics argued that this evidence had been overstated and lacked mathematical rigor.

Alison Gopnik more recently tackled the problem by advocating the use of Bayesian networks, or directed graph representations of conditional dependencies. In a Bayesian network, edge weights are conditional dependency strengths that are updated in light of new data, and nodes are observed variables. The graphical representation itself constitutes a model, or hypothesis, about the world and is subject to change, given new data.

Error management theory
Error management theory (EMT) is an application of Neyman-Pearson statistics to cognitive and evolutionary psychology. It maintains that the possible fitness costs and benefits of type I (false positive) and type II (false negative) errors are relevant to adaptively rational inferences, toward which an organism is expected to be biased due to natural selection. EMT was originally developed by Martie Haselton and David Buss, with initial research focusing on its possible role in sexual overperception bias in men and sexual underperception bias in women.

This is closely related to a concept called the "smoke detector principle" in evolutionary theory. It is defined by the tendency for immune, affective, and behavioral defenses to be hypersensitive and overreactive, rather than insensitive or weakly expressed. Randolph Nesse maintains that this is a consequence of a typical payoff structure in signal detection: In a system that is invariantly structured with a relatively low cost of false positives and high cost of false negatives, naturally selected defenses are expected to err on the side of hyperactivity in response to potential threat cues. This general idea has been applied to hypotheses about the apparent tendency for humans to apply agency to non-agents based on uncertain or agent-like cues. In particular, some claim that it is adaptive for potential prey to assume agency by default if it is even slightly suspected, because potential predator threats typically involve cheap false positives and lethal false negatives.

Heuristics and biases
Heuristics are efficient rules, or computational shortcuts, for producing a judgment or decision. The intuitive statistician metaphor of cognition led to a shift in focus for many psychologists, away from emotional or motivational principles and toward computational or inferential principles. Empirical studies investigating these principles have led some to conclude that human cognition, for example, has built-in and systematic errors in inference, or cognitive biases. As a result, cognitive psychologists have largely adopted the view that intuitive judgments, generalizations, and numerical or probabilistic calculations are systematically biased. The result is commonly an error in judgment, including (but not limited to) recurrent logical fallacies (e.g., the conjunction fallacy), innumeracy, and emotionally motivated shortcuts in reasoning. Social and cognitive psychologists have thus considered it "paradoxical" that humans can outperform powerful computers at complex tasks, yet be deeply flawed and error-prone in simple, everyday judgments.

Much of this research was carried out by Amos Tversky and Daniel Kahneman as an expansion of work by Herbert Simon on bounded rationality and satisficing. Tversky and Kahneman argue that people are regularly biased in their judgments under uncertainty, because in a speed-accuracy tradeoff they often rely on fast and intuitive heuristics with wide margins of error rather than slow calculations from statistical principles. These errors are called "cognitive illusions" because they involve systematic divergences between judgments and accepted, normative rules in statistical prediction.

Gigerenzer has been critical of this view, arguing that it builds from a flawed assumption that a unified "normative theory" of statistical prediction and probability exists. His contention is that cognitive psychologists neglect the diversity of ideas and assumptions in probability theory, and in some cases, their mutual incompatibility. Consequently, Gigerenzer argues that many cognitive illusions are not violations of probability theory per se, but involve some kind of experimenter confusion between subjective probabilities with degrees of confidence and long-run outcome frequencies. Cosmides and Tooby similarly claim that different probabilistic assumptions can be more or less normative and rational in different types of situations, and that there is not general-purpose statistical toolkit for making inferences across all informational domains. In a review of several experiments they conclude, in support of Gigerenzer, that previous heuristics and biases experiments did not represent problems in an ecologically valid way, and that re-representing problems in terms of frequencies rather than single-event probabilities can make cognitive illusions largely vanish.

Tversky and Kahneman refuted this claim, arguing that making illusions disappear by manipulating them, whether they are cognitive or visual, does not undermine the initially discovered illusion. They also note that Gigerenzer ignores cognitive illusions resulting from frequency data, e.g., illusory correlations such as the hot hand in basketball. This, they note, is an example of an illusory positive autocorrelation that cannot be corrected by converted data to natural frequencies.

For adaptationists, EMT can be applied to inference under any informational domain, where risk or uncertainty are present, such as predator avoidance, agency detection, or foraging. Researchers advocating this adaptive rationality view argue that evolutionary theory casts heuristics and biases in a new light, namely, as computationally efficient and ecologically rational shortcuts, or instances of adaptive error management.

Base rate neglect
People often neglect base rates, or true actuarial facts about the probability or rate of a phenomenon, and instead give inappropriate amounts of weight to specific observations. In a Bayesian model of inference, this would amount to an underweighting of the prior probability, which has been cited as evidence against the appropriateness of a normative Bayesian framework for modeling cognition. Frequency representations can resolve base rate neglect, and some consider the phenomenon to be an experimental artifact, i.e., a result of probabilities or rates being represented as mathematical abstractions, which are difficult to intuitively think about. Gigerenzer speculates an ecological reason for this, noting that individuals learn frequencies through successive trials in nature. Tversky and Kahneman refute Gigerenzer's claim, pointing to experiments where subjects predicted a disease based on the presence vs. absence of pre-specified symptoms across 250 trials, with feedback after each trial. They note that base rate neglect was still found, despite the frequency formulation of subject trials in the experiment.

Conjunction fallacy
Another popular example of a supposed cognitive illusion is the conjunction fallacy, described in an experiment by Tversky and Kahneman known as the "Linda problem." In this experiment, participants are presented with a short description of a person called Linda, who is 31 years old, single, intelligent, outspoken, and went to a university where she majored in philosophy, was concerned about discrimination and social justice, and participated in anti-nuclear protests. When participants were asked if it were more probable that Linda is (1) a bank teller, or (2) a bank teller and a feminist, 85% responded with option 2, even though it option 1 cannot be less probable than option 2. They concluded that this was a product of a representativeness heuristic, or a tendency to draw probabilistic inferences based on property similarities between instances of a concept, rather than a statistically structured inference.

Gigerenzer argued that the conjunction fallacy is based on a single-event probability, and would dissolve under a frequentist approach. He and other researchers demonstrate that conclusions from the conjunction fallacy result from ambiguous language, rather than robust statistical errors or cognitive illusions. In an alternative version of the Linda problem, participants are told that 100 people fit Linda's description and are asked how many are (1) bank tellers and (2) bank tellers and feminists. Experimentally, this version of the task appears to eliminate or mitigate the conjunction fallacy.

Computational models
There has been some question about how concept structuring and generalization can be understood in terms of brain architecture and processes. This question is impacted by a neighboring debate among theorists about the nature of thought, specifically between connectionist and language of thought models. Concept generalization and classification have been modeled in a variety of connectionist models, or neural networks, specifically in domains like language learning and categorization. Some emphasize the limitations of pure connectionist models when they are expected to generalize future instances after training on previous instances. Gary Marcus, for example, asserts that training data would have to be completely exhaustive for generalizations to occur in existing connectionist models, and that as a result, they do not handle novel observations well. He further advocates an integrationist perspective between a language of thought, consisting of symbol representations and operations, and connectionist models than retain the distributed processing that is likely used by neural networks in the brain.

Evidence in humans
In practice, humans routinely make conceptual, linguistic, and probabilistic generalizations from small amounts of data. There is some debate about the utility of various tools of statistical inference in understanding the mind, but it is commonly accepted that the human mind is somehow an exceptionally apt prediction machine, and that action-oriented processes underlying this phenomenon, whatever they might entail, are at the core of cognition. Probabilistic inferences and generalization play central roles in concepts and categories and language learning, and infant studies are commonly used to understand the developmental trajectory of humans' intuitive statistical toolkit(s).

Infant studies
Developmental psychologists such as Jean Piaget have traditionally argued that children do not develop the general cognitive capacities for probabilistic inference and hypothesis testing until concrete operational (age 7–11 years) and formal operational (age 12 years-adulthood) stages of development, respectively.

This is sometimes contrasted to a growing preponderance of empirical evidence suggesting that humans are capable generalizers in infancy. For example, looking-time experiments using expected outcomes of red and white ping pong ball proportions found that 8-month-old infants appear to make inferences about population characteristics from which the sample came, and vice versa when given population-level data. Other experiments have similarly supported a capacity for probabilistic inference with 6- and 11-month-old infants, but not in 4.5-month-olds.

The colored ball paradigm in these experiments did not distinguish the possibilities of infants' inferences based on quantity vs. proportion, which was addressed in follow-up research where 12-month-old infants seemed to understand proportions, basing probabilistic judgments - motivated by preferences for the more probable outcomes - on initial evidence of the proportions in their available options. Critics of the effectiveness of looking-time tasks allowed infants to search for preferred objects in single-sample probability tasks, supporting the notion that infants can infer probabilities of single events when given a small or large initial sample size. The researchers involved in these findings have argued that humans possess some statistically structured, inferential system during preverbal stages of development and prior to formal education.

It is less clear, however, how and why generalization is observed in infants: It might extend directly from detection and storage of similarities and differences in incoming data, or frequency representations. Conversely, it might be produced by something like general-purpose Bayesian inference, starting with a knowledge base that is iteratively conditioned on data to update subjective probabilities, or beliefs. This ties together questions about the statistical toolkit(s) that might be involved in learning, and how they apply to infant and childhood learning specifically.

Gopnik advocates the hypothesis that infant and childhood learning are examples of inductive inference, a general-purpose mechanism for generalization, acting upon specialized information structures ("theories") in the brain. On this view, infants and children are essentially proto-scientists because they regularly use a kind of scientific method, developing hypotheses, performing experiments via play, and updating models about the world based on their results. For Gopnik, this use of scientific thinking and categorization in development and everyday life can be formalized as models of Bayesian inference. An application of this view is the "sampling hypothesis," or the view that individual variation in children's causal and probabilistic inferences is an artifact of random sampling from a diverse set of hypotheses, and flexible generalizations based on sampling behavior and context. These views, particularly those advocating general Bayesian updating from specialized theories, are considered successors to Piaget's theory rather than wholesale refutations because they maintain its domain-generality, viewing children as randomly and unsystematically considering a range of models before selecting a probable conclusion.

In contrast to the general-purpose mechanistic view, some researchers advocate both domain-specific information structures and similarly specialized inferential mechanisms. For example, while humans do not usually excel at conditional probability calculations, the use of conditional probability calculations are central to parsing speech sounds into comprehensible syllables, a relatively straightforward and intuitive skill emerging as early as 8 months. Infants also appear to be good at tracking not only spatiotemporal states of objects, but at tracking properties of objects, and these cognitive systems appear to be developmentally distinct. This has been interpreted as domain specific toolkits of inference, each of which corresponds to separate types of information and has applications to concept learning.

Concept formation
Infants use form similarities and differences to develop concepts relating to objects, and this relies on multiple trials with multiple patterns, exhibiting some kind of common property between trials. Infants appear to become proficient at this ability in particular by 12 months, but different concepts and properties employ different relevant principles of Gestalt psychology, many of which might emerge at different stages of development. Specifically, infant categorization at as early as 4.5 months involves iterative and interdependent processes by which exemplars (data) and their similarities and differences are crucial for drawing boundaries around categories. These abstract rules are statistical by nature, because they can entail common co-occurrences of certain perceived properties in past instances and facilitate inferences about their structure in future instances. This idea has been extrapolated by Douglas Hofstadter and Emmanuel Sander, who argue that because analogy is a process of inference relying on similarities and differences between concept properties, analogy and categorization are fundamentally the same process used for organizing concepts from incoming data.

Language learning
Infants and small children are not only capable generalizers of trait quantity and proportion, but of abstract rule-based systems such as language and music. These rules can be referred to as “algebraic rules” of abstract informational structure, and are representations of rule systems, or grammars. For language, creating generalizations with Bayesian inference and similarity detection has been advocated by researchers as a special case of concept formation. Infants appear to be proficient in inferring abstract and structural rules from streams of linguistic sounds produced in their developmental environments, and to generate wider predictions based on those rules.

For example, 9-month-old infants are capable of more quickly and dramatically updating their expectations when repeated syllable strings contain surprising features, such as rare phonemes. In general, preverbal infants appear to be capable of discriminating between grammars with which they have been trained with experience, and novel grammars. In 7-month-old infant looking-time tasks, infants seemed to pay more attention to unfamiliar grammatical structures than to familiar ones, and in a separate study using 3-syllable strings, infants appeared to similarly have generalized expectations based on abstract syllabic structure previously presented, suggesting that they used surface occurrences, or data, in order to infer deeper abstract structure. This was taken to support the “multiple hypotheses [or models]” view by the researchers involved.

Grey parrots
Multiple studies by Irene Pepperberg and her colleagues suggested that Grey parrots (Psittacus erithacus) have some capacity for recognizing numbers or number-like concepts, appearing to understand ordinality and cardinality of numerals. Recent experiments also indicated that, given some language training and capacity for referencing recognized objects, they also have some ability to make inferences about probabilities and hidden object type ratios.

Non-human primates
Experiments found that when reasoning about preferred vs. non-preferred food proportions, capuchin monkeys were able to make inferences about proportions inferred by sequentially sampled data. Rhesus monkeys were similarly capable of using probabilistic and sequentially sampled data to make inferences about rewarding outcomes, and neural activity in the parietal cortex appeared to be involved in the decision-making process when they made inferences. In a series of 7 experiments using a variety of relative frequency differences between banana pellets and carrots, orangutans, chimpanzees and gorillas also appeared to guide their decisions based on the ratios favoring the banana pellets after this was established as their preferred food item.

Reasoning in medicine
Research on reasoning in medicine, or clinical reasoning, usually focuses on cognitive processes and/or decision-making outcomes among physicians and patients. Considerations include assessments of risk, patient preferences, and evidence-based medical knowledge. On a cognitive level, clinical inference relies heavily on interplay between abstraction, abduction, deduction, and induction. Intuitive "theories," or knowledge in medicine, can be understood as prototypes in concept spaces, or alternatively, as semantic networks. Such models serve as a starting point for intuitive generalizations to be made from a small number of cues, resulting in the physician's tradeoff between the "art and science" of medical judgement. This tradeoff was captured in an artificially intelligent (AI) program called MYCIN, which outperformed medical students, but not experienced physicians with extensive practice in symptom recognition. Some researchers argue that despite this, physicians are prone to systematic biases, or cognitive illusions, in their judgment (e.g., satisficing to make premature diagnoses, confirmation bias when diagnoses are suspected a priori).

Communication of patient risk
Statistical literacy and risk judgments have been described as problematic for physician-patient communication. For example, physicians frequently inflate the perceived risk of non-treatment, alter patients' risk perceptions by positively or negatively framing single statistics (e.g., 97% survival rate vs. 3% death rate), and/or fail to sufficiently communicate "reference classes" of probability statements to patients. The reference class is the object of a probability statement: If a psychiatrist says, for example, “this medication can lead to a 30-50% chance of a sexual problem,” it is ambiguous whether this means that 30-50% of patients will develop a sexual problem at some point, or if all patients will have problems in 30-50% of their sexual encounters.

Base rates in clinical judgment
In studies of base rate neglect, the problems given to participants often use base rates of disease prevalence. In these experiments, physicians and non-physicians are similarly susceptible to base rate neglect, or errors in calculating conditional probability. Here is an example from an empirical survey problem given to experienced physicians: Suppose that a hypothetical cancer had a prevalence of 0.3% in the population, and the true positive rate of a screening test was 50% with a false positive rate of 3%. Given a patient with a positive test result, what is the probability that the patient has cancer? When asked this question, physicians with an average of 14 years experience in medical practice ranged in their answers from 1-99%, with most answers being 47% or 50%. (The correct answer is 5%.) This observation of clinical base rate neglect and conditional probability error has been replicated in multiple empirical studies. Physicians' judgments in similar problems, however, improved substantially when the rates were re-formulated as natural frequencies.