User:Homer3009/sandbox

There is considerable debate in psychology on the conditions under which people do or do not appreciate base rate information. Researchers in the heuristics-and-biases program have stressed empirical findings showing that people tend to ignore base rates and make inferences that violate certain norms of probabilistic reasoning, such as Bayes’ theorem. The conclusion drawn from this line of research was that human probabilistic thinking is fundamentally flawed and error-prone. Other researchers have emphasized the link between cognitive processes and information formats, arguing that such conclusions are not generally warranted.

Consider again Example 2 from above. The required inference is to estimate the (posterior) probability that a (randomly picked) driver is drunk, given that the breathalyzer test is positive. Formally, this probability can be calculated using Bayes’ theorem, as shown above. However, there are different ways of presenting the relevant information. Consider the following, formally equivalent variant of the problem:


 * 1 out of 1000 drivers are driving drunk. The breathalyzers never fail to detect a truly drunk person. For 50 out of the 999 drivers who are not drunk the breathalyzer falsely displays drunkness. Suppose the policemen then stop a driver at random, and force them to take a breathalyzer test. It indicates that he or she is drunk. We assume you don't know anything else about him or her. How high is the probability he or she really is drunk?

In this case, the relevant numerical information—p(drunk), p(D | drunk), p(D | sober)—is presented in terms of natural frequencies with respect to a certain reference class (see reference class problem). Empirical studies show that people’s inferences correspond more closely to Bayes’ rule when information is presented this way, helping to overcome base-rate neglect in laypeople and experts. Teaching people to translate these kinds of Bayesian reasoning problems into natural frequency formats is more effective than merely teaching them to plug probabilities (or percentages) into Bayes’ theorem. It has also been shown that graphical representations of natural frequencies (e.g., icon arrays) help people to make better inferences.

Why are natural frequency formats helpful? One important reason is that this information format facilitates the required inference because it simplifies the necessary calculations. This can be seen when using an alternative way of computing the required probability p(drunk|D):


 * $$p(drunk| D) = \frac{N(drunk \cap D)}{N(D)} = \frac{1}{51} = 0.0196$$

where N(drunk &cap; D) denotes the number of drivers that are drunk and get a positive breathalyzer result, and N(D) denotes the total number of cases with a positive breathalyzer result. The equivalence of this equation to the above one follows from the axioms of probability theory, according to which N(drunk &cap; D) = p (D | drunk) × p (drunk). Importantly, although this equation is formally equivalent to Bayes’ rule, it is not psychologically equivalent. Using natural frequencies simplifies the inference because the required mathematical operation can be performed on natural numbers, instead of normalized fractions (i.e., probabilities), because it makes the high number of false positive more transparent, and because natural frequencies exhibit a “nested-set structure”.

It is important to note that not any kind of frequency format facilitates Bayesian reasoning. Natural frequencies refer to frequency information that results from natural sampling, which preserves base rate information (e.g., number of drunken drivers when taking a random sample of drivers). This is different from systematic sampling, in which base rates are fixed a priori (e.g., in scientific experiments). In the latter case it is not possible to infer the posterior probability p (drunk | positive test) from comparing the number of drivers who are drunk and test positive compared to the total number of people who get a positive breathalyzer result, because base rate information is not preserved and must be explicitly re-introduced using Bayes’ theorem.