Talk:Base rate fallacy

Incorrect example
The article currently has an example that reads: "A group of policemen have breathalyzers displaying false drunkness in 5% of the cases. 1/1000 chauffeurs are drunk drivers. Suppose the policemen then stop a driver and force him to take a breathalyzer test. How high is the probability that a chauffeur who fails the test (assuming you don't anything else about him or her) is driving drunk?"

This example can not be solved, because it contains too little information. --Kaba3 (talk) 11:36, 14 September 2013 (UTC)
 * You are correct: thanks for pointing this out. Although it's unoriginal and used by a lot of books, maybe the article should lead with Tversky and Kahneman's blue/green taxi example? MartinPoulter (talk) 13:26, 14 September 2013 (UTC)


 * Example #1 of Drunk Drivers is still wrong. The example states the breathalyzer only gives false positives 5% of the time and yet the example claims the probability a positive test is accurate is only 2%. This is patently absurd.  RonCram (talk) 15:23, 4 June 2017 (UTC)
 * No, it's not absurd. Calculate all parameters passionately. Or may be you are trying to say that conditions of this example is ridiculous? Derek Di My Mind (talk) 18:29, 16 August 2023 (UTC)

Something else
For instance, appealing to vivid examples should not be taken knowledge of prior probabilities.

I don't understand what this statement means? CSTAR 19:44, 17 Jun 2004 (UTC)


 * Yea, that makes no sense. I'll fix it in a minute. --Taak 21:36, 17 Jun 2004 (UTC)

I thought that base rate neglect involved ignoring priors in a Bayesian context. For example, if a medical test with a 5% false positive rate is applied to a population whose background incidence of the disease is, say, 1%, the great majority test results will be faulty: they will indicate disease where there is none. It's not that the medical test is irrelevant though: rather its significance can be overweighed.

I agree with the previous poster. This article seems totally bogus after reading other explanations of the base rate fallacy at http://www.schneier.com/blog/archives/2006/07/terrorists_data.html and http://citeseer.ist.psu.edu/axelsson00baserate.html. Somebody should really correct this article. Jbl26 21:28, 11 July 2007 (UTC)

Fallacies in example
"In some experiments, students were asked to estimate the Grade Point Averages of hypothetical students. When given relevant statistics about GPA distribution, students tended to ignore them if given descriptive information about the particular student, even if the new descriptive information did not seem to have anything to do with school performance."

"This finding has been used to argue that interviews are an unnecessary part of the college admissions process because empirical evidence shows that interviewers are unable to pick successful candidates better than basic statistics."

This is fallacious in itself:

Students ignore statistics in favour of descriptive information. College interviews give a form of descriptive information. Therefore, college interviewers will ignore statistics in favour of descriptive information.

The descriptive information given to students in the experiment were irrelevant. College interviews give a form of descriptive information. Therefore, college interviews are irrelevant when considering suitability of an academic candidate.

The first fallacy assumes that students and college interviewers possess the same lack of skill in judging relevance of information.

The second fallacy assumes: A. Statistics are always relevant, and descriptive information is always irrelevant. AND/OR B. Statistics are a better judge of suitability for academic placement than descriptive information. -- Sasuke Sarutobi 21:26, 12 December 2006 (UTC)

It looks like the text has been changed, but I still can't make heads or tails of it; maybe it needs a concrete example? Reyemile (talk) 08:01, 2 January 2008 (UTC)

Here's another example (starts after 11 min): http://www.ted.com/talks/peter_donnelly_shows_how_stats_fool_juries.html 89.142.146.23 (talk) 23:21, 21 August 2009 (UTC)

Definition
Hi IP editor 219.25.218.88. Upon thinking about it, it seems the definition should be expanded, because the typical error actually excludes both. I added a mathematical formalism section to try to clarify the matter. WavePart (talk) 18:18, 10 June 2010 (UTC)

A cognitive bias?
Where are the reliable sources identifying Base rate fallacy as a cognitive bias? More neutral to describe it as an error. Also, I agree that the article needed simplifying, that the initial example was too complicated and that the article had too much unreferenced content, but it seems this edit took out a lot of content which is actually quite useful, eg that it's also known as Base rate neglect and that the error involves confusing two different failure rates. Is there a happy medium that simplifies without taking so much away? Cheers, MartinPoulter (talk) 15:13, 2 September 2013 (UTC)
 * I assume you are referring to me. My guiding philosophy is "We should not write so that it is possible for the reader to understand us, but so that it is impossible for him to misunderstand us." The reason I deleted or dramatically altered much of the content was that it to me seemed perplexing. I believe Wikipedia articles should be written in a way that someone with only an elementary knowledge of math and statistics could read it (Though there are always exceptions, like there are to all rules, off course). And to anser your question, yes I do believe there is "a happy medium that simplifies without taking so much away". --Spannerjam 16:31, 2 September 2013 (UTC)
 * Very sorry to get your name wrong, Spannerjam: too many of my tabs open at once and I got confused. It looks like you're still working on the article, so I'll withhold further comment until you've done more. The section on mathematical formalism was correct, and probability is relatively simple maths, so I think it belongs in the article, but a long way down since it's the least accessible content, and it needs to be made consonant with the given example. "Error in thinking" is much preferable, so thanks for that. Cheers, MartinPoulter (talk) 19:29, 2 September 2013 (UTC)

Removed text
The following was removed from the article in this edit and I've copied it here so to give other editors a chance to make it more accessible and consistent. I think the article needs a section towards the end on Bayes Theorem, but it needs to be explained adequately for those who don't understand the formalism.

In a city of 1 million inhabitants let there be 100 terrorists and 999,900 non-terrorists. To simplify the example, it is assumed that all people present in the city are inhabitants. Thus, the base rate probability of a randomly selected inhabitant of the city being a terrorist is 0.0001, and the base rate probability of that same inhabitant being a non-terrorist is 0.9999. In an attempt to catch the terrorists, the city installs an alarm system with a surveillance camera and automatic facial recognition software. The software has two failure rates of 1%:


 * 1) The false negative rate: If the camera scans a terrorist, a bell will ring 99% of the time, and it will fail to ring 1% of the time.
 * 2) The false positive rate: If the camera scans a non-terrorist, a bell will not ring 99% of the time, but it will ring 1% of the time.

Suppose now that an inhabitant triggers the alarm. What is the chance that the person is a terrorist? In other words, what is P(T | B), the probability that a terrorist has been detected given the ringing of the bell? Someone making the 'base rate fallacy' would infer that there is a 99% chance that the detected person is a terrorist. Although the inference seems to make sense, it is actually bad reasoning, and a calculation below will show that the chances they are a terrorist are actually near 1%, not near 99%.

The fallacy arises from confusing the natures of two different failure rates. The 'number of non-bells per 100 terrorists' and the 'number of non-terrorists per 100 bells' are unrelated quantities. One does not necessarily equal the other, and they don't even have to be almost equal. To show this, consider what happens if an identical alarm system were set up in a second city with no terrorists at all. As in the first city, the alarm sounds for 1 out of every 100 non-terrorist inhabitants detected, but unlike in the first city, the alarm never sounds for a terrorist. Therefore 100% of all occasions of the alarm sounding are for non-terrorists, but a false negative rate cannot even be calculated. The 'number of non-terrorists per 100 bells' in that city is 100, yet P(T | B) = 0%. There is zero chance that a terrorist has been detected given the ringing of the bell.

Imagine that the city's entire population of one million people pass in front of the camera. About 99 of the 100 terrorists will trigger the alarm — and so will about 9,999 of the 999,900 non-terrorists. Therefore, about 10,098 people will trigger the alarm, among which about 99 will be terrorists. So, the probability that a person triggering the alarm actually is a terrorist, is only about 99 in 10,098, which is less than 1%, and very, very far below our initial guess of 99%.

The base rate fallacy is so misleading in this example because there are many more non-terrorists than terrorists. If, instead, the city had about as many terrorists as non-terrorists, and the false-positive rate and the false-negative rate were nearly equal, then the probability of misidentification would be about the same as the false-positive rate of the device. These special conditions hold sometimes: as for instance, about half the women undergoing a pregnancy test are actually pregnant, and some pregnancy tests give about the same rates of false positives and of false negatives. In this case, the rate of false positives per positive test will be nearly equal to the rate of false positives per nonpregnant woman. This is why it is very easy to fall into this fallacy: by coincidence it gives the correct answer in many common situations.

In many real-world situations, though, particularly problems like detecting criminals in a largely law-abiding population, the small proportion of targets in the large population makes the base rate fallacy very applicable. Even a very low false-positive rate will result in so many false alarms as to make such a system useless in practice.


 * (Section heading) Mathematical formalism

In the above example, where P(T | B) means the probability of T given B, the base rate fallacy is committed by assuming that P(terrorist | bell) equals P(bell | terrorist) and then adding the premise that P(bell | terrorist)=99%. Now, is it true that P(terrorist | bell) equals P(bell | terrorist)?


 * $$P(\mathrm{terrorist}\mid\mathrm{bell}) \,\overset{\underset{\mathrm{?}}{}}{=}\, P(\mathrm{bell}\mid\mathrm{terrorist}).$$

That is not true. Instead, the correct calculation uses Bayes' theorem to take into account the prior probability of any randomly selected inhabitant in the city being a terrorist and the total probability of the bell ringing:



\begin{align} & {}\quad P(\mathrm{terrorist}\mid\mathrm{bell}) \\[10pt] &= \frac{P(\mathrm{bell} \mid \mathrm{terrorist}) P(\mathrm{terrorist})} {P(\mathrm{bell})} \\[10pt] &= \frac{P(\mathrm{bell} \mid \mathrm{terrorist}) \times P(\mathrm{terrorist})} { P(\mathrm{bell} \mid \mathrm{terrorist}) \times P(\mathrm{terrorist}) + P(\mathrm{bell} \mid \mathrm{nonterrorist}) \times P(\mathrm{nonterrorist})} \\[10pt] &= \frac{ 0.99 \cdot (100/1,000,000)} {\frac{0.99 \cdot 100}{1,000,000} + \frac{0.01 \cdot 999,900}{1,000,000}} \\[10pt] &= 1/102 \approx 1\% \end{align}

$$

Thus, in the example the probability was overestimated by more than 100 times due to the failure to take into account the fact that there are about 10000 times more nonterrorists than terrorists (a.k.a. failure to take into account the 'prior probability' of being a terrorist).

(end of copied text) MartinPoulter (talk) 13:31, 7 September 2013 (UTC)

Visualization
I think including tree diagrams like the ones in this article could help with making the examples more intuitive and easy to understand. Anka.213 (talk) 12:02, 12 April 2017 (UTC)

Proposed merger from False positive paradox
The false positive paradox is a common example of the base rate fallacy and probably doesn't deserve its own article. However, the text of the other article does a better job of explaining the idea in plain language (as opposed to just throwing math at you) than this one, and thus should not be deleted entirely. Jode32 (talk) 02:02, 16 July 2017 (UTC)
 * ✅ Klbrain (talk) 11:28, 4 November 2018 (UTC)

Not A Formal Fallacy -- and starting an article with an obviously false claim is a Bad Idea(tm.)
This base rate fallacy is not a "formal fallacy" as claimed in the opening paragraph. Utter nonsense. From the click-through:  a formal fallacy (also called deductive fallacy) is a pattern of reasoning rendered invalid by a flaw in its logical structure that can neatly be expressed in a standard logic system, for example propositional logic It is a plausibly claimed empirical observation. By calling it plausibly claimed, I am saying that I have observed somewhat the same thing. It's a Ya see that sort of thing pretty often... kind of truth. It's cocktail-party social science, like the Peter Principle and the Dunning-Kruger Effect. Great fun, and sometimes useful.

Its opposite, the Iron Law of Phlegmatism, which I have just concocted, is equally true: some people are so impressed with the hum-drum ordinary that they can't open their eyes to see something new. Here I have shown an example of Somebody-Or-Other's Principle, which states that a Really Big Truth is one the opposite of which is equally true. FWIW. David Lloyd-Jones (talk) 07:45, 5 December 2017 (UTC)


 * There is no reason why you should not write a better intro para.  I have difficulty following this stuff, but my principal objection to the intro para as it stands is that it doesn't describe the Base rate fallacy.   At least, not in terms that the interested generalist can follow.   A good wiki entry para should answer (1) "What is it?" and (2) "Why should I care?", say I.  If this one also starts out with an assertion which - according to you (and who should doubt it?) - is plain wrong, and if you know what the base rate fallacy is, please share?   Please?   Success Charles01 (talk) 17:51, 9 February 2018 (UTC)
 * Isn't this article confusing Base Rate Error with Base Rate Fallacy/Bias? The first is something that can be calculated, it is not a cognitive bias, but rather a statistical factor one needs to be aware of when evaluating the results of a test in a population. The second is a cognitive bias where one either 1) dismisses a positive result as a false positive (ie., assumes it is false) since the actual incidence rate in the population (or a previously tested population) is low relative to the false positive rate, or 2) dismisses the possibility of a false positive result (ie., assumes it is true) because the actual incidence rate in the population (or previously tested population) is high relative to the false positive rate. In other words, Base Rate Fallacy is when one has a bias toward the rate of incidence in a population or previously experienced population when evaluating the Base Rate Error rather than taking the normal, more objective, precautions to compensate for Base Rate Error. Base Rate Fallacy is also a factor in Confirmation Biases (a feedback loop where the base rate fallacy can falsely feed into reinforcing the perception of the incidence rate). Additionally, False Positive Paradox was merge into this article, but it is an example of Base Rate Error, not Base Rate Fallacy. — al-Shimoni  (talk) 07:09, 16 December 2019 (UTC)

Improve the lede?
I agree that the lede is not readily accessible to general readers. Base rate is not clearly defined, and no examples are provided to illustrate why base rate neglect is such an important phenomenon. I'll work on a revision and will post it here for comment.Regutten (talk) 21:38, 1 January 2019 (UTC)

Cancer screening
Understanding the effects of base rates is critical for properly interpreting the results from cancer screenings. Unfortunately, very few people understand that in the context of the general screening of asymptomatic patients, most of those who receive a positive (indicative of cancer) screening outcome do not have cancer or would not die from the cancer if they had not been screened. I would like to add a section to the page that discusses base rate neglect in the context of the interpretation of cancer screening results.Regutten (talk) 21:38, 1 January 2019 (UTC)

Tversky/Kahneman taxi problem?
Given its prominence in psychology discussions of base rate neglect, I'm surprised that the taxi problem is not discussed anywhere on this page. Some discussion of the problem could be added to the "Findings in psychology" section, or the problem could be added as the first of the examples in the "Examples" section. Thoughts?Regutten (talk) 21:38, 1 January 2019 (UTC)
 * WP has some coverage here that might be cross refrerenced: https://en.wikipedia.org/wiki/Representativeness_heuristic#The_taxicab_problem DKEdwards (talk) 03:36, 11 January 2023 (UTC)

"Incidence" vs "Prevalence"
Throughout this article, I found "Incidence" to be defined as "the proportion of diseased people in the population", while in Epidemiology this definition actually pertains to "Prevalence". Incidence would be the proportion of people getting the condition in a condition-free population over a certain timespan.

I think this is confusing for a reader and should be corrected.
 * ✅ Thank you, I have retitled the relevant subsections. Coolclawcat (talk) 03:50, 19 January 2024 (UTC)

Politically contentious examples
Is it really necessary to use examples relating to Covid, drink driving, surveillance etc. that arouse strong political feelings, to illustrate a philosophical concept? Subspace345 (talk) 21:12, 21 August 2023 (UTC)
 * First of all, base rate fallacy is more than just a . As the article mentions, people have been imprisoned because of it. Second, the reason there are examples of medical testing and law enforcement is because those areas are where the fallacy often appears – and most impactful when it does – in real life.
 * Have you ever sat through a boring lecture on some topic where you were thinking something like "Why is this even relevant? How would I ever use this knowledge in real life"? Well, we have an important topic, and nowadays Wikipedia articles are one of the most common ways to learn about a topic whether we like it or not. By giving engaging examples that are relevant to people's lives, people can absorb the statistical concepts and connect them to the practical situations they might be used in.
 * If you have some other example in mind that would get this across better with lower risk of polarization, please feel free to be bold and add it yourself. Coolclawcat (talk) 03:50, 19 January 2024 (UTC)