Wikipedia:Reference desk/Archives/Mathematics/2021 February 1

= February 1 =

Winning percentage fallacy
Consider a combat tournament with a large number (sufficiently large that small sample size is not a problem) of combatants, in which each match is zero-sum and has a winner (there are no ties). The rules specify that each combatant may choose to bring into the tournament exactly 2 of the following 3 "weapons": a pillow, a thimble, a sword. The sword obviously being superior to the other "weapons", ALL combatants choose a sword as one of their 2 weapons. For the other weapon though, the combatants are split: exactly half choose to bring a pillow, and exactly half choose to bring a thimble.

After the conclusion of the tournament, a statistician analyzes the results of the tournament. The statistician finds that:


 * combatants with a sword had a combined winning percentage of exactly 50% (remember that EVERY combatant brought a sword, so every win by any combatant with a sword meant a loss by another combatant with a sword, resulting in a combined winning percentage of exactly 50%)
 * combatants with a thimble had a combined winning percentage of less than 50%
 * combatants with a pillow had a combined winning percentage of more than 50%

Simple example with round-robin tournament of only 4 combatants to demonstrate the mathematics: Alice and Bob both wield sword and pillow. Carol and Dave both wield sword and thimble. The 6 matches:


 * Alice defeats Bob.
 * Alice defeats Carol.
 * Alice defeats Dave.
 * Bob defeats Carol.
 * Bob defeats Dave.
 * Carol defeats Dave.

Then, standings:


 * Alice (sword and pillow) 3-0
 * Bob (sword and pillow) 2-1
 * Carol (sword and thimble) 1-2
 * Dave (sword and thimble) 0-3

Winning percentages by weapon:


 * sword 6-6 (50%)
 * thimble 1-5 (16.7%)
 * pillow 5-1 (83.3%)

Since pillow has the highest winning percentage, even higher than sword, the statistician concludes that a pillow is a better, more effective weapon than a sword.

What logical or statistical fallacy is this statistician making? I checked the list of fallacies and misuse of statistics articles, but those articles list so many different fallacies that I am uncertain.

—SeekingAnswers (reply) 08:59, 1 February 2021 (UTC)


 * Were any combatants harmed in this trial? It is a pity they were not allowed to bring a pen. As to the fallacy, I'd go with false causality. --Lambiam 09:45, 1 February 2021 (UTC)
 * Specifically, the correlation between having a pillow and winning is due to chance rather than cause and effect. On the other hand, causality can't be dismissed completely either; perhaps the people who got a pillow strapped it to their body as a kind of makeshift armor. But I think what this really illustrates is the importance of good experimental design and drawing the correct conclusion from the data. Even if you assume the difference between pillow and thimble is not due to chance, as in the armor scenario, the only conclusion you can draw from the statistic is that people who have pillow as their second "weapon" are more likely to win than people who have thimble. In a truly "random trial", every contestant is given would be given a random selection of two out of the three "weapons". In that case the winning likelihoods would be S+P:2/3, S+T:2/3, P+T:1/6, which has the correct correlation between sword and winning. So the real issue is the bias introduced by the conditions of the experiment, and so perhaps some form of Sampling bias fits as well. --RDBury (talk) 12:04, 1 February 2021 (UTC)
 * As an experimental set-up, it is a poor design, but in this hypothetical scenario the people who organized the tournament and decided on the rules of combat may have had no interest whatsoever in statistical studies, and the hapless statistician conducting the study could not have made them change the rules. When circumstances make an experimental design unfeasible or unethical, natural experiments may still allow one to draw statistical inferences, but avoiding the many pitfalls of statistics then requires more care and expertise. For a sad case of the consequences of lack of expertise, see Lucia de B. --Lambiam 13:33, 1 February 2021 (UTC)


 * Yes, the difficulty is that the tournament wasn't intended as an experiment in the first place, and the statistician has no power over the tournament's design or rules. The statistician is just trying to draw conclusions from data for a concluded event that wasn't intended as an experiment to test any specific variable. —SeekingAnswers (reply) 00:25, 2 February 2021 (UTC)


 * The scenario presented is actually not purely theoretical. The scenario is a highly-abstracted version of a more complex and less obvious (that one particular weapon-equivalent is superior) real-life situation that I am facing, in which a colleague is arguing that our organization should adopt the pillow-equivalent in place of the sword-equivalent based on essentially similar/analogous data and reasoning that purportedly shows that the pillow-equivalent is "more successful" than the sword-equivalent. If he is able to convince the rest of our organization to adopt the pillow-equivalent and discard the sword-equivalent, then I fear we will be taking a large loss in the future. I am trying to convince our other colleagues that his reasoning is bad, which would be easier if I could put a name to the fallacy that he is making. —SeekingAnswers (reply) 00:47, 2 February 2021 (UTC)
 * Getting to the truth is one thing; getting management to believe the truth is an entirely different matter. The rules of statistical inference, and even logic, don't seem to apply when you're trying to convince someone that the position they've committed to is incorrect, especially then they think it will make them look bad to their superiors. That's not really a math problem though. --RDBury (talk) 09:01, 2 February 2021 (UTC)


 * TL;DR. I only took a quick glance at this question, but it looks possibly related to the Condorcet paradox in voting theory.  Or maybe Simpson's paradox in statistics. 2602:24A:DE47:BB20:50DE:F402:42A6:A17D (talk) 05:08, 3 February 2021 (UTC)


 * I concur with RDB that convincing your colleagues or management is an entirely different issue than technically winning the argument. Having a catchy name for the fallacy, such as "correlation does not imply causation", is hardly going to be helpful IMO. A better appraoch may be to present your analysis and invite your colleagues to criticize it.
 * As to the technical issue, with rare exceptions, statistical techniques are based on an underlying very abstract mathematical model of (an isolated aspect of) the real world. If that model is not adequate for its purpose, then we get into the realm of "lies, damned lies, and statistics". A key aspect of simple (often too simple) models is the assumption of independence of certain variables. Perhaps poor sword fighters know that they are so clumsy that they cannot even wield a pillow to defend themselves, so their choice for a thimble and their losing the combats may both be the result of their clumsiness, and not of the inferiority of the thimble. Other combatants may be master kung-fu pillow fighters; they simply discard the useless swords and smother their opponents with the pillows, against which thimbles offer no effective defence. With the independence assumption, the simplest models are the linear ones; here we can attempt to explain the scores of the participants as a linear combination of weights assigned to their choices of weapons. If we do this without thinking, we obtain a system of equations that is dependent and unsolvable. We can tentatively conclude (tentatively because the sample size is too low: the one-sided p > 5) that the pillow is mightier than the thimble, but the ranking of the sword remains unapproachable. What also remains unexamined is a possible synergy between several weapons in a combatant's equipment.  --Lambiam 11:51, 3 February 2021 (UTC)