Jury theorem

A jury theorem is a mathematical theorem proving that, under certain assumptions, a decision attained using majority voting in a large group is more likely to be correct than a decision attained by a single expert. It serves as a formal argument for the idea of wisdom of the crowd, for decision of questions of fact by jury trial, and for democracy in general.

The first and most famous jury theorem is Condorcet's jury theorem. It assumes that all voters have independent probabilities to vote for the correct alternative, these probabilities are larger than 1/2, and are the same for all voters. Under these assumptions, the probability that the majority decision is correct is strictly larger when the group is larger; and when the group size tends to infinity, the probability that the majority decision is correct tends to 1.

There are many other jury theorems, relaxing some or all of these assumptions.

Setting
The premise of all jury theorems is that there is an objective truth, which is unknown to the voters. Most theorems focus on binary issues (issues with two possible states), for example, whether a certain defendant is guilty or innocent, whether a certain stock is going to rise or fall, etc. There are $$n$$ voters (or jurors), and their goal is to reveal the truth. Each voter has an opinion about which of the two options is correct. The opinion of each voter is either correct (i.e., equals the true state), or wrong (i.e., differs than the true state). This is in contrast to other settings of voting, in which the opinion of each voter represents his/her subjective preferences and is thus always "correct" for this specific voter. The opinion of a voter can be considered a random variable: for each voter, there is a positive probability that his opinion equals the true state.

The group decision is determined by the majority rule. For example, if a majority of voters says "guilty" then the decision is "guilty", while if a majority says "innocent" then the decision is "innocent". To avoid ties, it is often assumed that the number of voters $$n$$ is odd. Alternatively, if $$n$$ is even, then ties are broken by tossing a fair coin.

Jury theorems are interested in the probability of correctness - the probability that the majority decision coincides with the objective truth. Typical jury theorems make two kinds of claims on this probability:

Claim 1 is often called the non-asymptotic part and claim 2 is often called the asymptotic part of the jury theorem.
 * 1) Growing Reliability: the probability of correctness is larger when the group is larger.
 * 2) Crowd Infallibility: the probability of correctness goes to 1 when the group size goes to infinity.

Obviously, these claims are not always true, but they are true under certain assumptions on the voters. Different jury theorems make different assumptions.

Independence, competence, and uniformity
Condorcet's jury theorem makes the following three assumptions:

The jury theorem of Condorcet says that these three assumptions imply Growing Reliability and Crowd Infallibility.
 * 1) Unconditional Independence: the voters make up their minds independently. In other words, their opinions are independent random variables.
 * 2) Unconditional Competence: the probability that the opinion of a single voter coincides with the objective truth is larger than 1/2 (i.e., the voter is smarter than a random coin-toss).
 * 3) Uniformity: all voters have the same probability of being correct.

Correlated votes: weakening the independence assumption
The opinions of different voters are often correlated, so Unconditional Independence may not hold. In this case, the Growing Reliability claim might fail.

Example
Let $$p$$ be the probability of a juror voting for the correct alternative and $$c$$ be the (second-order) correlation coefficient between any two correct votes. If all higher-order correlation coefficients in the Bahadur representation of the joint probability distribution of votes equal to zero, and $$(p,c)\in\mathcal{B}_n$$ is an admissible pair, then the probability of the jury collectively reaching the correct decision under simple majority is given by:
 * $$P(n,p,c)=I_p\left(\frac{n+1}{2},\frac{n+1}{2}\right)+0.5c(n-1)(0.5-p)\frac{\partial I_p(\frac{n+1}{2},\frac{n+1}{2})}{\partial p},$$

where $$I_p$$ is the regularized incomplete beta function.

Example: Take a jury of three jurors $$(n=3)$$, with individual competence $$p=0.55$$ and second-order correlation $$c=0.4$$. Then $$P(3,0.55,0.4)=0.54505$$. The competence of the jury is lower than the competence of a single juror, which equals to $$0.55$$. Moreover, enlarging the jury by two jurors $$(n=5)$$ decreases the jury competence even further, $$P(5,0.55,0.4)=0.5196194$$. Note that $$p=0.55$$ and $$c=0.4$$ is an admissible pair of parameters. For $$n=5$$ and $$p=0.55$$, the maximum admissible second-order correlation coefficient equals $$\approx 0.43$$.

The above example shows that when the individual competence is low but the correlation is high:

The above result is due to Kaniovski and Zaigraev. They also discuss optimal jury design for homogenous juries with correlated votes.
 * The collective competence under simple majority may fall below that of a single juror;
 * Enlarging the jury may decrease its collective competence.

There are several jury theorems that weaken the Independence assumption in various ways.

Truth-sensitive independence and competence
In binary decision problems, there is often one option that is easier to detect that the other one. For example, it may be easier to detect that a defendant is guilty (as there is clear evidence for guilt) than to detect that he is innocent. In this case, the probability that the opinion of a single voter is correct is represented by two different numbers: probability given that option #1 is correct, and probability given that option #2 is correct. This also implies that opinions of different voters are correlated. This motivates the following relaxations of the above assumptions:


 * 1) Conditional Independence: for each of the two options, the voters' opinions given that this option is the true one are independent random variables.
 * 2) Conditional Competence: for each of the two options, the probability that a single voter's opinion is correct given that this option is true is larger than 1/2.
 * 3) Conditional Uniformity:  for each of the two options, all voters have the same probability of being correct given that this option is true.

Growing Reliability and Crowd Infallibility continue to hold under these weaker assumptions.

One criticism of Conditional Competence is that it depends on the way the decision question is formulated. For example, instead of asking whether the defendant is guilty or innocent, one can ask whether the defendant is guilty of exactly 10 charges (option A), or guilty of another number of charges (0..9 or more than 11). This changes the conditions, and hence, the conditional probability. Moreover, if the state is very specific, then the probability of voting correctly might be below 1/2, so Conditional Competence might not hold.

Effect of an opinion leader
Another cause of correlation between voters is the existence of an opinion leader. Suppose each voter makes an independent decision, but then each voter, with some fixed probability, changes his opinion to match that of the opinion leader. Jury theorems by Boland and Boland, Proschan and Tong shows that, if (and only if) the probability of following the opinion leader is less than 1-1/2p (where p is the competence level of all voters), then Crowd Infallibility holds.

Problem-sensitive independence and competence
In addition to the dependence on the true option, there are many other reasons for which voters' opinions may be correlated. For example: It is possible to weaken the Conditional Independence assumption, and conditionalize on all common causes of the votes (rather than just the state). In other words, the votes are now independent conditioned on the specific decision problem. However, in a specific problem, the Conditional Competence assumption may not be valid. For example, in a specific problem with false evidence, it is likely that most voters will have a wrong opinion. Thus, the two assumptions - conditional independence and conditional competence - are not justifiable simultaneously (under the same conditionalization).
 * Deliberation among voters;
 * Peer pressure;
 * False evidence (e.g. a guilty defendant that excels at pretending to be innocent);
 * External conditions (e.g. poor weather affecting their judgement).
 * Any other common cause of votes

A possible solution is to weaken Conditional Competence as follows. For each voter and each problem x, there is a probability p(x) that the voter's opinion is correct in this specific problem. Since x is a random variable, p(x) is a random variable too. Conditional Competence requires that p(x) > 1/2 with probability 1. The weakened assumption is:


 * Tendency to Competence: for each voter, and for each r>0, the probability that p(x) = 1/2+r is at least as large as the probability that p(x) = 1/2-r.

A jury theorem by Dietrich and Spiekerman says that Conditional Independence, Tendency to Competence, and Conditional Uniformity, together imply Growing Reliability. Note that Crowd Infallibility is not implied. In fact, the probability of correctness tends to a value which is below 1, if and only of Conditional Competence does not hold.

Bounded correlation
A jury theorem by Pivato shows that, if the average covariance between voters becomes small as the population becomes large, then Crowd Infallibility holds (for some voting rule). There are other jury theorems that take into account the degree to which votes may be correlated.

Other solutions
Other ways to cope with voter correlation include causal networks, dependence structures, and interchangeability.

Diverse capabilities: weakening the uniformity assumption
Different voters often have different competence levels, so the Uniformity assumption does not hold. In this case, both Growing Reliability and Crowd Infallibility may not hold. This may happen if new voters have much lower competence than existing voters, so that adding new voters decreases the group's probability of correctness. In some cases, the probability of correctness might converge to 1/2 (- a random decision) rather than to 1.

Stronger competence requirements
Uniformity can be dismissed if the Competence assumption is strengthened. There are several ways to strengthen it:


 * Strong Competence: for each voter i, the probability of correctness pi is at least 1/2+e, where e>0 is fixed for all voters. In other words: the competence is bounded away from a fair coin toss. A jury theorem by Paroush shows that Strong Competence and Conditional Independence together imply Crowd Infallibility (but not Growing Reliability).
 * Average Competence: the average of the individual competence levels of the voters (i.e. the average of their individual probabilities of deciding correctly) is slightly greater than half, or converges to a value above 1/2. Jury theorems by Grofman, Owen and Feld, and Berend and Paroush, show that Average Competence and Conditional Independence together imply Crowd Infallibility (but not Growing Reliability).

Random voter selection
instead of assuming that the voter identity is fixed, one can assume that there is a large pool of potential voters with different competence levels, and the actual voters are selected at random from this pool (as in sortition).

A jury theorem by Ben Yashar and Paroush shows that, under certain conditions, the correctness probability of a jury, or of a subset of it chosen at random, is larger than the correctness probability of a single juror selected at random. A more general jury theorem by Berend and Sapir proves that Growing Reliability holds in this setting: the correctness probability of a random committee increases with the committee size. The theorem holds, under certain conditions, even with correlated votes.

A jury theorem by Owen, Grofman and Feld analyzes a setting where the competence level is random. They show what distribution of individual competence maximizes or minimizes the probability of correctness.

Weighted majority rule
When the competence levels of the voters are known, the simple majority rule may not be the best decision rule. There are various works on identifying the optimal decision rule - the rule maximizing the group correctness probability. Nitzan and Paroush show that, under Unconditional Independence, the optimal decision rule is a weighted majority rule, where the weight of each voter with correctness probability pi is log(pi/(1-pi)), and an alternative is selected iff the sum of weights of its supporters is above some threshold. Grofman and Shapley analyze the effect of interdependencies between voters on the optimal decision rule. Ben-Yashar and Nitzan prove a more general result.

Dietrich generalizes this result to a setting that does not require prior probabilities of the 'correctness' of the two alternative. The only required assumption is Epistemic Monotonicity, which says that, if under certain profile alternative x is selected, and the profile changes such that x becomes more probable, then x is still selected. Dietrich shows that Epistemic Monotonicity implies that the optimal decision rule is weighted majority with a threshold. In the same paper, he generalizes the optimal decision rule to a setting that does not require the input to be a vote for one of the alternatives. It can be, for example, a subjective degree of belief. Moreover, competence parameters do not need to be known. For example, if the inputs are subjective beliefs x1,...,xn, then the optimal decision rule sums log(xi/(1-xi)) and checks whether the sum is above some threshold. Epistemic Monotonicity is not sufficient for computing the threshold itself; the threshold can be computed by assuming expected-utility maximization and prior probabilities.

A general problem with the weighted majority rules is that they require to know the competence levels of the different voters, which is usually hard to compute in an objective way. Baharad, Goldberger, Koppel and Nitzan present an algorithm that solves this problem using statistical machine learning. It requires as input only a list of past votes; it does not need to know whether these votes were correct or not. If the list is sufficiently large, then its probability of correctness converges to 1 even if the individual voters' competence levels are close to 1/2.

More than two options
Often, decision problems involve three or more options. This critical limitation was in fact recognized by Condorcet (see Condorcet's paradox), and in general it is very difficult to reconcile individual decisions between three or more outcomes (see Arrow's theorem).

This limitation may also be overcome by means of a sequence of votes on pairs of alternatives, as is commonly realized via the legislative amendment process. (However, as per Arrow's theorem, this creates a "path dependence" on the exact sequence of pairs of alternatives; e.g., which amendment is proposed first can make a difference in what amendment is ultimately passed, or if the law—with or without amendments—is passed at all.)

With three or more options, Conditional Competence can be generalized as follows:


 * Multioption Conditional Competence: for any two options x and y, if x is correct and y is not, then any voter is more likely to vote for x than for y.

A jury theorem by List and Goodin shows that Multioption Conditional Competence and Conditional Independence together imply Crowd Infallibility. Dietrich and Spiekermann conjecture that they imply Growing Reliability too. Another related jury theorem is by Everaere, Konieczny and Marquis.

When there are more than two options, there are various voting rules that can be used instead of simple majority. The statistic and utilitarian properties of such rules are analyzed e.g. by Pivato.

Indirect majority systems
Condorcet's theorem considers a direct majority system, in which all votes are counted directly towards the final outcome. Many countries use an indirect majority system, in which the voters are divided into groups. The voters in each group decide on an outcome by an internal majority vote; then, the groups decide on the final outcome by a majority vote among them. For example, suppose there are 15 voters. In a direct majority system, a decision is accepted whenever at least 8 votes support it. Suppose now that the voters are grouped into 3 groups of size 5 each. A decision is accepted whenever at least 2 groups support it, and in each group, a decision is accepted whenever at least 3 voters support it. Therefore, a decision may be accepted even if only 6 voters support it.

Boland, Proschan and Tong prove that, when the voters are independent and p>1/2, a direct majority system - as in Condorcet's theorem - always has a higher chance of accepting the correct decision than any indirect majority system.

Berg and Paroush consider multi-tier voting hierarchies, which may have several levels with different decision-making rules in each level. They study the optimal voting structure, and compares the competence against the benefit of time-saving and other expenses.

Goodin and Spiekermann compute the amount by which a small group of experts should be better than the average voters, in order for them to accept better decisions.

Strategic voting
It is well-known that, when there are three or more alternatives, and voters have different preferences, they may engage in strategic voting, for example, vote for the second-best option in order to prevent the worst option from being elected. Surprisingly, strategic voting might occur even with two alternatives and when all voters have the same preference, which is to reveal the truth. For example, suppose the question is whether a defendant is guilty or innocent, and suppose a certain juror thinks the true answer is "guilty". However, he also knows that his vote is effective only if the other votes are tied. But, if other votes are tied, it means that the probability that the defendant is guilty is close to 1/2. Taking this into account, our juror might decide that this probability is not sufficient for deciding "guilty", and thus will vote "innocent". But if all other voters do the same, the wrong answer is derived. In game-theoretic terms, truthful voting might not be a Nash equilibrium. This problem has been termed the swing voter's curse, as it is analogous to the winner's curse in auction theory.

A jury theorem by Peleg and Zamir shows sufficient and necessary conditions for the existence of a Bayesian-Nash equilibrium that satisfies Condorcet's jury theorem. Bozbay, Dietrich and Peters show voting rules that lead to efficient aggregation of the voters' private information even with strategic voting.

In practice, this problem may not be very severe, since most voters care not only about the final outcome, but also about voting correctly by their conscience. Moreover, most voters are not sophisticated enough to vote strategically.

Subjective opinions
The notion of "correctness" may not be meaningful when making policy decisions, which are based on values or preferences, rather than just on facts.

Some defenders of the theorem hold that it is applicable when voting is aimed at determining which policy best promotes the public good, rather than at merely expressing individual preferences. On this reading, what the theorem says is that although each member of the electorate may only have a vague perception of which of two policies is better, majority voting has an amplifying effect. The "group competence level", as represented by the probability that the majority chooses the better alternative, increases towards 1 as the size of the electorate grows assuming that each voter is more often right than wrong.

Several papers show that, under reasonable conditions, large groups are better trackers of the majority preference.

Applicability
The applicability of jury theorems, in particular, Condorcet's Jury Theorem (CJT) to democratic processes is debated, as it can prove majority rule to be a perfect mechanism or a disaster depending on individual competence. Recent studies show that, in a non-homogeneous case, the theorem's thesis does not hold almost surely (unless weighted majority rule is used with stochastic weights that are correlated with epistemic rationality but such that every voter has a minimal weight of one).