User:Tnorsen/Sandbox/SSBT

Scholarpedia Bell's theorem -- SHORT VERSION!

Bell's theorem asserts that if certain predictions of quantum theory are correct then our world is non-local. "Non-local" here means that there exist interactions between events that are too far apart in space and too close together in time for the events to be connected even by signals moving at the speed of light. This theorem was proved in 1964 by John Stewart Bell and has been in recent decades the subject of extensive analysis, discussion, and development by both physicists and philosophers of science. The relevant predictions of quantum theory were first convincingly confirmed by the experiment of Aspect et al. in 1982; they have been even more convincingly reconfirmed many times since. In light of Bell's theorem, the experiments thus establish that our world is non-local. This conclusion is very surprising, since non-locality is normally taken to be prohibited by the theory of relativity.

Historical background
John Bell's interest in non-locality was triggered by his analysis of the problem of hidden variables in quantum theory and in particular by his learning about the de Broglie–Bohm "pilot-wave" theory (aka "Bohmian mechanics" ). Bell wrote that David "Bohm's 1952 papers on quantum mechanics were for me a revelation. The elimination of indeterminism was very striking. But more important, it seemed to me, was the elimination of any need for a vague division of the world into 'system' on the one hand, and 'apparatus' or 'observer' on the other."

In particular, learning about Bohm's "hidden variables" theory helped Bell recognize the invalidity of the various "no hidden variables" theorems (by John von Neumann and others) which had been taken almost universally by physicists as conclusively establishing something like Niels Bohr's Copenhagen interpretation of quantum theory. Bohm's pilot-wave theory was a clean counterexample, i.e., a proof-by-example that the theorems somehow didn't rule out what they had been taken to rule out.

This led Bell to carefully scrutinize those theorems. The result of this work was his paper "On the problem of hidden variables in quantum mechanics". This paper was written prior to the 1964 paper in which Bell's theorem was first presented, but (due to an editorial accident) remained unpublished until 1966. The 1966 paper shows that the "no hidden variables" theorems of von Neumann and others all made unwarranted — and in some cases unacknowledged — assumptions. (All these theorems involved an assumption which today is usually called non-contextuality.) In examining how Bohm's theory managed to violate these assumptions, Bell noticed that it did have one "curious feature": the theory was manifestly non-local. As Bell explained, "in this theory an explicit causal mechanism exists whereby the disposition of one piece of apparatus affects the results obtained with a distant piece." This naturally raised the question of whether the non-locality was eliminable, or somehow essential: ... to the present writer's knowledge, there is no proof that any hidden variable account of quantum mechanics must have this extraordinary character. It would therefore be interesting, perhaps, to pursue some further 'impossibility proofs,' replacing the arbitrary axioms objected to above by some condition of locality, or of separability of distant systems. Because of the editorial accident mentioned above, Bell had answered his own question before the paper in which it appeared was even published. The answer is contained in what we will here call "Bell's inequality theorem", which states precisely that "any hidden variable account of quantum mechanics must have this extraordinary character", i.e., must violate a locality constraint that is motivated by relativity.

But the more general result we here call "Bell's theorem" is much more than this: combined with the Einstein–Podolsky–Rosen (EPR) argument "from locality to deterministic hidden variables", the inequality theorem establishes a contradiction between locality as such (and not merely some special class of local theories) and the (now experimentally confirmed) predictions of quantum theory.

The EPR argument for pre-existing values
It is a general principle of orthodox formulations of quantum theory that measurements of physical quantities do not simply reveal pre-existing or pre-determined values, the way they do in classical theories. Instead, the particular outcome of the measurement somehow "emerges" from the dynamical interaction of the system being measured with the measuring device, so that even someone who was omniscient about the states of the system and device prior to the interaction couldn't have predicted in advance which outcome would be realized.

In a celebrated 1935 paper, however, Albert Einstein, Boris Podolsky, and Nathan Rosen pointed out that, in situations involving specially-prepared pairs of particles, this orthodox principle conflicted with locality. Unfortunately, the role of locality in the discussion is often misunderstood — or missed entirely. One thus often hears that the EPR paper is essentially just an expression of (in particular) Einstein's philosophical discontent with quantum theory. This is quite wrong: what the paper actually contains is an argument showing that, if non-local influences are forbidden, and if certain quantum theoretical predictions are correct, then the measurements (whose outcomes are correlated) must be revealing pre-existing values. It is on this basis — in particular, on the assumption of locality — that EPR claimed to have established the "incompleteness" of orthodox quantum theory (which denies the existence of any such pre-existing values).

In the 1935 EPR paper, the argument was formulated in terms of position and momentum (which are observables having continuous spectra). The argument was later reformulated (by Bohm ) in terms of spin. This "EPRB" version is conceptually simpler and also more closely related to the recent experiments designed to test Bell's inequality.

The EPRB argument is as follows: assume that one has prepared a pair of spin-1/2 particles in the entangled spin singlet state

$$\frac1{\sqrt2}\,\big(\left\vert\uparrow\right\rangle\otimes\left\vert\downarrow\right\rangle-\left\vert\downarrow\right\rangle\otimes\left\vert\uparrow\right\rangle\big),$$

with $$\left\vert\uparrow\right\rangle$$, $$\left\vert\downarrow\right\rangle$$ an orthonormal basis of the spin state space. A measurement of the spin of one of the particles along a given axis yields either the result "up" (i.e., "spin up") or the result "down" (i.e., "spin down"). Moreover, if one measures the spin of both particles along some given axis (say, the $$z$$-axis), then quantum theory predicts that the results obtained will be perfectly anti-correlated, i.e., they will be opposite ("up" for one particle and "down" for the other). If such measurements are carried out simultaneously on two spatially-separated particles (technically, if the measurements are performed at space-like separation) then locality requires that any disturbance triggered by the measurement on one side cannot influence the result of the measurement on the other side. But without any such interaction, the only way to ensure the perfect anti-correlation between the results on the two sides is to have each particle carry a pre-existing determinate value (appropriately anti-correlated with the value carried by the other particle) for spin along the $$z$$-axis. Any element of locally-confined indeterminism would at least sometimes spoil the predicted perfect anti-correlation between the outcomes.

Now, obviously there is nothing special here about the $$z$$-axis, so what was just established for the $$z$$-axis applies to any axis. Thus it applies to all axes at once. That is, assuming (a) locality and (b) that the perfect anti-correlations predicted by quantum theory actually obtain, it follows that each particle must carry a pre-existing value for spin along all possible axes, with the values for the two particles in a given pair — which, of course, needn't be the same from one particle pair to another — perfectly anti-correlated, axis by axis. (A mathematical formulation of this argument is presented at the end of Section 5.)

Bell's inequality theorem
Pre-existing values are thus the only local way to account for perfect anti-correlations in the outcomes of spin measurements along identical axes. But a simple argument shows that pre-existing values are incompatible with the predictions of quantum theory (for a pair of particles prepared in the singlet state) when we allow also for the possibility of spin measurements along different axes.

According to quantum theory, when spin measurements along different axes are performed on the pair of particles in the singlet state, the probability that the two results will be opposite (one "up" and one "down") is equal to $$(1+\cos\,\theta)/2$$, where $$\theta\in[0,\pi]$$ is the angle between the chosen (oriented) axes. It follows from the simple mathematical result below, Bell's inequality theorem, that this is not compatible with the pre-existing values we have been discussing.

To see this, suppose that the spin measurements for both particles do simply reveal pre-existing values. Denote by $$Z^i_\alpha$$, $$i=1,2$$, the pre-determined outcome of the spin measurement for particle number $$i$$ along axis $$\alpha$$. These values will evidently vary from one run of the experiment (i.e., one particle pair) to the next, and can thus be treated mathematically as random variables (each one assuming only two possible values, say 1 for "up" and -1 for "down").

Now consider three particular axes $$\mathbf a$$, $$\mathbf b$$, and $$\mathbf c$$ that lie in a single plane and are such that the angle between any two of them is equal to $$2\pi/3$$. Then, since $$\big(1+\cos(2\pi/3)\big)/2=1/4$$, agreement with quantum theory will require that $$P(Z^1_\alpha\ne Z^2_\beta)=1/4$$ if $$\alpha\ne\beta$$ are among $$\mathbf a$$, $$\mathbf b$$, $$\mathbf c$$ (where $$P$$ stands for probability). Agreement with quantum theory also requires opposite outcomes for identical measurement axes, i.e., $$Z^1_\alpha=-Z^2_\alpha$$, for all $$\alpha$$. But it turns out that it is impossible to satisfy both requirements:

Bell's inequality theorem. Consider random variables $$Z^i_\alpha$$, $$i=1,2$$, $$\alpha=\mathbf a, \mathbf b, \mathbf c$$, taking only the values $$\pm1$$. If these random variables are perfectly anti-correlated, i.e., if $$Z^1_\alpha=-Z^2_\alpha$$, for all $$\alpha$$, then:


 * $$(1)\quad P(Z^1_{\mathbf a}\ne Z^2_{\mathbf b})+P(Z^1_{\mathbf b}\ne Z^2_{\mathbf c})+P(Z^1_{\mathbf c}\ne Z^2_{\mathbf a})\ge1.$$

Proof. Since (at any given point of the sample space) the three $$\pm1$$-valued random variables $$Z^1_\alpha$$ can't all disagree, the union of the events $$\{Z^1_{\mathbf a}=Z^1_{\mathbf b}\}$$, $$\{Z^1_{\mathbf b}=Z^1_{\mathbf c}\}$$, $$\{Z^1_{\mathbf c}=Z^1_{\mathbf a}\}$$ is equal to the entire sample space. Therefore the sum of their probabilities must be greater than or equal to 1:


 * $$P(Z^1_{\mathbf a}=Z^1_{\mathbf b})+P(Z^1_{\mathbf b}=Z^1_{\mathbf c})+P(Z^1_{\mathbf c}=Z^1_{\mathbf a})\ge1.$$

But since $$Z^1_\beta = -Z^2_\beta$$, we have that $$P(Z^1_\alpha=Z^1_\beta)=P(Z^1_\alpha\ne Z^2_\beta)$$. The thesis immediately follows.

Each of the three terms on the left hand side of (1) must equal $$1/4$$ in order to reproduce the quantum predictions. But, since $$1/4+1/4+1/4=3/4<1$$, the full set of quantum predictions cannot be matched. This establishes the incompatibility between the quantum predictions and the existence of pre-existing values.

We note that Bell's original paper considered for this purpose, instead of the disagreement probability $$P(Z^1_\alpha\ne Z^2_\beta)$$, the correlation $$C(\alpha,\beta)$$, defined as the expected value of the product $$Z^1_\alpha Z^2_\beta$$:


 * $$C(\alpha,\beta)=E(Z^1_\alpha Z^2_\beta)=P(Z^1_\alpha Z^2_\beta=1)\,-\,P(Z^1_\alpha Z^2_\beta=-1)=P(Z^1_\alpha=Z^2_\beta)\,-\,P(Z^1_\alpha\ne Z^2_\beta)=1\,-\,2P(Z^1_\alpha\ne Z^2_\beta).$$

Bell's original inequality (under the same assumptions as for Bell's inequality theorem above) is:


 * $$\vert C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)\vert\le 1+C(\mathbf b,\mathbf c).$$

Let us see how this inequality is related to inequality (1). Rewriting inequality (1) in terms of the correlations $$C(\alpha,\beta)$$, we obtain:


 * $$\quad C(\mathbf a,\mathbf b)+C(\mathbf b,\mathbf c)+C(\mathbf c,\mathbf a)\le1.$$

Since (because of the perfect anti-correlations) $$C(\alpha,\beta)=C(\beta,\alpha)$$, this yields that
 * $$(2)\quad C(\mathbf a,\mathbf b)+C(\mathbf a,\mathbf c)+C(\mathbf b,\mathbf c)\le1.$$

Bell's original inequality is equivalent to the conjunction of two inequalities without absolute value: one of them is obtained from (2) by changing the signs of $$C(\mathbf a,\mathbf c)$$ and $$C(\mathbf b,\mathbf c)$$. (This inequality follows, as (2) does, from Bell's inequality theorem above if we replace $$Z^i_{\mathbf c}$$ with $$-Z^i_{\mathbf c}$$.) The other inequality is obtained from (2) by changing the signs of $$C(\mathbf a,\mathbf b)$$ and $$C(\mathbf b,\mathbf c)$$. (This inequality follows from Bell's inequality theorem above by replacing $$Z^i_{\mathbf b}$$ with $$-Z^i_{\mathbf b}$$.)

Bell's theorem
Bell's theorem states that the predictions of quantum theory (for measurements of spin on particles prepared in the singlet state) cannot be accounted for by any local theory. The proof of Bell's theorem is obtained by combining the EPR argument (from locality and certain quantum predictions to pre-existing values) and Bell's inequality theorem (from pre-existing values to an inequality incompatible with other quantum predictions).

Here is how Bell himself recapitulated the two-part argument: Let us summarize once again the logic that leads to the impasse. The EPRB correlations are such that the result of the experiment on one side immediately foretells that on the other, whenever the analyzers happen to be parallel. If we do not accept the intervention on one side as a causal influence on the other, we seem obliged to admit that the results on both sides are determined in advance anyway, independently of the intervention on the other side, by signals from the source and by the local magnet setting. But this has implications for non-parallel settings which conflict with those of quantum mechanics. So we cannot dismiss intervention on one side as a causal influence on the other. Already at the time Bell wrote this, there was a tendency for critics to miss the crucial role of the EPR argument here. The conclusion is not just that some special class of local theories (namely, those which explain the measurement outcomes in terms of pre-existing values) are incompatible with the predictions of quantum theory (which is what follows from Bell's inequality theorem alone), but that local theories as such (whether deterministic or not, whether positing hidden variables or not, etc.) are incompatible with the predictions of quantum theory. This confusion has persisted in more recent decades, so perhaps it is worth emphasizing the point by (again) quoting from Bell's pointed footnote from the same 1980 paper quoted just above: "My own first paper on this subject ... starts with a summary of the EPR argument from locality to deterministic hidden variables. But the commentators have almost universally reported that it begins with deterministic hidden variables."

The CHSH–Bell inequality: Bell's theorem without perfect correlations
Perhaps motivated by this widespread and persistent misunderstanding concerning his 1964 paper, Bell wrote many subsequent papers in which he explained and elaborated upon his very interesting result from a variety of angles. After 1975 Bell sometimes presented his result using a new strategy that does not rely on perfect (anti-)correlations and on the EPR argument. The new strategy has some advantages: perfect correlations cannot be demonstrated empirically, and one could also imagine the possibility that quantum theory might be replaced with a new theory that predicts some small deviation from the perfect correlations. So it is desirable to have a version of Bell's theorem that "depends continuously" on the correlations. The new strategy also sheds some light on the meaning of locality.

The idea is to write down a mathematically precise formulation of a consequence of locality in the context of an experiment in which measurements are performed on two systems which have previously interacted — say, systems that have been produced by a common source — but which are now spatially separated. (The EPR scenario considered above is of course an example of such an experiment.) Which of the several possible measurements are actually performed on each system will be determined by (control) parameters — $$\alpha_1$$ and $$\alpha_2$$ — which should be thought of as being randomly and freely chosen by the experimenters, just before the measurements. The measurements (and the choices of the control parameters) are assumed to be space-like separated. Once $$\alpha_1$$ and $$\alpha_2$$ are chosen, the experiment is performed, yielding (say, real-valued) outcomes $$A_1$$ and $$A_2$$ for the measurements on the two systems. While the values of $$A_1$$ and $$A_2$$ may vary from one run of the experiment to another even for the same choice of parameters, we assume that, for a fixed preparation procedure on the two systems, these outcomes exhibit statistical regularities. More precisely, we assume these are governed by probability distributions $$P_{\alpha_1,\alpha_2}(A_1,A_2)$$ depending of course on the experiments performed, and in particular on $$\alpha_1$$ and $$\alpha_2$$.

Notice that no assumption of pre-determined outcomes is being invoked here: part (or all) of the randomness of $$A_1$$, $$A_2$$ can arise during the process of measurement. By contrast, recall that in the above proof of Bell's inequality theorem using the random variables $$Z^i_\alpha$$, the randomness was entirely located at the source, or at least occurred prior to the measurements. Moreover, in that context it was meaningful to talk about the joint probability distribution of $$(Z^i_\alpha,Z^i_\beta)$$ with $$\alpha\ne\beta$$ (i.e., the joint probability distribution for outcomes of different measurements on the same system), while here a joint probability distribution of that type is not meaningful.

Let us now see how a mathematically precise necessary condition for locality can be formulated. First of all, one should realize that locality does not imply the independence $$P_{\alpha_1,\alpha_2}(A_1,A_2)=P_{\alpha_1,\alpha_2}(A_1)P_{\alpha_1,\alpha_2}(A_2)$$ of the outcomes $$A_1$$, $$A_2$$. Indeed, it is perfectly natural to expect that the previous interaction between the systems 1 and 2 could produce dependence relations between the outcomes. However, if locality is assumed, then it must be the case that any additional randomness that might affect system 1 after it separates from system 2 must be independent of any additional randomness that might affect system 2 after it separates from system 1. More precisely, locality requires that some set of data $$\lambda$$ — made available to both systems, say, by a common source — must fully account for the dependence between $$A_1$$ and $$A_2$$; in other words, the randomness that generates $$A_1$$ out of the parameter $$\alpha_1$$ and the data codified by $$\lambda$$ must be independent of the randomness that generates $$A_2$$ out of the parameter $$\alpha_2$$ and $$\lambda$$. Since $$\lambda$$ can vary from one run of the experiment to the other, it should be modeled as a random variable.

Let us re-state these ideas mathematically: $$\lambda$$ is a random variable conditioning upon which yields a decomposition


 * $$(3)\quad P_{\alpha_1,\alpha_2}(A_1,A_2)=\int_\Lambda P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)\,\mathrm dP(\lambda),$$

into conditional probabilities obeying a factorizability condition of the form:


 * $$(4)\quad P_{\alpha_1,\alpha_2}(A_1,A_2|\lambda)=P_{\alpha_1}(A_1|\lambda)P_{\alpha_2}(A_2|\lambda).$$

The probability distribution $$P$$ of $$\lambda$$ should not be allowed to depend on $$(\alpha_1,\alpha_2)$$; this is the mathematical meaning of the assumption, noted above, that the control parameters $$\alpha_1$$, $$\alpha_2$$ are "randomly and freely chosen by the experimenters". One might imagine here that the experimenter on each side makes a free-will choice (just before the measurement) about how to set his apparatus, that is independent of the data codified by $$\lambda$$ (which existed before the choices were made). One needn't worry, however, about whether experimenters have "genuine free will" or about what that exactly means. In a real experiment, the parameters $$\alpha_1$$ and $$\alpha_2$$ would typically be chosen by some random or pseudo-random number generator (say, a computer) that is independent of any other physical processes that might be relevant for the outcomes, and hence independent of $$\lambda$$ — unless, that is, there exists some incredible conspiracy of nature (the kind of conspiracy that would make any kind of scientific inquiry impossible). We will thus call the assumption that the probability distribution of $$\lambda$$ is independent of $$(\alpha_1,\alpha_2)$$ the "no conspiracy" condition.

Note that the "no conspiracy" condition doesn't follow from locality: even if we assume that the choices of $$\alpha_1$$ and $$\alpha_2$$ are made at space-like separation from the physical processes creating the value of $$\lambda$$, it is still possible in principle that the supposedly random process determining $$\alpha_1$$ and $$\alpha_2$$ is in fact dependent, via some local influences from the more distant past, on whatever is going on in the process that creates $$\lambda$$. The "no conspiracy" assumption, then, is strictly speaking just that — an additional assumption (beyond locality) on which the derivation of Bell-type inequalities rests. That said, we stress that this assumption is necessarily always made whenever one does any empirical science; in practice, one assesses the applicability of the assumption to a given experiment by examining the care with which the experimental design precludes any non-conspiratorial dependencies between the preparation of the systems and the settings of instruments.

The precise mathematical setup for formulas (3) and (4) is the following: one considers a probability space $$(\Lambda,P)$$ and, with each $$\lambda\in\Lambda$$ and each choice of the parameters $$\alpha_1$$, $$\alpha_2$$, one associates a probability measure $$P_{\alpha_1,\alpha_2}(\cdot|\lambda)$$ on the set of possible values for the pair $$(A_1,A_2)$$. Formula (4) says that, for each $$\lambda\in\Lambda$$, the probability measure $$P_{\alpha_1,\alpha_2}(\cdot|\lambda)$$ factorizes as the product of a probability measure $$P_{\alpha_1}(\cdot|\lambda)$$ (the marginal of $$A_1$$ given $$\lambda$$) that depends only on $$\alpha_1$$ and a probability measure $$P_{\alpha_2}(\cdot|\lambda)$$ (the marginal of $$A_2$$ given $$\lambda$$) that depends only on $$\alpha_2$$. The probability distribution (3) of $$(A_1,A_2)$$ that is observed in the experiment (and for which quantum theory makes predictions) is obtained from $$P_{\alpha_1,\alpha_2}(\cdot|\lambda)$$ by averaging (i.e., integrating) over $$\lambda$$ with respect to the probability measure of the space $$(\Lambda,P)$$. As in Section 3, we define the correlation $$C(\alpha_1,\alpha_2)$$ as the expected value of the product $$A_1A_2$$ for a given choice of $$\alpha_1$$, $$\alpha_2$$:


 * $$C(\alpha_1,\alpha_2)=E_{\alpha_1,\alpha_2}(A_1A_2)=\int_\Lambda E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)\,\mathrm dP(\lambda),$$

where $$E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)$$ is the expected value of the product $$A_1A_2$$ with respect to the probability measure $$P_{\alpha_1,\alpha_2}(\cdot|\lambda)$$.

Now it is easy to prove the CHSH inequality (after John F. Clauser, Michael A. Horne, Abner Shimony, and Richard A. Holt). This inequality is also known in the literature as the CHSH–Bell inequality or simply "Bell's inequality". In this article we will call it the "CHSH–Bell inequality" in order to distinguish it from the inequalities of Section 3 which are used in the versions of Bell's theorem that require the assumption of certain perfect (anti-)correlations.

Theorem. Suppose that the possible values for $$A_1$$ and $$A_2$$ are $$\pm1$$. Under the mathematical setup described above, assuming the factorizability condition (4), the following inequality holds:


 * $$|C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)|+|C(\mathbf a',\mathbf b)+C(\mathbf a',\mathbf c)|\le2,$$

for any choice of parameters $$\mathbf a$$, $$\mathbf b$$, $$\mathbf c$$, $$\mathbf a'$$.

Proof. It follows from (4) that $$E_{\alpha_1,\alpha_2}(A_1A_2|\lambda)=E_{\alpha_1}(A_1|\lambda)E_{\alpha_2}(A_2|\lambda)$$, for all $$\lambda$$, $$\alpha_1$$, $$\alpha_2$$. Thus:


 * $$|C(\mathbf a,\mathbf b)-C(\mathbf a,\mathbf c)|+|C(\mathbf a',\mathbf b)+C(\mathbf a',\mathbf c)|$$
 * $$\le\int_\Lambda\Big[\big|E_{\mathbf a}(A_1|\lambda)\big|\,\big(\big|E_{\mathbf b}(A_2|\lambda)-E_{\mathbf c}(A_2|\lambda)\big|\big)\,+\,\big|E_{\mathbf a'}(A_1|\lambda)\big|\,\big(\big|E_{\mathbf b}(A_2|\lambda)+E_{\mathbf c}(A_2|\lambda)\big|\big)\Big]\,\mathrm dP(\lambda)$$
 * $$\le\int_\Lambda\Big[\big|E_{\mathbf b}(A_2|\lambda)-E_{\mathbf c}(A_2|\lambda)\big|\,+\,\big|E_{\mathbf b}(A_2|\lambda)+E_{\mathbf c}(A_2|\lambda)\big|\Big]\,\mathrm dP(\lambda),$$

where the second inequality follows from the observation that $$|E_\alpha(A_1|\lambda)|\le1$$. The conclusion now follows directly from the following elementary lemma:

Lemma. For real numbers $$x,y\in[-1,1]$$, we have that $$|x-y|+|x+y|\le2$$.

Proof. Squaring $$|x-y|+|x+y|$$ we obtain $$2x^2+2y^2+2|x^2-y^2|$$, which is either equal to $$4x^2$$ or to $$4y^2$$; in either case, it is less than or equal to 4.

For the experiment considered in Section 2 (spin measurements on a pair of particles in the singlet state), quantum theory predicts $$C(\alpha,\beta)=-\alpha\cdot\beta$$ (where the dot denotes the Euclidean inner product and the oriented axes $$\alpha$$, $$\beta$$ are identified with their corresponding unit vectors). For this experiment, the CHSH–Bell inequality is maximally violated by the quantum predictions if $$\mathbf b$$ and $$\mathbf c$$ are mutually orthogonal, $$\mathbf a'$$ bisects $$\mathbf b$$ and $$\mathbf c$$, and $$\mathbf a$$ bisects $$\mathbf b$$ and the opposite axis $$-\mathbf c$$. In that case, the left hand side is equal to $$2\sqrt2$$. We remark also that the original Bell's inequality is obtained from the CHSH–Bell inequality by setting $$\mathbf a'=\mathbf b$$ and using $$C(\mathbf b,\mathbf b)=-1$$.

We have thus established again the incompatibility between locality and certain predictions of quantum theory: we have proven that the CHSH–Bell inequality, which is violated by the quantum predictions, follows from the assumption of locality (and the "no conspiracy" condition).

Let us now take advantage of the mathematical formulation of (a consequence of) locality presented above — the factorizability condition (4) — in order to formulate mathematically the version of Bell's theorem presented in Section 4. Since Bell's inequality theorem has already been formulated mathematically, it remains for us to do so for the EPR argument as well. The mathematical statement (which we will prove in a moment) corresponding to the EPR argument is the following: assuming (4) and the perfect anti-correlations $$P_{\alpha,\alpha}(A_1\ne A_2)=1$$, there exist random variables $$Z^i_\alpha$$ on the probability space $$(\Lambda,P)$$ such that:


 * $$(5)\quad P_{\alpha_1,\alpha_2}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)\;\stackrel{(4)}=\;P_{\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=1,$$

for $$i=1,2$$ and all $$\lambda$$, $$\alpha_1$$, and $$\alpha_2$$.

Notice that (using integration over $$\lambda$$) equality (5) implies that, for all $$\alpha_1$$, $$\alpha_2$$, the probability distribution of the pair of random variables $$(Z^1_{\alpha_1},Z^2_{\alpha_2})$$ is equal to the (unconditional) probability distribution (3) of the pair of outcomes $$(A_1,A_2)$$ (the probability distribution observed in the experiment, for which quantum theory makes predictions). In particular, we have $$P_{\alpha_1,\alpha_2}(A_1\ne A_2)=P(Z^1_{\alpha_1}\ne Z^2_{\alpha_2})$$. The random variables $$Z^i_\alpha$$ are precisely the ingredients necessary for the proof of Bell's inequality theorem and hence we obtain, as just announced, a mathematical formulation of the version of Bell's theorem presented in Section 4.

Here is the proof of the mathematical statement corresponding to the EPR argument: assume (4) and the perfect anti-correlations. It follows from $$P_{\alpha,\alpha}(A_1\ne A_2)=1$$ that $$P_{\alpha,\alpha}(A_1\ne A_2|\lambda)=1$$ holds for all More precisely, what follows from $$P_{\alpha,\alpha}(A_1\ne A_2)=1$$ is that $$P_{\alpha,\alpha}(A_1\ne A_2|\lambda)=1$$ holds for all $$\lambda$$ in a subset $$G_\alpha$$ of $$\Lambda$$ having probability equal to 1. Then, strictly speaking, it cannot be proven that (5) holds for all $$\lambda$$ in $$\Lambda$$, but merely that (5) holds for all $$\lambda$$ in the set $$G_{\alpha_i}$$ (a set of probability 1). Nevertheless, since sets of probability zero are irrelevant for integration, it does follow from this that the probability distribution of the pair of random variables $$(Z^1_{\alpha_1},Z^2_{\alpha_2})$$ is equal to the (unconditional) probability distribution (3) of the pair of outcomes $$(A_1,A_2)$$. Note, however, that for the latter argument the "no conspiracy" assumption is crucial: namely, in a "conspiratorial" model, the probability measure $$P$$ on $$\Lambda$$ depends on $$\alpha_1$$ and $$\alpha_2$$. Denoting this probability measure by $$P_{\alpha_1,\alpha_2}$$, we have $$P_{\alpha,\alpha}(G_\alpha)=1$$. However, the integral in (3) should be taken with respect to the probability measure $$P_{\alpha_1,\alpha_2}$$ and it may not be true that $$P_{\alpha_1,\alpha_2}(G_{\alpha_1}\cap G_{\alpha_2})=1$$. In other words, the "bad" set — namely, the complement of $$G_{\alpha_1}\cap G_{\alpha_2}$$ — consisting of those $$\lambda$$ for which the pair $$(Z^1_{\alpha_1},Z^2_{\alpha_2})$$ is not even well-defined is not an irrelevant set of probability zero with respect to the relevant probability measure $$P_{\alpha_1,\alpha_2}$$. We note also that Bell's inequality theorem requires that the random variables $$Z^i_\alpha$$ all be defined on the same probability space (endowed, of course, with one probability measure). So, without the "no conspiracy" assumption, we wouldn't have the right ingredients for Bell's inequality theorem. The importance of the "no conspiracy" assumption for the EPR argument will be discussed again in Subsection 10.3. $$\lambda\in\Lambda$$. When $$\alpha_1=\alpha_2=\alpha$$, for each $$\lambda\in\Lambda$$, the outcomes $$A_1$$ and $$A_2$$ given $$\lambda$$ (whose joint probability distribution is $$P_{\alpha,\alpha}(\cdot|\lambda)$$) are at the same time independent (by (4)) and perfectly anti-correlated. An elementary lemma from probability theory shows that this can happen only if they are not really random, i.e., if they are constant. The constant may depend upon $$\alpha$$ and $$\lambda$$, and thus there are functions $$f_i$$ such that $$P_{\alpha,\alpha}\big(A_i=f_i(\alpha,\lambda)|\lambda\big)=1$$. Define the random variables $$Z^i_\alpha$$ by setting $$Z^i_\alpha(\lambda)=f_i(\alpha,\lambda)$$. In order to conclude the proof, observe that condition (4) implies:


 * $$P_{\alpha_1,\alpha_2}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=P_{\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=P_{\alpha_i,\alpha_i}\big(A_i=Z^i_{\alpha_i}(\lambda)|\lambda\big)=1.$$

Experiments
Bell's theorem brings out the existence of a contradiction between the empirical predictions of quantum theory and the assumption of locality. Since locality has been widely taken to be an implication of relativity theory, one thus has some grounds for wondering if the relevant predictions of quantum theory are correct. This question can only be addressed through experiment.

The first really convincing experimental tests of the relevant quantum predictions were produced in 1981—1982 by Aspect et al.. These experiments involved measuring the polarizations of pairs of photons emitted (in a state of total angular momentum zero analogous to the singlet state mentioned previously) during the decay from an excited state of calcium. Correlations between the outcomes of the two polarization measurements were monitored as the axes along which the polarizations were being measured were changed. Results consistent with the quantum predictions were observed and a Bell-type inequality was violated with high statistical significance. A subsequent experiment demonstrated that the quantum predictions continued to hold even when the apparatus settings (i.e., the axes along which the incoming photons' polarizations were measured) were not fixed until the last possible moment — after the photons had already been emitted by the source. (Rather than physically rotate a piece of measurement apparatus — a practical impossibility on the ten-nanosecond timescale involved in a photon's traversal of the several meters distance between the calcium source and a detector — Aspect et al. used an ingenious device that shunted each incoming photon — effectively randomly for the purpose at hand — to one of two polarization measurement devices of fixed orientation.)

The innovation of Aspect et al. represented an important first step toward closing the so-called locality loophole. Recall that the locality assumption used in, for example, the derivation of the CHSH–Bell inequality, requires that the (conditional) probability distribution for possible outcomes of one of the measurements be independent of the choice of apparatus setting for the other measurement. But this is a consequence of the  relativistic notion of locality only if each apparatus setting is made too late for it to affect (via influences propagating at the speed of light) the distant measurement. Fixing the final apparatus settings only after the photons (moving at the speed of light) have been emitted ensured this. However, the 1982 experiment of Aspect et al. involved, on each side of the apparatus, a periodic switching between the two possible settings (albeit with incommensurate frequencies on the two sides); one could thus conceivably still worry that the photon source and/or the nearby measurement were somehow "anticipating" the final distant apparatus setting — thus violating the formal locality assumption but without violating relativity's supposed prohibition on superluminal influences.

The locality loophole was closed much more convincingly in a more recent experiment in Innsbruck by Weihs et al. in 1998. The basic experimental procedure was analogous to the one of Aspect et al., but the Innsbruck group used entangled pairs of photons created in parametric down-conversion (instead of the decay of calcium atoms like in Aspect et al.) and high-speed electro-optic modulators to switch between two polarization measurement settings on each side. Importantly, the modulators could be controlled on a nanosecond timescale, allowing the choice between the two possible apparatus settings on each side to be made (by independent, spatially-separated quantum random number generators) only well after the window for possible light-speed influence on the distant measurement had passed. Leaving aside the possibility of a cosmic conspiracy, this setup thus guarantees that the formal locality assumption can be violated only if some data from the measurement on one side is being somehow broadcast, faster than light, to the photon and/or measuring device on the opposite side and influencing the results there. In light of Bell's theorem the experiment thus quite conclusively establishes the relativistic non-locality of the actual world.

Other experiments (Tittel et al. ) have shown that the quantum predictions remain accurate even when the particles are allowed to separate by several kilometers before their polarizations are measured. Also, in experiments designed to close the so-called detection loophole (Rowe et al. and Matsukevich et al. ), Bell-type inequalities were violated even when a much higher fraction of all emitted pairs was successfully detected.

Another interesting recent experiment (Salart et al. ) relates experimental violations of a Bell-type inequality to the motion of the earth in order to put lower limits on the speed (relative to some hypothetical preferred frame) of any involved superluminal influences.