User:RayLei/sandbox

In probability theory, if a random variable X has a binomial distribution with parameters n and p, i.e., X is distributed as the number of "successes" in n independent Bernoulli trials with probability p of success on each trial, then


 * $$P(X\leq x) = P(X<x+1)$$

for any x ∈ {0, 1, 2, ... n}. If np and n(1 &minus; p) are large (sometimes taken to mean ≥ 5), then the probability above is fairly well approximated by


 * $$P(Y\leq x+1/2)$$

where Y is a normally distributed random variable with the same expected value and the same variance as X, i.e., E(Y) = np and var(Y) = np(1 &minus; p). This addition of 1/2 to x is a continuity correction.

A continuity correction can also be applied when other discrete distributions supported on the integers are approximated by the normal distribution. For example, if X has a Poisson distribution with expected value λ then the variance of X is also λ, and


 * $$P(X\leq x)=P(X<x+1)\approx P(Y\leq x+1/2)$$

if Y is normally distributed with expectation and variance both λ.

Reason
Look at the Bin(10, 0.5) distribution, whose mean and variance are 5 and 2.5, respectively. How well does a N(5, 2.5) distribution approximate the Bin(10, 0.5)?

The probability measure of a continuous distribution, in particular normal, at one point is zero, so it is reasonable to approximate the lumps of probability at the integers by areas under the normal curve. Take P(X = 3), for example. The exact binomial probability is 0.1172.

If integrate the N(5, 2.5) density from 2 to 3 it is too low everywhere and the result is too small.

Φ((3-5)/sqrt(2.5)) - Φ((2-5)/sqrt(2.5)) = Φ(-1.265) - Φ(-1.897) = 0.1030 - 0.0289 = 0.0741

On the other hand, if integrate the N(5, 2.5) density from 3 to 4 it is too high everywhere and the result is too large.

Φ((4-5)/sqrt(2.5)) - Φ((3-5)/sqrt(2.5)) = Φ(-0.6325) - Φ(-1.265) = 0.2635 - 0.1030 = 0.1605

But if integrate from 2.5 to 3.5 the result is an much closer approximation.

Φ((3.5-5)/sqrt(2.5)) - Φ((2.5-5)/sqrt(2.5)) = Φ(-0.9487) - Φ(-1.581) = 0.1714 - 0.0569 = 0.1145

More generally, one can use the integration under the normal from minus infinity to a+0.5 to approximate the binomial probability P(X <= a) and normal a-0.5 to infinity to approximate binomial P(x>=a), e.g. a = 3 and the exact binomial calculation gives P(x<=a)=0.1719 and P(x>=a)=0.9453 while the normal approximation (with continuity correction) is Φ((3.5-5)/sqrt(2.5)) = Φ(-0.9487) = 0.1714 and 1 - Φ((2.5-5)/sqrt(2.5)) = 1 - Φ(-1.581) = 1 - 0.0569 = 0.9431.

Applications
Before the ready availability of statistical software having the ability to evaluate probability distribution functions accurately, continuity corrections played an important role in the practical application of statistical tests in which the test statistic has a discrete distribution: it was a special importance for manual calculations. A particular example of this is the binomial test, involving the binomial distribution, as in checking whether a coin is fair. Where extreme accuracy is not necessary, computer calculations for some ranges of parameters may still rely on using continuity corrections to improve accuracy while retaining simplicity.