User:Cedars/posterior

Posterior distribution of the binomial parameter
The problem considered by Bayes in Proposition 9 of his essay is the posterior distribution for the parameter of the binomial distribution.

Consider $$n$$ Bernoulli trials. If the success probability is equal to $$a$$, then the conditional probability of observing $$k$$ successes is the (discrete) binomial distribution function
 * $$ p(k|a) = \binom n k a^k (1-a)^{n-k} .$$

The mean value of $$ k $$ is $$ na $$ and the standard deviation is $$ \sqrt{na(1-a)} .$$ The mean value of $$ \frac k n $$ is $$ a $$ and the standard deviation is
 * $$ \sqrt{\frac{a(1-a)}n}\approx \sqrt{\frac{k(n-k)}{n^3}}.$$

In the more realistic situation when $$k$$ is known and $$a$$ is unknown, $$ p(k|a) $$ is a likelihood function of $$a$$. The posterior probability distribution function of $$a$$, after observing $$k$$, is
 * $$ p(a|k) = \frac{p(k|a)\,p(a)}{\int_0^1 p(k|a)\,p(a)\,da}

= \frac{a^k (1-a)^{n-k}\,p(a)}{\int_0^1 a^k (1-a)^{n-k}\,p(a)\,da} , $$ where a prior probability distribution function, $$p(a)$$, is available to express what was known about $$a$$ before observing $$k$$.

Assume now that the prior distribution is the continuous uniform distribution, $$p(a)=1$$ for $$0\le a\le 1$$. Then the posterior distribution,
 * $$ p(a|k) = \frac{a^k (1-a)^{n-k}}{\int_0^1 a^k (1-a)^{n-k}\,da},$$

is a beta distribution,
 * $$a|k \sim \textrm{Be}(k+1,\,n-k+1).$$.

The mean value of $$a$$ is $$\frac{k+1}{n+2}$$, rather than $$\frac k n$$, and the standard deviation is $$\sqrt{\frac{(k+1)(n-k+1)}{(n+2)^2\,(n+3)}}$$, rather than $$ \sqrt{\frac{k(n-k)}{n^3}}.$$

If the prior distribution is
 * $$a \sim \textrm{Be}(k_0+1,\,n_0-k_0+1),$$

then the posterior distribution is
 * $$a|k \sim \textrm{Be}((k+k_0)+1,\,(n+n_0)-(k+k_0)+1).$$

So the beta distribution is a conjugate prior.

What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter $$a$$. That is, not only can one compute probabilities for experimental outcomes, but also for the parameter which governs them, and the same algebra is used to make inferences of either kind. Interestingly, Bayes actually states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball. By making the binomial parameter $$a\,$$ depend on a random event, he cleverly escapes a philosophical quagmire that was an issue he most likely was not even aware of.