Bernoulli distribution

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability $$p$$ and the value 0 with probability $$q = 1-p$$. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are Boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails", respectively, and p would be the probability of the coin landing on heads (or vice versa where 1 would represent tails and p would be the probability of tails). In particular, unfair coins would have $$p \neq 1/2.$$

The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.

Properties
If $$X$$ is a random variable with a Bernoulli distribution, then:


 * $$\Pr(X=1) = p = 1 - \Pr(X=0) = 1 - q.$$

The probability mass function $$f$$ of this distribution, over possible outcomes k, is


 * $$ f(k;p) = \begin{cases}

p & \text{if }k=1, \\ q = 1-p & \text {if } k = 0. \end{cases}$$

This can also be expressed as


 * $$f(k;p) = p^k (1-p)^{1-k} \quad \text{for } k\in\{0,1\}$$

or as


 * $$f(k;p)=pk+(1-p)(1-k) \quad \text{for } k\in\{0,1\}.$$

The Bernoulli distribution is a special case of the binomial distribution with $$n = 1.$$

The kurtosis goes to infinity for high and low values of $$p,$$ but for $$p=1/2$$ the two-point distributions including the Bernoulli distribution have a lower excess kurtosis, namely −2, than any other probability distribution.

The Bernoulli distributions for $$0 \le p \le 1$$ form an exponential family.

The maximum likelihood estimator of $$p$$ based on a random sample is the sample mean.



Mean
The expected value of a Bernoulli random variable $$X$$ is


 * $$\operatorname{E}[X]=p$$

This is due to the fact that for a Bernoulli distributed random variable $$X$$ with $$\Pr(X=1)=p$$ and $$\Pr(X=0)=q$$ we find


 * $$\operatorname{E}[X] = \Pr(X=1)\cdot 1 + \Pr(X=0)\cdot 0

= p \cdot 1 + q\cdot 0 = p.$$

Variance
The variance of a Bernoulli distributed $$X$$ is


 * $$\operatorname{Var}[X] = pq = p(1-p)$$

We first find


 * $$\operatorname{E}[X^2] = \Pr(X=1)\cdot 1^2 + \Pr(X=0)\cdot 0^2 = p \cdot 1^2 + q\cdot 0^2 = p = \operatorname{E}[X] $$

From this follows


 * $$\operatorname{Var}[X] = \operatorname{E}[X^2]-\operatorname{E}[X]^2 = \operatorname{E}[X]-\operatorname{E}[X]^2 = p-p^2 = p(1-p) = pq$$

With this result it is easy to prove that, for any Bernoulli distribution, its variance will have a value inside $$[0,1/4]$$.

Skewness
The skewness is $$\frac{q-p}{\sqrt{pq}}=\frac{1-2p}{\sqrt{pq}}$$. When we take the standardized Bernoulli distributed random variable $$\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}$$ we find that this random variable attains $$\frac{q}{\sqrt{pq}}$$ with probability $$p$$ and attains $$-\frac{p}{\sqrt{pq}}$$ with probability $$q$$. Thus we get


 * $$\begin{align}

\gamma_1 &= \operatorname{E} \left[\left(\frac{X-\operatorname{E}[X]}{\sqrt{\operatorname{Var}[X]}}\right)^3\right] \\ &= p \cdot \left(\frac{q}{\sqrt{pq}}\right)^3 + q \cdot \left(-\frac{p}{\sqrt{pq}}\right)^3 \\ &= \frac{1}{\sqrt{pq}^3} \left(pq^3-qp^3\right) \\ &= \frac{pq}{\sqrt{pq}^3} (q-p) \\ &= \frac{q-p}{\sqrt{pq}}. \end{align}$$

Higher moments and cumulants
The raw moments are all equal due to the fact that $$1^k=1$$ and $$0^k=0$$.


 * $$\operatorname{E}[X^k] = \Pr(X=1)\cdot 1^k + \Pr(X=0)\cdot 0^k = p \cdot 1 + q\cdot 0 = p = \operatorname{E}[X].$$

The central moment of order $$k$$ is given by

\mu_k =(1-p)(-p)^k +p(1-p)^k. $$ The first six central moments are
 * $$\begin{align}

\mu_1 &= 0, \\ \mu_2 &= p(1-p), \\ \mu_3 &= p(1-p)(1-2p), \\ \mu_4 &= p(1-p)(1-3p(1-p)), \\ \mu_5 &= p(1-p)(1-2p)(1-2p(1-p)), \\ \mu_6 &= p(1-p)(1-5p(1-p)(1-p(1-p))). \end{align}$$ The higher central moments can be expressed more compactly in terms of $$\mu_2$$ and $$\mu_3$$
 * $$\begin{align}

\mu_4 &= \mu_2 (1-3\mu_2 ), \\ \mu_5 &= \mu_3 (1-2\mu_2 ), \\ \mu_6 &= \mu_2 (1-5\mu_2 (1-\mu_2 )). \end{align}$$ The first six cumulants are
 * $$\begin{align}

\kappa_1 &= p, \\ \kappa_2 &= \mu_2, \\ \kappa_3 &= \mu_3, \\ \kappa_4 &= \mu_2 (1-6\mu_2 ), \\ \kappa_5 &= \mu_3 (1-12\mu_2 ), \\ \kappa_6 &= \mu_2 (1-30\mu_2 (1-4\mu_2 )). \end{align}$$

Related distributions

 * If $$X_1,\dots,X_n$$ are independent, identically distributed (i.i.d.)  random variables, all Bernoulli trials with success probability p, then their sum is distributed according to a binomial distribution with parameters n and p:
 * $$\sum_{k=1}^n X_k \sim \operatorname{B}(n,p)$$ (binomial distribution).


 * The Bernoulli distribution is simply $$\operatorname{B}(1, p)$$, also written as $\mathrm{Bernoulli} (p).$


 * The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
 * The Beta distribution is the conjugate prior of the Bernoulli distribution.
 * The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success.
 * If $Y \sim \mathrm{Bernoulli}\left(\frac{1}{2}\right)$, then $2Y - 1$ has a Rademacher distribution.