User:Scychan/sandbox

The expected discrepancy between the biased estimator and the true variance is

\begin{align} E \left[ \sigma^2 - s_{biased}^2 \right] &= E\left[ \frac{1}{n}\sum_{i=1}^n(x_i - \mu)^2 - \frac{1}{n}\sum_{i=1}^n (x_i - \overline{x})^2 \right] \\ &= \frac{1}{n} E\left[ \sum_{i=1}^n\left((x_i^2 - 2 x_i \mu + \mu^2) - (x_i^2 - 2 x_i \overline{x} + \overline{x}^2)\right) \right] \\ &= \frac{1}{n} E\left[ \mu^2 - 2 \overline{x} \mu + \overline{x}^2 \right] \\ &= \frac{1}{n} E\left[ (\overline{x}   - \mu)^2 \right] \\ &= \text{Var} (\overline{x}) \\ &= \frac{\sigma^2}{n} \end{align} $$

So, the expected value of the biased estimator will be
 * $$ \operatorname{E} \left[ s^2_{\text{biased}} \right] = \sigma^2 - \frac{\sigma^2}{n} = \frac{n-1}{n} \sigma^2 $$

So, an unbiased estimator should be given by
 * $$ s_{\text{unbiased}}^2 = \frac{n}{n-1} s_{\text{biased}}^2 $$

Intuition
In the biased estimator, by using the sample mean instead of the true mean, you are underestimating each xi - µ by x - µ. We know that the variance of a sum is the sum of the variances (for uncorrelated variables). So, to find the discrepancy between the biased estimator and the true variance, we just need to find the variance of x - µ.

This is just the variance of the sample mean, which is σ2/n. So, we expect that the biased estimator underestimates σ2 by σ2/n.