Scoring algorithm

Scoring algorithm, also known as Fisher's scoring, is a form of Newton's method used in statistics to solve maximum likelihood equations numerically, named after Ronald Fisher.

Sketch of derivation
Let $$Y_1,\ldots,Y_n$$ be random variables, independent and identically distributed with twice differentiable p.d.f. $$f(y; \theta)$$, and we wish to calculate the maximum likelihood estimator (M.L.E.) $$\theta^*$$ of $$\theta$$. First, suppose we have a starting point for our algorithm $$\theta_0$$, and consider a Taylor expansion of the score function, $$V(\theta)$$, about $$\theta_0$$:


 * $$V(\theta) \approx V(\theta_0) - \mathcal{J}(\theta_0)(\theta - \theta_0), \,$$

where


 * $$\mathcal{J}(\theta_0) = - \sum_{i=1}^n \left. \nabla \nabla^{\top} \right|_{\theta=\theta_0} \log f(Y_i ; \theta)$$

is the observed information matrix at $$\theta_0$$. Now, setting $$\theta = \theta^*$$, using that $$V(\theta^*) = 0$$ and rearranging gives us:


 * $$\theta^* \approx \theta_{0} + \mathcal{J}^{-1}(\theta_{0})V(\theta_{0}). \,$$

We therefore use the algorithm


 * $$\theta_{m+1} = \theta_{m} + \mathcal{J}^{-1}(\theta_{m})V(\theta_{m}), \,$$

and under certain regularity conditions, it can be shown that $$\theta_m \rightarrow \theta^*$$.

Fisher scoring
In practice, $$\mathcal{J}(\theta)$$ is usually replaced by $$\mathcal{I}(\theta)= \mathrm{E}[\mathcal{J}(\theta)]$$, the Fisher information, thus giving us the Fisher Scoring Algorithm:


 * $$\theta_{m+1} = \theta_{m} + \mathcal{I}^{-1}(\theta_{m})V(\theta_{m})$$..

Under some regularity conditions, if $$\theta_m$$ is a consistent estimator, then $$\theta_{m+1}$$ (the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.