User:Rvuchkov/sandbox

In algebraic statistics the concept of maximum likelihood degree (ML degree) arises naturally as the number of complex solutions of the likelihood equations. The ML degree is bounded by the degree of the likelihood ideal.

Introduction
A parametric probability model for a discrete random variable is given by a map $$\psi: U \rightarrow \Delta_{n-1} $$ where $$U \subset \mathbb{R}^d$$ is an open set and $$\Delta_{n-1}$$ is the probability simplex $$\Delta_{n-1}= \{(p_1,p_2, \dots ,p_n):p_1+p_2+ \dots +p_n=1,$$  $$p_i \geq 0 \}$$. The model is $$\psi(U)$$, where $$\psi = (\psi_1(\theta), \dots, \psi_n(\theta))$$ and $$\theta =(\theta_1,\dots ,\theta_d)$$. The problem of maximum-likelihood estimation for a fixed data vector $$ u = (u_1, \dots ,u_n) $$ is to find a parameter $$\theta $$ that best explains the data vector, leading us to the problem of maximizing


 * $$\psi_1(\theta)^{u_1} \dots \psi_n(\theta)^{u_n}$$ subject to $$\psi_1+\dots +\psi_n =1$$

Equivalently maximizing the log-likelihood function


 * $$ \sum_{i=1}^n u_i \log \psi_i$$

In the above definition if $$\psi_1,\dots ,\psi_n$$ are polynomials in $$(\theta_1,\dots ,\theta_d)$$, the Zariski closure of $$\psi(U)$$ is called an algebraic statistical model. To employ tools from algebraic geometry we let the domain of $$\psi$$ be extended to the complex numbers.


 * $$ \psi: \mathbb{C}^d \rightarrow \mathbb{C}^n $$        such that $$\psi_1+\dots +\psi_n =1$$

The Maximum Likelihood Degree
The maximum likelihood degree (ML degree) of a discrete statistical model is the number of complex critical points of the log-likelihood function.


 * $$\sum_{i=1}^n u_i \log \psi_i$$  for generic data $$ u = (u_1, \dots ,u_n)$$.

Equivalently the ML degree is the number of complex solutions to the system of equations



\begin{align} \frac{\partial \sum_{i=1}^n u_i \log \psi_i}{\partial \theta_1} &= N \frac{\partial g}{\partial \theta_1}, \\ \vdots \\ \frac{\partial \sum_{i=1}^n u_i \log \psi_i}{\partial \theta_n} &= N \frac{\partial g}{\partial \theta_n}. \end{align} $$

Where $$g= \psi_1+\dots +\psi_n$$ and $$N$$ is the sample size $$\sum _{i=1}^n u_i$$.

Example
The Hardy-Weinberg curve has ML degree 1.


 * $$\begin{align}

\psi_0(\theta) &= \theta^2, \\ \psi_1(\theta) &= 2\theta(1-\theta), \\ \psi_2(\theta) &= (1-\theta)^2. \end{align}$$

Where $$ \theta $$ is the parameter of a biased coin to land on tails. With these equations, we suppose that the coin is tossed twice; Then the equations represent the probability of a heads appearing. We can repeat the experiment $$N$$ times. Construct a data vector $$u= (u_0,u_1,u_2)$$ where $$u_i$$ is the number of times $$i$$ heads appear. Inspired by the MLE problem of estimating the unknown parameter $$ \theta $$ by maximizing the following:

$$ \ell_{u_0,u_1,u_2} = \psi_0(\theta)^{u_0}\psi_1(\theta)^{u_1}\psi_2(\theta)^{u_2} $$

One may find it more convenient to work with the logarithm of the likelihood function. Furthermore there are many ways to maximize the log-likelihood function such as Lagrange multipliers and other optimization tools. For this particular example we do not need any such machinery.

$$ \log l_{u_0,u_1,u_2} = (2u_0 + u_1) \log \theta + (u_1+2u_2) \log \theta $$

After applying the Lagrange multipliers method for optimization the following polynomial (likelihood equation) is produced.

$$ (2u_0+2u_1+2u_2)\theta-(2u_0+u_1)=0 $$

This is a polynomial of degree 1, which implies one complex solution. Thus ML degree is 1. This example is very interesting because a generic quadric has ML degree of 6. This implies that our example that was given here is of particular interest. In general solving the likelihood equations might be troublesome to do by hand and computer software such as Macaulay2, Singular, and Polymake, may be helpful.

Birch's Theorem for Toric Ideal
Let $$ A \in \mathbb{N}^{d \times k} $$  and   $$ u \in \mathbb{N}^{ k} $$ be a vector of positive counts. The maximum likelihood estimate of the frequencies $$\hat{u}$$ in the log-linear model $$\mathcal M_A$$ is the unique-nonegative solution to the simultaneous system of equations:

$$ A \hat{u} = Au $$  and   $$ \hat{ u} \in V(I_A) $$

Note that the number of complex solutions to $$ A \hat{u} = Au $$ is the ML degree.



History
The ML degree is a more recent work which started with two papers: The maximum likelihood degree paper by [Catanese, Hosten, Khetan, Sturmfels] and Solving likelihood equations paper by [Hosten, Khetan, Sturmfels]. Both papers describe on the connection between ML degree and polytopes and Newton polytopes.