User:Sabrina119

The concept of mean field came first into sight in physics depicting phase transitions. After their appearance, mean field theories have been extensively employed to describe a very wide variety of phenomena. Then, it has been wielded in many fields other than physics, including statistical inference, graphical models, artificial intelligence, computer network performance, etc.

Variational theory and mean field
Variational methods have been widely used in physics, statistics, control theory and economics. And, they have also appeared in machine learning contexts as regularization theory and maximum entropy estimation. Moreover, variational methods have been used and further developed in the context of approximate inference and estimation.

The variational approach to probabilistic inference consists on transforming the inference problem into an optimization problem. Hence, giving a probability distribution p(x|θ) which factors according to a graph, the variational methods yield approximations to marginal probabilities via the solution to an optimization problem that exploits some of the graphical structure.

Let P(x1, . . ., xn) be the distribution of interest over n variables We divide the set of variables into:


 * 1) “visible” variables xv whose marginal distribution P(xv) that we are interested in computing
 * 2) “hidden” variables xh whose posterior distribution P(xh|xv)  that we aim to compute

Evaluating the marginal or posterior involves summing over all configurations of the hidden variables xh :

P(xv) = Σxh P(xh, xv)

The complexity of this operation depends on the structure or the factorization of the joint distribution P. Therefore, to find a variational solution we need to cast these computations as optimization problems

The objective function will be as following:

J(Q) = −KL(Qxh || Pxh ) + log P(xv)

where Q is a variational distribution over hidden variables xh.

This objective function can be written without reference to the posterior or the marginal as following:

J(Q) = Σxh Q(xh) log P(xh, xv) + H(Q)

where H(Q) is the Shannon entropy of Q.

Now, the maximization of J(Q) could be restricted to some manageable class of Q distributions. (finite element methods).

The simplest choice is the class of mean field distributions (the class of completely factored distributions.), which means that Q is taken as to be a completely factorized distribution.

Q(xh) = πi∈h Qi(xi)

In other words, the mean field approximation refers to a class of variational approximation methods that approximate the true distribution p(x|θ) on a graph G with a simpler distribution Q(x|γ), for which it is possible to do exact inference.

Naive mean field approximation
The naive approach use a completely disconnected sub-graph. Hence, the approximating distribution is fully factorized: