User:ChrisSwetenham/PPCA

Probabilistic Principal Component Analysis is a topic in machine learning  and  computer vision. Probabilistic PCA is a latent variable model. Since the latent variable space is smaller than the data space, it is a form of dimensionality reduction.

Model
The model for a $$D$$-dimensional data space with an $$M$$-dimensional latent space is $${x} = {Wz} + {\mu} + {\varepsilon}\,$$ Where $${z}$$ is an $$M$$-dimensional latent random variable distributed according to a multi-dimensional gaussian distribution with zero mean and unit variance $$N(0, {I})$$, $${\mu}$$ is a $$D$$-dimensional constant vector, and $${\varepsilon}$$ is a $$D$$-dimensional random variable distributed according to a multi-dimensional gaussian distribution with zero mean and identically distributed indepentend components $$N(0, \sigma^2{I})$$.

Since linear transformations of gaussian random variables are gaussian, the random variable $${x}$$ is then distributed according to a gaussian $$N({\mu}, {C})$$ with the covariance matrix: $${C} = {WW}^T + \sigma^2{I}\,$$. The term $${WW}^T$$ implies that the resulting covariance matrix $${C}$$ remains unchanged under orthonormal transformations of $${W}$$, since $$({WR})({WR})^T = {WRR}^T{W}^T = {WW}^T$$.

Inference
Given a value for $${x}$$, we can infer the distribution in the latent space: $$p({z}|{x}) = N({z}; {M}^{-1}{W}^T({x}-{\mu}), \sigma^{-2}{M})\,$$ Where: $${M} = {W}^T{W} + \sigma^2{I}\,$$

Parameter Estimation
Given a set of samples from $$x$$, we can find the maximum likelihood solution for the parameters of the model. It can be shown that the maximum likelihood estimation of the matrix $$W$$ is given by: $$W = U_M(L_M - \sigma^2{I})^{\frac{1}{2}}$$ Where $$U_M$$ is a $$D$$ x $$M$$ matrix of the $$M$$ eigenvectors of the sample covariance matrix with the largest eigenvalues, and $$L_M$$ is the $$M$$-dimensional diagonal matrix of the corresponding eigenvalues.

The remaining parameter of the model, $$\sigma$$, has a maximum likelihood estimation of: $$\sigma^2 = \frac{1}{D - M}\sum_{i=M+1}^{D}{\lambda_i}$$

Where $$\lambda_i$$ are the remaining eigenvalues.

The parameters of the model can also be estimated using the EM algorithm, in which case the solution can be extended to mixtures of PPCA models.

Relationship to Other Models
The maximum likelihood solution of the Probabilistic PCA model above corresponds to the projection performed in classical Principal Components Analysis.

Since the components of the data vector are independent given the latent variable components, Probabilistic PCA can be seen as an instance of the Naive Bayes model.

Factor Analysis is a similar model to Probabilistic PCA, but it allows each component of the error term $${\varepsilon}$$ to have a different variance.

Applications
The Eigenface technique consists of applying PCA to the recognition of human faces. Using Probabilistic PCA can make the technique more robust to outliers (for example, images in the dataset which are not faces). More generally, PPCA can be used to model an underlying space of features that contribute to the appearance of an object.