User:Biserhong

Probabilistic principal component analysis (PPCA) formulates the principal component analysis procedure (PCA) as a maximum likelihood solution of a probabilistic latent variable model. When only the first few leading dimensions are needed, an EM algorithm can be used for efficiency instead of evaluating the sample covariance matrix. A Bayesian method can be utilized to automatically find the dimensionality of the principal subspace in situations where it is not clearly defined. The probabilistic PCA model can be used to sample from the distribution obtained by generating an observed variable in the original d dimensions from a linear transformation of a lower dimensional latent variable plus Gaussian noise. Probabilistic PCA overcomes several problems of standard approaches to principal component analysis - it allows to deal with missing data, whereas normally incomplete points must be discarded, defines a proper density model, so that we can estimate whether new data points are fit well by the model, and provides an efficient way to deal with high-dimensional data.

Description of the model
Probabilistic PCA is a generative latent variable model. It describes an observed d-dimensional data vector $$\mathbf{y}$$ as produced by a mapping of a q-dimensional latent (hidden) variable $$\mathbf{x}$$ plus shift to some mean vector $$\mathbf{\mu}$$ plus some additive Gaussian noise $$\mathbf{\epsilon}$$:


 * $$\mathbf{y} = \mathbf{Wx} + \mathbf{\mu} + \mathbf{\epsilon}$$

Generally, q < d. Thus, the d &times; q matrix W denotes a linear transformation from the lower dimensional latent space to the higher dimensional data space. The parameter $$\mathbf{\mu}$$ allows the data model to have a mean different from zero. The latent variable $$\mathbf{x}$$ is distributed according to a normal distribution with zero mean and an isotropic covariance matrix $$I$$:


 * $$ x\ \sim\ \mathcal{N}(0,\,I). \,$$

Here, the matrix $$I$$ is q-dimensional (i.e. the latent variable has q dimensions) and is a design parameter that needs to be specified according to the application. The noise model is assumed to be independent of $$\mathbf{x}$$ and is also distributed according to a Gaussian distribution:


 * $$ \mathbf{\epsilon}\ \sim\ \mathcal{N}(0,\,\mathbf{\Psi}). \,$$

The covariance matrix $$\mathbf{\Psi}$$ is diagonal, which means the observed variables $$\mathbf{y}$$ are conditionally independent given the latent variables $$\mathbf{x}$$. For a general matrix $$\mathbf{\Psi}$$ it is not possible to obtain a closed-form analytic solution. This is possible only if $$\mathbf{\Psi}$$ is of the form:


 * $$\mathbf{\Psi} = \sigma^{2}\mathbf{I}$$

Given the above formulation, and since the linear transformation W of the normally distributed random variable $$\mathbf{x}$$ is still normally distributed, the observed vectors $$\mathbf{y}$$ are distributed as follows:


 * $$ \mathbf{y}\ \sim\ \mathcal{N}(\mathbf{\mu},\,\mathbf{C}), \,$$ where $$\mathbf{C} = \mathbf{WW^{T}} + \mathbf{\Psi}$$

where we have the term $$\textstyle \mathbf{WW^{T}}$$ because the covariance of the linear transformation of $$\mathbf{x}$$ is:


 * $$\operatorname{Cov}(\mathbf{Wx}) = \mathbf{W}\operatorname{Cov}(\mathbf{x})\mathbf{W^{T}} = \mathbf{WW^{T}}$$

Therefore, we obtain a probability distribution over the data space given a latent variable $$\mathbf{x}$$:



p(\mathbf{y}|\mathbf{x})\, = (2\pi\sigma^{2})^{-d/2} \exp\left(-\frac{1}{2\sigma^{2}}||{\mathbf y}-{\mathbf{W}}{\mathbf{x}}-{\mathbf{\mu}}||^{2} \right). $$

The we integrate over $$\mathbf{x}$$ to get the marginal distribution of $$\mathbf{y}$$:


 * $$p(y) = \int p(\mathbf{y}|\mathbf{x}) \, p(\mathbf{x}) \, d\mathbf{x}

= (2\pi)^{-d/2}|\mathbf{C}|^{1/2} \exp\left(-\frac{1}{2}({\mathbf y}-{\mathbf\mu})^T{\mathbf{C}}^{-1}({\mathbf y}-{\mathbf\mu}) \right).

$$

Using Bayes' rule we can also obtain the posterior distribution of the latent variable $$\mathbf{x}$$