Normal-inverse-Wishart distribution

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).

Definition
Suppose


 * $$ \boldsymbol\mu|\boldsymbol\mu_0,\lambda,\boldsymbol\Sigma \sim \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right)$$

has a multivariate normal distribution with mean $$\boldsymbol\mu_0$$ and covariance matrix $$\tfrac{1}{\lambda}\boldsymbol\Sigma$$, where


 * $$\boldsymbol\Sigma|\boldsymbol\Psi,\nu \sim \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)$$

has an inverse Wishart distribution. Then $$(\boldsymbol\mu,\boldsymbol\Sigma) $$ has a normal-inverse-Wishart distribution, denoted as
 * $$ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu).

$$

Probability density function

 * $$f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)$$

The full version of the PDF is as follows:

$$ f(\boldsymbol{\mu},\boldsymbol{\Sigma} | \boldsymbol{\mu}_0,\lambda,\boldsymbol{\Psi},\nu ) =\frac{\lambda^{D/2}|\boldsymbol{\Psi}|^{\nu / 2}|\boldsymbol{\Sigma}|^{-\frac{\nu + D + 2}{2}}}{(2 \pi)^{D/2}2^{\frac{\nu D}{2}}\Gamma_D(\frac{\nu}{2})}\text{exp}\left\{ -\frac{1}{2}Tr(\boldsymbol{\Psi   \Sigma}^{-1})-\frac{\lambda}{2}(\boldsymbol{\mu}-\boldsymbol{\mu}_0)^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu}  - \boldsymbol{\mu}_0) \right\}$$

Here $$\Gamma_D[\cdot]$$ is the multivariate gamma function and $$Tr(\boldsymbol{\Psi})$$ is the Trace of the given matrix.

Marginal distributions
By construction, the marginal distribution over $$\boldsymbol\Sigma$$ is an inverse Wishart distribution, and the conditional distribution over $$\boldsymbol\mu$$ given $$\boldsymbol\Sigma$$ is a multivariate normal distribution. The marginal distribution over $$\boldsymbol\mu$$ is a multivariate t-distribution.

Posterior distribution of the parameters
Suppose the sampling density is a multivariate normal distribution


 * $$\boldsymbol{y_i}|\boldsymbol\mu,\boldsymbol\Sigma \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)$$

where $$\boldsymbol{y}$$ is an $$n\times p$$ matrix and $$\boldsymbol{y_i}$$ (of length $$p$$) is row $$i$$ of the matrix.

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly



(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu). $$

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart



(\boldsymbol\mu,\boldsymbol\Sigma|y) \sim \mathrm{NIW}(\boldsymbol\mu_n,\lambda_n,\boldsymbol\Psi_n,\nu_n), $$

where



\boldsymbol\mu_n = \frac{\lambda\boldsymbol\mu_0 + n \bar{\boldsymbol y}}{\lambda+n} $$



\lambda_n = \lambda + n $$



\nu_n = \nu + n $$



\boldsymbol\Psi_n = \boldsymbol{\Psi + S} +\frac{\lambda n}{\lambda+n} (\boldsymbol{\bar{y}-\mu_0})(\boldsymbol{\bar{y}-\mu_0})^T \mathrm{ with }\boldsymbol{S}= \sum_{i=1}^{n} (\boldsymbol{y_i-\bar{y}})(\boldsymbol{y_i-\bar{y}})^T $$.

To sample from the joint posterior of $$(\boldsymbol\mu,\boldsymbol\Sigma)$$, one simply draws samples from $$\boldsymbol\Sigma|\boldsymbol y \sim \mathcal{W}^{-1}(\boldsymbol\Psi_n,\nu_n)$$, then draw $$\boldsymbol\mu | \boldsymbol{\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu_n,\boldsymbol\Sigma/\lambda_n)$$. To draw from the posterior predictive of a new observation, draw $$\boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)$$, given the already drawn values of $$\boldsymbol\mu$$ and $$\boldsymbol\Sigma$$.

Generating normal-inverse-Wishart random variates
Generation of random variates is straightforward:
 * 1) Sample $$\boldsymbol\Sigma$$ from an inverse Wishart distribution with parameters $$\boldsymbol\Psi$$ and $$\nu$$
 * 2) Sample $$\boldsymbol\mu$$ from a multivariate normal distribution with mean $$\boldsymbol\mu_0$$ and variance $$\boldsymbol \tfrac{1}{\lambda} \boldsymbol\Sigma$$

Related distributions

 * The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If $$ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)$$ then $$(\boldsymbol\mu,\boldsymbol\Sigma^{-1}) \sim \mathrm{NW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi^{-1},\nu)$$.
 * The normal-inverse-gamma distribution is the one-dimensional equivalent.
 * The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.