Mills ratio

In probability theory, the Mills ratio (or Mills's ratio ) of a continuous random variable $$X$$ is the function


 * $$m(x) := \frac{\bar{F}(x)}{f(x)} ,$$

where $$f(x)$$ is the probability density function, and


 * $$\bar{F}(x) := \Pr[X>x] = \int_x^{+\infty} f(u)\, du$$

is the complementary cumulative distribution function (also called survival function). The concept is named after John P. Mills. The Mills ratio is related to the hazard rate h(x) which is defined as


 * $$h(x):=\lim_{\delta\to 0} \frac{1}{\delta}\Pr[x < X \leq x + \delta | X > x]$$

by


 * $$m(x) = \frac{1}{h(x)}.$$

Upper and lower bounds
When $$X$$ has a standard normal distribution then the following bounds hold for $$x>0$$:


 * $$\frac{x}{x^2 + 1} < m(x) < \frac{1}{x}$$

Example
If $$X$$ has standard normal distribution then
 * $$m(x) \sim 1/x, \, $$

where the sign $$\sim$$ means that the quotient of the two functions converges to 1 as $$x\to+\infty$$, see Q-function for details. More precise asymptotics can be given.

Inverse Mills ratio
The inverse Mills ratio is the ratio of the probability density function to the complementary cumulative distribution function of a distribution. Its use is often motivated by the following property of the truncated normal distribution. If X is a random variable having a normal distribution with mean μ and variance σ2, then


 * $$\begin{align}

& \operatorname{E}[\,X\,|\ X > \alpha \,] = \mu + \sigma \frac {\phi\big(\tfrac{\alpha-\mu}{\sigma}\big)}{1-\Phi\big(\tfrac{\alpha-\mu}{\sigma}\big)}, \\ & \operatorname{E}[\,X\,|\ X < \alpha \,] = \mu - \sigma \frac {\phi\big(\tfrac{\alpha-\mu}{\sigma}\big)}{\Phi\big(\tfrac{\alpha-\mu}{\sigma}\big)}, \end{align}$$

where $$\alpha$$ is a constant, $$\phi$$ denotes the standard normal density function, and $$\Phi$$ is the standard normal cumulative distribution function. The two fractions are the inverse Mills ratios.

Use in regression
A common application of the inverse Mills ratio (sometimes also called “non-selection hazard”) arises in regression analysis to take account of a possible selection bias. If a dependent variable is censored (i.e., not for all observations a positive outcome is observed) it causes a concentration of observations at zero values. This problem was first acknowledged by Tobin (1958), who showed that if this is not taken into consideration in the estimation procedure, an ordinary least squares estimation will produce biased parameter estimates. With censored dependent variables there is a violation of the Gauss–Markov assumption of zero correlation between independent variables and the error term.

James Heckman proposed a two-stage estimation procedure using the inverse Mills ratio to correct for the selection bias. In a first step, a regression for observing a positive outcome of the dependent variable is modeled with a probit model. The inverse Mills ratio must be generated from the estimation of a probit model, a logit cannot be used. The probit model assumes that the error term follows a standard normal distribution. The estimated parameters are used to calculate the inverse Mills ratio, which is then included as an additional explanatory variable in the OLS estimation.