User:Dnliu97/sandbox

In statistics, coefficient of variation is defined as the ratio of the standard deviation to the mean ($$c_v=\frac{\sigma}{\mu}$$). It is frequently used in chemistry, biology, and other laboratory sciences for measurement such as weights and concentrations of virus, bacteria, blood and proteins, etc. Constant coefficient of variation models can be used to model data which satisfies constant coefficient of variation.

Constant coefficient of variation
In linear regression, the variance of response variable is assumed to be constant $$\mathrm{var}(y)=\sigma^2$$. Poisson regression assumes that the variance of the data is equal to the mean. In constant coefficient of variation models, data with constant coefficient of variation satisfy $$\sigma \sim Ey$$, which also takes the form

$$\mathrm{v}=\frac{\mathrm{var}(y)}{(Ey)^2}$$

where $$\mathrm{v}$$ is a constant, and $$\mathrm{var}(y)=\sigma^2=\text{v}(Ey)^2=\text{v}\mu^2$$,  $$\mu=Ey$$.

Constant coefficient of variation data are always non-negative, by Taylor expansion evaluated at $$Ey$$, assuming remainder $$R$$ and $$\tilde R$$ are negligible,


 * first order: $$\mathrm{log}\ y=\mathrm{log}(Ey)+(y-Ey)\frac{1}{Ey}+R$$


 * second order: $$\mathrm{log}\ y=\mathrm{log}(Ey)+(y-Ey)\frac{1}{Ey}-\frac{1}{2}(y-Ey)^2\frac{1}{(Ey)^2}+\tilde R$$

Therefore, the log transformation stabilizes the variance for data with constant coefficient of variation. The intercept will be biased by the off-set $$-\frac{1}{2}\mathrm{v}$$ by running least square estimate after log transformation.

$$E(\mathrm{log}\ y)\approx \mathrm{log}\ \mu-\frac{1}{2}\mathrm{v}$$

$$\mathrm{var}(\mathrm{log}\ y)\approx \frac{\mathrm{var}(y)}{\mu^2}=\mathrm{v}$$, constant.

Gamma regression
In statistics, gamma regression is a generalized linear model in which the response variable belongs to gamma data that are continuous, non-negative, and right-skewed.

Gamma data
The probability density function of gamma data $$y$$ follows gamma distribution with parameter $$\lambda$$ and $$\kappa$$ has the form

$$f_{\lambda,\kappa}(y)=\frac{\lambda^\kappa}{\Gamma(\kappa)}y^{\kappa-1}e^{-\lambda y}$$, $$y\ge0$$

Since gamma distribution is in exponential family, using parameters $$\mu$$ and $$\nu$$ is easier to show the property with the form

$$f_{\mu,\nu}(y)=\frac{1}{\Gamma(\nu)}(\frac{\nu y}{\mu})^{\nu}exp(-\frac{\nu y}{\mu})\frac{1}{y}$$, $$y\ge0$$

where $$\nu$$ is the nuisance parameter, $$\mu$$ is the mean, variance is $$\frac{\mu^2}{\nu}$$, and the variance function is $$V(\mu)=\mu^2$$.

Therefore, gamma regression is a constant coefficient of variation model because gamma data satisfies that $$\frac{\text{var}(y)}{(Ey)^2}=\frac{1}{\nu}$$, constant.

Regression model
Gamma regression model assumes that $$g(\mu)$$ can be modeled by a linear combination of unknown parameters $$\beta$$ and predictors $$X$$ with the form

$$g(\mu)=\eta=X\beta$$, $$\beta \in \mathcal{R}^p$$ and $$X\in \mathcal{R}^{n\times p}$$

where $$\mu=E(y|X)$$, $$g$$ is the link function of GLM. The canonical link for Gamma regression is negative inverse link $$g(\mu)=-\frac{1}{\mu}$$. However, the canonical link restricts that $${\mu}\ne0$$. A more practical link is log link $$g(\mu)=\mathrm{log}\ \mu$$, then the model can be written as

$$\mathrm{log} (E(y|X))=\eta=X\beta$$

The parameter $$\beta$$ can be estimated by method of maximum likelihood estimation with iterated weighted least square.

Transformation models
In statistical modeling, model transformations are often used when there is lack of fit in the model. Transformation on variables can present data on a different scale that improves modeling assumptions. Transformation can be applied to predictors or responses. There are many types of transformation, some common transformation functions include

h(x) = \begin{cases} x^\alpha & \text{if } \alpha \neq 0 \\ \mathrm{log}\ x & \text{if } \alpha = 0 \end{cases} $$, the simplest case could be linear ($$\alpha=1 $$), quadratic ($$\alpha=2 $$), or squared roots ($$\alpha=\frac{1}{2} $$).
 * Box-cox transformations with the form $$


 * Inverse transformation: $$h(x)=\frac{1}{x}$$


 * Exponential transformation: $$h(x)=e^x$$


 * Normalization: $$h(x)=\frac{x-x_{min}}{x_{max}-x_{min}}$$


 * Standardization: $$h(x)=\frac{x-\mu}{\sigma}$$


 * Two parameters transformation: $$h(x_1,x_2)=x_1+x_2$$ or $$h(x_1,x_2)=x_1x_2$$, where $$x_1$$, $$x_2$$ are two different predictors. The form of product between predictors should be considered if there is substantial justification showing interaction effects between predictors.

Linear predictor with transformed variables
For linear predictor $$\eta=X\beta=\sum_{j=1}^q \beta_j x_j $$, we can replace $$x_j $$ by a function of $$x_j $$, where $$x_j $$ is one of the predictors, $$\beta_j $$ is the corresponding parameter for $$x_j $$, and $$q $$ is the number of predictors. Then the linear predictor has the form

$$\eta=\sum_{j=1}^q \tilde\beta_j h_j(x)  $$

where $$h_j(x)  $$ is the transformation function for variable $$x_j  $$, $$\tilde\beta_j $$ is the corresponding parameter for transformed variable $$h_j(x)   $$. When performing transformations on predictors, the model usually keeps original predictors and adds transformed variables. Although transformations can improve the goodness of fit of the model, adding too many transformed predictors could result in losses of natural information of the original data, it is also more difficult for result interpretation. Therefore, checking model assumptions, violations, and primary interest is necessary before and after transformations.

Transformed responses
In constant coefficient of variation models, transformation is accordingly performed on responses, usually $$h:\mathcal{R}^{+}\rightarrow \mathcal{R}$$ such that the non-negative response can be mapped into a real number scale. The model takes the form

$$E(h(Y)|X)=\eta=X\beta$$

It can also be written as

$$h(Y)=X\beta+\epsilon$$

where $$\epsilon$$ is the additive error with mean $$E(\epsilon)=0$$, and finite variance $$\mathrm{var}(\epsilon)=\sigma^2$$.

Log-additive model
The log-additive model is a regression method that takes log transformation on the response variable. The log-additive model takes the form

$$\mathrm{log}\ y=\mu+\epsilon$$

The mean and variance can be calculated as

$$Ey=e^{\mu}Ee^{\epsilon}\approx e^\mu=: \tilde\mu$$

$$\mathrm{var}(y)\approx(e^\mu)^2\mathrm{var}(e^\epsilon) \approx \tilde\mu^2\sigma^2$$

since $$\frac{\mathrm{var}(y)}{(Ey)^2}\approx\sigma^2$$, which shows that the log-additive model behaves very similar to constant coefficient of variation model.

Log-normal model
The log-normal model is used to investigate data that follows Log-normal distribution and works sufficiently well for constant coefficient variation situations. In log-normal model, the error term is assumed to follow normal distribution $$\epsilon \sim N(0,\sigma^2)$$ after log transformation. It can be implemented by ordinary multiple linear regression after transformation. The model takes the form

$$\mathrm{log}\ y=X\beta+\epsilon$$

Transformation model is not generalized linear model. In GLM, the non-linear link function $$g$$ transformed the conditional expectation $$g(E(Y|X))=\eta=X\beta$$. While in transformation models, the conditional expectation of transformed response can be expressed as linear combination of parameters and predictors. The link function of gamma regression and the transformation function of log-linear model are both logarithm, but these two models are not interchangeable in some cases.

Application
The constant coefficient of variation models are used in various fields, mostly laboratory sciences. For example, applying log-normal model to estimate the relationship between children's blood concentrations and residential dust-lead levels, using gamma regression to evaluate the relative effects of confounding variables on antibody concentration of quantitative assay data. Many other fields such as behavioral sciences or healthcare used log-normal model to estimate response time or surgical procedure time with the combination of Bayesian or machine learning techniques.