Generalized Pareto distribution

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location $$\mu$$, scale $$\sigma$$, and shape $$\xi$$. Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as $$ \kappa = - \xi \,$$.

Definition
The standard cumulative distribution function (cdf) of the GPD is defined by


 * $$F_{\xi}(z) = \begin{cases}

1 - \left(1 + \xi z\right)^{-1/\xi} & \text{for }\xi \neq 0, \\ 1 - e^{-z} & \text{for }\xi = 0. \end{cases} $$

where the support is $$ z \geq 0 $$ for $$ \xi \geq 0$$ and $$ 0 \leq z \leq - 1 /\xi $$ for $$ \xi < 0$$. The corresponding probability density function (pdf) is


 * $$f_{\xi}(z) = \begin{cases}

(1 + \xi z)^{-\frac{\xi +1}{\xi }} & \text{for }\xi \neq 0, \\ e^{-z} & \text{for }\xi = 0. \end{cases} $$

Characterization
The related location-scale family of distributions is obtained by replacing the argument z by $$\frac{x-\mu}{\sigma}$$ and adjusting the support accordingly.

The cumulative distribution function of $$X \sim GPD(\mu, \sigma, \xi)$$ ($$\mu\in\mathbb R$$, $$\sigma>0$$, and $$\xi\in\mathbb R$$) is


 * $$F_{(\mu,\sigma,\xi)}(x) = \begin{cases}

1 - \left(1+ \frac{\xi(x-\mu)}{\sigma}\right)^{-1/\xi} & \text{for }\xi \neq 0, \\ 1 - \exp \left(-\frac{x-\mu}{\sigma}\right) & \text{for }\xi = 0, \end{cases} $$ where the support of $$X$$ is $$ x \geqslant \mu $$ when $$ \xi \geqslant 0 \,$$, and $$ \mu \leqslant x \leqslant \mu - \sigma /\xi $$ when $$ \xi < 0$$.

The probability density function (pdf) of $$X \sim GPD(\mu, \sigma, \xi)$$ is


 * $$f_{(\mu,\sigma,\xi)}(x) = \frac{1}{\sigma}\left(1 + \frac{\xi (x-\mu)}{\sigma}\right)^{\left(-\frac{1}{\xi} - 1\right)}$$,

again, for $$ x \geqslant \mu $$ when $$ \xi \geqslant 0$$, and $$ \mu \leqslant x \leqslant \mu - \sigma /\xi $$ when $$ \xi < 0$$.

The pdf is a solution of the following differential equation:


 * $$\left\{\begin{array}{l}

f'(x) (-\mu \xi +\sigma+\xi x)+(\xi+1) f(x)=0, \\ f(0)=\frac{\left(1-\frac{\mu \xi}{\sigma}\right)^{-\frac{1}{\xi }-1}}{\sigma} \end{array}\right\} $$

Special cases

 * If the shape $$\xi$$ and location $$\mu$$ are both zero, the GPD is equivalent to the exponential distribution.
 * With shape $$\xi = -1$$, the GPD is equivalent to the continuous uniform distribution $$U(0, \sigma)$$.
 * With shape $$\xi > 0$$ and location $$\mu = \sigma/\xi$$, the GPD is equivalent to the Pareto distribution with scale $$x_m=\sigma/\xi$$ and shape $$\alpha=1/\xi$$.
 * If $$ X $$ $$\sim$$ $$GPD$$ $$($$$$\mu  = 0$$, $$\sigma$$, $$\xi$$ $$)$$, then $$ Y = \log (X) \sim exGPD(\sigma, \xi)$$ . (exGPD stands for the exponentiated generalized Pareto distribution.)
 * GPD is similar to the Burr distribution.

Generating GPD random variables
If U is uniformly distributed on (0, 1 ], then


 * $$ X = \mu + \frac{\sigma (U^{-\xi}-1)}{\xi} \sim GPD(\mu, \sigma, \xi \neq 0)$$

and
 * $$ X = \mu - \sigma \ln(U) \sim GPD(\mu,\sigma,\xi =0).$$

Both formulas are obtained by inversion of the cdf.

In Matlab Statistics Toolbox, you can easily use "gprnd" command to generate generalized Pareto random numbers.

GPD as an Exponential-Gamma Mixture
A GPD random variable can also be expressed as an exponential random variable, with a Gamma distributed rate parameter.


 * $$X|\Lambda \sim \operatorname{Exp}(\Lambda)  $$

and
 * $$\Lambda \sim \operatorname{Gamma}(\alpha, \beta)  $$

then
 * $$X \sim \operatorname{GPD}(\xi = 1/\alpha, \ \sigma = \beta/\alpha)  $$

Notice however, that since the parameters for the Gamma distribution must be greater than zero, we obtain the additional restrictions that:$$\xi$$ must be positive.

In addition to this mixture (or compound) expression, the generalized Pareto distribution can also be expressed as a simple ratio. Concretely, for $$Y \sim \text{Exponential}(1)$$ and $$Z \sim \text{Gamma}(1/\xi, 1)$$, we have $$\mu + \sigma \frac{Y}{\xi Z} \sim \text{GPD}(\mu,\sigma,\xi)$$. This is a consequence of the mixture after setting $$\beta=\alpha$$ and taking into account that the rate parameters of the exponential and gamma distribution are simply inverse multiplicative constants.

The exponentiated generalized Pareto distribution (exGPD)
If $$ X \sim GPD$$ $$($$$$\mu = 0$$, $$\sigma$$, $$\xi$$ $$)$$, then $$ Y = \log (X)$$ is distributed according to the exponentiated generalized Pareto distribution, denoted by $$ Y$$ $$\sim$$ $$exGPD$$ $$($$$$\sigma$$, $$\xi$$ $$)$$.

The probability density function(pdf) of $$ Y $$ $$\sim$$ $$exGPD$$ $$($$$$\sigma$$, $$\xi$$ $$)\,\, (\sigma >0) $$ is


 * $$ g_{(\sigma, \xi)}(y) = \begin{cases} \frac{e^y}{\sigma}\bigg( 1 + \frac{\xi e^y}{\sigma} \bigg)^{-1/\xi -1}\,\,\,\, \text{for } \xi \neq 0, \\

\frac{1}{\sigma}e^{y - e^{y}/\sigma} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \,\,\,\,  \text{for } \xi = 0 ,\end{cases}$$ where the support is $$ -\infty < y < \infty $$ for $$ \xi \geq 0 $$, and $$ -\infty < y \leq \log(-\sigma/\xi)$$ for $$ \xi < 0 $$.

For all $$\xi$$, the $$\log \sigma $$ becomes the location parameter. See the right panel for the pdf when the shape $$\xi$$ is positive.

The exGPD has finite moments of all orders for all $$\sigma>0$$ and $$-\infty< \xi < \infty $$.



The moment-generating function of $$ Y \sim exGPD(\sigma,\xi)$$ is
 * $$ M_Y(s) = E[e^{sY}] = \begin{cases} -\frac{1}{\xi}\bigg(-\frac{\sigma}{\xi}\bigg)^{s} B(s+1, -1/\xi) \,\,\,\,\,\,\,\,\,\,\,\, \text{for } s \in (-1, \infty), \xi < 0, \\

\frac{1}{\xi}\bigg(\frac{\sigma}{\xi}\bigg)^{s} B(s+1, 1/\xi - s) \,\,\,\,\,\, \,\,\,\,\,\,\,\,\,\,\, \text{for } s \in (-1, 1/\xi), \xi > 0, \\ \sigma^{s} \Gamma(1+s) \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \text{for } s \in (-1, \infty), \xi = 0, \end{cases}$$ where $$B(a,b) $$ and $$ \Gamma (a) $$ denote the beta function and gamma function, respectively.

The expected value of $$ Y $$ $$\sim$$ $$exGPD$$ $$($$$$\sigma$$, $$\xi$$ $$)$$ depends on the scale $$ \sigma$$ and shape $$ \xi $$ parameters, while the $$ \xi $$ participates through the digamma function:
 * $$ E[Y] = \begin{cases} \log\ \bigg(-\frac{\sigma}{\xi} \bigg)+ \psi(1) - \psi(-1/\xi+1)  \,\,\,\,\,\,\,\,\,\,\,\, \,\, \text{for }\xi < 0,  \\

\log\ \bigg(\frac{\sigma}{\xi} \bigg)+ \psi(1) - \psi(1/\xi) \,\,\,\,\,\,\,\,\,\,\,\, \,\,\, \,\,\, \,\,\, \,\,\, \,\,\, \,\,\,\,\,\, \,\,\,  \text{for }\xi > 0,   \\ \log \sigma + \psi(1) \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \,\,\, \,\,\, \,\,\, \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \,\,\, \,\,\,\,\,\,\, \text{for }\xi = 0. \end{cases}$$ Note that for a fixed value for the $$ \xi \in (-\infty,\infty) $$, the $$ \log\ \sigma $$ plays as the location parameter under the exponentiated generalized Pareto distribution.

The variance of $$ Y $$ $$\sim$$ $$exGPD$$ $$($$$$\sigma$$, $$\xi$$ $$)$$ depends on the shape parameter $$ \xi $$ only through the polygamma function of order 1 (also called the trigamma function):
 * $$ Var[Y] = \begin{cases} \psi'(1) - \psi'(-1/\xi +1) \,\,\,\,\,\,\,\,\,\,\,\, \, \text{for }\xi < 0,  \\

\psi'(1) + \psi'(1/\xi) \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \text{for }\xi > 0 , \\ \psi'(1) \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \,\,\,\,\,\text{for }\xi = 0. \end{cases}$$ See the right panel for the variance as a function of $$\xi$$. Note that $$ \psi'(1) = \pi^2/6 \approx 1.644934 $$.

Note that the roles of the scale parameter $$\sigma$$ and the shape parameter $$\xi$$ under $$Y \sim exGPD(\sigma, \xi)$$ are separably interpretable, which may lead to a robust efficient estimation for the $$\xi$$ than using the $$X \sim GPD(\sigma, \xi)$$. The roles of the two parameters are associated each other under $$X \sim GPD(\mu=0,\sigma, \xi)$$ (at least up to the second central moment); see the formula of variance $$Var(X)$$ wherein both parameters are participated.

The Hill's estimator
Assume that $$ X_{1:n} = (X_1, \cdots, X_n) $$ are $$n$$ observations (not need to be i.i.d.) from an unknown heavy-tailed distribution $$ F $$ such that its tail distribution is regularly varying with the tail-index $$1/\xi $$ (hence, the corresponding shape parameter is $$\xi $$). To be specific, the tail distribution is described as

\bar{F}(x) = 1 - F(x) = L(x) \cdot x^{-1/\xi}, \,\,\,\,\,\text{for some }\xi>0,\,\,\text{where } L \text{ is a slowly varying function.} $$ It is of a particular interest in the extreme value theory to estimate the shape parameter $$\xi$$, especially when $$\xi$$ is positive (so called the heavy-tailed distribution).

Let $$F_u$$ be their conditional excess distribution function. Pickands–Balkema–de Haan theorem (Pickands, 1975; Balkema and de Haan, 1974) states that for a large class of underlying distribution functions $$F$$, and large $$u$$, $$F_u$$ is well approximated by the generalized Pareto distribution (GPD), which motivated Peak Over Threshold (POT) methods to estimate $$\xi$$: the GPD plays the key role in POT approach.

A renowned estimator using the POT methodology is the Hill's estimator. Technical formulation of the Hill's estimator is as follows. For $$ 1\leq i \leq n $$, write $$ X_{(i)} $$ for the $$i$$-th largest value of $$ X_1, \cdots, X_n $$. Then, with this notation, the Hill's estimator (see page 190 of Reference 5 by Embrechts et al ) based on the $$k$$ upper order statistics is defined as

\widehat{\xi}_{k}^{\text{Hill}} = \widehat{\xi}_{k}^{\text{Hill}}(X_{1:n}) = \frac{1}{k-1} \sum_{j=1}^{k-1} \log \bigg(\frac{X_{(j)}}{X_{(k)}} \bigg), \,\,\,\,\,\,\,\, \text{for } 2 \leq k \leq n. $$ In practice, the Hill estimator is used as follows. First, calculate the estimator $$\widehat{\xi}_{k}^{\text{Hill}}$$ at each integer $$k \in \{ 2, \cdots, n\}$$, and then plot the ordered pairs $$\{(k,\widehat{\xi}_{k}^{\text{Hill}})\}_{k=2}^{n}$$. Then, select from the set of Hill estimators $$\{\widehat{\xi}_{k}^{\text{Hill}}\}_{k=2}^{n}$$ which are roughly constant with respect to $$k$$: these stable values are regarded as reasonable estimates for the shape parameter $$\xi$$. If $$ X_1, \cdots, X_n $$ are i.i.d., then the Hill's estimator is a consistent estimator for the shape parameter $$\xi$$.

Note that the Hill estimator $$\widehat{\xi}_{k}^{\text{Hill}}$$ makes a use of the log-transformation for the observations $$ X_{1:n} = (X_1, \cdots, X_n) $$. (The Pickand's estimator $$\widehat{\xi}_{k}^{\text{Pickand}}$$ also employed the log-transformation, but in a slightly different way .)