User:Dicklyon/Gamma median

The median of a gamma distribution does not have a closed-form formula, but has some known simple special cases and asymptotic behavior.

On Talk:Gamma distribution we have been talking about approximations for the median of a gamma distribution (well, more accurately, I've been mostly talking to myself there). Here I will discuss some approximations I have come up with for the low-$k$ regime. Per WP:NOR, this is not ready for article space.



The gamma distribution pdf is $$\frac{1}{\Gamma(k) \theta^k} x^{k - 1} e^{-\frac{x}{\theta}}$$, but we'll use $θ = 1$ because both the mean and median scale with the scale parameter. So we'll work with $$p(x) = \frac{1}{\Gamma(k)} x^{k - 1} e^{-x}$$.

The median of this pdf is the heavy black curve in the figures (computed by iteratively solving for 0.5 == gammainc(median, k) in Matlab using Newton–Raphson iteration).

Approximations for high $k$
The Laurent series described in the gamma distribution article, originally by Choi and then extended and corrected by Berg and Pedersen, is increasingly accurate with more terms for high enough $k$, but is not very good around $k = 1$, and diverges below that.

Approximations for low $k$


To find the median, we just need to solve for $k$ in $$\int_0^\nu p(x) dx = 0.5$$ – but that's hard to solve. That integral is called the lower incomplete gamma function, and it can be computed as a sum of easy-to-compute terms by expressing the exponential as a power series.


 * $$0.5 = \int_0^\nu p(x) dx = \int_0^\nu \frac{x^{k - 1} }{\Gamma(k)} e^{-x} dx$$
 * $$0.5 = \int_0^\nu \frac{x^{k - 1} }{\Gamma(k)} \sum_{n=0}^\infty \frac{(-x)^n}{n!} dx$$
 * $$0.5 = \frac{1}{\Gamma(k)} \sum_{n=0}^\infty \frac{-1^n}{n!} \int_0^\nu x^{n + k - 1} dx$$

But that expression can't be solved for $k$, except numerically.

What we can do, however, is take just the first ($k < 1$) term, the one that treats $n = 0$, which is actually pretty accurate when $k$ is low enough. The modified pdf is the upper (dashed) curve in the figure, $$\frac{x^{k - 1}}{\Gamma(k)}$$. This is not a probability distribution, as it doesn't integrate to 1 (in fact its complete integral is infinite for all $ν$). Nevertheless, it is an OK approximation to the pdf in the region below the median, when the median is much less than 1, as the figure suggests.

So, let's integrate, keeping just one term, to find a first approximation to the median, $e^{x} ≈ 1$:


 * $$0.5 = \frac{1}{\Gamma(k)} \int_0^{\nu_0} x^{k - 1} dx$$
 * $$0.5 \Gamma(k) = \frac{x^k}{k}\vert_{x=\nu_0}$$
 * $$\nu_0 = (0.5 k \Gamma(k))^{1/k} = (0.5 \Gamma(k+1))^{1/k}$$



When $ν$ is very small, $$\nu_0$$ is an excellent approximation. But as the figure shows, the vertically-hashed area between curves is excess area counted in the integral. If we can estimate that area, and increase the estimate of $$\nu$$ to include that much more area to the right (the horizontally-hashed area), then we can get a better estimate. So let's take the next (negative) term in the series to estimate the extra area.

Call the extra area that we counted $$\Delta A$$:


 * $$\Delta A = - \frac{1}{\Gamma(k)} \sum_{n=1}^\infty \frac{-1^n}{n!} \int_0^{\nu_0} x^{n + k - 1} dx$$

And consider just the first ($ν_{0}$) term:


 * $$\Delta A_1 = \frac{1}{\Gamma(k)} \int_0^{\nu_0} x^k dx$$
 * $$\Delta A_1 = \frac{\nu_0^{k+1}}{\Gamma(k) (k+1)}$$

And take the area to the right to be a rectangle of width $$\Delta x_1$$ and height $$p(\nu_0)$$.


 * $$\Delta A_1 = \Delta x_1 \left( \frac{\nu_0^{k - 1}}{\Gamma(k)} e^{-\nu_0} \right) $$
 * $$\Delta x_1 = \frac{\frac{x^{k+1}}{\Gamma(k) (k+1)}}{\frac{\nu_0^{k - 1}}{\Gamma(k)} e^{-\nu_0}} = \frac{e^{\nu_0} \nu_0^2}{(k+1)}$$

So our next approximation is:


 * $$\nu_1 = \nu_0 + \Delta x_1 = \nu_0 + \frac{\nu_0^2 e^{\nu_0}}{(k+1) } = \nu_0 \left( 1 + \frac{e^{\nu_0} \nu_0}{(k+1) }      \right)    $$

Or we could write it out without the intermediate $n = 1$ as


 * $$\nu_1 = (0.5 \Gamma(k+1))^{1/k} \left( 1 + \frac{ e^{(0.5 \Gamma(k+1))^{1/k}} (0.5 \Gamma(k+1))^{1/k} }{(k+1) } \right)$$

The error in estimating Δ$k$ and the error in getting Δ$k$ from Δ$a$ are in opposite directions, so they partially cancel. To get a next better estimate, we can improve the estimate of Δ$b$, or the conversion to Δ$c$, or both. But to get an improvement, we again want opposite signs of the errors. With two more terms for Δ$a$, and a trapezoidal estimate for the area to the right, we get a good result:


 * $$\Delta A_2 = \frac{1}{\Gamma(k) } \left( \frac{\nu_0^{k+1}}{(k+1)} - \frac{\nu_0^{k+2}}{2(k+2)} +  \frac{\nu_0^{k+3}}{6(k+3)}\right)$$

And for the righthand area that should equal this, we can take the height about in the middle instead of at the left edge; we can get that by evaluating the pdf half way between $ν_{0}$ and $ν_{0}$, or we can extrapolate from the height at $ν_{1}$ using the derivative there, by half of the previously computed Δ$ν$, which is what we decided to do.


 * $$\Delta A_2 = \Delta x_2 \left( p(\nu_0) + p'(\nu_0) \frac{\Delta x_1}{2} \right) = \Delta x_2 p(\nu_0) \left( 1 + \frac{p'(\nu_0)}{p(\nu_0)} \frac{\Delta x_1}{2} \right) $$

where the logarithmic derivative is:


 * $$\frac{p'(\nu_0)}{p(\nu_0)} = \frac{d \log(p(x))}{dx} |_{x=\nu_0} = -\left(1 + \frac{1 - k}{\nu_0}\right)$$

Therefore:


 * $$\Delta x_2 = \frac{\Delta A_2}{ p(\nu_0) \left( 1 + \frac{p'(\nu_0)}{p(\nu_0)} \frac{\Delta x_1}{2} \right) }$$
 * $$\Delta x_2 = \frac{\frac{1}{\Gamma(k) } \left( \frac{\nu_0^{k+1}}{(k+1)} - \frac{\nu_0^{k+2}}{2(k+2)} +  \frac{\nu_0^{k+3}}{6(k+3)}\right)}{ \left(\frac{\nu_0^{k - 1}}{\Gamma(k)}  e^{-\nu_0}\right) \left( 1 - \left(1 + \frac{1 - k}{\nu_0}\right)\frac{\nu_0^2}{2(k+1) e^{-\nu_0}} \right) }$$
 * $$\Delta x_2 = \frac{ e^{\nu_0} \left( \frac{\nu_0^2}{(k+1)} - \frac{\nu_0^3}{2(k+2)} +  \frac{\nu_0^4}{6(k+3)} \right) }{ 1 - \left(1 + \frac{1 - k}{\nu_0}\right)\frac{e^{\nu_0}\nu_0^2}{2(k+1)} }$$

Then we have the next approximation to the median:


 * $$\nu_2 = \nu_0 + \Delta x_2 = \nu_0 + \frac{ e^{\nu_0} \left( \frac{\nu_0^2}{(k+1)} - \frac{\nu_0^3}{2(k+2)} +  \frac{\nu_0^4}{6(k+3)} \right) }{ 1 - \left(1 + \frac{1 - k}{\nu_0}\right)\frac{e^{\nu_0}\nu_0^2}{2(k+1)} }  =  \nu_0 \left( 1 + \frac{ e^{\nu_0} \left( \frac{\nu_0}{(k+1)} -  \frac{\nu_0^2}{2(k+2)} +  \frac{\nu_0^3}{6(k+3)} \right) }{ 1 - \left(1 + \frac{1 - k}{\nu_0}\right)\frac{e^{\nu_0}\nu_0^2}{2(k+1)} }  \right)$$

It's complicated, but it's a ridiculously good approximation at low values of $A$, and it points toward how a series of better approximations might be derived, using more terms from the exponential and higher-order derivatives.

These are essentially just simple approximations to Newton–Raphson steps that one could do numerically, for very quick convergence when the starting point is not too bad. But it's nice that they can produce "closed form" approximations, not just numerical recipes.

Approximations for medium-to-high $x$


Banneheka & Ekanayake (2009) provided the rational function $$(k - 1 + 0.2)/(k + 0.2)$$ as an approximation for the median at high enough $A$. Generalizing the 0.2 to be a function $ν_{0}$ leads to a variety of improved approximations that work down to lower values of the shape factor, and potentially down to 0 if a better form is found.

Using $k = 1$ leads to the right behavior for very large $A$, matching the $c(k)$ term in the Laurent series, so we arranged this series of approximations to use that first coefficient. The others are least-squares fitted to an ideal $x$ computed as a function of $A$ using the numerical median values. To get a good fit, the lower limit of $x$ used is adjusted for each $k$.

Unlike the original Laurent series, in this one adding more terms makes it work to lower values of $k$.

Approximations across all $k$


To get a functional form with low relative error across all $k$, it seems we need to combine the above strategies. E.g. use $c = 8/45 = 0.1777777$ to get the low-$k$ shape, and a Laurent series to make it fit everywhere.

Since $8/(9⋅45 k)$ approaches $2^{-1/k}$ at high $c$ (according to Wolfram), multiplying by $2^{-1/k}$ makes an approximation that approaches $1 - log(2)/k$ at high $k$, which is a good place to start. Beyond that, we find coefficients empirically, by least-squares fit. The error curves shown correspond to the rounded coefficients stated on the figure.

By using negative powers of $k + log(2) - 1/3$ instead of $k$, we allow convergence of the series down to 0. The 0.5 offset seems a bit better than other values tried, but has not been optimized.



From looking at the small-$N$ approximations, it seems that maybe a polynomial in $k - 1/3$ might work. But that term is proportional to $k$ at the high end, so powers of it wouldn't be useful. And being tempted to get rid of the gamma function, which is itself not really an easy closed-form expression, we can try polynomials in $k + 0.5$ instead. However, since the latter flattens out at 1, and we want to end up with a result close to $k$, we included terms multiplied by $k$, yielding very good least-squares fits with not too many coefficients.

Trying to be smarter about this


The $$2^{-1/k}$$ is useful. It needs a factor approaching $$(1 + k \pi^2/12)e^{-\gamma}$$ at the low end, and approaching $$k + \log(2) - 1/3$$ at the high end, as pointed out in earlier sections. If we make an appropriate interpolator between those coefficients of 1 and $k$, e.g. using a logistic sigmoid of $$\log(k)$$, we will be able to approach the known asymptotes at both ends, such that both absolute and relative errors will tend toward zero for low and high $k$.

The coefficient of 1 changes from 0.5615 to 0.3598, while the coefficient of $k$ changes from 0.4618 to 1.0. Finding a pair of interpolators jointly by searching over four parameters (width and centers of two sigmoids) is easy by brute-force search. We minimized the max of absolute and relative errors to arrive at these (sigmoidal in log $k$) interpolators, which yield absolute and relative errors everywhere less than 0.00228:


 * $$\alpha_0 = \frac{1}{1 + (k/0.2131)^{1.2900}}$$
 * $$\alpha_1 = \frac{1}{1 + (k/0.00802)^{3.5251}}$$

The formula for the smarter everywhere-good median approximation is then:


 * $$\nu \approx 2^{-1/k} \left(  \alpha_0 e^{-\gamma} + (1 - \alpha_0) \left( \log(2) - \frac{1}{3} \right)    + k \left(     \alpha_1 \frac{e^{-\gamma}\pi^2}{12} + (1 - \alpha_1)   \right)          \right)$$



The logistic sigmoid leads to a rather odd-looking best fit, as the plot of interpolators shows (red dashed curve, the sigmoid for the $k$ coefficient, in particular). I actually ran this optimization down to $$ k = 2^{-10} $$ to make sure the relative error is not bouncing back up, and it does eventually like the low-$k$ coefficients down there.

Gompertz function as better interpolator
The Gompertz function is a good asymmetric function that ends up working just a bit better as an interpolator for max relative error (less than 0.00184), but considerably better for max absolute error (less than 0.000612) at the same time, and gives a more sensible-looking pair of interpolating curves (solid curves in the two figures).


 * $$\alpha_0 = 1 - \exp \left(-(k / 0.1500)^{-1.0023} \right)$$
 * $$\alpha_1 = 1 - \exp \left(-(k / 0.0500)^{-2.5080} \right)$$

These interpolators work with the same formula for the median approximation given above.

As 4-parameter fits go, this is the best.

Single-interpolator version
I tried one interpolator for both terms, but that was a lot worse. The graph of interpolators suggests that a single interpolator for the constant factor, and just fixing the $k$ factor to the high-$d$ value might be about as good. Indeed, that worked OK (and the same trick with Gompertz function did not). The best interpolator is not visually different from the (blue dashed) one plotted:


 * $$\alpha = \frac{1}{1 + (k/0.2136)^{1.2789}}$$

Then we have this simpler expression, with absolute and relative errors less then 0.00271 with just two fitted parameters:


 * $$\nu \approx 2^{-1/k} \left(  \alpha e^{-\gamma} + (1 - \alpha) \left( \log(2) - \frac{1}{3} \right)    + k     \right)$$

Having the wrong coefficient of $k$ doesn't hurt very much for very small $k$ – the relative error just doesn't go to zero as quickly as it might.

Rewrite it this way:


 * $$\nu \approx 2^{-1/k} \left(  \frac{1}{1 + (k/0.2136)^{1.2789}} \left(  e^{-\gamma} - \log(2) + \frac{1}{3} \right) +  \log(2) - \frac{1}{3}   + k      \right)$$
 * $$\nu \approx 2^{-1/k} \left( \frac{ 0.2016}{1 + (k/0.2136)^{1.2789}} + 0.3598 + k      \right)$$

Better single interpolator function: Gudermannian


Alternatively, we can numerically find the ideal single interpolator, see what it looks like, and seek a better functional form than a sigmoid to approximate it. The difference from a logistic sigmoid is small, but enough that we can see which direction to move, among the various sigmoid functions illustrated in Wikipedia, and it looks like the Gudermannian function would be a step in the right direction.

For the Gudermannian of log $k$ we use the arctan($k$) formula, but with optimized offset and scaling, this way:


 * $$\alpha = 1 - \frac{2}{\pi} \arctan\left( (k/0.21239)^{1.0554} \right)$$

This actually yields less than half as much max relative error, below 0.00132, and max absolute error below 0.00122. Using the same base formula from above, the new best two-parameter approximation is:


 * $$\nu \approx 2^{-1/k} \left( \frac{2}{\pi}\arctan\left( \left( \frac{k}{0.21239} \right)^{1.0554} \right) \left( e^{-\gamma}  - \log(2) + \frac{1}{3} \right) +  e^{-\gamma}   + k      \right)$$
 * $$\nu \approx 2^{-1/k} \left( 0.56146 + k   - 0.12837 \arctan\left( (4.7083 k)^{1.0554} \right)    \right)$$

I was not able to reduce the maximum error further by using two Gudermannian interpolators.