User:Tomkeelin/sandbox

Moments
The $$m^{th}$$ moment of the unbounded metalog distribution, $$E[x^m] =\int_{y=0}^1 {M_k(y)}^m \, dy$$, is a special case of the more general formula for QPDs. For the unbounded metalog, such integrals evaluate to closed-form moments that are $$m^{th}$$ order polynomials in the coefficients $$a_i$$. The first four central moments of the four-term unbounded metalog are:

\begin{align} \text{mean} = {} & a_1 +{a_3\over2}\\[6pt] \text{variance} = {} & \pi^2{{a_2}^2\over3}+{{a_3}^2\over{12}}+\pi^2{{a_3}^2\over{36}}+a_2a_4 +{{a_4}^2\over{12}}\\[6pt] \text{skewness} = {} & \pi^2 {a_2}^2{a_3}+\pi^2 {{a_3}^3 \over{24}} + {{{a_2} {a_3} {a_4}} \over{2}}+ \pi^2{{{a_2} {a_3} {a_4}} \over{6}} + {{{a_3} {a_4}^2} \over{8}}\\[6pt] \text{kurtosis} = {} & 7\pi^4 {{a_2}^4\over{15}}+3\pi^2 {{{a_2}^2{a_3}^2} \over{24}}+7\pi^4 {{{a_2}^2{a_3}^2} \over{30}}+{{{a_3}^4}\over{80}} + \pi^2{{a_3}^4 \over{24}} + 7\pi^4 {{a_3}^4\over{1200}} + 2\pi^2{a_2}^3{a_4}\\[6pt] & {} +{{{a_2} {a_3}^2 {a_4}} \over{2}}+2\pi^2{{{a_2} {a_3}^2 {a_4}} \over{3}}+2{{a_2}^2 {a_4}^2}+\pi^2{{{a_2}^2 {a_4}^2 } \over{6}}+{{{a_3}^2 {a_4}^2 } \over{8}}+\pi^2{{{a_3}^2 {a_4}^2 } \over{40}}+{{{a_2} {a_4}^3 } \over{3}} +{{a_4}^4 \over{80}} \end{align} $$ Moments for fewer terms are subsumed in these equations. For example, moments of the three-term metalog can be obtained by setting $$a_4$$ to zero. Moments for metalogs with more terms, and higher-order moments ($$m>4$$), are also available. Moments for semi-bounded and bounded metalogs are not available in closed form.

Parameterization with Four Moments
Let $$m, v, s,$$ and $$k$$ be mean, variance, skewness, and kurtosis, respectively. For the four-term unbounded metalog, these central moments may be expressed in terms of coefficients.

\begin{align} m = {} & a_1 +{a_3\over2}\\[6pt] v = {} & \pi^2{{a_2}^2\over3}+{{a_3}^2\over{12}}+\pi^2{{a_3}^2\over{36}}+a_2a_4 +{{a_4}^2\over{12}}\\[6pt] s = {} & \pi^2 {a_2}^2{a_3}+\pi^2 {{a_3}^3 \over{24}} + {{{a_2} {a_3} {a_4}} \over{2}}+ \pi^2{{{a_2} {a_3} {a_4}} \over{6}} + {{{a_3} {a_4}^2} \over{8}}\\[6pt] k = {} & 7\pi^4 {{a_2}^4\over{15}}+3\pi^2 {{{a_2}^2{a_3}^2} \over{24}}+7\pi^4 {{{a_2}^2{a_3}^2} \over{30}}+{{{a_3}^4}\over{80}} + \pi^2{{a_3}^4 \over{24}} + 7\pi^4 {{a_3}^4\over{1200}} + 2\pi^2{a_2}^3{a_4}\\[6pt] & {} +{{{a_2} {a_3}^2 {a_4}} \over{2}}+2\pi^2{{{a_2} {a_3}^2 {a_4}} \over{3}}+2{{a_2}^2 {a_4}^2}+\pi^2{{{a_2}^2 {a_4}^2 } \over{6}}+{{{a_3}^2 {a_4}^2 } \over{8}}+\pi^2{{{a_3}^2 {a_4}^2 } \over{40}}+{{{a_2} {a_4}^3 } \over{3}} +{{a_4}^4 \over{80}} \end{align} $$

... and let $$s_s$$ be the standardized skewness, $$s_s=s/v^{3/2}$$.

Parameterization with Three Moments
Three-term unbounded metalogs can be parameterized in closed form with moments. Let $$m, v,$$ and $$s$$ be the mean, variance, and skewness as given above, and let $$s_s$$ be the standardized skewness, $$s_s=s/v^{3/2}$$. Equivalent expressions of the moments in terms of the coefficients and coefficients in terms of the moments are as follows.

Setting $$a_4=0$$ yields a set of cubic equations in terms of coefficients $$a_1, a_2,$$ and $$a_3$$ that can be solved in closed form form as a function of  $$m, v,$$ and $$s$$.

\begin{array}{l} a_1 = m -{a_3\over2}\\ a_2 = {1\over{\pi}}\Bigl[3\Bigl(v-\Bigl({1\over{12}}+{{\pi}^2\over{36}}\Bigr){a_3}^2\Bigr)\Bigr]^{1\over{2}}\\ a_3 = 4\Bigl({6v\over{6+\pi^2}}\Bigr)^{1\over{2}}\cos\Bigl[{1\over{3}}\Bigl(\cos^{-1}\Bigl(-{s_s\over{4}}\Bigl(1+{{\pi}^2\over{6}}\Bigr)^{1\over{2}}\Bigr)+4\pi\Bigr)\Bigr] \end{array} $$

Parameterizing metalogs with their moments is useful, for example, when summing independent, non-identically distributed uncertainties. Given first three or four central moments

$$ \begin{align} m = {} & a_1 +{a_3\over2}\\[6pt] v = {} & \pi^2{{a_2}^2\over3}+{{a_3}^2\over{12}}+\pi^2{{a_3}^2\over{36}}\\[6pt] s = {} & \pi^2 {a_2}^2{a_3}+\pi^2 {{a_3}^3 \over{24}}\\[6pt] \end{align} $$

To validate this equivalence, start with a given set of moments and calculate the corresponding coefficients with the equations on the right. Then take these coefficients, fill them into the equations on the left, and note that the result is exactly the set of moments with which you started. We derived this result by noting that the equations on the left reduce to a cubic polynomial in terms of the coefficients, which can be solved in closed form in terms of the moments. Moreover, this solution is unique. In terms of moments, the feasibility condition is $$|s_s|\leq 2.07093$$, which can be shown to be equivalent to the feasibility condition in terms of coefficients: $$a_2>0$$ and $${|a_3|/a_2}<1.66711$$.

Click here to view the proof.

Median
The median of any distribution in the metalog family has a simple closed form. Note that $$ y=0.5$$ defines the median and that $$M_k(0.5)=a_1$$ since all subsequent terms are zero under this condition. It follows that the medians of the unbounded metalog, log metalog, negative-log metalog, and logit metalog distributions are $$a_1$$, $$b_l+e^{a_1}$$, $$b_u-e^{-a_1}$$, and $${b_l+b_u e^{a_1}\over{1+e^{a_1}}}$$ respectively.

Simulation
Since the quantile function is expressed in closed form, metalogs facilitate Monte Carlo simulation. Substituting in uniformly distributed random samples of $$y$$ produces random samples of $$x$$ in closed form, thereby eliminating the need to invert a CDF. See below for simulation applications.

Applications
Due to their shape and bounds flexibility, metalogs can be used to represent empirical or other data in virtually any field of human endeavor.
 * Astronomy . Metalogs were applied to assess the risks of asteroid impact.
 * Cybersecurity . Metalogs were used in cyber security risk assessment.
 * Eliciting and Combining Expert Opinion . Statistics Canada elicited expert opinions on future Canadian fertility rates from 18 experts, which included the use of spreadsheet-based real-time PDF feedback based on five-term metalogs. The individual expert opinions were then weighted and combined into an overall metalog-based forecast.
 * Empirical Data Exploration and Visualization . In fish biology, a 10-term log metalog distribution (bounded below at 0) was fit to the weights of 3,474 steelhead trout caught and released on the Babine River in British Columbia during 2010-2014. The bimodality of the resulting distribution has been attributed to the presence of both first-time and second-time spawners in the river, the latter of which tend to weigh more.
 * Hydrology . A 10-term semi-bounded metalog was used to model the probability distribution of annual river gauge heights.
 * Oil Field Production . Semi-bounded SPT metalogs were used to analyze biases in projections of oil-field production when compared to observed production after the fact.
 * Portfolio Management . SPT metalogs have been used to model commercial value of new products and product portfolios.
 * Simulation Input Distributions . Since quantile functions in the metalog family are expressed in closed form, they facilitate Monte Carlo simulation. Substituting in uniformly distributed random samples of $$y$$ produces random samples of $$x$$ in closed form, thereby eliminating the need to invert a CDF expressed as $$y=F(x)$$. This approach was used to simulate the total value of a portfolio of 259 financial assets.
 * Simulation Output Distributions . Metalogs have also been used to fit output data from simulations in order to represent those outputs as closed-form continuous distributions (both CDFs and PDFs). Used in this way, they are typically more stable and smoother than histograms.
 * Sums of Lognormals . Metalogs enable a closed-form representation of known distributions whose CDFs have no closed-form expression. Keelin et al. (2019) apply this to the sum of independent identically distributed lognormal distributions, where quantiles of the sum can be determined by a large number of simulations. Nine such quantiles are used to parameterize a semi-bounded metalog distribution that runs through each of these nine quantiles exactly. Quantile parameters are stored in a table, which can then be interpolated to yield in-between values; these values are guaranteed to be feasible by the convexity property above.

For a given application and data set, choosing the number of metalog terms $$k$$ requires judgment. For expert elicitation, three to five terms is usually sufficient. For data exploration and matching other probability distributions such as the sum of lognormals, eight to 12 terms is usually sufficient. A metalog panel, which displays the metalog PDFs corresponding to differing numbers of terms $$k$$ for a given data set, may aid this judgment. For example, in the steelhead weight metalog panel above, using less than seven terms arguably underfits the data, by obscuring the data's inherent bimodality. Using more than 10 terms is unnecessary and could, in principle, overfit the data. The case with 16 terms is infeasible for this data set, as indicated by the blank cell in the metalog panel above. Keelin (2016) offers further perspectives on distribution selection within the metalog family. Other tools (such as regularization, Akaike information criterion, and Bayesian information criterion) may also be useful.

Related distributions
The following distributions are subsumed within the metalog family:
 * The logistic distribution is a special case of the unbounded metalog where $$a_i = 0$$ for all $$i>2$$.
 * The uniform distribution is a special case of: 1) the unbounded metalog where $$k\geq4$$, $$a_1 = 0.5$$, $$a_4=1$$ and $$a_i=0$$ otherwise; and 2) the bounded metalog where $$k\geq2$$, $$b_l = 0$$, $$b_u = 1$$, $$a_2 = 1$$, and $$a_i=0$$ otherwise.
 * The log-logistic distribution, also known as the Fisk distribution in economics, is a special case of the log metalog where $$b_l = 0$$, and $$a_i = 0$$ for all $$i>2$$.
 * The log-uniform distribution is a special case of the log metalog where $$k\geq4$$, $$a_1 = 0.5$$, $$a_4=1$$, and $$a_i=0$$ otherwise.
 * The logit-logistic distribution is a special case of the logit metalog where $$a_i = 0$$ for all $$i>2$$.

Software
Freely available software tools can be used to work with metalog distributions: Commercially available packages also support the use of metalog distributions:
 * Excel Workbooks. By pasting or typing in CDF data, metalogs (with choice of bounds) are instantly displayed.
 * SPT metalogs workbook calculates 2-3 term metalogs determined by three $$(x_i,y_i)$$ CDF data.
 * Metalogs workbook calculates 2-16 term metalogs (including metalog panel) determined by 2-10,000 $$(x_i,y_i)$$ CDF data.
 * ELD (equally likely data) Metalog workbooks calculate 2-16 term metalogs determined by 2-10,000 $$(x_i)$$ CDF data, where $$y_i$$'s and metalog panel are automatically calculated.
 * R. rmetalog (approved by the Comprehensive R Archive Network).
 * Python. Pymetalog closely mirrors the R package. Metalogistic takes advantage of the SciPy platform.
 * Web browser. MakeDistribution.com facilitates experimentation with metalogs parameterized by several CDF data points. The SPT metalog calculator, metalog calculator and ELD metalog calculator are online versions of the Excel Workbooks.
 * SIPmath Modeler Tools support metalog distributions in an Excel add-In for simulation.
 * Lumina's Analytica Free 101 software for modeling and aiding difficult decisions.
 * FrontLine Solvers: Analytic Solver, RASON, and Solver SDK, Excel-based software for optimization.
 * Lone Star Analysis: TruNavigator and AnalyticsOS software for predictive and prescriptive analytics.

Convex Hull for Feasible Coefficients of Three-Term Metalogs
Feasibility condition for metalogs with $$k=3$$ terms: $$a_1$$ is any real number, $$a_2>0$$ and $$|a_3|/a_2\leq 1.66711$$.

Convex Hull for Feasible Coefficients of Four-Term Metalogs
Convex Hull for Feasible Coefficients of Four-Term Metalogs

Feasibility for metalogs with $$k=4$$ terms is defined as follows:
 * $$a_1$$ is any real number, and
 * $$a_2\geq0$$, and
 * If $$a_2=0$$, then $$a_3=0$$ and $$a_4>0$$ (uniform distribution exactly)
 * If $$a_2>0$$, then feasibility conditions are specified numerically
 * For a given $$|a_3|/a_2$$, feasibility requires that $$a_4/a_2\geq$$ number shown.
 * For a given $$a_4/a_2$$, feasibility requires that $$|a_3|/a_2\leq$$ number shown.
 * At the top of this table, the four-term metalog is symmetric and peaked, similar to a student-t distribution with 3 degrees of freedom.
 * At the bottom of this table, the four-term metalog is a uniform distribution exactly.
 * In between, it has varying degrees of skewness depending on $$a_3$$. Positive $$a_3$$ yields right skew. Negative $$a_3$$ yields left skew. When $$a_3=0$$, the four-term metalog is symmetric.

Convex Hull Equations
The feasible area can be closely approximated by an ellipse (dashed, gray curve), defined by center $$b =4.5$$ and semi-axis lengths $$c=8.5$$ and $$d =1.93$$. Supplementing this with linear interpolation outside its applicable range, feasibility, given $$a_2>0$$, can be closely approximated:

\ \approx \left\{ \begin{array}{rlcrll} {|a_3|\over a_2}&\leq{d\over c}\sqrt{c^2-({a_4\over a_2}-b)^2} & \text{ for } & -4.0& \leq{a_4\over a_2}\leq 4.5,\\ {|a_3|\over a_2}&\leq 0.0216 ({a_4\over a_2} -4.5)+ 1.930& \mbox{ for } & 4.5&<{a_4\over a_2}\leq 7.0, \\ {|a_3|\over a_2}& \leq 0.0040 ({a_4\over a_2}-7.0) +1.984& \mbox{ for} & 7.0 &<{a_4\over a_2}\leq 10.0,\\ {|a_3|\over a_2}& \leq 0.0002 ({a_4\over a_2}-10.0) +1.996& \mbox{ for} & 10.0 &<{a_4\over a_2}\leq 30.0,\\ {|a_3|\over a_2}& \leq 2.0& \mbox{ for} & 30.0 &<{a_4\over a_2}. \end{array}\right. $$ Points inside or on the boundary of this range have been shown to be feasible. There may be additional feasible points very slightly outside it.