User:Bassis/Well behaving statistics

What property is relevant to a sample? In Algorithmic Inference relevance is identified with pivotal features allowing to transfer probability masses from sample distribution to the distribution of the parameters that are compatible with the actually observed sample.

Facing a sample $$\boldsymbol x=\{x_1,\ldots,x_m\}$$, given a sampling mechanism $$(g_\theta,Z)$$, with $$\theta$$ scalar, for the random variable X, we have $$\boldsymbol x=\{g_\theta(z_1),\ldots,g_\theta(z_m)\}$$. The sampling mechanism $$(g_\theta,\boldsymbol z)$$, of the statistic s, as a function &rho; of $$\{x_1,\ldots,x_m\}$$ with specifications in $$\mathfrak S$$, has an explaining function defined by the master equation:

for suitable seeds $$\boldsymbol z=\{z_1,\ldots,z_m\}$$ and parameter &theta;

Well behaving
In order to derive the distribution law of the parameter &Theta; compatible with $$\boldsymbol x$$ we require the statistic to fulfill some technical properties. Namely, we say that a statistic s is well behaving if it satisfies the following three statements:
 * 1) monotony. A uniformly monotone relation exists between s and &theta; for any fixed seed $$\{z_1,\ldots,z_m\}$$ – so as to have a unique solution of (1);
 * 2) well definition. On each observed s the statistic is well defined for every value of &theta;, i.e. any sample  specification $$\{x_1,\ldots,x_m\}\in\mathfrak X^m$$ such that $$\rho(x_1,\ldots,x_m)=s$$ has a probability density different from 0 – so as to avoid considering non surjective mapping from $$\mathfrak X^m$$ to $$\mathfrak S$$, i.e. associating via $$s$$ to a sample $$\{x_1,\ldots,x_m\}$$ a &theta; that could not generate the sample itself;
 * 3) local sufficiency. $$\{\breve\theta_1,\ldots, \breve\theta_N\}$$ constitutes a true &Theta; sample for the observed s, so that we may attribute the same probability to each sampled value. Now, $$\breve\theta_j= h^{-1}(s,\breve z_1^j, \ldots,\breve z_m^j)$$ is a solution of (1) with the seed $$\{\breve z_1^j,\ldots,\breve z_m^j\}$$. Since the seeds are equally distributed, the sole caveat comes from  their independence or, conversely from their dependence on &theta; itself. We must restrict this check to seeds involved by s, i.e. we may avert this drawback by requiring that the distribution of $$\{Z_1,\ldots,Z_m|S=s\}$$ is independent of &theta;. An easy way to check this property is by mapping seed specifications into $$x_i$$s specifications. The mapping of course depends on &theta;, but the distribution of $$\{X_1, \ldots,X_m|S=s\}$$ will not depend on &theta;, if the above seed independence holds – a condition that looks like a local sufficiency of the statistic S.

Example
For instance, for both the Bernoulli distribution with parameter p and the Exponential distribution with parameter &lambda; the statistic $$\sum_{i=1}^m x_i$$ is well behaving. The satisfaction of the above three properties is straightforward when looking at both explaining functions: $$g_p(u)=1$$ if $$u\leq p$$, 0 otherwise in the case of the Bernoulli random variable, and $$g_\lambda(u)=-\log u/\lambda$$ for the Exponential random variable, giving rise to statistics $$s_p=\sum_{i=1}^m I_{[0,p]}(u_i)$$ and $$s_\lambda=-\frac{1}{\lambda}\sum_{i=1}^m \log u_i$$.

Vice versa, in the case of X following a uniform distribution (continuous) in $$[0,A]$$ the same statistics do not meet the second requirement. For instance, the observed sample $$\{c,c/2,c/3\}$$ gives $$s'_A=11/6c$$. But the explaining function of this X is $$g_a(u)=u a$$. Hence a master equation $$s_A=\sum_{i=1}^m u_i a$$ would produce with a U sample $$\{0.8, 0.8, 0.8\}$$ a solution $$\breve a=0.76 c$$. This conflicts with the observed sample since the first observed value should result greater than the right extreme of the X range. The statistic $$s_A=\max\{x_1,\ldots,x_m\}$$ is well behaving in this case.

Analogously, for a random variable X following the power law distribution with parameters K and $$A$$ (see Pareto examples for a complete reference on the matter) we use $$s_1=\sum_{i=1}^m \log x_i$$ and $$s_2=\min_{i=1,\ldots,m} \{x_i\}$$ as joint statistic for  these parameters.

As a general statement that holds under weak conditions, sufficient statistics well behave with respect to the related parameters. In the table below we report sufficient / well behaving statistics for the parameters of the most commonly used distribution laws.