Lexis ratio

The Lexis ratio is used in statistics as a measure which seeks to evaluate differences between the statistical properties of random mechanisms where the outcome is two-valued &mdash; for example "success" or "failure", "win" or "lose". The idea is that the probability of success might vary between different sets of trials in different situations. This ratio is not much used currently having been largely replaced by the use of the chi-squared test in testing for the homogeneity of samples.

This measure compares the between-set variance of the sample proportions (evaluated for each set) with what the variance should be if there were no difference between in the true proportions of success across the different sets. Thus the measure is used to evaluate how data compares to a fixed-probability-of-success Bernoulli distribution. The term "Lexis ratio" is sometimes referred to as L or Q, where


 * $$L^2 =Q^2 = \frac{s^2}{\sigma_0^2}.$$

Where $$s^2 $$ is the (weighted) sample variance derived from the observed proportions of success in sets in "Lexis trials" and $$\sigma_0^2$$ is the variance computed from the expected Bernoulli distribution on the basis of the overall average proportion of success. Trials where L falls significantly above or below 1 are known as supernormal and subnormal, respectively.

This ratio ( Q ) is a measure that can be used to distinguish between three types of variation in sampling for attributes: Bernoullian, Lexian and Poissonian. The Lexis ratio is sometimes also referred to as L.

Definition
Let there be k samples of size n1, n3, n3, ..., nk and these samples have the proportion of the attribute being examined of p1, p2, p3, ..., pk respectively. Then the Lexis ratio is


 * $$ Q = \frac{ \sum{ n_i ( p_i - p )^2 } }{ ( k - 1 ) p( 1 - p ) } $$

If the Lexis ratio is significantly below 1, the sampling is referred to as Poissonian (or subnormal); it is equal to 1 the sampling is referred to as Bernoullian (or normal); and if it is above 1 it is referred to as Lexian (or supranormal). Chuprov showed in 1922 that in the case of statistical homogeneity
 * $$ E( Q ) = 1 $$

and

$$ var( Q ) = \frac{ 2 }{ n - 1 } $$

where E is the expectation and var is the variance. The formula for the variance is approximate and holds only for large values of n.

An alternative definition is


 * $$ Q = \frac{ s^2 }{ \sigma_0^2 }$$

here $$s^2 \,$$ is the (weighted) sample variance derived from the observed proportions of success in sets in "Lexis trials" and $$\sigma_0^2$$ is the variance computed from the expected Bernoulli distribution on the basis of the overall average proportion of success.

Lexis variation
A closely related concept is the Lexis variation. Let k samples each of size n be drawn at random. Let the probability of success (p) be constant and let the actual probability of success in the kth sample be p1, p2, ..., pk.

The average probability of success (p) is


 * $$ p = \frac{ 1 }{ k } \sum{ p_i } $$

The variance in the number of successes is


 * $$ var(successes) = n p ( 1 - p ) + n ( n - 1 ) var( p_i ) $$

where var( pi ) is the variance of the pi.

If all the pi are equal the sampling is said to be Bernoullian; where the pi differ the sampling is said to be Lexian and the dispersion is said to be supranormal.

Lexian sampling occurs in sampling from non homogenous strata.

History
Wilhelm Lexis introduced this statistic to test the then commonly held assumption that sampling data could be regarded as homogeneous.