Probability of superiority

The probability of superiority or common language effect size is the probability that, when sampling a pair of observations from two groups, the observation from the second group will be larger than the sample from the first group. It is used to describe a difference between two groups. D. Wolfe and R. Hogg introduced the concept in 1971. Kenneth McGraw and S. P. Wong returned to the concept in 1992 preferring the term common language effect size. The term probability of superiority was proposed by R. J. Grissom a couple of years later.

The probability of superiority can be formalized as $$ P(X > Y) $$. (D. Wolfe and R. Hogg originally discussed it in the inverted form $$ P(X < Y) $$). $$ P(X > Y) $$ is the probability that some value ($$ X $$) sampled at random from one population is larger than the corresponding score ($$ Y $$) sampled from another population.

Examples
McGraw and Wong gave the example of sex differences in height, noting that when comparing a random man with a random woman, the probability that the man will be taller is 92%. (Alternatively, in 92 out of 100 blind dates, the male will be taller than the female. )

The population value for the common language effect size is often reported like this, in terms of pairs randomly chosen from the population. Kerby (2014) notes that a pair, defined as a score in one group paired with a score in another group, is a core concept of the common language effect size.

As another example, consider a scientific study (maybe of a treatment for some chronic disease, such as arthritis) with ten people in the treatment group and ten people in a control group. If everyone in the treatment group is compared to everyone in the control group, then there are (10×10=) 100 pairs. At the end of the study, the outcome is rated into a score, for each individual (for example on a scale of mobility and pain, in the case of an arthritis study), and then all the scores are compared between the pairs. The result, as the percent of pairs that support the hypothesis, is the common language effect size. In the example study it could be (let's say) .80, if 80 out of the 100 comparison pairs show a better outcome for the treatment group than the control group, and the report may read as follows: "When a patient in the treatment group was compared to a patient in the control group, in 80 of 100 pairs the treated patient showed a better treatment outcome." The sample value, in for example a study like this, is an unbiased estimator of the population value.

Equivalent statistics
An effect size related to the common language effect size is the rank-biserial correlation. This measure was introduced by Cureton as an effect size for the Mann–Whitney U test. That is, there are two groups, and scores for the groups have been converted to ranks.

The Kerby simple difference formula computes the rank-biserial correlation from the common language effect size. Letting f be the proportion of pairs favorable to the hypothesis (the common language effect size), and letting u be the proportion of pairs not favorable, the rank-biserial r is the simple difference between the two proportions: r = f − u. In other words, the correlation is the difference between the common language effect size and its complement. For example, if the common language effect size is 60%, then the rank-biserial r equals 60% minus 40%, or r = 0.20. The Kerby formula is directional, with positive values indicating that the results support the hypothesis.

A non-directional formula for the rank-biserial correlation was provided by Wendt, such that the correlation is always positive.