ANOVA on ranks

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

Logic of the F test on means
The F statistic is a ratio of a numerator to a denominator. Consider randomly selected subjects that are subsequently randomly assigned to groups A, B, and C. Under the truth of the null hypothesis, the variability (or sum of squares) of scores on some dependent variable will be the same within each group. When divided by the degrees of freedom (i.e., based on the number of subjects per group), the denominator of the F ratio is obtained.

Treat the mean for each group as a score, and compute the variability (again, the sum of squares) of those three scores. When divided by its degrees of freedom (i.e., based on the number of groups), the numerator of the F ratio is obtained.

Under the truth of the null hypothesis, the sampling distribution of the F ratio depends on the degrees of freedom for the numerator and the denominator.

Model a treatment applied to group A by increasing every score by X. (This model maintains the underlying assumption of homogeneous variances. In practice it is rare – if not impossible – for an increase of X in a group mean to occur via an increase of each member's score by X.) This will shift the distribution X units in the positive direction, but will not have any impact on the variability within the group. However, the variability between the three groups' mean scores will now increase. If the resulting F ratio raises the value to such an extent that it exceeds the threshold of what constitutes a rare event (called the Alpha level), the Anova F test is said to reject the null hypothesis of equal means between the three groups, in favor of the alternative hypothesis that at least one of the groups has a larger mean (which in this example, is group A).

Handling violation of population normality
Ranking is one of many procedures used to transform data that do not meet the assumptions of normality. Conover and Iman provided a review of the four main types of rank transformations (RT). One method replaces each original data value by its rank (from 1 for the smallest to N for the largest). This rank-based procedure has been recommended as being robust to non-normal errors, resistant to outliers, and highly efficient for many distributions. It may result in a known statistic (e.g., in the two independent samples layout ranking results in the Wilcoxon rank-sum / Mann–Whitney U test), and provides the desired robustness and increased statistical power that is sought. For example, Monte Carlo studies have shown that the rank transformation in the two independent samples t-test layout can be successfully extended to the one-way independent samples ANOVA, as well as the two independent samples multivariate Hotelling's T2 layouts Commercial statistical software packages (e.g., SAS) followed with recommendations to data analysts to run their data sets through a ranking procedure (e.g., PROC RANK) prior to conducting standard analyses using parametric procedures.

Failure of ranking in the factorial ANOVA and other complex layouts
ANOVA on ranks means that a standard analysis of variance is calculated on the rank-transformed data. Conducting factorial ANOVA on the ranks of original scores has also been suggested. However, Monte Carlo studies,   and subsequent asymptotic studies  found that the rank transformation is inappropriate for testing interaction effects in a 4x3 and a 2x2x2 factorial design. As the number of effects (i.e., main, interaction) become non-null, and as the magnitude of the non-null effects increase, there is an increase in Type I error, resulting in a complete failure of the statistic with as high as a 100% probability of making a false positive decision. Similarly, it was found that the rank transformation increasingly fails in the two dependent samples layout as the correlation between pretest and posttest scores increase. It was also discovered that the Type I error rate problem was exacerbated in the context of Analysis of Covariance, particularly as the correlation between the covariate and the dependent variable increased.

Transforming ranks
A variant of rank-transformation is 'quantile normalization' in which a further transformation is applied to the ranks such that the resulting values have some defined distribution (often a normal distribution with a specified mean and variance). Further analyses of quantile-normalized data may then assume that distribution to compute significance values. However, two specific types of secondary transformations, the random normal scores and expected normal scores transformation, have been shown to greatly inflate Type I errors and severely reduce statistical power.

Violating homoscedasticity
The ANOVA on ranks has never been recommended when the underlying assumption of homogeneous variances has been violated, either by itself, or in conjunction with a violation of the assumption of population normality. In general, rank based statistics become nonrobust with respect to Type I errors for departures from homoscedasticity even more quickly than parametric counterparts that share the same assumption.

Further information
Kepner and Wackerly summarized the literature in noting "by the late 1980s, the volume of literature on RT methods was rapidly expanding as new insights, both positive and negative, were gained regarding the utility of the method. Concerned that RT methods would be misused, Sawilowsky et al. (1989, p. 255) cautioned practitioners to avoid the use of these tests 'except in those specific situations where the characteristics of the tests are well understood'." According to Hettmansperger and McKean, "Sawilowsky (1990) provides an excellent review of nonparametric approaches to testing for interaction" in ANOVA.