Mokken scale

The Mokken scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971.

Mokken Scales have been used in psychology, education, political science,  public opinion, medicine and nursing.

Overview


Mokken scaling belongs to item response theory. In essence, a Mokken scale is a non-parametric, probabilistic version of Guttman scale. Both Guttman and Mokken scaling can be used to assess whether a number of items measure the same underlying concept. Both Guttman and Mokken scaling are based on the assumption that the items are hierarchically ordered: this means that they are ordered by degree of "difficulty". Difficulty here means the percentage of respondents that answers the question affirmatively. The hierarchical order means that a respondent who answered a difficult question correctly is assumed to answer an easy question correctly. The key difference between a Guttman and Mokken scale is that Mokken scaling is probabilistic in nature. The assumption is not that every respondent who answered a difficult question affirmatively will necessarily answer an easy question affirmatively. Violations of this are called Guttman errors. Instead, the assumption is that respondents who answered a difficult question affirmatively are more likely to answer an easy question affirmatively. The scalability of the scale is measured by Loevinger's coefficient H. H compares the actual Guttman errors to the expected number of errors if the items would be unrelated.

The chance that a respondent will answer an item correctly is described by an item response function. Mokken scales are similar to Rasch scales, in that they both adapted Guttman scales to a probabilistic model. However, Mokken scaling is described as 'non-parametric' because it makes no assumptions about the precise shape of the item response function, only that it is monotone and non-decreasing. The key difference between Mokken scales and Rasch scales is that the latter assumes that all items have the same item response function. In Mokken scaling the Item Response Functions differ for different items.

Mokken scales can come in two forms: first as the Double Monotonicity model, where the items can differ in their difficulty. It is essentially an ordinal version of Rasch scale; and second, as the Monotone Homogeneity model, where items differ in their discrimination parameter, which means that there can be a weaker relationship between some items and the latent variable and other items and the latent variable. Double Monotonicity models are used most often.

Monotone homogeneity
Monotone homogeneity models are based on three assumptions.
 * 1) There is a unidimensional latent trait on which subject and items can be ordered.
 * 2) The item response function is monotonically nondecreasing. This means that as one moves from one side of the latent variable to the other, the chance of giving a positive response should never decrease.
 * 3) The items are locally stochastically independent: this means that responses to any two items by the same respondent should not be the function any other aspect of the respondent or the item, but his or her position on the latent trait.

Double monotonicity and invariant item ordering
The Double Monotonicity model adds a fourth assumption, namely non-intersecting Item response functions, resulting in items that remain invariant rank-ordering. There has been some confusion in Mokken scaling between the concepts of Double Monotonicity model and invariant item ordering. The latter implies that all respondents to a series of questions all respond to them in the same order across the whole range of the latent trait. For dichotomously scored items, the Double Monotonicity model can mean invariant item ordering; however, for polytomously scored items this does not necessarily hold. For invariant item ordering to hold not only should the item response functions not intersect, also, the item step response function between one level and the next within each item must not intersect.

Sample size
The issue of sample size for Mokken scaling is largely unresolved. Work using simulated samples and varying the item quality in the scales (Loevinger's coefficient and the correlation between scales) suggests that, where the quality of the items is high that lower samples sizes in the region of 250–500 are required compared with sample sizes of 1250–1750 where the item quality is low. Using real data from the Warwick Edinburgh Mental Well Being Scale (WEMWBS) suggests that the required sample size depends on the Mokken scaling parameters of interest as they do not all respond in the same way to varying sample size.

Extensions
While Mokken scaling analysis was originally developed to measure the extent to which individual dichotomous items form a scale, it has since been extended for polytomous items. Moreover, while Mokken scaling analysis is a confirmatory method, meant to test whether a number of items form a coherent scale (like confirmatory factor analysis), an Automatic Item Selection Procedure has been developed to explore which latent dimensions structure responses on a number of observable items (like factor analysis).

Analysis
Mokken scaling software is available within the public domain statistical software R (programming language) and also within the data analysis and statistical software stata. MSP5 for Windows for use on personal computers is no longer compatible with current versions of Microsoft Windows. Also within the R (programming language), unusual response patterns in Mokken Scales can be checked using the package PerFit. Two guides on how to conduct a Mokken scale analysis have been published.