User:Philipmccarthy/Lexical diversity

Lexical Diversity

Also known as lexical richness and vocabulary richness, lexical diversity refers to the range of words deployed in a text.

The earliest index of lexical diversity was type-token ratio (Templin, 1957). Type-token ratio (TTR) refers to the number of types (unique words) in a text, divided by the number of tokens (words) in a text. Although extensively used, even to this day, TTR is generally recognized as being extremely sensitive to length variations in text (Malvern et al., 2004; McCarthy and Jarvis, 2007). The problem arises because text length increases constantly (one token extra = one extra token), whereas types increase rapidly in the early stages of text, but then gradually trail off.

In recent years, two indices of lexical diversity have emerged: vocd and MTLD. Although vocd remains sensitive to text length, the effect is very small (McCarthy and Jarvis, 2007). MTLD appears not to be a function of text length (McCarthy, 2005). While both vocd and MTLD are indices of lexical diversity, each assesses text quite differently. Thus, while the indices tend to correlate quite highly ( r => .700), they are probably measuring different aspects of lexical diversity.