Word sketch

A word sketch is a one-page, automatic, corpus-derived summary of a word’s grammatical and collocational behaviour. Word sketches were first introduced by the British corpus linguist Adam Kilgarriff and exploited within the Sketch Engine corpus management system. They are an extension of the general collocation concept used in corpus linguistics in that they group collocations according to particular grammatical relations (e.g. subject, object, modifier etc.). The collocation candidates in a word sketch are sorted either by their frequency or using a lexicographic association score like Dice, T-score or MI-score.

Since the introduction, word sketches have been used by lexicographers to develop modern corpus-based dictionaries by major publishing houses including Oxford English Dictionary, Macmillan English Dictionary and comprising dozens of languages including English, Chinese, Slovene, Japanese, Dutch, Romanian, Russian, Czech, Polish, Vietnamese, Turkish, Portuguese, Hindi, Spanish and others.

Formal account
A word sketch triple is a triple consisting of headword, grammatical relation, collocation (e.g. man, modifier, young). Considering an underlying text corpus, a word sketch quintuple is a quintuple consisting of headword, grammatical relation, collocation, position of headword in the corpus, position of collocation in the corpus (e.g. man, modifier, young, 104, 103). A word sketch database is a set of such triples or quintuples, which may be generated either by querying a corpus using corpus query language or by parsing the corpus using a natural language parser.