A Linguistic Atlas of Early Middle English

A Linguistic Atlas of Early Middle English (LAEME) is a digital, corpus-driven, historical dialect resource for Early Middle English (1150–1325). LAEME combines a searchable Corpus of Tagged Texts (CTT), an Index of Sources, and dot maps showing the distribution of textual dialect features. LAEME is headed by the University of Edinburgh's Margaret Laing, and includes contributions from Roger Lass (University of Cape Town), and web-scripts by Keith Williamson, Vasilis Karaiskos (University of Edinburgh) and Sherrylyn Branchaw (University of California, LA).

Dating from 1987, a year after the publication of A Linguistic Atlas of Late Medieval English (LALME), LAEME's parent project, LAEME builds on medieval dialect methodologies developed for LALME, but parts ways with the latter by employing corpus linguistics methods. In its present form, such methods include the lexico-grammatical tagging of a select but comprehensive Early Middle English corpus, to which an ongoing project will add syntactic parsing (see Parsed Linguistic Atlas of Early Middle English P-LAEME). Public access to a fully tagged, syntactically annotated corpus should provide unprecedented scope for phonological, lexico-grammatical, semantic, pragmatic as well as dialectal inquiry into a period marked by rapid linguistic change but also by a paucity of surviving texts. At the lexical level, further scope is added by A Corpus of Narrative Etymologies (CoNE, Version 1.1, 2013). CoNE derives a Corpus of Changes (CC) from LAEME's Corpus of Tagged Texts (CTT), giving relative chronologies of lexical forms annotated by Special Codes. CoNE's narrative etymologies are not based on semantic backtracking through cognates, but on the processual narrative of word forms through time.

Background
Prior to the advent of modern linguistics, medieval philologists had sought to describe patterns of linguistic variation on the basis of literary language. This approach posed significant problems, as literary language was and is more likely to be imitative, archaizing, or syncretic than everyday documentary language. Texts composed in a literary tradition or copied from literary exemplars, then, are not ideal for use in reconstructing the distribution of dialect features for a given period. Nevertheless, to take the most famous Middle English instance, J.R.R. Tolkien was able to conjecture the AB literary dialect by comparing manuscripts of the Katherine Group ("B") and the Ancrene Wisse ("A"). Drawing on his personal knowledge of Old English and Old Norse, while counting and cataloging thousands of verbs by hand, Tolkien argued for a regional literary standard localized to north-west Herefordshire that preserved Old English elements into the 13th century. His culminating philological essay, Ancrene Wisse and Hali Meiðhad (1929), has been called "one of the great triumphs of English philology" but also "philology's last gasp."

In the decade after this pioneering work, Middle English dialect studies went on generational hiatus. LALME, whose initial stages date to 1952, ushered in the next phase. Motivated by strong arguments for scribal normalization, historical dialectologists were no longer constrained by the unevenness of scribal fidelity. Medieval dialect studies would now rely on the relative consistency of scribal translation into a scribe's own language, while developing techniques for discriminating source from scribe. Angus McIntosh, one of LALME's compilers, "observed that most copied Middle English texts were...in language that was dialectally homogeneous,", suggesting scribal conversion of exemplar language into local varieties. Problematically, though, such varieties might not reflect the geographic areas of composition, since scribes often traveled to copying centers from far afield. This and other problems arising from the diversity of scribal practices (e.g. literam, or literal, and mischsprache, or mixed-language copying) placed a premium on texts of explicit local provenance to anchor dialectal domains. Such anchor texts, often correspondence or legal documents, form the basis of LALME's fit-technique. Michael Benskin, another LALME compiler, describes the fit-technique as a "mechanical means of discovering whereabouts in the continuum of accents [an] unfamiliar accent belongs." This is done by comparing word forms in texts of uncertain origin against like forms attested in the anchor matrix, a comparison which "depends on the progressive elimination of the areas to which the individual elements of the accent do not belong." Only frequently attested items of particular dialectal salience are so compared. In the case of LALME, these items were solicited through text questionnaires later converted into Linguistic Profiles (LPs). Questionnaires were not deemed practicable in LAEME's case, since the amount of linguistic information desired from comparatively fewer, comparatively fragmentary texts would have made them unwieldy. Instead, LAEME opted for corpus digitization.

Corpus and tools
The period covered by LAEME is of prime grammatical and phonological interest, as the language was undergoing widespread inflectional loss from OE but at different rates in different regions. Variation within phonological categories and inflectional paradigms possess dialectal and base-grammatical significance. Emergent orthography of phonetically-variable characters yogh, thorn and edh index other changes in the language. Scribes during this period show a proportionally higher preference for literam and mixed-language copying, resulting in a higher proportion of composite texts compared to later periods. Relict usage, where older exemplar forms remain unmodified in a translated text, and constrained selection, where exemplar forms are maintained because of scribal familiarity, pose additional problems for dialectal discrimination. Students of the period must also take into account its pronounced text/speech diglossia, with the majority of texts composed in Latin or French. Because LAEME's anchor matrix is thin and patchy, its dialectal fits are informed approximations subject to revision. LAEME's texts overall are geographically and temporally uneven, extremely sparse for its first half-century and for Northern varieties in general, much denser for Southern varieties and through its latter half-century, where it begins to overlap with LALME.

The above complexities and complications have obliged LAEME's architects to build the atlas around a digitized corpus of lexico-grammatically tagged texts that can be searched and compared according to the research needs of users. Instead of creating Linguistic Profiles from questionnaire responses, text dictionaries - taxonomical inventories of each text language - are derived from tagged texts. Independent searchability multiplies LAEME's usages across disciplines, as scholars and researchers can generate their own Form and Tag Dictionaries, Feature Maps, and Concordances. LAEME surpasses its remit as a dialect resource by offering a suite of multi-function linguistic tools for public use. This versatility accords with Angus McIntosh's progressive vision for LALME, which he saw as "more likely to benefit those concerned with the literature and culture and social structure of medieval England than those primarily concerned with...linguistics."

An example of a multi-function LAEME tool is the Concordance. Users begin by selecting a tag type (suffixes, grammatical words, inflection, lexis), entering a search string, and then a position limiter (initial, medial, final). This search is followed by a filter set allowing users to specify counties, number of words to precede and/or proceed the search string, and sorting parameters (form, tag, date, file). Finally, users may select specific tagged forms generated by the search. Entries in the resulting concordance link to manuscript descriptions and to corresponding text dictionaries. Users are able to view contextualized instances of items indexed to coded sources available in several file types. Similar processes can be used to create Tag and Form Dictionaries with frequency counts, and to generate Feature Maps.