Data-driven learning

Data-driven learning (DDL) is an approach to foreign language learning. Whereas most language learning is guided by teachers and textbooks, data-driven learning treats language as data and students as researchers undertaking guided discovery tasks. Underpinning this pedagogical approach is the data - information - knowledge paradigm (see DIKW pyramid). It is informed by a pattern-based approach to grammar and vocabulary, and a lexicogrammatical approach to language in general. Thus the basic task in DDL is to identify patterns at all levels of language. From their findings, foreign language students can see how an aspect of language is typically used, which in turn informs how they can use it in their own speaking and writing. Learning how to frame language questions and use the resources to obtain data and interpret it is fundamental to learner autonomy. When students arrive at their own conclusions through such procedures, they use their higher order thinking skills (see Bloom's taxonomy) and are creating knowledge (see Vygotsky).

In DDL, students use the same types of tools that professional linguists use, namely a corpus of texts that have been sampled and stored electronically, and a concordancer, which is a search engine designed for linguistic analysis. Some tools have been specifically created for data-driven learning, such as SkELL, WriteBetter, and Micro-concord.

Micro-concord was the first significant software designed for classroom use. It was developed for the MS-DOS microcomputers by Tim Johns and Mike Scott and published for DOS computers in 1993 by OUP. It evolved into the widely-used WordSmith Tools.

Johns (1936 – 2009) pioneered data-driven learning and coined the term. It first appeared in an article, Should you be persuaded: Two examples of data-driven learning (1991). His paper, From Printout to Handout, is reprinted and discussed at length in Volume 2 of Hubbard's Computer-Assisted Language Learning. Thomas' task-based Discovering English with Sketch Engine exemplifies DDL and acknowledges Johns throughout. Other recent books on DDL which credit Johns as the originator of the approach include those by Anderson and Corbett (2009), Reppen (2010), Bennett (2010), Flowerdew (2012), Boulton and Tyne (2014), and Friginal (2018).

Johns worked at the English for Overseas Students Unit of Birmingham University from 1971 till the end of his career. This was while John Sinclair led a large team of linguists at Birmingham University working on the COBUILD project which delivered the first major corpus-based dictionaries and grammars of English for foreign students. COBUILD however, never tasked students with exploring language data themselves.

Johns' referred to his specific DDL approach as kibitzing: when he returned his students' written work, together they would explore the errors using corpus data. A selection of these Kibbitzer tutorials are accessible on Mike Scott’s website.

Despite the widespread awareness of corpora among the major movers and shakers in foreign language teaching, DDL is not widely embraced by its practitioners. One of the main reasons for this is the incompatibility of views on language and language learning: traditional language teachers and textbooks have a prescriptive view of language treating it as a system of rules to be memorised, engaging only lower order thinking skills. A descriptive view of language permits the observation of language patterns and outliers that exist in language itself. DDL positions students to use higher order thinking skills to learn how to learn to make and learn from their own observations. Such guided discovery leads to fuzzy results, which are incompatible with prescriptive linguistics and teaching.

There is a considerable body of research conducted into DDL as evidenced by the professional bodies, books, journal articles and conference presentations. TaLC (Teaching and Language Corpora) is a biennial conference that is a platform for corpus-based research that has a pedagogical focus. CorpusCALL is a special interest group within EuroCALL and is mostly active through its Facebook group. The online teaching journal, Humanising Language Teaching hosts a section called Corpus Ideas.