Talk:List of text corpora

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

What about using a simple criterion for including a corpus in this list? There are hundreds of various corpora, but only few of them are used and mentioned in Corpus linguistics papers. What about setting a threshold for at least 10 citations / uses of a corpus by various authors? It is easy to check with Google Scholar. Of course, each corpus here should be published as a paper. Vít Baisa (talk) 09:41, 25 January 2016 (UTC)[reply]

More languages[edit]

I was surprised to see how few languages are listed here with corpora. I added several African corpora, but I hope that those with specialized knowledge about corpus linguistics will sift the ones already listed (removing those that are not complete) and adding useful corpora. Pete unseth (talk) 21:12, 23 July 2023 (UTC)[reply]