Wikipedia:Reference desk/Archives/Language/2013 May 4

= May 4 =

Clusivity in the second person
Our article Clusivity explains that it is the presence of separate inclusive (I and you) and exclusive (I and others, but not you) forms of first-person pronouns. That's clear enough. It then goes on to talk about clusivity in the second person (you vs. you and them). I don't see the difference between this and number. Surely "you and them" is just the second person plural. Can anyone explain what I'm missing here? Rojomoke (talk) 12:22, 4 May 2013 (UTC)
 * If I'm addressing Jack and Jill, my use of "you" (or, let's say, a marked plural form like German ihr or AmE y'all) might mean Jack and Jill, or it might mean Jack and Jill and Bob, say. I guess these are the two different situations it's referring to. Victor Yus (talk) 12:37, 4 May 2013 (UTC).
 * Yes. There are potentially three distinct groups: the speaker(s), the person(s) addressed, and others, and it is possible for a language to distinguish all seven combinations of these, irrespective of number. Lojban does so: I don't know if any natural langauges do. --ColinFine (talk)

Free, extensively tagged corpora of contemporary English?
Are there free, extensively tagged corpora of contemporary English? What I'm hoping to find is a free corpus of mainstream English in which By "contemporary", I mean from recent decades, especially the last two.
 * nouns and pronouns are tagged with such grammatical attributes as gender, number, person, and case; and
 * verbs are tagged with their verb forms & valencies.

Thanks. --108.2.210.141 (talk) 19:41, 4 May 2013 (UTC)


 * Digitized? For what purpose?  Given your location I suggest you try asking Labov's department. Native PA speakers shouldn't have a problem identifying such attributes for common English words--your purpose is obscure. μηδείς (talk) 07:00, 5 May 2013 (UTC)


 * Thanks for responding. I'm not a linguist, just someone with an intellectual curiosity about languages, particularly English. I want to know the relative prevalence of the variations in the way people use English. For example, how common is the singular "they" used? How common is "she" used as a pronoun for a person of unknown gender? How common are words like "police" and "government" treated as plural. The more extensively tagged a corpus is, the more kinds of statistics can be computed from it mechanically. For that purpose, it'd be ideal with the tags are easy to process by a computer program. There are a few other things that I'd like to try if I have a richly-tagged, computer-friendly corpus to play with. --108.2.210.141 (talk) 13:31, 5 May 2013 (UTC)


 * Your choices will be limited because high-quality corpora of useful size are horribly expensive to create. If you want it big and free, then the WaCKy corpora are probably your best bet. http://wacky.sslmit.unibo.it/doku.php?id=corpora Note however that they have been annotated automatically. That means that the quality of the annotations is limited by the software used to create them. For many applications that won't matter at all, but you should be aware of it. If an older and much, much smaller corpus is acceptable, then you could check the manually annotated SUSANNE corpus. http://www.grsampson.net/Resources.html If you have never used a corpus before, be warned that they use their own proprietary formats and you will have to put in some work before you can access the information you are looking for. KarlLohmann (talk) 06:28, 6 May 2013 (UTC)