Wikipedia:Reference desk/Archives/Language/2021 January 11

= January 11 =

Parsing
Using common sense to parse this sentence:

Caren was born and raised in Rustenberg, South Africa with her three older siblings.

It's clear that Caren was born alone, but raised with 3 siblings. Both things happened in Rustenberg.

But, isn't the sentence actually saying that she was born and raised with 3 siblings? If we keep the structure and use other words, would it be so clear that the first act was alone and the second with others? Like: --Bumptump (talk) 14:16, 11 January 2021 (UTC)
 * Yes, it's poorly worded. Where did you see it? ←Baseball Bugs What's up, Doc? carrots→ 15:56, 11 January 2021 (UTC)
 * It's found here. I don't find the sentence problematic. Bus stop (talk) 16:33, 11 January 2021 (UTC)


 * Formally, the sentence has not two but three different potential meanings: 1) [born] and [raised with siblings]; 2) [born and raised] with siblings; 3) [born-and-raised] with siblings. The difference between 2) and 3) is that 2) may be expanded to 2a) [born with siblings] and [raised with siblings], but 3) cannot be. Your common-sense interpretation is either 1 or 3, but your claim that the sentence is "saying" 2 is false, unless you mean that that is one of several things it is saying.
 * With some choices of words, the sentence would be actually ambiguous; but language is used by real people in the real world, and both discourse pragmatics and real-world constraints are crucial parts of understanding language, as AI researchers have often found. --ColinFine (talk) 17:14, 11 January 2021 (UTC)
 * We don't need ultimate clarity in all contexts. In a legal contract ultimate clarity is necessary. But in other contexts it may not. Bus stop (talk) 18:26, 11 January 2021 (UTC)
 * Alternatively we may say "At the point of Caren's birth three other siblings were in existence. The birth of Caren transpired in Rustenberg, South Africa. The upbringing of Caren transpired in Rustenberg, South Africa, in the presence of her three older siblings." Bus stop (talk) 19:10, 11 January 2021 (UTC)
 * Or simply, "Caren was born in Rustenberg, South Africa and raised with her three older siblings." ←Baseball Bugs What's up, Doc? carrots→ 21:30, 11 January 2021 (UTC)
 * That works too. Bus stop (talk) 21:58, 11 January 2021 (UTC)


 * At a stretch, it could mean that she was the last-born of a set of quadruplets. But nobody would read it that way, without some prior context. --  Jack of Oz   [pleasantries]  22:05, 11 January 2021 (UTC)
 * It was clear to me what the writer meant. But you can't count on publicity writers for perfect English. I do wonder if Caren Pistorius is related to Oscar Pistorius. ←Baseball Bugs What's up, Doc? carrots→ 22:40, 11 January 2021 (UTC)
 * It's not clear to me. It's ambiguous as to whether her siblings were also born in Rustenberg. Clarityfiend (talk) 23:49, 11 January 2021 (UTC)
 * Right. Not automatically clear to everyone. Unfortunately, her website is not subject to Wikipedia rules. ←Baseball Bugs What's up, Doc? carrots→ 01:39, 12 January 2021 (UTC)
 * And, as Bus stop implied above, it's not necessary that the sentence should also indicate where her siblings were born: the article is about her, not them. {The poster formerly known as 87.81.230.195} 90.200.40.9 (talk) 13:49, 12 January 2021 (UTC)
 * According to Oscar's article, they are not related. Bumptump (talk) 00:32, 12 January 2021 (UTC)
 * The surname Pistorius entered South Africa with the progenitor | Friedrich Heinrich Pistorius (born in Germany in 1789), who is the 4x great-grandfather of Oscar. While Caren's patrilineage is not available, it is likely she and Oscar are no more distantly related than fifth cousins. 124.148.156.188 (talk) 10:44, 12 January 2021 (UTC)
 * Also consider this: "born and raised in Rustenberg, South Africa with her three older siblings, while her seventeen younger siblings were sold to passing merchants". Obviously not intended, and also not how the sentence would normally be understood, but the wording does, strictly speaking, not imply that the subject was the youngest of the litter. --Lambiam 11:07, 12 January 2021 (UTC)
 * Good point. We aren't trying, in this sort of writing, to remove all ambiguity. Legal writing aims to leave no loopholes. This source writes: "Contract language is limited and stylised," says Adams. He compares it to software code: do it right and everything works smoothly. But make a typo and the whole thing falls apart. When errors are introduced into legal documents, they’re likely to be noticed far more than in any other form of writing, he says. "People are more prone to fighting over instances of syntactic ambiguity than in other kinds of writing." Bus stop (talk) 13:49, 12 January 2021 (UTC)


 * It needs a balancing comma after South Africa. —Tamfang (talk) 02:06, 15 January 2021 (UTC)

Vowel location frequency within words
Playing word games sometimes leads my mind to weird places. For example, if you're playing a Scrabble-like game, where you need to form words based on a set of letters, and you have an "A" and an "E" and a selection of consonants, it makes sense to put the "A" closer to the front and the "E" closer to the back because there seem to be more words that follow that pattern. Of course there are lots of words where the "E" comes before the "A", like "beat" and "break", but many more where the opposite is true, like "bake" and "fake" and "mashed" and so on. Further, the patterns are probably even more distinct if you control for prefixes and post-fixes and so on. My question is: has this been studied using a corpus of English words? Matt Deres (talk) 23:30, 11 January 2021 (UTC)


 * If you have a Unix/Linux system, you can do some basic checking yourself:
 * grep -ic 'e.*a' /usr/share/dict/words
 * grep -ic 'a.*e' /usr/share/dict/words
 * etc... AnonMoos (talk) 05:37, 12 January 2021 (UTC)
 * And if you restrict the count to words that contain just two vowels:
 * grep -ic '^[^aeiou]*a[^aeiou]*e[^aeiouy]*$' /usr/share/dict/words
 * grep -ic '^[^aeiou]*e[^aeiou]*a[^aeiouy]*$' /usr/share/dict/words
 * (an approximation, because of the ambiguity of the letter (y)), the discrepancy is even more pronounced; I find a-e : e-a = 3000 : 1356. --Lambiam 10:57, 12 January 2021 (UTC)


 * Neat! Thank you both; maybe I should have asked this on RD/C. :) I guess I was assuming this was a common way of describing languages and someone had already done this kind of work, comparing languages or variants and so on. Like, how does the pattern change between EN-variants. Unfortunately, I don't use Unix or Linux. Matt Deres (talk) 16:59, 12 January 2021 (UTC)


 * I would think that "o" before "u" would show a lot more frequency variation between UK and American English spellings than "a" before "e". Traditional cryptanalysts often compiled tables of digraph frequency, and sometimes trigraph frequency, but such tables referred to combinations of adjacent letters...  AnonMoos (talk) 21:57, 12 January 2021 (UTC)

I finally rebooted my computer from Windows into Linux to do some tasks, and also did some searches. The "words" file there has 45425 words, and according to the simplest search, 6753 words have "e" before "a", while 10831 words have "a" before "e" (of course, some words have both). On another Unix-style system I have access to (not Linux), the "words" file has 235970 words, and 54427 have have "e" before "a", while 63392 have "a" before "e". AnonMoos (talk) 05:58, 15 January 2021 (UTC)