Wikipedia:Reference desk/Archives/Mathematics/2015 June 11

= June 11 =

Can we define the frequency of a word in a language with infinite sentences?
Given a language with finite words, aka elements, but infinite combinations of words, aka sentenced, could we say that any word has the same frequency or say that frequency is not defined?--Llaanngg (talk) 18:46, 11 June 2015 (UTC)
 * English is such a language, see Word_lists_by_frequency - there's no real linguistic problem here unless you try to make it a problem. There's not even a mathematical problem, you just restrict to a corpus, e.g. here . But if you really really want to force it into being mathematically problematic, then sure, you can do so in various ways. It's sort of like trying to to find a way to force a uniform distribution on to the real numbers. SemanticMantis (talk) 19:38, 11 June 2015 (UTC)


 * Yes, a reasonable person won't have any problem considering "the" more common than "grumpy" in English, as he will be assuming that we are referring to a corpus. Similarities with other natural languages apply. Mathematicians often are more pedantic than this. --Abaget (talk) 20:19, 11 June 2015 (UTC)


 * Chiming in here with a pet peeve &mdash; what you mean is infinitely many sentences, not "infinite sentences". A "language with infinite sentences" would be one in which a single sentence could be infinitely long.  This may seem picky but when you start discussing these things it's really pretty important to get it right. --Trovatore (talk) 19:49, 11 June 2015 (UTC)


 * It is both infinite sentences (as in as long as you wish, since a recurrently built part of a sentence is allowed), and infinitely many combinations of words (but not all combinations allowed). Both could be meant here. --Abaget (talk) 20:19, 11 June 2015 (UTC)
 * It is absolutely critical, and I can't possibly say this strongly enough, to distinguish between "as long as you wish" and "infinitely long". --Trovatore (talk) 23:49, 11 June 2015 (UTC)
 * Yeah, I don't know what exactly OP meant, and there is no written infinitely long sentence in English, but valid English sentences can be arbitrarily long (in word number). I agree it is slightly sloppy in a pure mathematical sense, but "infinite" is commonly used for "no upper bound" or "arbitrarily large" and that's what we have with respect to sentence length in English.
 * As an aside, I'm fairly certain we can describe an infinite sentence in English (that is grammatically valid), even if we can't write it down (much like we do with infinite sequences). Something like "I like the dog that bit the man who kicked the dog that bit another man who kicked another dog..." or even simpler - "I will meditate on the number 1, and then on 2, and then on 3..." - I'm sure some linguists/mathematicians have talked about this sort of infinite sentence but I can't search for it right now. After all, Chomsky has been talking about this kind of thing (see first sentence of Recursion) for quite some time. SemanticMantis (talk) 20:22, 11 June 2015 (UTC)
 * Oh, infinitely long (mathematical) sentences are a well-studied topic. See infinitary logic for a start.  It's not just a curiosity; it has very important applications in model theory and descriptive set theory.  See for example Scott analysis and Barwise compactness theorem. --Trovatore (talk) 21:20, 11 June 2015 (UTC)
 * Thanks for the refs, but I meant truly infinite-length sentences in natural language, not logic/math. SemanticMantis (talk) 21:46, 11 June 2015 (UTC)
 * Well, I don't think the issues are really all that different. --Trovatore (talk) 21:50, 11 June 2015 (UTC)
 * Yikes, we don't have an article on the Scott analysis? And the BCT is a tiny stub?  That really needs to change.... --Trovatore (talk) 21:32, 11 June 2015 (UTC)
 * I was going to respond with an example sentence that is infinitely long, but for some strange reason I could never finish writing my response. It feels like I never get any closer to the end no matter how many words I write... :) --Guy Macon (talk)
 * You're not trying - how about There's a Hole in My Bucket or better 'There was an old man named Michael Finnigan' or many of the others at Repetitive song :) Sorry I see you want a sentence - well how about the terminal sentence in I don't know why shew swallowed a fly perhaps she'll die or the house that Jack built. Dmcq (talk) 09:10, 12 June 2015 (UTC)
 * Did someone call? I don't know any infinitely long sentences, but I do know an infinitely long poem: "$∞$ Green Bottles".  --   Jack of Oz   [pleasantries]  11:13, 12 June 2015 (UTC)

There is no language with either an infinitely long sentence or with infinitely many sentences. Even if the number of sentences in English, uttered or written, is huge, it is still finite. So "a language with infinite sentences" is a contradiction. Bo Jacoby (talk) 13:09, 12 June 2015 (UTC).
 * I'm not convinced English grammar prohibits arbitrarily long sentences, and if not that would imply there are infinitely many sentences. A humorous example is Monty Python's Njorl's Saga, which starts "Erik Njorl, son of Frothgar, leaves his home to seek Hangar the Elder at the home of Thorvald Nlodvisson, the son of Gudleif, half brother of Thorgier, ..." Eventually the lead sentence gets into so may details about Thorvald's relatives and their doings that the announcer has to cut in, leaving the impression that the sentence is quite long indeed. In any case, it's really a question for linguists, not mathematicians.
 * If you want to talk about formal grammars though, an example might be the language consisting of strings of a's and b's with no two consecutive b's. You could then ask, what is the expected number of b's in a randomly selected word of length n? Using generating functions it can be shown that this is asymptotically n/(φ√5), which would imply that the average frequency of b's is 1/(φ√5). (Here φ is the golden ratio.) You can extend this naturally to infinite words. Imagine a communication channel which sends a's and b's, but can't send two consecutive b's. Then you might ask for is the maximum rate at which information can be sent. Equivalently, if you have a channel capable of transmitting dots and dashes, but the dashes take twice as long to send as the dots, what is the maximum rate? If information is being transmitted at the maximum rate, then the probability of a given allowable word of length n should be asymptotically the same as the probability of any other allowable word of length n. Which means that the average frequency of b's is 1/(φ√5), provided the channel is transmitting at it's maximum information rate. So it does make sense to talk about the frequency of letters in formal grammars and it can even be computed in some cases. There are some languages though where this doesn't work, for example the language consisting of even length strings of a's and odd length strings of b's. The frequency of b's doesn't converge in this case and it's not clear how you would extend this to infinite words.
 * PS. (For the nit pickers out there.) If Thorvald was the son of Gudleif, then his name would be Thorvald Gudleifson, which makes me question whether Njorl's Saga is historically accurate. --RDBury (talk) 15:37, 12 June 2015 (UTC)


 * Um, for both claims in your first sentence - For the claim of infinitely long sentences, see the link above to Chomsky's perspective on recursion. That is perhaps debatable, and reasonable people might reasonably disagree, but it is by no means obvious that there cannot exist an infinite sentence.
 * For infinitely many possible sentences, that is completely reasonable, and I can't see how or why you would reject that. All you need for infinitely many sentences is a finite word list and arbitrarily long sentences. Do you think that there is an upper bound on the number of words in a valid English sentence? Are perhaps you are not considering sentences with different words or different orders to be unique? Or do you think that the number of English sentences is finite, by virtue of the fact that only finitely many have currently been written? There is indeed only a finite number of sentences that have been written or spoken, so maybe we need to distinguish between possible sentences (infinite) and extant sentences (finite). Or maybe you mean something else, but I can't tell. SemanticMantis (talk) 15:44, 12 June 2015 (UTC)
 * If there are an infinite number of possible sentences but only a finite number of words then there must be some infinite sentence. Dmcq (talk) 16:44, 12 June 2015 (UTC)
 * Huh? How ya figure? --Trovatore (talk) 16:55, 12 June 2015 (UTC)
 * There are an infinite number of natural numbers, but each one is still finite, and there is no infinitely long natural number. We do have a decent article on Arbitrarily large, which may clarify some of the possible confusion on the distinction between that concept and infinity. SemanticMantis (talk) 17:09, 12 June 2015 (UTC)
 * If there are an infinite number of different sentences then consider any particular length of starting words. There must be one such sequence which starts an infinite number of sentences. Therefore we can start saying a sentence and never stop and at any time it is still the start of valid sentence. Dmcq (talk) 22:59, 12 June 2015 (UTC)
 * Yes, but that's different from saying there's an infinite sentence. Basically you're arguing by König's lemma that a certain tree must have an infinite branch, but what's not clear is that that branch is a sentence. --Trovatore (talk) 23:10, 12 June 2015 (UTC)

English has been spoken in a finite time by a finite number of people, so the totality of the English language is finite. Bo Jacoby (talk) 19:21, 12 June 2015 (UTC).
 * Thanks for explaining yourself, but that's not how Linguists (or anyone I can think of other than you) think of language. You are correct the corpus of every sentence written or spoken is finite. But a language is far more than the collection of utterances that have been made. Language Philosophy_of_language describe more of what languages are. Even if you want to ignore everything Chomsky ever said, I don't think you'll find many perspectives claiming that a corpus is a language. SemanticMantis (talk) 20:16, 12 June 2015 (UTC)
 * I think it depends on what one is really talking about. There's natural language as found in the wild, and then there are Platonic idealizations of it.  As an example, is this an English sentence?
 * The man the dog the bird the cat the child petted chased pecked bit left.
 * Well, it can be successfully diagrammed; according to some formal notion of English grammar it is syntactically correct. The child petted the cat, the cat chased the bird, the bird pecked the dog, the dog bit the man, and the man left.  But can it really be said to be part of natural English?  I think it's sort of a stretch.
 * So certainly, the simplest formal idealizations of English contain arbitrarily large sentences. But it's not so clear that English as a natural language does.  Even in that case Bo is a little bit off &mdash; almost no one would claim that something can't be an English sentence unless it's been spoken before.  But that's a detail. --Trovatore (talk) 22:52, 12 June 2015 (UTC)
 * And as you (T.) certainly know, but not all others: Even with a formal grammar that allows arbitrarily long sentences, you still cannot generate infinitely long sentences. --Stephan Schulz (talk) 23:33, 12 June 2015 (UTC)

"a language is far more than the collection of utterances that have been made." Even if you include far more, such as the utterances that will be made in the next million years, the collection is still finite. You may define some infinite mathematical structure, and you are free to call it 'language', but until you have done that the OP's question makes no sense. Bo Jacoby (talk) 06:59, 13 June 2015 (UTC).


 * For some (finite) set of infinite sequences of words, the most obvious approach is to interleave them (first word of each, then second word, etc.) and compute the natural density (if it exists). If you also have an infinite number of sequences, you might try ordering them as can be done with the rationals.  --Tardis (talk) 04:36, 16 June 2015 (UTC)