Talk:Regular language

The word regular
I removed the following:
 * The word regular is used because these expressions are built up using very distinct limited and regulated syntax rules.

I think one could argue that context-free languages are also built according to limited syntax rules. I'm pretty sure the word "regular" was chosen without much thinking; it's one of those meaningless terms, like "normal", that are used all over the place. Also, the above "these expressions" seems to refer to regular expressions, but regular expressions hadn't been mentioned in the article at that point. --AxelBoldt

Regular expressions
A regular language is a formal language (i.e. a possibly infinite set of finite sequences of symbols from a finite alphabet) that has one of the following equivalent properties:
 * it can be described by a regular expression.
 * it can be described by a regular expression.

Modern "regular" expressions can describe more languages than just regular ones. Example: /([a-z]*)\\1/ (weird software problem - there should be 1 backslash there) --Taw


 * Obviously, this refers to regular expressions in the sense used in the theory of computation, not Perl regular expressions etc. -- Schnee 12:16, 13 Jun 2004 (UTC)

Examples and other questions
I'm reading about regular languages for the first time, so am curious about few things which I could not find in the article.
 * 1) Why the word 'regular' ? I can see a line on this talk page about it, but it'd be good if a line is added to the main article as well.
 * 2) Examples of some regular languages
 * 3) What do you call a language thats not a regular language. An "irregular language" ? If there is anything of the sort, a link to it and maybe an example would help understand this article better.

Jay 07:48, 14 Nov 2003 (UTC)


 * An example of a regular language might be that consisting of all words formed by three a's followed by any number of b's (with the alphabet consisting of a and b). I don't know where the term "regular" comes from, but I guess it's historic; as for non-regular languages, I never heard of a special moniker for them (outside of "non-regular"). -- Schnee 12:16, 13 Jun 2004 (UTC)


 * As mentions, it is not obvious that a language that is not regular should be called nonregular or non-regular. The reason is that irregular fits better as a correct English word (my students were also confused by this and were using irregular). I did add this clarification in the page., can you please reconsider or comment on your revert? Cheater no1 (talk) 16:31, 28 October 2020 (UTC)
 * As I said in my edit summary, the prefix "non-" means negation in common language, so I guess it is obvious that a non-regular language is a language that is not regular, and this needn't be explained. We don't explain "non-zero number", "non-empty set", etc. either.
 * If there is a subtle distinction in English between "irregular" and "non-regular", things might be different, and my English knowledge is insufficient to comment about this. In the standard textbook by Hopcroft.Ullman.1979, none of these negations appears in the index. I wonder if you have a citation for "non-regular" or/and "irregular".
 * As for the name "regular", it doesn't appear in Chomsky.1956 ("finite-state grammar" is used instead, p.114), but does in Chomsky.1959a (however with a different meaning, p.149), so it appears to have been introduced in between. - Jochen Burghardt (talk) 18:47, 28 October 2020 (UTC)
 * Thank you for the swift comment, . I indeed don't see a definition of "nonregular" in our textbook Juraj Hromkovic -- Theoretical Computer Science. The author instead uses only "not regular" as a negation of "regular". I will update this thread if I find a reliable source (other than university presentations) that defines "nonregular", or commenting on the usage of "nonregular" instead of "irregular". Cheater no1 (talk) 12:36, 29 October 2020 (UTC)
 * Continuing research on the origin of the name "regular", I found that it doesn't appear in Chomsky.Miller.1958, while Chomsky.1959b says in fn.2 Finite state languages are what are called "regular events" in Kleene (1956). Indeed, Kleene.1956 introduces "regular events" on p.1, and the precursor Kleene.1951 defines them on p.46, noting "We would welcome any suggestions as to a more descriptive term." So, it seems that Kleene introduced the term "regular" in 1951, and Chomsky adopted it in 1959. - Jochen Burghardt, 29 October 2020

NFA's
'# it can be accepted by a deterministic finite state machine.' is superfluous. Every NFA has an equivalent DFA. So by requiring a regular language to be accepted by an NFA requires it to be accepted by a (the corrisponding DFA)


 * That's not superfluous. All the properties listed are equivalent, after all, and to the uninitiated, it's not obvious that DFAs and NFAs are basically the same. -- Schnee 12:16, 13 Jun 2004 (UTC)

Formal definition
This article describes the properties of regular languages and gives a few examples, but I think a kind of formal definition was missing. I added a well-known recursive definition. I think it fits pretty well in the article. What do you think? --Alexandre 22:27, 25 Jun 2004 (UTC)

The formal definition needs to include {ε} as an atomic language. Tobinyehle (talk) 16:56, 22 February 2016 (UTC)
 * Union only produces {ε} when one argument is {ε} and the other is {ε} or ∅
 * Concatenation only produces {ε} when both arguments are {ε}
 * Kleene Star only produces {ε} when its argument is {ε}


 * The section Regular language says that ∅* yields {ε}. That was my reason to revert the edit of 21 Feb. I just looked it up: the definition of Aho.Hopcroft.Ullman.1974 (sect.9.1, p.318) agrees with the example. - Jochen Burghardt (talk) 21:36, 22 February 2016 (UTC)

Kleene star
The recursive definition is quite nice. There's something I don't understand here, though. Why is the kleene star required in the recursive definition? It seems to me that you can construct it by induction using concatenation.

Here's a proof. Is it incorrect somewhere?

Suppose L is regular. If $$\{\epsilon\} \cup L \cup L^2 \cup \cdots \cup L^n$$ is regular, then concatenate $$L$$ to get that $$L \cup L^2 \cup \cdots \cup L^{n+1}$$ is regular. Union with $$\{\epsilon\}$$. Therefore L* is regular.
 * The trick is that ordinary induction cannot "go to infinity". Your proof only establishes that the language $$\{\epsilon\} \cup L \cup L^2 \cup \cdots \cup L^n$$ is regular for all integers n. But none of these languages includes all of L*, only their union does, and regular languages are not closed under infinite unions. (if they were, every countable language would be regular) Deco 21:56, 5 December 2005 (UTC)

Properties of non-regular languages
Can anybody talk about properties of non - regular languages? (unsigned, undated)

it would be of great help if some one could post more info about non regular languages 7 OCT 2007  —Preceding unsigned comment added by 64.234.33.191 (talk) 19:18, 7 October 2007 (UTC)

What are half, log, and sqrt?
The article states that regular languages are closed under the "half", "log" and "sqrt" operations. What are these? I could guess that $$\mathrm{sqrt}(L) = \{x \in \Sigma^*|xx \in L\}$$, but that is not the inverse of the square of a language. I have no idea for the other two. Eric119 19:31, 14 August 2006 (UTC)
 * I have removed the claim, since it seems bogus to me. If these functions are real and anyone knows what they are, please add them back in (with an explanation!). Eric119 05:32, 18 August 2006 (UTC)


 * I don't have a definition, but I think I have some examples (just guessing)-- for example, if $$L=a^*b^*$$ (i.e. $$L=\Sigma^*$$ for $$\Sigma=\{a,b\}$$) then half(L) would be the set of strings starting only with the letter b. The other half would be the strings starting only with the letter a. The two halves add to the whole language. The square-root would be $$sqrt(L)=(ab)^*=a^*=b^*$$-- i.e. just take half number of the generators. Finally, log just extracts the generators from out of the language, so if $$L=a^*$$ then $$log(L)=\{a\}$$. Or something like that. I don't know the general definition. linas 04:00, 26 April 2007 (UTC)


 * The statement is true. http://www.cs.caltech.edu/~cs20/a/hw/hw2/solution/sol2.pdf (i hope i am not infringing anything by posting this link)


 * Not at all. And in the homework assignment corresponding to those solutions is the definition of these sets:
 * Half(L) = { x | &exist; y ( |y| = |x| and xy &isin; L)}
 * Sqrt(L) = { x | &exist; y ( |y| = |x|2 and xy &isin; L)}
 * Log(L) = { x | &exist; y ( |y| = 2undefined and xy &isin; L)}
 * What is not clear to me is that these are significant at all, other than as a homework problem. &mdash; Carl (CBM · talk) 18:10, 11 September 2007 (UTC)
 * The above operations are discussed in some detail in the recent textbook
 * Jeffrey Shallit: A second course in formal languages and automata theory, 2009, Chapter 3.
 * The book also points to further studies of regularity preserving transformations, for example
 * Dexter Kozen: On Regularity-Preserving Functions. Bulletin of the EATCS 58:131-138 (1996)
 * An additional recent reference in the research literature studying Half(L) is
 * Michael Domaratzki: State Complexity of Proportional Removals. Journal of Automata, Languages and Combinatorics 7(4): 455-468 (2002).
 * I do not know if that answers your question, that is, if you think that coverage in the (mathematically oriented) research literature is enough to witness significance of a topic. You might find some additional motivation for studying these operations in one of the above references. Hermel (talk) 10:59, 16 February 2009 (UTC)

HALF(L)
Regular languages most definitely are closed under this operation. The proof is a little long and I am not including it here. It can be found in the text of Problem Solving in Automata, Languages, and Complexity by Ding-Zhu Du and Ker-I Ko. The basic idea however is that if L is regular, then a DFA exists that accepts it. We want to make a new machine that has this original machine and also a copy of it that runs backwards (non-deterministically). We want to run these in parallel on input. If the parallel machines stops with both machines in the same state (note that this state is not necessarily in the set of final states of the original machine) then we have half of L Isometric 05:40, 17 October 2006 (UTC)


 * Thanks. I was not aware of what the "half" function is, and didn't succeed in finding out. Though just now I did a Google search that found the answer. I'll add a definition to the article. Eric119 06:05, 18 October 2006 (UTC)
 * Incidentally, I believe I proved that the regular languages are closed under the "sqrt" operation I defined above, but it was rather tricky. Eric119 06:12, 18 October 2006 (UTC)


 * I removed the statement
 * * HALF(L), the set of strings that are the first half of a string in L 
 * from the article, as its (clearly?) wrong. A standard textbook example of a language that is not regular is $$\{a^nb^n\vert n\ge 0\}$$. I'll try to supply a proof that its not regular shortly. linas 03:47, 26 April 2007 (UTC)


 * Isometric is talking about something completely different; he is not talking about "half of a string", he is talking about "half of a language", which has "half" as many strings in it as the original language does. Similarly for the definition sqrt(L), and log(L). I can kinda guess what these are, but this claim would need to be backed up by a longer article defining them so that we don't get mistakes like this. linas 03:55, 26 April 2007 (UTC)


 * Linas, I don't get the point of your example. That is a non-regular langauge whose "half" a* is regular. Here I am defining half as it said in the article which more explicitly is half(L) = {x | exists y length(x) = length(y) and xy in L}. Incidentally, the document at http://www.comp.nus.edu.sg/~sanjay/cs3231/half.pdf claims a proof of closure, but I haven't verified it myself. Eric119 20:15, 28 April 2007 (UTC)

Regular language is closed under HALF. If A is a regular language, then HALF(A) = {x|xy in A and |x|=|y|}. The proof is like this: Given DFA D that recognizes A, you can guess that, given a string w in A, after running through the first half of w, you will end up in some state q in D. So you make another machine M with initial state q, and make every transition in A to be a transition with 0,1, assume alphabet {0,1}. M has the same set of accept states as A. Now you can run D and A in parallel, this can be done with subset construction. You accept x if running x on A ends up in state q, and running {0,1}* for |x| steps on M ends in an accept state. You repeat this for every state q of D and accept x if one of them both have D end up in state q and M start from q and end up in an accept state.

This proof works in general, that is, any constant fraction of a regular language is also regular. E.g. if A is regular, then the first third, or the middle third, or the last third, all three languages are regular.


 * The following comment was deleted by IP editor 192.5.109.34 (talk). I'm restoring it. Hermel (talk) 19:38, 11 March 2011 (UTC)


 * On the other hand, if A is regular, then the language derived by remove the middle third of a string in A is not regular. An example is a*b+c*, assume alphabet {a,b,c}. You can see the language with middle third removed is not regular by observing a*b+c* intersects with a*c*. The intersection is a^nc^n, but this one is not regular.


 * The editor 192.5.109.34 (talk) replaced the above paragraph with the following. Hermel (talk) 19:38, 11 March 2011 (UTC)


 * On the other hand, if A is regular, then the language derived by remove the middle third of a string in A is not regular. An example is (00*1)*0*. You can see the language with middle third removed is not regular since the language contains all the sequences of the form $$0^i110^i,$$ but does not contain the sequences $$0^i110^j$$
 * for i≠j.

Reference: Sterns and Hartmanis, Regularity Preserving Modifications of Regular Expressions, Information and Control, v6, 1963, pp 55-69


 * The following comment by 128.174.24.69 was deleted as well by 192.5.109.34 (talk). I'm restoring this as well. Hermel (talk) 19:38, 11 March 2011 (UTC)


 * The a*b+c* intersection of a*c* does not equal a^n c^n, it equals the empty set. This also appears to be completely unrelated to the problem of removing the middle third. Also, the source cited doesn't appear to be available anywhere on the internet; you could just as well cite nothing at all. —Preceding unsigned comment added by 128.174.24.69 (talk) 05:44, 24 February 2011 (UTC)


 * It is related if one third and last third are regular their concatenation is regular. This is not true

BTW, the paper is available in many university libraries; But googling it brings up the following link Admittedly, it is behind a WP:PAYWALL, but still... Hermel (talk) 19:58, 11 March 2011 (UTC)
 * The language formed by catenating first and last thirds may be regular, but it isn't the language formed by excising the middle third. In general it has many additional elements, e.g. {aaa, bbb} yields extreme thirds {a, b}, but their catenation is {aa, ab, ba, bb}.
 * The counterexample for excising middle third (dub EMT) seems to have been garbled at some point.  It should be not a*b+c*, but a*bc*.  Then to find the overlap of the derived language with a*c*, note that the 'b' in the original word must be in the middle third, so all the c's come from the final third, and thus constitute the original third.  Then EMT(a*bc*) ∩ a*c* = {a^nc^n}, so EMT(a*bc*) is not a regular language.  QED. --RichardW57m (talk) 10:02, 31 March 2022 (UTC)

More Than One Definition?
The Article first describes regular language as one that is accepted by a deterministic finite state machine. Then it defines it using the recursive definition. The point is that either one can be used as definition. However, if one is used as definition, the other one must be proved as theorem.--CBKAtTopsails 16:20, 6 June 2007 (UTC)
 * This is implied by the word "equivalent", but may deserve clarification. Dcoetzee 18:15, 11 September 2007 (UTC)

"On the other hand, it is not known to contain AC0."
Um... I'm not exactly an expert on the subject, but doesn't it certainly not contain AC0? I'm hesitant to take it out, because I don't really know what I'm talking about (not to mention that I don't want to say they're separate without a source, and just cutting it might imply to the casual reader that it does contain it), but it seems to me as though certain languages in AC0 being regular would allow you to use their DFAs to construct DFAs for L-hard languages, violating the space hierarchy theorem. Twin Bird (talk) 04:40, 1 July 2011 (UTC)

Merger proposal
I propose that Rational language (currently orphaned) be merged into Regular language. I think that the content in the rational language article can easily be explained in the context of regular languages, and this article is of a reasonable size that the merging of Rational language will not cause any problems as far as article size or undue weight is concerned. Hermel (talk) 22:01, 3 August 2013 (UTC)
 * Are you proposing this just because the names are similar? Regular language is also a particular case of context-free language and a lot of others... That doesn't mean all notions more general than Regular should be merged into Regular. If anything should be done with Rational language is to merge it to rational series—its generating notion. JMP EAX (talk) 23:00, 17 August 2014 (UTC)

Actually, I'm unable to verify the [more general] def given on the Rational language from either of the sources cited! Both simply use "Rational language" as a 100% synonym for Regular language as far as I can tell. (See Talk:Rational language for specifics.) So simply redirecting that page here (and perhaps mentioning the synonym) is the best solution because there's nothing (verifiable) worth merging anywhere in that stub unless I've missed something in the tomes cited as refs there. JMP EAX (talk) 00:08, 18 August 2014 (UTC)

✅. JMP EAX (talk) 00:22, 19 August 2014 (UTC)

Mistake in "Equivalent formalisms" section
"the number of equivalence classes of its syntactic congruence is finite.[note 8][note 9] (This number equals the number of states of the minimal deterministic finite automaton accepting L.)"

I'm quite sure that the sentence in parentheses is not true. The number of equivalence classes of the syntactic congruence is the size of the syntactic monoid. This is also the size of the transition monoid of the minimal deterministic finite automaton accepting L. However, there are usually more elements in the monoid than states in the automaton. — Preceding unsigned comment added by 81.218.76.25 (talk) 08:45, 6 October 2019 (UTC)

Indeed, for (ab+ba)* the minimum automaton has 4 states but the syntactic monoid has 15 elements. (The congruences are generated by aba ≡ a, bab ≡ b, abba ≡ baab and aaa ≡ bbb ≡ aaaa ≡ bbbb.) I think we should split the equivalent formalism into:

"the number of equivalence classes of its syntactic congruence is finite.[note 8][note 9]

"the number of equivalence classes of its left syntactic equivalence (also known as the right syntactic congruence.) is finite.[note 8][note 9] (This number equals the number of states of the minimal deterministic finite automaton accepting L.)"

The notes will need corresponding review. --RichardW57m (talk) 14:33, 1 April 2022 (UTC)

Trace Languages
We also have regular and rational languages in the domain of trace languages. They are defined using different definitions that coincide for the free monoid of strings in an alphabet, namely recognition by DFAs and the purely algebraic definitions respectively, for not all rational trace languages are regular trace languages. The bad boy example is (U+0F73 TIBETAN VOWEL SIGN II}* in the domain of Unicode strings under canonical equivalence, with decomposable characters treated as syntactic sugar.

Should we mention such divergences here, or just have a headnote redirecting the read to Trace monoid for the use of the terms with respect to trace languages? That article doesn't yet have a section on these languages. --RichardW57m (talk) 12:02, 30 March 2022 (UTC)


 * IMO this deserves a mention under the Generalizations section, but not elsewhere, and the main content related to this should go at trace monoid. Caleb Stanford (talk) 14:56, 30 March 2022 (UTC)
 * The existence of a generalisation needs to be mentioned somewhere before the reader concludes that this article has nothing to say about these terms for traces. --RichardW57m (talk) 16:04, 30 March 2022 (UTC)

Wiki Education assignment: Linguistics in the Digital Age
— Assignment last updated by Fedfed2 (talk) 00:53, 9 December 2023 (UTC)