Talk:Myhill–Nerode theorem

Example proof of non-regularity
Consider the language $$L=\{a^i b^j | i \geq j\}$$. Now consider the infinite set of strings $$\{a^i | i \geq 0\}$$. For any two strings from this set x = ai, y = ak with i &lt; k, we can append z = bk to each, which results in xz = aibk, which is not in L, and yz = akbk, which is in L. Thus each string of the form ai belongs to a different equivalence class, so there are an infinite number of equivalence classes defined by the language, and so by the Myhill-Nerode Theorem it is not regular.

Note that this language can be "pumped" in the sense that any nonempty string in the language can be expanded by replacing a single a with arbitrarily many copies of a. The language thus gives an example of a non-regular language that cannot be shown to be non-regular using the pumping lemma.


 * This is false. The pumping lemma requires it to be true for the case i = 0 as well.  So say the string is $$a^p b^p$$  If the y portion of the string is $$a^n$$, then removing it would get the string $$a^{p - n} b^p$$, which is not in the language since if $$n > 1$$ (required by the condition for y being nonempty), then $$p > p - n$$, so it is not in the language.  Therefore, it can not be pumped, so the Myhill-Nerode Theorem is not required.


 * As suggested by Michael Sipser, the language F = {$$a^i b^j c^k | i,j,k \geq 0$$ and if i = 1 then j = k}, can be pumped (including for the case where the y portion is ommitted), but is not regular, might be a canidate for using the Myhill-Nerode Theorem. Jrincayc 17:11, 21 Mar 2004 (UTC)

Unintuitive formal definition
While the current version of the article may be cleaner from a formal point of view, IMO it's highly unintuitive. My initial version, explaining it in terms of FSM states and alluding to a simple proof by the pigeonhole principle is much more intuitive (again, IMO). Any ideas on how to integrate this? --Delirium 22:36, Sep 9, 2004 (UTC)

Not Helpful
As a grad student trying to understand Myhill-Nerode I find the description not at all helpful. Delerium's definition contains the following:

"..two strings x and y are said to belong to the same equivalence class if they both drive a given machine to the same state q. A consequence of this is that for any string z, xz and yz drive the machine to the same state q'. Since a finite state machine has a finite number of states, there can only be a finite number of such equivalence classes (a proof readily follows from the pigeonhole principle). This observation constitutes the Myhill-Nerode Theorem: A language is regular if and only if the set of equivalence classes defined by the language is finite."

It's more illuminating than what is currently posted.

I also think an example of each case (regular vs non-regular) would be very useful for those of us who learn by example.Myrikhan (talk) 20:45, 7 February 2008 (UTC)

Indeed, this article had almost zero impact on my understanding of the theorem. An example (or more examples) would greatly help in this article. —Preceding unsigned comment added by 147.251.53.17 (talk) 10:02, 13 November 2008 (UTC)

Second this, but probably "x ~ y iff they both drive a given machine to the same state q" is wrong. Intuition (on bisimulations) tells me that it should be "x ~ y iff they both drive a given machine to states that are observationally equivalent". Maybe simpler "iff they both drive a given machine to states that behave equally." I think for intuition all three sentences are fine if the formal details are mentioned. 2A02:810D:903F:E4B0:2169:44BD:D32C:F0A0 (talk) 11:26, 11 April 2021 (UTC)
 * The theorem is about languages, not machines, so definitions of distinguishing extensions based on machines are incorrect. —David Eppstein (talk) 16:10, 11 April 2021 (UTC)

Unclear reference or word missing
The section Myhill-Nerode theorem currently reads: This reads like an obscure English idiom, as though a word is missing. What exactly are we partitioning? The automaton? The set of words in the language? Please clarify and illustrate. -- 172.192.233.87 (talk) 19:25, 7 February 2009 (UTC)
 * …and if one starts with a partition into equivalence classes, one can easily…

Formalism
I think a formal statement of at least the Equivalence Relation and/or the whole theorem would be helpful and unambiguous for understanding. —Preceding unsigned comment added by Methossant (talk • contribs) 17:31, 2 April 2009 (UTC)

I also think the statement of the theorem is not quite correct. For example if we have a two letter alphabet, a b, then a* can be recognized by a 1-state automaton, but there are two equivalence classes. The minimal DFA needs to be complete; there should be a sink state. I agree that the current discussion is unclear and not very helpful. Aclark17 (talk) 10:21, 3 June 2010 (UTC)

Problem with the explanation of how to find L-equivalence classes
In the section Myhill–Nerode theorem one can read « This may be done by an exhaustive case analysis in which, beginning from the empty string, distinguishing extensions are used to find additional equivalence classes » and further « Given the empty string, 00 (or 11), 01, and 10 are distinguishing extensions ». The first proposition is ambiguous and the second one does not make sense.
 * 1. The empty string generates an L-equivalence class (equal to L, binary representations of numbers divisible by 3, as $$2^n$$ is invertible mod 3) but it can also be used as a distinguishing extension (distinguishing between L and its complement, as extending by the empty string is the identity). Both uses of the empty string yield results on L-equivalence classes.
 * 2. Talking about "distinguishing extensions" only makes sense with respect to 2 different strings, but here we are only mentioning the empty string, and no other from which to distinguish it. If we use the empty string as a distinguishing extension then the 3 2-character strings generate different L-equivalence classes, but the empty string is a distinguising extension only for the pairs {00,01}, and {00,10}, while {01,10} are not distinguished by the empty string.

I recommend to modify those two parts as « beginning with the equivalence class of the empty string, distinguishing extensions are used to find additional equivalence classes », and « 00 (0, 11, or the empty string), 01 (or 1), and 10 are distinguishing extensions for every pair of elements: one in the equivalence class of (the binary representation of) $$0\equiv 0, -1\equiv 2, \text{and} -2\equiv 1 \text{ (mod }3),$$ respectively, and one in the complement of this equivalence class. In particular the equivalence class of an arbitrary string of 0s is the binary representations of numbers divisible by 3, and is a distinguishing extension for this language and its complement -for any language L the empty string is always a distinguishing extension wrt L for L and its complement. »

One could probably also write down an efficient general algorithm for finding all L-equivalence classes, based on the proof of the Myhill-Nerode theorem, and apply it to binary representations of numbers divisible by 3. But i have not worked it out. Plm203 (talk) 23:46, 23 August 2023 (UTC)