Talk:DFA minimization

Updated
Hello. This article needs to be updated. It is not true that the Hopcroft's algorithm is the best known anymore. Admittedly the results are new (2008 forwards). For a very recent paper, see for example this excellent paper for an algorithm that runs in either $$O(n + m \log m)$$ or $$O(n + m \log n)$$, where n is the number of states, and m is the number of transitions in the DFA. Combined with the simplicity of the algorithm, this algorithm is a prime candidate to be described in this article. --Kaba3 (talk) 21:15, 23 July 2012 (UTC)


 * I know the paper claims to be about DFA minimization, but I'm not sure I believe it. It talks about "incomplete" state transition tables where only some entries are specified; how is this a DFA? What happens when an input symbol hits one of the missing transitions? And of course if there are no missing transitions then its complexity is exactly the same as Hopcroft. The claim in the paper that they simplify the algorithm down to 150 lines of code also doesn't much impress me: the minimization routine in my own implementation of Hopcroft, http://www.ics.uci.edu/~eppstein/PADS/Automata.py, is 38 lines counting comments, and uses a partition refinement library that is 85 lines. —David Eppstein (talk) 01:55, 24 July 2012 (UTC)


 * Hi, David. It is implicitly assumed that when a state does not have a transition, the transition is to an implicit trap state, which has all transitions to itself. Clearly in a computer implementation it does not make sense to store the trap state, or the transitions leading to that trap state. Are you able to access the paper? --Kaba3 (talk) 16:23, 24 July 2012 (UTC)


 * From the mathematical viewpoint it is easy to define a structure which corresponds exactly to this computer-friendly concept. In another paper, the same author defines such a structure as PT-DFA. Here the DFA-definition is relaxed by turning the transition function into a partial function (a relation ~ for which (a, t) ~ b and (a, t) ~ c implies b = c, where a, b, and c are states, and t is a symbol). The other definitions are then extended as you would guess (the generated language etc.). --Kaba3 (talk) 16:31, 24 July 2012 (UTC)

Why
Why is W initialized with F and not with the smallest part among F and Q \ F? PS: I fixed another small bug: it was not sufficient to test whether X ∩ Y is nonempty, but also Y \ X must be tested — Preceding unsigned comment added by 188.79.6.215 (talk) 10:56, 2 November 2013 (UTC)


 * My bad, I thought there was a bug were there was none. I've deleted part of my own original question since it made no sense (I'm the same person, different IP). Still, I believe W could be initialized with the minimum of the parts. Also, Stanford seems to have a copy of the original publication: ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/190/CS-TR-71-190.pdf Shouldn't this be linked? Although the algorithms described in that PDF and in this wiki are quite different.
 * Choosing the smaller of the two parts in the initialization step might make the algorithm run slightly faster in practice (I'm not sure, it would need experiments to verify, so without WP:RS we can't just state that it does) but it definitely won't change the O-notation for the worst case running time. —David Eppstein (talk) 18:07, 4 November 2013 (UTC)


 * Indeed, but it seems strange to not choose the initialization that looks preferable. Besides that point, in the Technical Report PDF that I linked (relevant only if it is really the original algorithm), the selection of which part to include in the waiting list does not depend on their sizes, but on the number of elements that have an anti-image (with respect to the transition function of the automaton). There are more differences besides that one, but maybe that's the biggest one. All descriptions/implementations of the algorithm that I've found online look more similar to the implementation proposed in the Wikipedia than to the PDF I linked... I find all this a bit puzzling. Do you know if the published version is available somewhere? — Preceding unsigned comment added by 147.83.72.247 (talk) 19:11, 4 November 2013 (UTC)
 * I don't think any other versions of the paper are online. worldcat.org says that the book it's in is held by several British and German libraries, and not by many other places than that. —David Eppstein (talk) 21:30, 4 November 2013 (UTC)
 * Thanks for the info. It's a shame that old papers are usually difficult to find :( — Preceding unsigned comment added by 147.83.72.247 (talk) 17:03, 5 November 2013 (UTC)

Is there a bug in the presentation of Hopcroft's algorithm?
I tried it with Q={1,2,3}, edges = {0-a->1, 1-b->2, 2-a->3}, and F={1,3}. Now Pre[b](F) is empty and Pre[a](F)=Q\F. So no splitting. So we end up merging the states in F, and also the states in Q\F. (If I had started with W={Q\F} then the result would have been ok.) Am I missing something obvious ? PhS (talk)


 * Which version of Hopcroft's algorithm are you referring to? I found no "Pre" in the wikipedia article. - Jochen Burghardt (talk) 18:06, 27 July 2021 (UTC)


 * I write Pre[c](A) for what is described as "the set of states from which a transition on c leads to a state in A". Thanks for your patience. PhS (talk)


 * Ok, I see. Now I understand your computation, and I think you are right.
 * Before 14 June, the algorithm in the article initialized, and your counter-example would no longer work, I think, since choosing A as Q\F = {0,2}, and choosing c as b would lead to X = {1}, as you indicated in parantheses above. So, I suspect the recent edits by 84.245.120.107 did introduce an error.
 * My problem is, Hopcroft+Ullman (1979) doesn't present the algorithm but just gives a computation example (Exm.3.8, p.68-69), and diving into the notation of Technical Report STAN-CS-71-190 (cited at Hopcroft (1971) in the article's References section) is a lot of work. If you (or somebody else) have a modern textbook at hand that presents the algorithm, could you check it? - Jochen Burghardt (talk) 08:43, 28 July 2021 (UTC)
 * Knuutila (2001) is open-access via DOI, and may contain a readymade algorithm description; I'll have a look at it in the next days. - Jochen Burghardt (talk) 08:56, 28 July 2021 (UTC)
 * Knuutila (2001) does not give an easily-understood pseudocode version of the original algorithm. The closest I found is in an obscure (possibly draft?) paper, Xu (2009).  It initialises W as {F, Q \ F}, and is so close to the page's pseudocode that it is either the source or shares a source with it.  I am going to edit the algorithm back to match, and reference Xu's paper. Tz 98 (talk) 12:49, 18 October 2021 (UTC)

Unreachable states algorithm
Not sure that in "Unreachable states"  algorithms I understand this line: reachable_states := temp; Isn't temp contains only states which are reachable from new_states only? Av life (talk) 15:23, 24 October 2014 (UTC)


 * (I inserted a section header before your posting.) I think you are right, the assignment should be replaced by the version that is given in the comment right to it, viz.  In fact, this was the version until 's edit of 17:53, 17 August 2013. It is tempting to conclude:
 * new_states  =   temp \ reachable_states     (known from the previous assignment), hence
 * reachable_states ∪ new_states  =   reachable_states ∪ (temp \ reachable_states)   =   temp.
 * However, the last equation holds only if reachable_states is a subset of temp, which is usually false. Therefore, I'll restore the old version.
 * I tried to find a source for the algorithm, but neither Aho.Hopcroft.Ullman.1974 "The Design and Analysis of Computer Algorithms", nor Hopcroft.Ullman.1979 "Introduction to Automata Theory, Languages, and Computation", nor "Hopcroft.Motwani.Ullman.2003 (same title), nor Goos.Waite.1984 "Compiler Construction" bother with explaining a reachability algorithm. - Jochen Burghardt (talk) 17:21, 24 October 2014 (UTC)
 * The fix looks ok to me. This is just a variation of breadth-first search, so look there for links to explanations. —David Eppstein (talk) 18:37, 27 July 2021 (UTC)