Wikipedia:Reference desk/Archives/Mathematics/2007 September 29

= September 29 =

History of Knuth–Morris–Pratt algorithm
(This is a repost of a question I've mentioned months ago on Talk:Knuth–Morris–Pratt algorithm but got no attention there.)

The algorithm described in Item 179 of HAKMEM seems quite similar to the Knuth–Morris–Pratt algorithm, despite that HAKMEM is from 1972 so it predates the publication by KMP. Were variants of the algorithm for fast string matching known long before this publication or am I misinterpreting something and the two algorithms differ in some important point? &#x2013; b_jonas 15:52, 29 September 2007 (UTC)


 * The Gosper algorithm is in fact closer to a similar algorithm also based on a finite-state automation, due to Robert Boyer and J Strother Moore (A fast string searching algorithm, Communications of the ACM 20:10 (October 1977) pp. 762–772 ). In the final section "Historical Remarks" the authors write: "We have since learned that R.W. Gosper, of Stanford University, simultaneously and independently discovered the delta1 version of the algorithm (private communication)." I once saw a presentation of a development by stepwise refinement of the various algorithms, in which the different variants of the same basic idea arose through subtly different design choices, but I've forgotten all details, including who presented this when and where. Based on the publication dates, I'd agree that Gosper appears to have scientific priority for the basic idea. --Lambiam 16:33, 29 September 2007 (UTC)
 * We actually have an article on the Boyer–Moore string search algorithm. --Lambiam 16:39, 29 September 2007 (UTC)


 * I could be wrong, but I don't think the algorithm in HAKMEM is similar to the Boyer-Moore algorithm. &#x2013; b_jonas 16:53, 29 September 2007 (UTC)


 * I think you're right. I did not look at the actual algorithm in HAKMEM, but, like KMP, in incremental matching it essentially proceeds along the pattern left-to-right as long as there is a match. It would be interesting to see if Knuth references this in his The Art of Computer Programming (which I don't have access to). Bill Gosper was research assistant of Knuth in the period 1974–1977, so it is hardly conceivable that Knuth was not aware of his algorithm. --Lambiam 19:35, 29 September 2007 (UTC)
 * Clearly we see common ideas among the algorithms. From the ACM page we can see that Boyer and Moore cite both HAKMEM and the 1974 Stanford technical report that became the 1977 Knuth–Morris–Pratt paper; Moore's web page says he and Bob Boyer invented their algorithm around 1975. Looking at the second edition of TAOCP Volume III: Sorting and Searching, I find no mention of string searching or the Boyer–Moore algorithm or even Gosper. I believe the first PhD thesis on computational complexity was in 1975 by Knuth's student Bob Sedgewick (founding Chair of the Department of Computer Science at Princeton), so the whole idea of analyzing and proving complexity bounds was in its infancy as these algorithms were being developed. The HAKMEM item strives for efficiency, but predates the complexity era. Physically, these authors were in close proximity (Xerox PARC was on Stanford land, and SRI International was still the "Stanford Research Institute"), and the institutions had close ties. But why are we discussing this on the mathematics reference desk? The computing desk would seem more appropriate. --KSmrqT 22:24, 29 September 2007 (UTC)


 * Thanks for all comments. Just let me comment that KMP is part of the material that shall be in TAOCP volume 5.  &#x2013; b_jonas 08:27, 30 September 2007 (UTC)