Talk:CYK algorithm

I don't like the article
I don't like this article because it gives an example about categorizing NP(noun phrase) VP(verb phrase) and so forth, and this would be useful for someone who wanted to build a POS tagger. However, this article is not about POS taggers, it's about CYK, so examples should be given with that in mind. Just give a grammar, and a word, and have various states of the matrix P for the example. Not that complicated. —Preceding unsigned comment added by 188.26.48.167 (talk) 12:56, 23 January 2011 (UTC)


 * The example given is for parsing sentences into NP, VP and so on, which is more than part-of-speech tagging. It may be more common to introduce CYK with artificial examples, but I don't see anything wrong with using an example from an application area. Computational linguistics is one of the areas where CYK is applied, and CYK is sometimes studied as part of a CL curriculum. But I've reconstructed the grammar from the example and added it to the article. It should now make more sense. -- UKoch (talk) 17:25, 20 December 2011 (UTC)

I think there is still much room for improvement
The fact that binary form (no RHS longer than 2) should be used, rather than CNF has been known for about 40 years. It was not published, because it was considered obvious, as the complexity results from the length of right-hand sides. It is also the only way to preserve conveniently the shape of parse trees. Beau Shiel had a nice paper about it at Coling 76. The empty word is a problem that cannot be escaped, if parse-trees are to be preserved, which is implied by the name parser (but sometimes not described with the algorithm). If no parse tree is kept, it is a recognizer.

The size of a grammar should be defined explicitly, or a reference given.Teetooan (talk) 20:32, 27 February 2014 (UTC)

The proper way to write for an encyclopedia is to give the modern and simpler description first. Then to give some historical details such as the use of CNF in the first versions. On the other hand, the CNF is easier to explain. It should be noted that it loses infinite ambiguity.

In 1974, Valiant used the Strassen algorithm (1969) to reduce the complexity, but the complexity of matrix multiplication was reduced since then, first by Coppersmith–Winograd (1990), then by Williams (2011). It should be better to discuss usability on the basis of Strassen's algorithm, which is the only one that has a chance of being meaningful. Teetooan (talk) 00:49, 26 February 2014 (UTC)

It might be nice to underscore that CYK is the only cubic algorithm that can run directly on the grammar (up to the binary transformation, that is both trivial an unescapable).Teetooan (talk) 13:09, 26 February 2014 (UTC)

Parsing weighted context-free grammars is not specific of CYK and is available in a similar way to all general CF parsers. It is a special case of decorating the rules with values from a semi-ring.Teetooan (talk) 20:45, 27 February 2014 (UTC)

Explain practical parse tree generation!
Original threadstarter did not title this thread. Also: linebreaks added to original threadstarter's post. --131.234.234.12 (talk) 11:56, 23 January 2009 (UTC) anyone please help is there any proper algorithm to show that how the string is enabled. Example: given a string,the algortithm could give how it comes from the start S, show how many ways it can be ,or even give the shortest way. —Preceding unsigned comment added by 218.7.20.106 (talk • contribs) 02:32, 21 July 2003
 * It sounds like you're asking about parse trees... the page mentions that the algorithm is easy to modify to find a parse tree (or several). This would show you how to construct the string from the rules. I *think* that with Chomsky normal form, all derivations are the same length, but I'm not really an expert...
 * —Preceding unsigned comment added by 24.21.186.8 (talk • contribs) 18:40, 9 July 2004

Give examples!
Original threadstarter did not title this thread. --131.234.234.12 (talk) 11:59, 23 January 2009 (UTC) This page is great. However - it would incredible with a few examples to lead readers through the process. —Preceding unsigned comment added by 209.134.42.169 (talk • contribs) 13:01, 10 May 2007

Another article on CYK

 * Done. Ceroklis 10:01, 26 September 2007 (UTC)

What does C stand for in CYK?
So far, I have come across three different names for letter C: Which one is correct? —Preceding unsigned comment added by 89.133.45.42 (talk • contribs) 21:36, 21 June 2007
 * Cook
 * Cocke
 * Coke
 * Cocke, according to the Jurafsky and Martin book. Ealdent 15:26, 23 October 2007 (UTC)

Mistake in the Algorithm?
I'm almost sure there is a mistake in the algorithm, but I'll write it here first just in case... The line: "Let the grammar contain r terminal and nonterminal symbols R1 ... Rr." should be: "Let the grammar contain r nonterminal symbols R1 ... Rr." There is no need to consider the terminals also. Sararkd (talk) 02:37, 18 January 2008 (UTC)

Change algorithm indexing
Is there any objection to changing the algorithm indexing from 1-indexed to 0-indexed? This would make converting from pseudocode to real code much easier. —Preceding unsigned comment added by Jhm718 (talk • contribs) 11:40, 24 April 2009 (UTC)

Why "Carlos"
Is there any good reason for naming the input string Carlos? It seems silly and irrelevant to me. Sigurdmeldgaard (talk) 08:34, 17 May 2009 (UTC)

Seems to be corrected - thanks Sigurdmeldgaard (talk) 19:08, 27 May 2009 (UTC)

Citation from 2009?
Is that paper from 2009 really notable enough to warrant a citation in Wikipedia? It is used to make a fairly technical point. Since all other sources (besides Knuth, which is a reference work) are from the 60s, 70s, it makes the impression as if the latest paper would be the only continuation of the study of this algorithm, which is difficult to believe... that is, IMO it inflates the importance of the paper and the reference should be removed.
 * I support this removal Teetooan (talk) 00:56, 26 February 2014 (UTC)
 * While I understand your feelings about giving undue weight to that paper, I would suggest to add more references of recent date about the CYK algorithm. For instance, the CYK algorithm, as well as Valiant's improvement have been recently generalized from context-free grammars to the case of Boolean grammars. Removing content from the article would certainly be a step into the wrong direction, given the size of the article. If there are any recent papers that deserve to be included, please be bold and just go ahead. I am confident that the article will further expanded over time, given that the article's subject is covered in many textbooks and courses on automata theory. Hermel (talk) 21:29, 20 July 2010 (UTC)
 * While I understand your feelings about giving undue weight to that paper, I would suggest to add more references of recent date about the CYK algorithm. For instance, the CYK algorithm, as well as Valiant's improvement have been recently generalized from context-free grammars to the case of Boolean grammars. Removing content from the article would certainly be a step into the wrong direction, given the size of the article. If there are any recent papers that deserve to be included, please be bold and just go ahead. I am confident that the article will further expanded over time, given that the article's subject is covered in many textbooks and courses on automata theory. Hermel (talk) 21:29, 20 July 2010 (UTC)

There is no reason to add recent references on CYK if there was no significant change in know-how regarding this algorithm. The best would rather be an account of actual use, and of current role. Some references could be added from the nineties that give original results on the relationship between all the general CF parsers (and some non CF ones).Teetooan (talk) 07:57, 26 February 2014 (UTC)

In the previous comment, do you mean the Lange & Leiß paper? I just read that paper, without going deep into the math, and found it incredibly informative, since the main aim is pedagogical.
 * So are lots of other papers on the web. But it is no significant contribution Teetooan (talk) 00:56, 26 February 2014 (UTC))

The paragraph in this article summarizing this paper is weak, and would benefit from being written in more specific terms concerning the three operations (BIN, DEL, UNIT) necessary to transform to CNF, the effect of ordering these operations on size explosion (exponential if DEL precedes BIN), and that DEL and UNIT can be internalized into the CYK algorithm at no extra cost, leaving only a linear transformation on grammar size into 2NF form (BIN). This papers omits TERM because it offers no advantage for the CYK algorithm.

Why so concerned about importance? Clarity often comes at remove from original research and should be valued for its own sake. &mdash; MaxEnt 11:27, 3 December 2010 (UTC)
 * Yes, the previous comment was about the Lange & Leiß paper. Please, be bold and go ahead if you want to improve that section. Hermel (talk) 14:59, 4 December 2010 (UTC)

403 Access Forbidden
External Link to "CYK parsing demo in JavaScript" (http://homepages.uni-tuebingen.de/student/martin.lazarov/demos/cky.html) is broken.

A short search didn't result in any alternative location.

I haven't removed the link, since maybe it's a temporary problem (don't think so) and maybe someone else might come up with something. --84.109.241.65 (talk) 05:08, 27 December 2010 (UTC)

Mistake in the Algorithm?
I think it should be

set P[i,1,j] = true

instead of

set P[i,i,j] = true

if the second index denotes the length of a span. —Preceding unsigned comment added by 93.138.47.91 (talk) 00:58, 7 May 2011 (UTC)

|G| in time complexity
I'm not aware of any literature that mentions |G|. Of course the actual time the algorithm takes depends on the "size" of the grammar (I assume |G| is supposed to be the number of productions), but in this kind of problem, I've only ever seen n mentioned. In particular, in the pseudocode given, the loop over all productions is unnecessary, or at least unnecessarily specific: One would implement that as a hash table lookup, which is much faster. -- My point: If nobody disagrees, I'll erase the |G| part. -- UKoch (talk) 22:14, 9 December 2011 (UTC)

Most efficient parsing algorithm?
In the article, the CYK algorithm is called the most efficient parsing algorithm in terms of worst-case asymptotic complexity. Where does that come from? As discussed above, the time complexity is $$O(n^3)$$, and Hopcroft and Ullman mention Earley parsing, which has the same time complexity, and they write that there's an algorithm whose time complexity is $$O(n^{2.8})$$, and another one whose time complexity is $$O(n^3/\log n)$$. (Of course, they give sources for all of these.) I'll remove the "ost efficient" part if no-one objects. -- UKoch (talk) 17:38, 14 December 2011 (UTC)


 * I changed "the most efficient parsing algorithm" to "one of the most efficient parsing algorithms". -- UKoch (talk) 14:08, 20 December 2011 (UTC)

Pseudocode Difficult to Read
Aside from the fact that one-letter variable names obfuscate meaning and are difficult to follow, having a lowercase letter L in a font where it looks nearly identical for the numeral for "one" (cf. l and 1) only compounds the issue. Antigravity711 (talk) 17:06, 27 January 2020 (UTC)