Talk:Recursive descent parser

Untitled
This page really needs a discussion of the problem of left-recursive grammar rules. Such rules are extremely common, as for example

EXPR = INT | EXPR + EXPR | EXPR * EXPR;

A recursive descent parser, presented with this grammar, will loop forever on a malformed input.

Should probably mention that hand-written recursive descent parsers are usually based on EBNF grammars? These support left-recursion by turning

A = A b C | C

into

A = C (b C)*

Paolo Bonzini, 09:34, 8 Mar 2006 (UTC)

parser Programming Languages do the same:

expr = term $(('+':ADD|'-':SUB) term!2);

This constructs a left descending tree bottom up. Syntax formula like expr are compiled into executable code.

expr: call  term je     %g1                   //success ret                              // fail ne status %g1: push S,(address of)"+" call  strcmp jne   %g2 push N,ADD_NODE jmp. %g4 %g2: push S,(address of)"-" call  strcmpr je     %g3 cmp X,X                   // set eq success status ret                            // return success %g3: push N,SUB_NODE %g4: push S,2 call   maketree ret 216.146.244.139 (talk) 20:05, 9 August 2021 (UTC)

Recursive descent parser example
Dominus 04:44, 28 Aug 2003 (UTC)

The code on the page is wrong. Parse_E will parse the string NOTTRUETRUE which is not part of the language.


 * I don't think it does. Care to demonstrate? -- Jan Hidders 11:22, 28 Aug 2003 (UTC)


 * maybe the problem is hidden behind the if's, which should be else-if's??


 * Ah, yes, of course. Hope I fixed that now. -- Jan Hidders 00:51, 4 Nov 2004 (UTC)

Does it accept 1*-1? I'm not sure it does
Does not look like it accepts this:

//                    ident :=      number   *    -   number.
 * Correct.

Symbol symbols[7] = {ident,becomes,number,times,minus,number,period}; Should it?


 * The parser is based on the [PL/0] grammar (a compiler used in many compiler construction courses), which is in turn based on Pascal. For reasons unknown to me, standard Pascal does not allow imbedded unary expressions.  For instance:

ident := -ident * number;
 * is allowed, however, expressions such as yours are not.


 * To allow this, you could add "-|+" to the factor production, as in:

factor = ident | ("+"|"-") factor | number | "(" expression ")"
 * And change the parser appropriately.


 * Since the [PL/0] grammar is so well known, I'm of the opinion that the grammar should be left as is. 68.219.72.181 13:47, 17 April 2006 (UTC)

Question about See Also entry
There is a new See Also entry that I have a question about.

It points to a 'source code' site (nothing wrong with that), but it doesn't seem to add very much. It is essentially the example, copied from this wikipedia site (without any notice of the copying, by the way), plus another simple example.

Should this link be deleted? 67.34.42.125 12:27, 22 May 2006 (UTC)

expect in C implementation
Before returning, expect emits an error internally if necessary. No client of expect tests the function's error result; ignoring it is legal but dubious practise which detracts from clarity in example code. The function's type should be void not int. — Preceding unsigned comment added by 203.211.96.228 (talk) 00:01, 3 March 2021 (UTC)

PEGs
I have removed references to parsing expression grammars in this discussion on recursive descent parsing, as the content referring to PEGs, at least as formulated, tended to give the impression that PEGs are a widely adopted formalism. Parsing expression grammars, while analytic, are a formal grammar, and RD parsing is a parsing strategy (algorithm). QTJ 04:36, 11 October 2006 (UTC)

Bug in the parser code
The code inside block, after "while (accept(procsym))" seems to be nonsense, right? There are 2 nested while-loops and as far as i understand the grammar only the outer one should be there. Catskineater (talk) 17:28, 30 March 2008 (UTC)

SML example
I just wrote one of these in SML for an assignment. Should I post it here under Implementation in functional languages? bkhl (talk) 07:47, 13 May 2008 (UTC)

Recursive descent with backup
The "recursive descent with backup" discussion, while correct, troubles me. "X with backup", for any parsing algorithm X, will parse any context-free language. It will do so, except in a very few cases, at the expense of every virtue which the original algorithm X had.

Perhaps the "backup" discussion might be moved later in the article. I'd almost suggest deleting it, but the topic does come up. Almost automatically, whenever algorithm X proves inadequate, backup will be suggested. Quite often, it will be tried. My wishing it were not so, does not help. So some kind of discussion of "backup" as an alternative probably needs to be kept.

--Jeffreykegler (talk) 03:23, 26 May 2010 (UTC)

Expansion necessary
The example uses no token look ahead at all and thus is not really helpful to understand the strength of recursive descent parsers. There should at least be a bit of ambiguity in the grammar, otherwise the example is too trivial and - even worse - points the reader into wrong directions. — Preceding unsigned comment added by 139.30.5.126 (talk) 18:33, 27 March 2012 (UTC)

The article doesn't explain what a Recursive descent parser is
Sorry to nag, but this article doesn't provide an answer to the simple question of what a Recursive descent parser is. I'm sure that the information is technically correct in a formal way, but an encyclopedia article really should provide some understanding of the subject to an interested reader. I suspect the authors are very talented people that are communicating among themselves, but to the average person this is gobbledygook. I've been programming since 1972, know FORTRAN, Pascal, C, assembly languages for the PDP-8, CDC 6400 and 6600, 8080 and 8086 Intel chips, Python, APL, PHP, and probably a few others, and I don't know what a recursive descent parser is after reading this article. If I'm lost, many others are also going to be left bewildered by the article. Kd4ttc (talk) 04:54, 22 November 2012 (UTC)

Another bug in example code
The grammar rules suggest that the following string should be accepted by this grammar:

begin foo := 5 ; end

i.e. The grammar rules accepts (in fact, requires) a semicolon between the "5" number token and the "end" endsym token whereas the implementation of the relative portion of the "statement" function looks something like:

if (accept(beginsym)) {   do    { statement; }   while (accept(semicolon)); expect(endsym); }

This only accepts strings of the form:

"begin" statement { ";" statement } "end"

Or, am I missing something? This example code is as old as the hills so it seems unlikely that such a bug could've been around for so long, but I guess it's possible.

Update: The version of this example parser at the URL http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Recursive_descent_parser.html has a modified version of the grammar to reflect this. I'll update the article when I get the chance.

Update: I updated the grammar. It now matches the grammar as specified on the PL/0 page (without the | "?" ident | "!" expression rule) and corresponds to the C implementation given.

--Richardcook (talk) 15:45, 18 July 2014 (UTC)

Bad description
I know we have the accepted description. But it doesn't really describe most real world implementations.

The first implementations of these parsing languages were most likely the metacompiler's developed by Val Schorre. These are a type of TDPL languages.

The languages are very close to the example language used. $ is used for zero or more instead of { } and are used for all grouping.

expr = term $(('+'/'-') term); term = factor $(('*'/'/') factor); factor = .number/.id/'('expr')'; These languages included transforming constructs not shown above.

Each rule, like TDPLs, is a function returning success or failure. They read the source program as an input character stream and output the transformed code.

These are not substitution rules. They are analitical rules. Exactly the opposite of production rules.

Second generation Schorre metacompilers transformed the input into abstract syntax trees: expr = term $(('+':ADD/'-':SUB) term!2); term = factor $(('*':MPY/'/':DIV) factor!2); factor = .NUMBER/.ID/'('expr')' ('^' factor:XPN!2|.EMPTY); There are two operational stacks. The node stack holds nodes. The : operator creates a node object and pushes it onto the node stack.. A tree is constructed on the parse stack. A recognized token (.ID or .NUMBER) is pushed onto the parse stack. The ! operator creates a tree using the most recent created node and top of parse stack objects specified by the. Nodes are placed on a stack when created by a the : opetator. The ! operator removes the top of parse stack entries and combines them with the poped top node stack entry to form a list of +1 objects. The newly created list, tree representation, is pushed onto the parse stack. Recognized tokens .number or .id are placed on the parse stack when recognized. The rules operating on the string "A + B" would have recognized A and B placing them on the parse stack. The ":ADD" operator upon recognition of the + would of created an ADD node and pushed it on the node stack. The !2 removes the top node stack "ADD" and top 2 parse stack entries A and B. Combines them creating a tree ADD[A, B] and pushes it on the parse stack.

These transform by analysis the input into an abstract syntax tree. Steamerandy (talk) 20:04, 24 December 2014 (UTC)

The Schorre line of metacompilers illistrate top down parser programming languages. I have seen and been classing them as recursive decent.

They are explained as inferior to LR parsers.

In my experience there has been no need of undoing any previous action except in the case of an error in the input. Any parser has the same problem.

In my experience Tree Meta and CWIC are more efficient then LR parsers. All the Schorre metacompilers provide at least 1 symbol look head. The CWIC expr rules:


 * expr = term $(('+':ADD | '-':SUB) term!2);
 * term = factor $(('*':MPY | '/':DIV) factor!2);
 * factor = ('(' expr ')' | number | id) ('^' factor:EXP!2|.EMPTY);

The above will parse an arithmetic expression and build an abstract syntax tree.

In CWIC there backtracking is limited only by book keeping storage. A higher level rule would use a backtracking alternate to catch an error. The parsed structure would be released and the input reset to backtrack point. A look ahead test would be used to find a restart point. Usually a negative match looking for a statement termination character.


 * skip = $(-';' .any) ';';

The above would skip over any character not a ;. The - operator nagates the success or failure of the test that follows. In neither case is the parse is advanced. .ANY matches any character. Literal string matches are not normally kept. The '+' string test in the expr rule will match the character leaving the parse on the character following the matched character. There is a lot to these metacompilers. But the point I am trying to get at is they are programed. They are grammar rules that have a deterministic translation. They are logic equations returning success or failure. True or false if you will. A rule's operation, code wise, is known. The way one writes the rules determines the order of alternative tests etc.

The way these systems transform the input directly into an abstract syntax tree is direct eleminating the generation of a parse tree and reprocessing it to get the abstract syntax tree.

A fuller description of Schorre Metacompilers can be found in my personal space: Schorre Meta languages CorrecionsSteamerandy (talk) 18:22, 12 September 2018 (UTC) Steamerandy (talk) 11:07, 8 September 2015 (UTC)

Knuth on recursive descent
Granted that this article doesn't have a history section at this point but this might be useful, from http://www.bayfronttechnologies.com/mc_tutorial.html

In 1990 when Donald Knuth was asked what helped to crystallize his thinking about attribute grammars he related the following [Knuth90].

"Many details of those two days (February 4-5, 1967) still remain fresh in my mind. ... Peter [Wegner] asked me what I thought about formal semantics, and I said I liked Iron's idea of synthesizing an overall meaning from submeanings. I also said that I liked the way other people had combined Irons's approach with a top-down or "recursive-descent" parser; Irons had used a more complex parsing scheme. In particular, I was impressed by the metacompiler of Val Schorre [Schorre64] and by the more powerful (though less elegant) "transmogrifier" of Bob McClure [McClure65]."

[Knuth90] D. E. Knuth, "The Genesis of Attribute Grammars," In Proceedings of the International Conference on Attribute Grammars and their Applications (Paris, France). P. Deransart and M. Jourdan, Eds. Springer-Verlag New York, New York, NY, 1-12. 1990.

The "other people" that he referred to were probably Waychoff et al. at Burroughs for whom he consulted, however they rarely published formally so he probably felt it inappropriate to name them. MarkMLl (talk) 11:56, 1 September 2017 (UTC)

Pascal Implementation is Redundant
The Pascal implementation is identical to the C implementation. The differences are lexical and superficial: Rather than:

6 of the top ten most popular languages today are C-derived. Virtually any programmer viewing this page will be familiar enough with C syntax for the C code to function as pseudocode, so the Pascal implementation is completely superfluous.

Moreover, both implementations omit `nextsym` and `error`. It would be much more valuable to have one *complete* implementation than two identical, incomplete implementations.

EricTetz (talk) 00:16, 8 August 2019 (UTC)
 * I agree with this point, especially "Virtually any programmer viewing this page will be familiar enough with C syntax for the C code to function as pseudocode". The Pascal example states in a comment that it's a port of the C example, and it's almost transformable to C with search-and-replace. The three paragraphs describing it are identical to those describing C, word-for-word, except for the single substitution of "C" to "Pascal". As it currently is, it looks like a 157-line copy of the immediately preceding section. The which added the Pascal example says "helping non-C programmers with another dialect example", which is noble in thought, but I think out of scope; Wikipedia doesn't need to be 99-bottles-of-beer.net.
 * If an implementation in another language is warranted (which I don't really think it is) then may I suggest that it not be in a C-like procedural language; perhaps there's value in using a functional language like Haskell, since the whole algorithm is recursive in nature and Haskell is a very recursive language. oatco (talk) 18:55, 2 May 2020 (UTC)
 * I decided to go ahead and be bold and remove that section. The page is immediately 25 % shorter. oatco (talk) 19:03, 2 May 2020 (UTC)