Talk:Abstract syntax tree

Wording
"...since in an AST the grouping of operands is explicit in the tree structure." Wouldn't it make more sense to say "implicit."?


 * Please everybody sign posts adding 4 ~'s. It's implicit in the source code, where it is implied by the precedence and associativity rules, plus the arity of every operator (whether it is unary, binary, etc.). In the AST it is explicit: each operand node groups the operands it has as sons. --euyyn 23:37, 18 June 2006 (UTC)

Has anyone else ever thought of calling it the 'internal reptreesentation ...'?

NO

Confused about context
I'm afraid I'm a bit confused by the article. Does a parser create an abstract syntax tree as part of the compilation process for each individual program? That is, will there be a different abstract syntax tree for every C program I compile? Or is there one "abstract syntax tree" for the language of C, like with attribute grammars? -- Creidieki 18:07, 10 November 2005 (UTC)
 * Yes, the AST is created as part of the compilation process, so each different program has a different AST --Pezezin 11:53, 9 March 2006 (UTC)

Shunting yard algorithm
In the see also section, I just added a wikilink to Shunting yard algorithm. Though this currently redirects to Reverse Polish notation, it had already been suggested to split ''Shun

"Syntax tree"?
Syntax tree currently redirects here, but the term applies equally to concrete syntax trees, i.e. parse trees. As a linguist myself, may I just say: yo, comp sci kids, step off. By which I mean that the validity of this particular redirect should be carefully evaluated and appropriate disambiguation should be put into place after appropriate deliberation. Thank you. 69.140.12.199 05:32, 20 January 2007 (UTC)


 * Although you might assume "syntax tree" is an umbrella term for abstract and concrete syntax trees, I read the Dragon book carefully (which I think is the definitive book on the subject) and it states that
 * Syntax tree and Abstract syntax tree are synonymous
 * Parse tree and Concrete syntax tree are synonymous
 * I recently edited both articles to make this clear. Egriffin 15:55, 30 November 2007 (UTC)


 * Why would the dragon book be "definitive" on that distinction? The authors are not linguists, they simply borrow certain linguistic concepts, and there are many other books. HenkeB (talk) 17:26, 13 December 2008 (UTC)


 * So what is the difference between abstract and concrete syntax trees? Is the distinction made in  universally accepted? Rp (talk) 16:19, 16 May 2008 (UTC)
 * Between concrete and abstract syntax: yes; between parse trees and syntax trees: no, these are pretty lose terms. HenkeB (talk) 17:37, 13 December 2008 (UTC)

WikiProject Philosophy?
Since this is a topic in computer science, and is not related to Philosophy, I believe it should be removed from WikiProject Philosophy. Biomedtechnology 20:31, 26 September 2007 (UTC)

Link to "Tutorial"?
There is a link to a socalled "Tutorial", which I don't see much tutorial value in, it looks more like a proposal for a standard for ASTs (which to my mind seems rather futile). I'd think this article would actually be improved if the link was removed, but someone must have put it there for some reason? --Lasse Hillerøe Petersen (talk) 21:47, 12 August 2012 (UTC)

Design Patterns
This section takes it as a given or at least suggests that ASTs should be implemented in an object-oriented fashion, and, on the basis of this assumption, suggests the use of the Visitor pattern. However, this pattern is nothing but boilerplate to make up for the lack of pattern matching. If anything, this suggests object orientation is perhaps not the best paradigm for implementing ASTs.

Most grammars meant to be mechanically parsed are deterministically context-free and can be defined using Backus–Naur Form or Extended Backus–Naur Form. The former maps directly to algebraic data types. The latter also maps to algebraic data types, but takes a given that the standard library provides an option and a list type, which in turn can also be implemented using algebraic data types. So, perhaps a better suggestion would be the use of algebraic data types as the backbone for the implementation of ASTs. Eduardo León (talk) 21:35, 1 May 2013 (UTC)

Old Fart Retrospective
I have been writing compilers sense before the dragon book. We simply called them trees back then. Although the diagrams look more like a root system.

I think that there should be something about equivalent s-expressions. The example while statement in a tree form can be expressed several ways. It's simplified here a bit. I just wont to get the concept across. Three forms are shown below. The list form has the node name, in bold, as the first element. The tree on the right and branching out to the right. The lower version is the symbolic tree from. Like the list form only the node has been taken out and placed in front of the opening '[' bracket.


 * while (a NEQ b)
 * if (a > b)
 * a = a - b;
 * else
 * b = b - a;

WHILE /    \   NEQ       \ /  \       ELSE a    b     /    \ /     ASGN IF     /    \ / \    b      SUB GT   \         /   \ / \    ASGN    b     a     a    b  /    \ a     SUB /  \                 a     b
 * WHILE[NEQ[a,b],ELSE[IF[GT[a,b],ASGN[a,SUB[a,b]]],ASGN[b,SUB[b,a]]]]


 * [WHILE,[NEQ,a,b],[ELSE,[IF,[GT,a,b],[ASGN,a,[SUB,a,b]]],[ASGN,b,[SUB,b,a]]]]

The three forms of showing a tree above were called pictorial form, tree form and list form. The list and tree forms are like polish prefix notation. With the tree form we can talk about the parts and analyze them. For example The WHILE node has specific branches.

WHILE[condition,expression]

The while executes the expression as long as the condition is true. A compiler could generate code for the condition and expression. It's hard to explain the advantage of the different trees notation without getting into code generation. So I am just going to touch on it here and if somebody thinks this should be in the topic. Use it.

The code can be generated condition first or last. It we generate expression code first than the first thing we output is a jump to the condition code and then a label to start the loop. If we generate the condition first we would output a label to start the loop. So lets do the condition first. A compiler needs a way to create symbols. Call them gensyms short for generated symbols that are not part of the language being compiled. Symbolically I am going to represent a generated symbol as %G %G1, %G2 etc. Lets say we have function falsegen(bexpr, loc) taking a conditional tree bexp and a gensym loc. The gensem is the location to go to when the condition is false. Another function gencode(tree) that generates code for a general tree. In this example %g1: would assign the current code location. Assume that we have a mechanism for getting/generating address of an as yet undefined symbol. A fix up, cashing code, or tracking were the code makes forward references. So in a pictorial form the code generation of a while loop would be:

%G1: falsegen(condition,%G2); gencode(expression); goto(%G1); %G2:

generating the expression first using a truegen function would be:

goto(%G2); %G1: gencode(expression); %G2: truegen(condition,%G1); Again for reference codegeneration of: WHILE[condition,expression] The different tree representation gives a way of talking about them. Abstracting the tree branch using descriptive names is an advantage in understanding how to generate code. In 1970 there was a compiler compiler that actually generated code very similar to my example. I should mention that gensyms were local to the function instance.

--Steamerandy (talk) 08:02, 30 October 2014 (UTC) fixed examples that were not displaying right Steamerandy (talk) 22:32, 4 May 2017 (UTC)

Does this make sense?
The article contains the sentence "Although this method can lead to a more efficient compiler, it goes against the software engineering principles of writing and maintaining programs". What software engineering principles of writing and maintaining programs? On it's own the statement is meaningless. For it to have meaning it would need to specify the principals the method violates (and where necessary to indicate how a violation occurs). FreeFlow99 (talk) 12:58, 18 January 2021 (UTC)


 * I agree it's ridiculous. The preceding sentence, "This parse tree can be used to perform almost all functions of a compiler by means of syntax-directed translation.", appears to misunderstand what "almost all functions of a compiler" implies. Maybe the author meant to say "This parse tree contains all the information of an AST.", but this is obvious. This is also the first I've heard the claim that retaining useless tokens in a syntax tree will somehow "lead to a more efficient compiler". I'm going to remove and reword this because I really don't think an RS exists to support it—the claim doesn't even pass the smell test. What engineer would avoid using something that workably leads to a more efficient compiler? TricksterWolf (talk) 16:05, 7 December 2021 (UTC)


 * I also removed "Java provides an excellent example, where the '+' operator is both numerical addition and concatenation of strings." from the previous paragraph. I don't think an example is necessary to explain that operator overloading is context-sensitive, but I'm not sure this is an 'excellent example' either due to how differently the compiler and JVM treat String concatenation and how this differs substantially from operator overloading in other languages. The + and += overloading in Java trigger a lot of complexity (creating objects, invoking methods, some at compile time and some at runtime) that is a fundamental part of the Java language and JVM. I may come back later and rewrite this entire section if I have time.TricksterWolf (talk) 16:18, 7 December 2021 (UTC)