Talk:History of compiler construction

Untitled
This article comes from an outgrown history section in the article Compiler along with older references. There appears to be enough material to create a good article although it definitely needs some work. See also. pgr94 (talk) 13:30, 29 January 2009 (UTC)

I'd agree that such an article is academically useful, although, I might add, some might belong in the Compiler article proper as still technically relevant, fostering understanding with seminal concepts. --- (Bob) Wikiklrsc (talk) 15:18, 29 January 2009 (UTC)

This deserves to be an article in its own right. Do not merge with Compiler or Compiler construction.Paul Foxworthy (talk) 12:10, 30 June 2011 (UTC)

I'm interested in this sentence: "LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting" because from what I've learned, known, and experienced, LR parsing cannot detect exact error location because the reduce step could have more than one possibilities. It can only be "somewhere around here". Should it be corrected? Or if anyone has a proof, please tell me. Leledumbo (talk) 16:23, 08 December 2011 (UTC + 7)

Hi Leledumbo, the LR parser article states

LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible. This is in contrast to an LL(k) (or even worse, an LL(*) parser) which may defer error detection to a different branch of the grammar due to backtracking, often making errors harder to localize across disjunctions with long common prefixes.

No source cited, but I hope that helps. Paul Foxworthy (talk) 13:07, 11 March 2012 (UTC)

I thought that's where this article takes the sentence from, that's why the sentence is exactly the same. Anyway, from papers I read, each author says different things depending the method he/she prefers to use, and I never one that really speaks the fact about both methods from neutral point of view. --Leledumbo (talk) 05:53, 12 March 2012 (UTC)

Direct quotation of a paragraph?
The paragraph on Frances Allen appears to be a direct quotation from her Turing award citation. Should this be sourced differently? 108.36.110.176 (talk) 12:06, 21 June 2013 (UTC)

Link no longer exists
This reference contains a link to a page at the Computer History Museum that no longer exists.



RussAbbott (talk) 20:52, 30 June 2013 (UTC)

Inconsistency
"Any program written in a high level programming language must be translated to object code before it can be executed" Is inconsistent with actions performed by some interpreters. Interpreters must perform similar code analysis. But translation to object code is not necessarily the result.

--Steamerandy (talk) 17:30, 27 October 2014 (UTC)

The first self-hosting compiler appears to be the NELIAC ALGOL compiler in 1958. Not LISP in 62. Steamerandy (talk) 07:11, 3 January 2015 (UTC)

BNF was not used in the specification of ALGOL 58. It was first used in the specification of ALGOL 60

Steamerandy (talk) 22:56, 24 March 2015 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 1 one external link on History of compiler construction. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive https://web.archive.org/20111013021915/http://www.computerhistory.org/events/lectures/cobol_06121997/index.shtml to http://www.computerhistory.org/events/lectures/cobol_06121997/index.shtml

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

Cheers.—cyberbot II  Talk to my owner :Online 22:20, 26 January 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on History of compiler construction. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20140525030319/http://cm.bell-labs.com/who/dmr/chist.html to http://cm.bell-labs.com/who/dmr/chist.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 10:18, 3 April 2017 (UTC)

Parser Programming (grammar description) Languages
META II and other META "languages" based on Dewey Val Shorre's transformational metalanguages are in part grammar description languages. That includes TREE-META and CWIC. Dewey was a member of the CWIC, Compiler for Writing and Implementing Compilers, development team.

I developed SLIC System of Languages for Implementing Compilers in the 1970s based on CWIC. I do not know the limitation of these parser languages. They have not been properly studied by Computer Science. In fact they are mostly ignored by CS. Probably a not invented here situation. I know of no compiled programming language whoes parser can not be written or handled by them.

How they do this is hard to explain today because of terminology changes over the years. To start I wont to explain the overall operation of SLIC. SLIC was planed to have 5 sub-languages:


 * 1) SYNTAX Parser programming.
 * 2) GENERATOR Code sequencing.
 * 3) ISO In Sequance Optimizing.
 * 4) PSEUDO (macro like) instructions
 * 5) MACHOP MAChine OPeration

The MACHOP language is used to define assembly machine instructions of the target machine. MACHOPs are actually called to output a sequance of bit fields making up a machine instruction. A MACHOP may define a family of instructions by way of vectored entry. A vactor is a mechanism developed in CWIC. Basicly corresponding arrays. Calling a vectored entry function simply sets up the function to use corosponding vectors's values. For example DEC-10 user mode instructions have identical formating. Only differing by specific values in some fields depending on its mneumonic opcode name entry. A vectored entry sets specific local variables to corosponding vector values. Commonly only the actuall opcode numeric.

The PSEUDO functions are liken to assembly macros. They were developed from CWIC's GENERATOR language. The genetator language changed to emit PSEUDO calls. Code generation is into named (declared) code sections. Emmiting a PSEUDO instruction creates a PSEUDO call block and appends it to a section's code list. A generator flushes a section to output code. On flushing ISOs would run on the pseudo code list and then each PSEUDO would be called and deleted. You may have as many sections as you wish. The SLIC linker handled linking SLIC load files and libraries but one had to write output formating for specific machines and/or program loaders.

The genetator language was developed from LISP 2. LISP 2 was a greate choice. LISP requires the same type of underlying memory management a compiler requires. With the advent of OOP languages it is hard to classify these languages. I originally used the term object based. But now object based has been used and doesn't fit. So I'll just say that most every pice of data is an object except the input character stream and output.

So in the GENERATOR, PSEUDO, and MACHOP languages variables contain objects or NIL. The syntax languages transforms its input into atomic (numeric, string, and symbol) objects Symbol objects are symbol table cataloged objects.

Now for the parser languages consisting of test formula. Were test is an action that may succeed or fail. A test formula is liken to a boolean function, returning success(TRUE) or failure(FALSE). A test formula is defined by a test expression. A test expression may be a sequance of tests that functions similar to a boolean AND expression. Really more a chain of if then else expressions. A sequance

'A' 'B' 'B' is like

IF nextchar=='A' THEN IF nextchar=='B' THEN IF nextchar=='B' THEN // continue else LONGFAIL; // backtrack else LONGFAIL; // backtrack else // first test failure no backtrack.

A string test compares the quoted string to the input stream. On success the matched characters are consumed advancing the input. We can use a negative string test:

-"something"

Failing on matching the input.

Note these are general operators. I am using quoted string test examples for simplicity. I have tried to place these languages grammar la guage in the Chomsky hierarchy but they really do not fit. Being top-down context is handled naturally. The GENETATOR language of CWIC and SLIC were based on LISP 2. Having the LISP runtime is a great asset. When a token is recognozed a token object is created and pushed on the parse stack.

There are 3 levels of parsing formula.

Character classs formula name a group of character tests.

dgt: '0'|'1';

A character class is defined by a list of character constants. Character classes are generally used in token .. formula. Token formulas are declaritive functions. I have implemented be creating a class table. An array indexed by a character's ordinal value contains a bit array or mask whoes value indicates class memberships.

These grammar description languages are best described as parser programming languages. Though in appearance they look like a declarative language they are really impariative. I now use the term formula hopping to get away from that association.

A formula is a type of Boolean function. I use the term test because a boolean test result comes from attempting an action. Parsing an element is a test action as the parse may succeed or fail. Unlike Chomsky grammars that do not have rigid semantics these grammars are easier to use. A term in Backus–Naur form, BNF, for example:



In the above we have to know a lot about the parser and parser generator we are writing for in ordee to produce a reasonable tree.

The same in CWIC's parser language:

expr = term $(('+':ADD|'-':SUB) term!2);

Here expr as a callable function as is term.

In the above expr first calls term. If term returns failure then expr returns failure. If term is successful we assume a term tree to be on the parse stack. On failure we expect the parse point and stacks to be unchanged. expr simply teturns failure.

On success recognizing that first term we begin the programed $(...) zero or more loop. The first operation of the loop is a grouped test:

('+':ADD|'-':SUB)

This recognizes a + or - character. The first alternative '+':ADD tests for the next non- skip_class character being a + character and creates an ADD node and pushes it onto the node stack. if unsuccessful it tries the alternative '-':SUB testing for the - and on success pushing SUB onto the node stack. Assumming we found a + or - and have the ADD or SUB node on the node stack we look for a term. On success !2 creates a 2 branch tree poping the top node and top 2 parse stack entries into a 3 element list.

[, , ]

4+5   -> [ADD,4,5]

Trees though are usually displayed with the node first list element ahead of the branches within [..]

ADD[4,5]

Parser programming languages are part top-down reductive analytical grammars combined with stack based tree construction language.

It would be hard to improve on CWIC's SYNTAX (parser programming language). When I implemented SLIC based on CWIC I only made two changes to the SYNTAX language neither of which effected its parsing capabilities. Basicly the changes I made were for readability simplified and/or shorter ways of doing some opetations. An example was the $ zero or more opetator. I added a % backtracking zero or more operator.

%x

is equilivant to

$(x \ .FAIL)

Although the documentation I had on CWIC didn't explain $ as backtracking or not. I assemed it didn't. And in many cases backtracking wasn't needed. Would have been unnecessary overhead. But on the other hand when backtracking was required adding the "\ .FAIL") characters would offten cause the line the wrap. Thus the % was added for cosmetic readability.

In these languages you program a top-down reductive analyses. Basically a recursive decent parser on steroids. CWIC and SLIC compiled to machine code creating fast efficient compilers.

SLIC was used to write a COBOL cross-compiler. It implemented exactly the DEC-10 COBOL syntax. Having the same RECORD CONTAINS bug as DEC's. The COBOL compiler was completed taking approximately three man-months. It compiled more lines per min then the supplied native DEC-10 COBOL compiler. And we fixed the RECORD CONTAINS bug in about 10 minutes when discovered.

Note. I use the turm formula. The term formula may be defined as a recipe, A procedure, steps, to be followed.

Parsing formula are test functions not production rewrite rules. A test is an action that may succeed or fail. An example is worth more then words:

expr = term $(('+':ADD|'-':SUB) term!2);

The expr formula, test function, first calls term. Assuming term is a parsing formula. it may succeed or fail. On success the

$(('+':ADD|'-':SUB) term!2)

zero or more loop then looks for additional terms preceded by a + or - symbols. The grouped sequence test

(('+':ADD|'-':SUB) term!2)

term = factor $(("*':MPY|'/':DIV) factor!2); factor = (id | number | '(' expe ')' )( '^' factor:POW!2|.EMPTY); // Token formula: id .. let $(alphanum|+'_'); number .. dgt $dgt MAKENUM[];

The above set of parser formulae produce a left or right handed syntax tree appropriate to the operation. +,-,*,/ left handed and ^ right handed. The $ zero or more loop operator is equivalent to the * used today in regular expressions. The $ loops buold the trees bottom up.

I use left handed or right handed describing the branching direction relative to the parent node.

The expr:

12*x^3-3*(t^2+5)-8

parsed by the above formulae would be transformed into a tree: SUB /  \        SUB     8 /  \    MPY     MPY /  \   /   \ 12   POW 3     ADD /  \     /   \    x     3  POW    5 /  \           t     2

SUB[SUB[MPY[12,POW[x,3]],MPY[3,ADD[POW[t,2],5]]],8] The tree is constructed on the parse stack.

Token formula recognize token strings creating token objects that are pushed in the parse stack.. By default a recognized token is cataloged into the dictionary creating a symbol object. There are supplied interceding functions the create numeric and string objects bypassing cataloging. The : operation. crerates a node object and pushes it on the node stack. ! pops the top number of parse stack entries and top node stack object in to a list. The list then pushed on the parse stack. The stack tree construction is a global system allowing tree construction independent of the formula allowing factoring that eliminates backtracking. Formula are test functions. A test being an action returning success or failure. On success the parse has advanced. On failure the parse state is unchanged. Test can be thought of a boolean type: success=true and failure=false. A formula is a test function. Test also being an action are performed in left to right order and down. A test may long fail by passing stacked formula formule. The backtrack alternative \ sets a backtrack point ahead of its left alternative. A backtrack is liken to a longjump in c. Only the parse state was save and on a failure return the parse state restored: program = $( ( declaration                       // A program is a sequance of                          | .EOF .STOP)                      // declarations terminated by                                                                       // End Of File or on an error                        \ ERRORX["Error"]                // backtrack and report error                                                                       // flaging furthest parse point.                          $(-';' (.ANY|.EOF .STOP))  // Error recovery find a ; or                                                                      // terminate on an EOF. On matching                           ';');                                      // a ; continue the declaration loop The above program formule defines a program to be a sequence of declaration that is terminated by End of File .EOF

But includes error reporting and recovery.

You can find a short paper on CWIC in the ACM archive.

SLIC'S SYNTAX and GENERATOR were basically the same as CWIC's. Only the generator language was changed to produce PSEUDO instructions appending them to section lists. CWIC produced 8, 16 and 32 bit value into memory blocks associated with named sections. Code write by a .FLUSH SLIC added PSEUDO and MACHOP defining languages. PSEUDO procedures were coded in LISP 2 dialect simular to GENERSTOR actions. MACHOP are also procedurally defined output functiond. MACHOPs are use to define a target machines instructions. The define an instruction or family of like formated instruction as a sequence of bit fields. Conditional expressions are used in selecting different sequences or field values. The also specify assembly listing formating that optionally can be output to the compile listing. A helpful compiler debugging aid.

I am interested in deturmining the grammar type. Reductive is not recognized today. CWIC should be in this history. But as it is it doesn't fit. These were called metacompilers. All Shorre based metacompilers work in simular manor. META II had a *stack that in the output productions were pope and output by a *1 argument. CWIC used a *1 argument in generator calls from a syntax formuls to pop the top parse stack entry. .ID, .NUMBER, and .STRING were built in token recognizing functions in parser languages previous to CWIC. Steamerandy (talk) 23:06, 31 October 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 external links on History of compiler construction. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20070921161049/http://hopl.murdoch.edu.au/showlanguage.prx?exp=242 to http://hopl.murdoch.edu.au/showlanguage.prx?exp=242
 * Added archive https://web.archive.org/web/20120328052718/http://www.dickgrune.com/Summaries/CS/CompilerConstruction-1979.html to http://www.dickgrune.com/Summaries/CS/CompilerConstruction-1979.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 02:06, 5 November 2017 (UTC)