Talk:Compiler/Archive 1

where is runtime system/library ?
I think it is indispensable for some compiler dealing with Fortran 90 and OpenMP.

Moved from original Compilers page
Compilers - Tools for creating machine-readable programs

A compiler is a software programs which takes as its input a set of modules written in a ProgrammingLanguage and creates a machine readable binary executable image file.

Computer hardware can only understand very specific binary machine language instructions. Each instruction must be written in an extremely precise format and must conform to extremely limited abilities of the computer's CPU (Central Processing Unit). Such binary machine language is too cumbersome for humans to read, write, and understand directly.

Compilers have the ability to understand the more abstract, symbolic, human-friendly programming languages and to generate a low level binary machine language program which executes *exactly* what the programming language specifies.

Moved from original Talk page
test Right, I forgot about that. Most of my experience programming was in the mid-80s, tinkering around with BASIC. Not exactly a powerhouse language. :-D

Compilers do not always take source code and output executable machine code. The program may have to be processed by a linkage editor and/or loader program first.

... Steve Gibson (http://grc.com/) releases his programs in machine language. Not sure how he does it... If he actually writes in it or what. I would imagine not but haven't asked. I suppose it's possible, though that would probably make him a certifiable genius. :-D

Yes he writes in assembler. See How to write Windows programs in assembler. There is even an link to a free assembler for 386 Intel processors and beyond (486, Pentium etc.).

It would be nice to have an assembler entry with some example programs.

No. Steve Gibson packs his executables with a custom/obsolete version of UPX and claims them to be Assembly. Please verify this for yourself. Goat-see 05:00, 8 May 2004 (UTC)

Moved from Compile (computing)
In telecommunication, the term compile has the following meanings:

1. To translate a computer program expressed in a high-level language into a program expressed in a lower level language, such as an intermediate language, assembly language, or a machine language.

* See: Compiler

2. To prepare a machine language program from a computer program written in another programming language by making use of the overall logic structure of the program or by generating more than one computer instruction for each symbolic statement as well as performing the function of an assembler.

Source: from Federal Standard 1037C

Semantic analysis
Shouldn't there be anything said abort semantic analysis, thats an important part of the frontend to. Type-checking and such.

Source code/language
I changed "source code" to "source language" because a compiler can be pretty anything that translates a language to another one. This includes a lot of translations which are not source code, such as binary-to-binary translations and just-in-time compilation processes. "Source code" in its normal meaning refers to a list of instructions in a language understandable to humans. Source language can be anything that can be griped by formal language definitions (and that is pretty much anything).

Compiled vs interpreted
The article makes a distinction between compiled and interpreted languages. While most languages are either primarily interpreted, or primarily compiled, any language that can be compiled can be interpreted, and (modulo eval statements and the like) any language that can be interpreted can be compiled. In practice, there are compilers for normally-interpreted languages such as Perl (which is compiled to bytecode anyway), LISP, and Prolog, and interpreters for such commonly-compiled languages as C and C++. -- Robert Merkel

Definition of "compiler"
The original definition of compiler is not necessarily bound to generation of executeable/object code. The general definition in the dragon book (which would worth a mention on its own on the compiler main page) goes roughly as "a compiler reads a program written in one language - the source language - and translates it into an equivalent program written in another language - the target language". Also a rough lineup of the history of compiler development, maybe split from "first FORTRAN compiler to dragon book" and "dragon book until now" would be nice. -- scut


 * From language compiler: a programming language compiler is an application that translates text of computer program written in some human-understandable programming language into the native code (machine code) of a particular physical processor, or into some virtual (software) machine code (e.g. Java compiler). --Anonymous

Need help in compiler construction
Hi, I need some help in my compiler construction course. I acquire the famous Dragon Book but it seems to me very difficult as it contains all examples in pascal & I had C/C++ background. So if you know any good book or link then please share with me. Thanks in advance, Iqbal


 * This is not a compiler discussion forum. It is for discussion of the Compiler entry of Wikipedia.


 * Oh, and the Dragon book is definitely the book you want. Learn Pascal.  It's not that hard.  -Doradus 20:15, 9 Jun 2004 (UTC)

On the other hand, Compiler construction *is* a compiler discussion forum. --DavidCary 04:00, 27 Jun 2005 (UTC)

Pascal
"The Pascal programming language is well known for this capability, and in fact many Pascal compilers are themselves written in the Pascal language because of the rigid specification of the language and the capability to use a single pass to compile Pascal language programs."

I don't understand the logic here. --Grouse 13:37, 10 Jul 2004 (UTC)


 * Nor do I. I've heard similar sentiments expressed as an article of faith in Pascal - 20 years ago. Charmingly quaint now, but quaintness is not part of the mission, and it should go away or be rephrased. Stan 19:10, 10 Jul 2004 (UTC)


 * I suspect the real reason "many Pascal compilers are themselves written in the Pascal language" merely because Pascal is used in the Dragon book. But the authors picked Pascal because of those above reasons.


 * Let me expand on what I *think* that sentence in the article is trying to say:


 * If I had to write a compiler for some given language from scratch, I wouldn't implement the compiler in Pascal. I would pick some more modern language (perhaps using Lex, AWK, Bison, etc.), something incredibly sophisticated and full-featured. However,
 * if I had to write a compiler, and I could pick what language it compiled, I'd pick something really simple and easy like assembly language or L00P. I'd still try to implement the compiler in some more modern language.
 * The idea of a "self-hosted compiler" is intellectually challenging. If a language is really great, wouldn't it make it easy to write everything in it ? Even compilers ? Even it's own compiler ? But if I get to pick the language, I would never pick some more modern language -- there's far too much stuff to implement. But I wouldn't write such a thing for L00P, either -- there's only a few things to implement, but the code is so difficult to read, write, and test. I think Pascal is close to the "sweet spot", and Dragon book seems to agree. If I had to rank languages from "easiest to write a self-hosted compiler" to "most difficult", I would guess something like Forth, Pascal, Lisp, assembly language, BASIC, Small C ...
 * Does that make sense ? Of course, I'm far too long-winded -- is there a better way to rephrase this ?
 * --DavidCary 04:00, 27 Jun 2005 (UTC)

I would disagree with your assessment. There are two very good reasons for a language compiler to be self-hosted And some good reasons to use Pascal or Basic for a compiler as opposed to C
 * 1) You do not have to have two code bases, one consisting of the test packages for the compiler (in the language the compiler processes) and a second code base consisting of the compiler source itself.  You can even use the compiler as a test case for itself.
 * 2) If the compiler is open source, the users of that compiler can (theoretically) submit bug fixes and updates, since they can actually read the source code.
 * 1) Unless you're writing an assembler or you can write the compiler's its own run-time library 100% in itself, there is going to be some assembly language used in the compiler, but usually you want this as small as possible.  So it means that every language has some assembler in it, at least at the run-time library level (I'm not counting the assembly or object code generated by the compiler.)  Now, if you can find a way to do run-time startup and setup code in the target language then it is theoretically possible not to have to use it, but I suspect sooner or later there's going to be "some assembly required." :)
 * 2) I have actually looked at the code involved in at least two self-hosted basic compilers, and I've written a Fortran to Visual Basic translator (using Visual Basic) and I think, if the level of strength of the compiler is good, it is no more difficult to write a compiler using Basic than Pascal or any other language, and probably easier than C because Basic (and Pascal) have built-in string types which C lacks. (C's strings are an abstraction, and they don't allow trailing zero bytes, and to know the length of a string you have to count it every time; Basic and Pascal use strings with a length prefix and you can store anything in them.) Paul Robinson (Rfc1394) 16:00, 3 April 2007 (UTC)


 * On the other hand, perhaps my understanding is quaint and out of date. Please bring me up-to-date. Perhaps there is some *other* language that is now considered better for learning about self-hosting? Perhaps the idea of "self-hosting" is now considered unimportant?
 * --DavidCary 04:00, 27 Jun 2005 (UTC)

I mostly agree with David. Also keep in mind that in the old days, a language that was not self hosting was considered a failure, or at best not truely a general purpose language. (since if GP, one could implement a compiler in it). Pascal is fairly nice to create a compiler for imperative languages.

Writing compilers in assembler is insane, since the refactoring for each new iteration (e.g. with better optimizations) would kill you. I know, since I tried :-)

The *nix LALR parser tools (Yacc and its GNU variant Bison and Lex) IMHO have as problem that they don't really simplify relative clean languages, also the quality of errormessages is often quite bad. This is one of the reasons why most commercial compilers (including GCC since either 4.0 or one of the later 3.x releases) are recursive descent.

I'm not a OOP purist, but OOP (read basic inheritance) does help when writing a compiler, so for that original Pascal disqualifies, and I wouldn't use it today. A modernised dialect like Delphi or Free Pascal are quite ok though. (Unsigned comment)


 * If you want to check out this link what you'll find is a 350K, 12,537 line listing of the source to a Pascal Compiler for the PDP-10. It's not OOP, and it is self-compiling.  Probably something for people who are very masochistic.   :)  Speaking of Masochism, Nicklaus Worth reported that (and if you'd ever written programs on a CDC Mainframe, as I have, you'd understand why he did) the original Pascal compiler was written in Fortran.  Paul Robinson (Rfc1394) 04:45, 26 March 2007 (UTC)

Self Hosted vs. not being self-hosted is a lot like making a choice between a motorcycle, a John Deer Tractor, or a pick up truck. If you need great gas mileage and want to be able to slip into spots with little space grab a motorcycle. If your wanting to plow your field, I wouldn't try to hook up a plow to a motorcycle! And neither a motorcycle or tractor's really good for carrying all your furniture to a new house, unless you have a trailor attached to it.... Not to mention, a tractor's not as good on dates with most women.... Likewise, if your developing an open-source compiler whose goal is to allow others to modify it, then by far self-hosted is the way to go! If on the other hand, your wanting to sell your compiler, and wish to gain an edge by having your own optimized routines, or routines that perform task that no other compilers can do, then it makes no sense to demand that the compiler be able to compile its own self. --Johng1970 (talk) 22:17, 9 June 2008 (UTC)

If I remember right, Wirth originally set out to write Pascal in Fortran, but the attempt failed. So instead he (and others) wrote the Pascal compiler in Pascal itself. --anonymous 16 Jan 2008 —Preceding unsigned comment added by 68.222.42.172 (talk) 23:53, 16 January 2009 (UTC)

Weird, weird, weird
Normal humans, as opposed to persons knowledgeable in computer science, would expect an article title compilation to be about what that word meant until last week. But those who have studied computer science (most of them, anyway) deny as a matter of religious dogma that the world existed before last week. Michael Hardy 20:45, 19 Jul 2004 (UTC)


 * ??? Stan 01:00, 20 Jul 2004 (UTC)


 * Um... there is no article named compilation. When I try to find one, I only hit a redirect to compiler.
 * What do you really want ? --DavidCary 06:00, 26 Jul 2004 (UTC)


 * Michael, if you're unhappy with the redirect and have better ideas for it, disambiguate and break the redirect yourself. I don't think there's much room for something other than a dictionary definition, if I understand you correctly... Dysprosia 07:24, 26 Jul 2004 (UTC)

For the record compilers have been around since at least the 1960s.


 * It would be nice to have a discussion of the origins of the words "compiler" and "to compile" as it applies to computers. Who used the term first? Before there were computers these words were in use, usually associated with "compile a list". The first computer programmers were very literate folks. While defining this new field they made very creative use of the English language. --- And by the way, for the record compilers have been around since at least the late 1950's.  --rchrd 01:50, 11 July 2006 (UTC)

Aren't compilers system software?

 * Until quite recently, I lived under the impression that assemblers, compilers, linkers, etc, belonged to the category of system software. But the wikipedia category of the same name seems to indicate otherwise (only OSs, drivers, and related SW are included). Could someone enlighten me as to this matter? I just haven't been used to think of compilers etc as applications (implying that apps are for -eh- "end users", as opposed to programmers/hackers). At the university where I got my CEng degree, we even had a course called System programming which specifically addressed compiler writing and connected topics (no wonder I'm confused...). --Wernher 23:36, 24 Jul 2004 (UTC)


 * (What the &^%$! was that last edit all about? I think some of the discussion materal got deleted) Anyway, yes, it's usual to put compilers in system software if you're only allowed to have app and system software categories, otherwise you'd finesse the question by putting it in programming tools and not asking too many questions. :-) Stan 03:32, 25 Jul 2004 (UTC)


 * On the top level Software is devided to only two categories: system and application. All other categories are subcategories of this two. Kenny 10:11, 2004 Aug 8 (UTC)


 * Yep, no problem. I did. BTW, I've made a major effort today/night in (sub)categorizing all the stuff in Category:Software (too much was lying around in the Software 'root directory'; in my opinion the subcategories should at least be used if we create them in the first place...). I hope my work led to a more logical and easily navigable sw information collection than before. --Wernher 03:30, 26 Jul 2004 (UTC)


 * I'd disagree strongly that compilers should be counted as system software. I would expect that category to contain the software that keeps the 'system' running, ie. mainly the OS functionality. A compiler, while creating code which may perform that function, is not necessary in any way to operate a 'system' (although it may make system calls during operation, etc). --[[User:VampWillow|VampWillow]] 21:44, 7 Aug 2004 (UTC)


 * Excuse me, IMHO, VampWillow is not accuanted enougth with CS. In CS compilers etc are under system software. Kenny 10:11, 2004 Aug 8 (UTC)


 * Compilers can be system software (typical case:Unix). However they don't need to be. E.g. something like Delphi doesn't have any direct connection to the OS builders. I'd rather put them under apps myself too. Kenny, could you produce a reference for your argumentation? That is more useful than yes/no fights.  (unsigned entry)


 * I think I can add some light (and less heat) to this discussion by adding the following. Originally, when large computers were sold, they came with a base set of system software to be able to use them, including an assembler, Fortran, Cobol, Basic (interpreted or compiled or both) and RPG III compilers, or you bought them as add-ons from the manufacturer of your computer and operating system.  At that time you could classify compilers as system software because they were either sold as a package as part of the operating system or were sold to sophisticated, technical users.
 * Compilers help convert high level programs developed in languages like C, C++, Basic to lower level programs expressed in assembly or machine language for a specific processor. As such a computer or microprocessor hardware is completely useless without a compiler that can convert significant amount of code to that computer's instruction set architecture. A modern day computer has one or more physical processors and several virtual processors or virtual machines. A Java Virtual Machine (JVM) is one example of a virtual processor. A JIT compiler can and does at run time, compiling the byte-code to underlying machines ISA. Often, compiler designers work closely with processor architects even before a processor is developed at the architecture and design stage. Ability to purchase some thing at a specific location has nothing to do with the category for that thing. A compiler often produces object code in a standard format like COFF or ELF which is then loaded by the operating system. A compiler often links in standard run-time libraries to executables. In languages like the C++, the working of something like exception handling is a complex puzzle pieces of which are done at compilation and others at run-time without the programmers knowledge. Just as an operating system increases the abstraction of the system services available to a user (programmer), a compiler increases the abstraction of the system processing architecture available to a programmer. In fact, a compiler is more fundamental system software than the operating system itself, because most of the times the operating system developers depend on the capabilities of the compiler. There are interesting ways in which compilers are bootstrapped for new processors that don't yet have an operating system. With that argument, I rest my case that compilers are in theory and practice the core part of system software. Time some one proficient in the ways of wikipedia takes the effort to relook at this categorization.


 * Today, compilers are sold over the counter in retail stores - or downloadable for free - by third parties who make them and have nothing to do with the computer manufacturer or operating system vendor and in some cases are available for multiple operating systems (Free Pascal, Free Basic and Java, among others), and thus would no longer qualify as "system software." Paul Robinson (Rfc1394) 19:29, 26 March 2007 (UTC)


 * In the past there have been plenty of examples of operating systems which could be implemented without using a compiler (e.g. any operating system written in assembly language); but a cross-compiler running on some other architecture can make operating system implementation much easier.  At the extreme you don't even need an assembler. On one occasion, just for the experience, I developed a very small operating system by writing it in assembler using absolute addresses for variables etc., hand-translating it to binary, writing out the 0's and 1's, spliting it in to 5-bit chunks, very carefully punching the equivalent characters on to 5-hole paper tape, and feeding the resulting tape into the built-in bootstrap loader; somewhat to my surprise it actually worked first time. Murray Langton (talk) 09:19, 16 June 2008 (UTC)

n-pass vs "n-parse"
Hmmm, how is it that the Dragon Book uses pass rather than parse in the relevant context? I would like to believe that Aho, Sethi, and Ullman knew what they were writing about. Strange, or what, VampWillow? --Wernher 21:15, 7 Aug 2004 (UTC)


 * Ah well, these 'Johnny come lately's do get things wrong occasionally, but as I was taight how to write compilers (and the rest) back in the mid-70s, and *the* book on the subject was the 1971 Compiler Construction for Digital Computers by David Gries which uses 'parse' then, I think, I rest my case!
 * It is, however, a very common mistake, as with newer compilers repeating their transit through the source code then the original usage of parsing the lexical strings to create the output streams/code/etc rather than being well-written and just going through the once, then the original spelling got lost somewhat (and some youngsters like to think they invented computer languages! ;-) --[[User:VampWillow|VampWillow]] 21:40, 7 Aug 2004 (UTC)


 * May be somewere it is misstake, but not here. Kenny 10:19, 2004 Aug 8 (UTC)


 * um, given that you are unable to spell 'mistake' and acknowledge in your userpage that I am not fluent in english, so please feel free to edit my comments in wikipedia I think it was rather pre-emptive of you to revert the correction made, especially whilst discussion is on-going. As such I have removed the revertion. --[[User:VampWillow|VampWillow]] 11:23, 8 Aug 2004 (UTC)

OK kids, let's hold off on the edit war and finish the discussion please. I think the use of "multi-parse" must be something that died relatively quickly, because I started learning about compilers in 1975 and don't remember hearing of that usage (thought I had a copy of Gries, but can't seem to find it). I suspect that "pass" came to be favored because a step that processes intermediate code into better intermediate is not really "parsing" anymore, plus preprocessing and assembly steps aren't really doing parsing either. Since our usual rule is to use the most common term, I think we should use "pass", but at the same time include a note that some sources use "parse" to mean the same thing (since there's a confusion about this, the WP article can be valuable in clearing it up). Stan 13:53, 8 Aug 2004 (UTC)


 * I have now entered a comprehensive note about the pass/parse usage, with which I hope we may reach a concensus regarding this question. I find the discussion very interesting, as I had in fact not thought of pass as emanating from parse before (quite plausible using English pronounciation). Whether this is the case for all use of pass in compilers, or whether pass has also been used independently from the beginning, I think is an open question. Also allow me to admit that I should perhaps have added txt tags around my starting comment of this thread. I did not intent to be 'evil'/sarcastic as such. --Wernher 00:37, 9 Aug 2004 (UTC)


 * "parse" is the source of the deriviation "pass" as to parse something is to work through it, and the first parse would grab syntatic elements, second parse build tables, etc. as each parse would re-inspect the data. When you try to think carefully about it you can't really make the word "pass" do the same function in an active sense. I suspect there could even be a language (en-us/en-uk) issue in here too, maybe? but. 'Object code generation' btw is done during a parse of the stored data (and again 'pass' doesn't really make sense!). PS. Nothing wrong with being evil occasionally! --[[User:VampWillow|VampWillow]] 10:24, 9 Aug 2004 (UTC)


 * "Parse" and "pass" have nothing to do with each etymologically, so saith OED. I got an opportunity to look at Gries (via a colleague at Apple even older than I am :-) ) last week and not only does Gries also use "pass" in the "multiple passes" sense, but there is no reference to "multiple parses" that I could find. So I think we need to get a book and page number that justifies this alternate sense of "parse" - don't want this article to become the source of a lemming-suicide-type legend! Stan 21:14, 15 Aug 2004 (UTC)


 * Coming late to this discussion, but I also find the footnote to be fairly tangential and a bit unlikely. I've never heard of parse and pass being conflicted; they seem very distinct to me. To make things more confusing, the Dragon Book (Aho, Sethi, and Ullman) does not refer to either multi-pass or multi-parse. It defines the former, but then discards it as being too narrow for their purposes:
 * "Several phases of compilation are usually implemented in a single pass consisting of reading an input file and writing an output file. In practice, there is great variation in the way the phases of a compiler are grouped into passes, so we prefer to organize our discussion of compiling around phases rather than passes." -- Compilers: Principles, Techniques, and Tools, 1988 edition, page 20.
 * Here, a pass is primarily a means of counting the number of times data is read and written -- it is mainly a way to count the expensive file I/O loops. The dragon book uses "phases" where this wikipedia article uses "passes". Given the importance of this book in the general CS populace, it might even make sense to add a footnote about "phases", to replace the one on "parses"! But anyway, unless someone has references to a book that uses "multi-parse", I'm going to say that the footnote is unnecessary. And since Stan asked for a reference years ago, and it didn't materialize, it seems safe to delete this footnote now. So I did! If that's inappropriate, you can revert it and yell at me. Captain Wingo 11:37, 5 December 2006 (UTC) [Edit - changed "1998 edition" to "1988 edition" :-)

There is a difference; I've never heard the word "parse" used in the context of multiple repetition of source-code processing by a compiler; the term has always been "pass", e.g. a 1-pass compiler, a 2-pass compiler, etc. Now, on each pass the compiler may - and probably does - parse the source, but I've never heard of a 1-parse or 2-parse reference. I mean, I heard the first Fortran compiler had to squeeze into such a small area that it was something like over 50 passes! Paul Robinson (Rfc1394) 19:35, 26 March 2007 (UTC)

threaded and incremental: the same ?
The article currently mentions


 * threaded code compiler (or interpreter)
 * incremental compiler

but I'm still a bit fuzzy on the difference between them. I think I understand threaded code, but incremental compiler is simply a redirect to compiler, which has no more information.

What's the difference ?

--DavidCary 01:57, 6 Nov 2004 (UTC)

A threaded compiler can compile parts in paralel, or parse and do codegeneration/optimization (for previously compiled parts) in paralel. The former is more likely.

An incremental compiler only (re)compiles the needed parts of a source. One could discuss about the granularity of what is the minimal part that must be recompiled (procedure or whole compilation unit).

These two things are IMHO not related