compiler-construction - your - writing a compiler in assembly




Learning to write a compiler (20)

Preferred languages : C/C++, Java, and Ruby.

I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby, so I prefer resources that involve one of those three, but any good resource is acceptable.


"... Let's Build a Compiler ..."

I'd second http://compilers.iecc.com/crenshaw/ by @sasb . Forget buying more books for the moment.

Why? Tools & language.

The language required is Pascal and if I remember correctly is based on Turbo-Pascal. It just so happens if you go to http://www.freepascal.org/ and download the Pascal compiler all the examples work straight from the page ~ http://www.freepascal.org/download.var The beaut thing about Free Pascal is you can use it almost whatever processor or OS you can care for.

Once you have mastered the lessons then try the more advanced " Dragon Book " ~ http://en.wikipedia.org/wiki/Dragon_book


Big List of Resources:

Legend:

  • ¶ Link to a PDF file
  • $ Link to a printed book

An easy way to create a compiler is to use bison and flex (or similar), build a tree (AST) and generate code in C. With generating C code being the most important step. By generating C code, your language will automatically work on all platforms that have a C compiler.

Generating C code is as easy as generating HTML (just use print, or equivalent), which in turn is much easier than writing a C parser or HTML parser.


From the comp.compilers FAQ :

"Programming a Personal Computer" by Per Brinch Hansen Prentice-Hall 1982 ISBN 0-13-730283-5

This unfortunately-titled book explains the design and creation of a single-user programming environment for micros, using a Pascal-like language called Edison. The author presents all source code and explanations for the step-by-step implementation of an Edison compiler and simple supporting operating system, all written in Edison itself (except for a small supporting kernel written in a symbolic assembler for PDP 11/23; the complete source can also be ordered for the IBM PC).

The most interesting things about this book are: 1) its ability to demonstrate how to create a complete, self-contained, self-maintaining, useful compiler and operating system, and 2) the interesting discussion of language design and specification problems and trade-offs in Chapter 2.

"Brinch Hansen on Pascal Compilers" by Per Brinch Hansen Prentice-Hall 1985 ISBN 0-13-083098-4

Another light-on-theory heavy-on-pragmatics here's-how-to-code-it book. The author presents the design, implementation, and complete source code for a compiler and p-code interpreter for Pascal- (Pascal "minus"), a Pascal subset with boolean and integer types (but no characters, reals, subranged or enumerated types), constant and variable definitions and array and record types (but no packed, variant, set, pointer, nameless, renamed, or file types), expressions, assignment statements, nested procedure definitions with value and variable parameters, if statements, while statements, and begin-end blocks (but no function definitions, procedural parameters, goto statements and labels, case statements, repeat statements, for statements, and with statements).

The compiler and interpreter are written in Pascal* (Pascal "star"), a Pascal subset extended with some Edison-style features for creating software development systems. A Pascal* compiler for the IBM PC is sold by the author, but it's easy to port the book's Pascal- compiler to any convenient Pascal platform.

This book makes the design and implementation of a compiler look easy. I particularly like the way the author is concerned with quality, reliability, and testing. The compiler and interpreter can easily be used as the basis for a more involved language or compiler project, especially if you're pressed to quickly get something up and running.


I am looking into the same concept, and found this promising article by Joel Pobar,

Create a Language Compiler for the .NET Framework - not sure where this has gone

Create a Language Compiler for the .NET Framework - pdf copy of the original doc

he discusses a high level concept of a compiler and proceeds to invent his own langauge for the .Net framework. Although its aimed at the .Net Framework, many of the concepts should be able to be reproduced. The Article covers:

  1. Langauge definition
  2. Scanner
  3. Parser (the bit im mainly interested in)
  4. Targeting the .Net Framework The
  5. Code Generator

there are other topics, but you get the just.

Its aimed to people starting out, written in C# (not quite Java)

HTH

bones


I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.

If you want a book that is lighter on theory, Game Scripting Mastery might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.

It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.


I liked the Crenshaw tutorial too, because it makes it absolutely clear that a compiler is just another program that reads some input and writes some out put.

Read it.

Work it if you want, but then look at another reference on how bigger and more complete compilers are really written.

And read On Trusting Trust , to get a clue about the unobvious things that can be done in this domain.


I remember asking this question about seven years ago when I was rather new to programming.

I was very careful when I asked and surprisingly I didn't get as much criticism as you are getting here. They did however point me in the direction of the " Dragon Book " which is in my opinion, a really great book that explains everything you need to know to write a compiler (you will of course have to master a language or two. The more languages you know, the merrier.).

And yes, many people say reading that book is crazy and you won't learn anything from it, but I disagree completely with that.

Many people also say that writing compilers is stupid and pointless. Well, there are a number of reasons why compiler development are useful:

  • Because it's fun.
  • It's educational, when learning how to write compilers you will learn a lot about computer science and other techniques that are useful when writing other applications.
  • If nobody wrote compilers the existing languages wouldn't get any better.

I didn't write my own compiler right away, but after asking I knew where to start. And now, after learning many different languages and reading the Dragon Book, writing isn't that much of a problem. (I'm also studying computer engineering atm, but most of what I know about programming is self taught.)

In conclusion, The Dragon Book is a great "tutorial". But spend some time mastering a language or two before attempting to write a compiler. Don't expect to be a compiler guru within the next decade or so though.

The book is also good if you want to learn how to write parsers/interpreters.


If you are interested in writing a compiler for a functional language (rather than a procedural one) Simon Peyton-Jones and David Lester's " Implementing functional languages: a tutorial " is an excellent guide.

The conceptual basics of how functional evaluation works is guided by examples in a simple but powerful functional language called "Core". Additionally, each part of the Core language compiler is explained with code examples in Miranda (a pure functional language very similar to Haskell).

Several different types of compilers are described but even if you only follow the so-called template compiler for Core you will have an excellent understanding of what makes functional programming tick.


If you have little time, I recommend ethoberon.ethz.ch/WirthPubl/CBEAll.pdf , a tiny little booklet that you can read in a day, but it explains the basics (including how to implement lexers, recursive descent parsers, and your own stack-based virtual machines). After that, if you want a deep dive, there's no way around the Dragon book as other commenters suggest.


If you're willing to use LLVM, check this out: http://llvm.org/docs/tutorial/ . It teaches you how to write a compiler from scratch using LLVM's framework, and doesn't assume you have any knowledge about the subject.

The tutorial suggest you write your own parser and lexer etc, but I advise you to look into bison and flex once you get the idea. They make life so much easier.


Not a book, but a technical paper and an enormously fun learning experience if you want to know more about compilers (and metacompilers)... This website walks you through building a completely self-contained compiler system that can compile itself and other languages:

Tutorial: Metacompilers Part 1

This is all based on an amazing little 10-page technical paper:

Val Schorre META II: A Syntax-Oriented Compiler Writing Language

from honest-to-god 1964. I learned how to build compilers from this back in 1970. There's a mind-blowing moment when you finally grok how the compiler can regenerate itself....

I know the website author from my college days, but I have nothing to do with the website.


One book not yet suggested but very important is "Linkers and Loaders" by John Levine. If you're not using an external assembler, you'll need a way to output a object file that can be linked into your final program. Even if you're using an external assembler, you'll probably need to understand relocations and how the whole program loading process works to make a working tool. This book collects a lot of the random lore around this process for various systems, including Win32 and Linux.


Python comes bundled with a python compiler written in Python. You can see the source code, and it includes all phases, from parsing, abstract syntax tree, emitting code, etc. Hack it.



The LCC compiler ( wikipedia ) ( project homepage ) of Fraser and Hanson is described in their book "A Retargetable C Compiler: Design and Implementation". It is quite readable and explains the whole compiler, down to code generation.


This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.

Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.

Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.

Some books on this topic that I found particularly helpful was Compilers Principles and Techniques (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called " lex and yacc " picked up where the Dragon Book left off for this part.


You can use BCEL by the Apache Software Foundation. With this tool you can generate assembler-like code, but it's Java with the BCEL API. You can learn how you can generate intermediate language code (in this case byte code).

Simple example

  1. Create a Java class with this function:

    public String maxAsString(int a, int b) {
        if (a > b) {
            return Integer.valueOf(a).toString();
        } else if (a < b) {
            return Integer.valueOf(b).toString();
        } else {
            return "equals";
        }
    }
    

Now run BCELifier with this class

BCELifier bcelifier = new BCELifier("MyClass", System.out);
bcelifier.start();

You can see the result on the console for the whole class (how to build byte code MyClass.java). The code for the function is this:

private void createMethod_1() {
  InstructionList il = new InstructionList();
  MethodGen method = new MethodGen(ACC_PUBLIC, Type.STRING, new Type[] { Type.INT, Type.INT }, new String[] { "arg0", "arg1" }, "maxAsString", "MyClass", il, _cp);

  il.append(InstructionFactory.createLoad(Type.INT, 1)); // Load first parameter to address 1
  il.append(InstructionFactory.createLoad(Type.INT, 2)); // Load second parameter to adress 2
    BranchInstruction if_icmple_2 = InstructionFactory.createBranchInstruction(Constants.IF_ICMPLE, null); // Do if condition (compare a > b)
  il.append(if_icmple_2);
  il.append(InstructionFactory.createLoad(Type.INT, 1)); // Load value from address 1 into the stack
  il.append(_factory.createInvoke("java.lang.Integer", "valueOf", new ObjectType("java.lang.Integer"), new Type[] { Type.INT }, Constants.INVOKESTATIC));
  il.append(_factory.createInvoke("java.lang.Integer", "toString", Type.STRING, Type.NO_ARGS, Constants.INVOKEVIRTUAL));
  il.append(InstructionFactory.createReturn(Type.OBJECT));
  InstructionHandle ih_13 = il.append(InstructionFactory.createLoad(Type.INT, 1));
  il.append(InstructionFactory.createLoad(Type.INT, 2));
    BranchInstruction if_icmpge_15 = InstructionFactory.createBranchInstruction(Constants.IF_ICMPGE, null); // Do if condition (compare a < b)
  il.append(if_icmpge_15);
  il.append(InstructionFactory.createLoad(Type.INT, 2));
  il.append(_factory.createInvoke("java.lang.Integer", "valueOf", new ObjectType("java.lang.Integer"), new Type[] { Type.INT }, Constants.INVOKESTATIC));
  il.append(_factory.createInvoke("java.lang.Integer", "toString", Type.STRING, Type.NO_ARGS, Constants.INVOKEVIRTUAL));
  il.append(InstructionFactory.createReturn(Type.OBJECT));
  InstructionHandle ih_26 = il.append(new PUSH(_cp, "equals")); // Return "equals" string
  il.append(InstructionFactory.createReturn(Type.OBJECT));
  if_icmple_2.setTarget(ih_13);
  if_icmpge_15.setTarget(ih_26);
  method.setMaxStack();
  method.setMaxLocals();
  _cg.addMethod(method.getMethod());
  il.dispose();
}

You should check out Darius Bacon's " ichbins ", which is a compiler for a small Lisp dialect, targeting C, in just over 6 pages of code. The advantage it has over most toy compilers is that the language is complete enough that the compiler is written in it. (The tarball also includes an interpreter to bootstrap the thing.)

There's more stuff about what I found useful in learning to write a compiler on my Ur-Scheme web page.


"Let's Build a Compiler" is awesome, but it's a bit outdated. (I'm not saying it makes it even a little bit less valid.)

Or check out SLANG . This is similar to "Let's Build a Compiler" but is a much better resource especially for beginners. This comes with a pdf tutorial which takes a 7 step approach at teaching you a compiler. Adding the quora link as it have the links to all the various ports of SLANG, in C++, Java and JS, also interpreters in python and java, originally written using C# and the .NET platform.





language-agnostic