CPTR 415 Compiler Construction
The Decaf Programming Language


  1. Introduction
  2. Decaf Overview
  3. Notation
  4. Decaf Syntax
  5. Lexical Conventions
  6. Types
  7. Classes
  8. Namespaces
  9. Statements
  10. Expressions
  11. Arrays
  12. Miscellaneous

  1. Introduction
  2. During this course you will implement a compiler for the Decaf programming language. Decaf is an adaptation of JavaTM. (The description of Decaf presented here is actually an adaptation of the original Decaf language specification devised by Brad Vander Zanden.) Decaf presents a number of learning opportunities:

    1. Decaf is an imperative language. The experience you gain in developing a Decaf compiler will extend to other imperative languages such as C, C++, Pascal, and FORTRAN.

    2. Decaf is object-oriented. You will learn some of rudiments of compiler technology for object-oriented languages.

    3. Usefulness. If you are unfamiliar with the Java programming language, programming in Decaf provides an opportunity to get the feel for programming in a Java-like language.

    Student compilers will be written in the C++ programming language and will take advantage of the Lex (Flex) and Yacc (Bison) compiler tools. Your compiler must be able to be built with GNU C++ (G++) and run on the cpp.cs.southern.edu programming host (running Linux). You are free to develop on another platform, but you should check to make sure it works on cpp before submitting it.

    Your compiler will generate MIPS assembly code that will run on the SPIM simulator.

    If you have your own PC, all the tools necessary for writing the Decaf compiler are available for free (Linux, GNU C++, SPIM).

  3. Decaf Overview
  4. Decaf is a strongly typed language that supports the following features:

    1. Primitive data types: int (32-bit signed integer) and void.
    2. Structured types: Classes and one-dimensional arrays.
    3. Assignment.
    4. Conditionals: if/else.
    5. Iteration: while.
    6. Methods and method calls.
    7. Arithmetic: +, -, *, /, %.
    8. Relational operators: <, >, <=, >=, ==, !=.
    9. Boolean operators: &&, ||, !.
    10. Integer input/output: print, read.

    In order to make the compiler construction process tractable for a semester course, many important features of Java are omitted from Decaf. These include inheritance, exceptions, interfaces, static members, constructors, multidimensional arrays, threads, other conditional and iteration statements (like switch and for), access modifiers (public, private, protected), garbage collection, and package (library) support.

    Here is a sample Decaf program that computes the greatest common divisor of two numbers entered by the user (the gcd method implements Euclid's algorithm):

    	    class Example {
                    void main() {
                        int x; int y;
                        x = read();
                        y = read();
                        print(gcd(x, y));
                    }
                    int gcd(int a, int b) {
                        if ( b == 0 )
                            return a;
                        else
                            return gcd(b, a % b);
                    }
                }
             

    The following Decaf program creates a linked list:

                
    	    class ListNode {
                    int data;
                    ListNode next;
                    void setData(int x) {
                        data = x;
                    }
                }
                class List {
                    ListNode head;
                    void show(ListNode list) {
                        ListNode cursor;
                        int count;
                        count = 0;
                        cursor = list;
                        while ( cursor != null ) {
                            count = count + 1;
                            print(count, cursor.data);
                            cursor = cursor.next;
                        }
                    }
                    void main() {
                        ListNode second;
                        ListNode third;
                        head = new ListNode();
                        head.setData(4);  //  Could use "head.data = 4;"
                        print(head.data);
                        second = new ListNode();
                        second.setData(5);
                        print(second.data);
                        third = new ListNode();
                        third.setData(11);
                        print(third.data);
                        head.next = second;
                        second.next = third;
                        third.next = null;
                        show(head);
                    }
                }
             

  5. Notation
  6. The following notational conventions are used throughout this document:

    1. Nonterminal symbols are denoted by all uppercase italics; for example, CLASS.
    2. Terminal symbols, operator symbols, and keywords are denoted by boldface; for example, identifier.
    3. < ... >* means zero or more occurrences of the item within the angle braces; for example, a* means zero or more occurrences of a.
    4. < ... >+ means one or more occurrences of the item within the angle braces; for example, a+ means one or more occurrences of a.
    5. The empty string is denoted by epsilon.

  7. Decaf Syntax
    1. Decaf Grammar
    2. 1.

      PROGRAM

      -->

      CLASS_DECLARATION+

      2.

      CLASS_DECLARATION

      -->

      class identifier CLASS_BODY

      3.

      CLASS_BODY

      -->

      { VAR_DECLARATION* METHOD_DECLARATION* }

      4.

      VAR_DECLARATION

      -->

      TYPE identifier ;

      5.

      TYPE

      -->

      SIMPLE_TYPE

      |

      SIMPLE_TYPE[]

      6.

      SIMPLE_TYPE

      -->

      int

      |

      identifier

      7.

      METHOD_DECLARATION

      -->

      RESULT_TYPE identifier ( PARAMETER_LIST ) METHOD_BODY

      8.

      RESULT_TYPE

      -->

      TYPE

      |

      void

      9.

      PARAMETER_LIST

      -->

      epsilon

      |

      PARAMETER < , PARAMETER >*

      10.

      PARAMETER

      -->

      TYPE identifier

      11.

      METHOD_BODY

      -->

      { LOCAL_VAR_DECLARATION* SIMPLE_STATEMENT* }

      12.

      LOCAL_VAR_DECLARATION

      -->

      TYPE identifier ;

      13.

      STATEMENT

      -->

      SIMPLE_STATEMENT

      |

      { SIMPLE_STATEMENT+ }

      14.

      SIMPLE_STATEMENT

      -->

      ;

      |

      NAME = EXPRESSION ;

      |

      NAME ( ARG_LIST ) ;

      |

      print ( ARG_LIST ) ;

      |

      CONDITIONAL_STATEMENT

      |

      while ( EXPRESSION ) STATEMENT

      |

      return OPTIONAL_EXPRESSION ;

      15.

      NAME

      -->

      VARIABLE_ACCESS


      |

      VARIABLE_ACCESS.FIELD_ACCESS

      16.

      VARIABLE_ACCESS

      -->

      this

      |

      identifier

      |

      identifier [EXPRESSION]

      17.

      FIELD_ACCESS

      -->

      identifier



      |

      identifier [EXPRESSION]

      18.

      ARG_LIST

      -->

      epsilon

      |

      EXPRESSION < , EXPRESSION >*

      19.

      CONDITIONAL_STATEMENT

      -->

      if ( EXPRESSION ) STATEMENT

      |

      if ( EXPRESSION ) STATEMENT else STATEMENT

      20.

      OPTIONAL_EXPRESSION

      -->

      epsilon

      |

      EXPRESSION

      21.

      EXPRESSION

      -->

      NAME

      |

      number

      |

      null

      |

      NAME ( ARG_LIST )

      |

      read ()

      |

      NEW_EXPRESSION

      |

      UNARY_OP EXPRESSION

      |

      EXPRESSION RELATION_OP EXPRESSION

      |

      EXPRESSION SUM_OP EXPRESSION

      |

      EXPRESSION PRODUCT_OP EXPRESSION

      |

      ( EXPRESSION )

      22.

      NEW_EXPRESSION

      -->

      new identifier ()

      |

      new int [ EXPRESSION ]

      |

      new identifier [ EXPRESSION ]

      23.

      UNARY_OP

      -->

      +

      |

      -

      |

      !

      24.

      RELATION_OP

      -->

      ==

      |

      !=

      |

      <=

      |

      >=

      |

      <

      |

      >

      25.

      SUM_OP

      -->

      +

      |

      -

      |

      ||

      26.

      PRODUCT_OP

      -->

      *



      |

      /

      |

      %

      |

      &&

    3. Operator Precedence
    4. Operator precedence is as follows (from highest to lowest):

      1. UNARY_OP
      2. PRODUCT_OP
      3. SUM_OP
      4. RELATION_OP

      Within each group of operators each operator has the same precedence; for example, +, -, and || have the same precedence.

    5. Operator Associativity
    6. All operators associate left-to-right; for example, a + b + c is interpreted as (a + b) + c.

  8. Lexical Conventions
  9. The Decaf language observes the following lexical conventions:

    1. Comments begin with the characters //. All text after the // until the end of the line is ignored by the compiler.

    2. Identifers are unlimited sequences of letters, numbers, and the underscore _ character. They must begin with a letter. (Note: Unlike Decaf, Java permits an idenitifier to begin with an underscore.) Formally:

      letter

      -->

      [a-z] | [A-Z]

      digit

      -->

      [0-9]

      alpha

      -->

      letter | digit | _

      identifier

      -->

      letter alpha*

    3. Numbers are unsigned quantities of one or more digits. A negative number can be constructed using the unary - operator.

      number

      -->

      digit*

    4. The following set of words are reserved keywords in the Decaf language. It is a compile time error to declare an identifier to have the name of a keyword.

                              
      		        class     else      if        int
                              new       null      print     read      
                              return    this      void      while
                      

    5. The following symbols are used in Decaf as operators, grouping symbols, comment specifiers, and delimiters:

      		        [         ]         {         }
                              !=        ==        <         >
                              <=        >=        &&        ||
                              !         +         -         *
                              /         %         ;         ,
                              (         )         =         //
                              .
                       

    6. White space consists of blanks, tabs, and end-of-line characters. White space may be used to mark the end of a symbol, keyword, identifier, or other token. However, with the exception delimiting the beginning and ending of a keyword, white space is not required. For example, a&&b is a legitimate lexical sequence that should be broken into the tokens identifier, conditional and, and identifier. White space is not passed to the parser.

  10. Types
    1. Primitive Types
    2. Decaf has two primitive types:

      1. int: a 32-bit signed integer, and

      2. void: a degenerate type used to indicate that a method returns no value.

    3. Structured Types
    4. Decaf has two structured types:

      1. Classes. A class consists of a set of instance variables and a set of methods that operate on these instance variables. A program creates instances of classes called objects. A program accomplishes its designated task by calling methods associated with variouus objects.

      2. Arrays. Decaf arrays are one-dimensional homogeneous arrays that may consist of integers or instances of a programmer-defined class. Mulidimensional arrays are not supported in Decaf.

    5. References
    6. All array and class variables (objects) in Decaf are reference variables. When a variable is declared, space is allocated for a reference, but no space is allocated for an object. The programmer must explicitly create an object with the new operator and then assign the reference that new returns to the appropriate variable. Java works exactly the same way. When an object is no longer referenced by any variable, it becomes garbage; Java would properly collect the garbage, but Decaf simply ignores the object. This failure to reclaim storage should pose no problem for most of the Decaf programs you will write.

      The following code performs the actions indicated by the comments:


                      class Foo {  //  Declare a class named Foo
                          int key;
                      }
                      
                      Foo a;   //  Declare a couple of variables that may
                      Foo b;   //  reference objects of type Foo
      
                      a = new Foo();  //  Create a Foo object and assign the
                                      //  reference to variable a
                      b = a;          //  b references the same Foo object as a
      
                      a.key = 10;
                      print(b.key);   //  10 gets printed
      
                      a = new Foo();  //  a and b will now reference different
                      b = new Foo();  //  objects
                  

      Parentheses appear after the variable name when a class object is created with new. Java requires this since the use of new actually calls a constructor, and arguments can be supplied if needed. Decaf does not implement constructors, but the parentheses remain so Decaf code looks more like Java code.

  11. Classes
  12. A class declaration creates a new class type and specifies its implementation. The identifier immediately after the class keyword defines the name of the class. The class body then defines the implementation by possibly declaring a set of instance variables and a set of methods.

    2.

    CLASS_DECLARATION

    -->

    class identifier CLASS_BODY

    3.

    CLASS_BODY

    -->

    { VAR_DECLARATION* METHOD_DECLARATION* }

    The body of a class is delimited by curly braces. There is no semicolon after the class body. Java allows variable and method declarations to be intermixed, but for simplicity Decaf requires that all instance variable declarations precede all method declarations.

    Variable Declarations. Java supports multiple variable declarations (variables separated by commas) in a single declaration and variable initialization. Decaf allows only one variable to be declared at a time and omits variable initializers. Decaf also omits class variables (refered to as static in Java).

    Integer instance variables should be automatically initialized to zero (0), and reference variables should be initialized to null.

    Types. A valid type for a variable is either the primitive type int, a previously defined class, or an array type. The current class is considered to be a previously defined class, so the type of an instance variable may be the class being defined. For example, the following definition is valid:


                class TreeNode {
                    int key;
                    TreeNode left_child;
                    TreeNode right_child;
                }
             

    Unlike C/C++, Java allows the square brackets of an array declaration to be placed after the type, instead of after the variable (Java also allows the C/C++ style of brackets after the variable). Decaf requires the brackets to be placed after the type. Also unlike C/C++, Java does not specify a size at the point of declaration; the size is provided when the new operator is used. Consequently, an array variable may reference different length arrays over its lifetime. Example array declarations include:


                int[] a;            //  Declare a
                Foo[] f;            //  Declare f
                a = new int[10];    //  Allocate array object a
                f = new Foo[1000];  //  Allocate array object f
             

    Identifier names. It is not an error to assign a variable the same name as a previously or subsequently defined class name (although it is poor programming style).

    Method Declarations. A method is similar to a function in conventional imperative languages. A method is a block of code that can be invoked with zero or more values as arguments. Unlike a traditional function, a method belongs to a particular class, and its definition must appear within the body of a class definition.

    Java supports special methods called constructors that automatically initialize an object when new is invoked. For simplicity, Decaf does not have constructors.

    Java supports method overloading; that is, multiple methods with the same name but different parameter lists. For simplicity, Decaf does not support method overloading, so it is a compile time error if two methods in a class have the same name.

    In Decaf, unlike Java, methods and instance variables must have unique names; that is, in Decaf it is illegal to have an instance variable and a method with the same name within the same class.

    Return Types. If a method does not return a value, its return type should be declared void. If the return type is an array type, Decaf requires the empty brackets be placed after the type name. For example:


                    int[] foo(int a, int b) { . . . }  //  Correct
    
                    int foo[](int a, int b) { . . . }  //  Incorrect
    
                    int foo(int a, int b)[] { . . . }  //  Incorrect
             

    Parameters. Parameters are considered local variables of the method (that is, they are in the method's namespace). They may not be overridden by other local variables, and it is a compile time error to attempt to do so.

    Parameters are passed by value. Thus, integer arguments are copied to their corresponding parameters and that reference arguments are copied to their corresponding parameters.

    this. The keyword this may be used within a method to reference the current object. this is useful for referencing shadowed variables or methods (that is, variables or methods in the current object that are masked by identically named local variables or parameters in the method) or for passing the current object as an argument to another object's method. For example:


                    class Example {
                        int key;
                        int foo;
                        void setKey(int key, AnotherClass obj) {
                            this.key = key;   //  key is a local variable
                                              //  (actually a parameter) that
                                              //  masks an instance variable
                                              //  so this must be used
                            foo = key;        //  foo == this.foo but key
                                              //  is the parameter key
                            obj.notify(this); //  Pass the current object to
                                              //  the method of another
                                              //  object
                        }
                    }
             

    As illustrated above, the expression "this." is appended implicitly to any non-shadowed instance variable or method of the current object. If a local variable has the same name as an instance variable or method, then the programmer must explicitly preface the instance variable or method name with "this." in order to access the shadowed instance variable or method.

    Body of Methods. The body of a method consists of a sequence of local variable declarations followed by statements:

    11.

    METHOD_BODY

    -->

    { LOCAL_VAR_DECLARATION* SIMPLE_STATEMENT+ }

    Unlike Java, Decaf requires that all local variable declarations precede any statement. Java allows declarations and statements to be intermixed. The syntax for local declarations is identical to that for instance variables: one variable per declararion and no initialization.

    Unlike Java, Decaf does automatically assign zero (0) to local integers does automatically assign null to local reference variables. This permits a run time check for uninitialized references. (Java checks for uninitialized variables at compile time.)

  13. Namespaces
  14. There are only two namespaces in Decaf---the method namespace and the class namespace. When a variable is declared in a method body it is placed in the method's namespace.

    A local variable's declaration is in force throughout the body of the method in which it is declared. If an identifier previously declared as an instance variable or method in the enclosing class, then the other declaration is hidden for the remainder of the method body. this must be used to access these hidden instance variables and methods.

    Type scope. Unlike Java, an identifier with the same name as a previously defined class does not hide the class name. Hence, the previously defined class name can still be used as a type name. In other words, a type name always remains in force.

    Variable Name Resolution Summary. The compiler should resolve a variable reference by looking first in the method namespace. If the variable cannot be found in the method namespace, the compiler should search the enclosing class's namespace. If the variable is found in neither the method namespace nor the enclosing class's namespace, then the variable is considered undeclared and a compile time error should be generated.

  15. Statements
  16. Decaf statements do not have values; they are evaluated only for their effect. In Decaf (unlike Java) this restriction extends to assignment. For example, in Java one can write expressions such as a = b = c or (a = b)[3]. Such statements are illegal in Decaf since an assignment statement has no value.

    Statements may be one of the following:

    14.

    SIMPLE_STATEMENT

    -->

    ;

    |

    NAME = EXPRESSION ;

    |

    NAME ( ARG_LIST ) ;

    |

    print ( ARG_LIST ) ;

    |

    CONDITIONAL_STATEMENT

    |

    while ( EXPRESSION ) STATEMENT

    |

    return OPTIONAL_EXPRESSION ;

    Empty Statement. The semicolon by itself represents an empty statement. It does nothing.

    Assignment. An assignment statement evaluates an expression and assigns the result to a variable:

    NAME = EXPRESSION ;

    A NAME is either the reserved keyword this, an identifer, a field access via the dot notation, or an array access:

    15.

    NAME

    -->

    VARIABLE_ACCESS

    |

    VARIABLE_ACCESS.FIELD_ACCESS

    16.

    VARIABLE_ACCESS

    -->

    this

    |

    identifier

    |

    identifier [EXPRESSION]

    17.

    FIELD_ACCESS

    -->

    identifier

    |

    identifier [EXPRESSION]

    Note that array accesses may occur in either half of a dotted name. For example, both a[3].b and c.d[2] are legal names.

    Method Calls. A method call consists of the name of the method name and a list of zero or more comma-separated arguments. The name may be a dotted name; for example, obj.f(3, x) A Decaf program may ignore the return value of a method which is not declared void. For example, if foo returns an int, then both of the following statements are valid:


            
    	       stack[3].foo(5, 10);
                   x = stack[3].foo(5, 10);
    	

    Output. The print operator writes zero or more integers to the screen. Each displayed value is followed by a blank space. The arguments to print must all be integer-valued expressions. A print operator with no arguments prints a new line; thus, a print operator with no arguments may be used to obtain a blank line in the output.

    For example, the code fragment

            
           print(34, 4, 2, 15);
           print();
           print(34, 4, 2, 15);
           print(15);
           print(16);
           print();
           print(-4);
           print();
           print();
           print(-4);
    	

    would display

            
           34 4 2 15
           34 4 2 15 15 16
           -4
           
           -4
    	

    There is no way to display a string of characters.

    Decaf's technique of handling input and output is different from Java's. Decaf builds the input and output commands into the language; Java uses library commands for I/O. Java's libraries are more flexible since developers can write new I/O routines to match the needs of their applications or handle new I/O devices. Since Decaf was devised to teach the concepts of compiler construction and is not meant to be a production language, the limitations of Decaf's I/O is acceptable.

    Conditional Statements. For simplicity, Decaf does not support the switch construct of Java. It does support the if/else statement:

    19.

    CONDITIONAL_STATEMENT

    -->

    if ( EXPRESSION ) STATEMENT

    |

    if ( EXPRESSION ) STATEMENT else STATEMENT

    In Decaf, the conditonal expression must evaluate to an int. As in C/C++, any integer except 0 is considered to be true, and 0 is considered false. (Java requires that conditional expression be type boolean, but Decaf does not support a boolean type.)

    The dangling else problem is resolved by matching an else with the closest previous unmatched if in the same method body. (Java resolves the dangling else the same way.)

    Iteration. Java supports multiple loop forms, including while, do, and for. For simplicity, Decaf supports only the while construct:

    while ( EXPRESSION ) STATEMENT

    The expression in the while statement must evaluate to an integer. The statement is executed until the expression evaluates to 0 (that is, false).

    Java supports break and continue that alter the normal program flow in a loop. These constructs are convenient but are not absolutely necessary, and thus they have been omitted from Decaf to make it more managable.

    Return Statement. Values are returned from methods via the return statement.

    return OPTIONAL_EXPRESSION ;

    The type of the return expression must be identical to the return type of the method in which it is used. It is a compile time error if the types do not agree or for the return expression to have an argument in a void method.

    Compound Statements. A compound statement can be used to group a sequence of statements to be executed as the body of a while of if statement. A compound statement is a sequence of statements enclosed in curly braces. Unlike Java, Decaf does not support "inner blocks." In Java, a pair of curly braces within a method body defines an inner block in which variables can be declared which are local to this inner block. Since Decaf supports only two namespaces, method and class, inner blocks for finer control of variable scoping are not supported.

  17. Expressions
  18. Decaf expressions are simpler and therefore less powerful than Java expressions. For example, Java allows object references to be determined by complex expressions, but Decaf requires that object references be determined by variable names. The following expressions are legal in Java but illegal in Decaf:


            c = (a = b)[3];      //  c is assigned the contents of b[3] and 
                                 //  a is assigned a reference to the array 
                                 //  pointed to by b
            c = foo.goo(a,b).x;  //  the object containing field x is determined
                                 //  at runtime by executing the method goo 
                                 //  in the object referenced by object foo
            

    One would use multiple Decaf statements to achieve the effects of each of these single Java statements.

    Decaf expressions involve the following:

  19. Arrays
  20. Decaf arrays are one-dimensional arrays. As in Java, arrays are first-class objects with a single publicly accessible field called length. If the array's length is n, then its components range from 0 to n - 1 inclusive. The array subscripts must be integers (i.e., int types). A runtime error must be generated and the program terminated if the subscript is out of range. Java behaves similarly by throwing an exception.

    An array's length field may be examined to determine the number of elements in the array. This length field is read only; once new is used to allocate the array, its length never changes. The array itself can always be reassigned, of course: it can be assigned to another, already allocated, array, and it can be assigned to the result of invoking the new operator again.

    Arrays of zero length can be allocated. In this case, the length field of the array object would be zero, and the array itself would be empty. The array object has a secret instance variable that points to the array components. The only way the programmer can access these components is via square brackets. For example, the fourth element of array foo is accessed via the expression foo[3]. The length field is accessed via the standard dot notation: foo.length.

  21. Miscellaneous
  22. A Decaf program must have exactly one class with a method named main(). main() should take no arguments, and its return type is void.