CPTR 415 Compiler Construction
The Decaf Programming Language

Introduction
Decaf Overview
Notation
Decaf Syntax
Lexical Conventions
Types
Classes
Namespaces
Statements
Expressions
Arrays
Miscellaneous

Introduction

During this course you will implement a compiler for the Decaf programming language. Decaf is an adaptation of Java^TM. (The description of Decaf presented here is actually an adaptation of the original Decaf language specification devised by Brad Vander Zanden.) Decaf presents a number of learning opportunities:

Decaf is an imperative language. The experience you gain in developing a Decaf compiler will extend to other imperative languages such as C, C++, Pascal, and FORTRAN.
Decaf is object-oriented. You will learn some of rudiments of compiler technology for object-oriented languages.
Usefulness. If you are unfamiliar with the Java programming language, programming in Decaf provides an opportunity to get the feel for programming in a Java-like language.

Student compilers will be written in the C++ programming language and will take advantage of the Lex (Flex) and Yacc (Bison) compiler tools. Your compiler must be able to be built with GNU C++ (G++) and run on the cpp.cs.southern.edu programming host (running Linux). You are free to develop on another platform, but you should check to make sure it works on cpp before submitting it.

Your compiler will generate MIPS assembly code that will run on the SPIM simulator.

If you have your own PC, all the tools necessary for writing the Decaf compiler are available for free (Linux, GNU C++, SPIM).

Decaf Overview

Decaf is a strongly typed language that supports the following features:

Primitive data types: int (32-bit signed integer) and void.
Structured types: Classes and one-dimensional arrays.
Assignment.
Conditionals: if/else.
Iteration: while.
Methods and method calls.
Arithmetic: +, -, *, /, %.
Relational operators: <, >, <=, >=, ==, !=.
Boolean operators: &&, ||, !.
Integer input/output: print, read.

In order to make the compiler construction process tractable for a semester course, many important features of Java are omitted from Decaf. These include inheritance, exceptions, interfaces, static members, constructors, multidimensional arrays, threads, other conditional and iteration statements (like switch and for), access modifiers (public, private, protected), garbage collection, and package (library) support.

Here is a sample Decaf program that computes the greatest common divisor of two numbers entered by the user (the gcd method implements Euclid's algorithm):

	    class Example {
                void main() {
                    int x; int y;
                    x = read();
                    y = read();
                    print(gcd(x, y));
                }
                int gcd(int a, int b) {
                    if ( b == 0 )
                        return a;
                    else
                        return gcd(b, a % b);
                }
            }

The following Decaf program creates a linked list:

            
	    class ListNode {
                int data;
                ListNode next;
                void setData(int x) {
                    data = x;
                }
            }
            class List {
                ListNode head;
                void show(ListNode list) {
                    ListNode cursor;
                    int count;
                    count = 0;
                    cursor = list;
                    while ( cursor != null ) {
                        count = count + 1;
                        print(count, cursor.data);
                        cursor = cursor.next;
                    }
                }
                void main() {
                    ListNode second;
                    ListNode third;
                    head = new ListNode();
                    head.setData(4);  //  Could use "head.data = 4;"
                    print(head.data);
                    second = new ListNode();
                    second.setData(5);
                    print(second.data);
                    third = new ListNode();
                    third.setData(11);
                    print(third.data);
                    head.next = second;
                    second.next = third;
                    third.next = null;
                    show(head);
                }
            }

Notation

The following notational conventions are used throughout this document:

Nonterminal symbols are denoted by all uppercase italics; for example, CLASS.
Terminal symbols, operator symbols, and keywords are denoted by boldface; for example, identifier.
< ... >^* means zero or more occurrences of the item within the angle braces; for example, a^* means zero or more occurrences of a.
< ... >⁺ means one or more occurrences of the item within the angle braces; for example, a⁺ means one or more occurrences of a.
The empty string is denoted by epsilon.

Decaf Syntax

Decaf Grammar

1.	PROGRAM	`-->`	CLASS_DECLARATION⁺
2.	CLASS_DECLARATION	`-->`	`class` identifier CLASS_BODY
3.	CLASS_BODY	`-->`	{ VAR_DECLARATION^* METHOD_DECLARATION^* }
4.	VAR_DECLARATION	`-->`	TYPE identifier ;
5.	TYPE	`-->`	SIMPLE_TYPE
		\|	SIMPLE_TYPE`[]`
6.	SIMPLE_TYPE	`-->`	`int`
		\|	identifier
7.	METHOD_DECLARATION	`-->`	RESULT_TYPE identifier ( PARAMETER_LIST ) METHOD_BODY
8.	RESULT_TYPE	`-->`	TYPE
		\|	`void`
9.	PARAMETER_LIST	`-->`	epsilon
		\|	PARAMETER < , PARAMETER >^*
10.	PARAMETER	`-->`	TYPE identifier
11.	METHOD_BODY	`-->`	`{` LOCAL_VAR_DECLARATION^* SIMPLE_STATEMENT^* `}`
12.	LOCAL_VAR_DECLARATION	`-->`	TYPE identifier ;
13.	STATEMENT	`-->`	SIMPLE_STATEMENT
		\|	{ SIMPLE_STATEMENT⁺ }
14.	SIMPLE_STATEMENT	`-->`	`;`
		\|	NAME = EXPRESSION ;
		\|	NAME ( ARG_LIST ) ;
		\|	`print` ( ARG_LIST ) ;
		\|	CONDITIONAL_STATEMENT
		\|	`while (` EXPRESSION `)` STATEMENT
		\|	`return` OPTIONAL_EXPRESSION ;
15.	NAME	`-->`	VARIABLE_ACCESS
		\|	VARIABLE_ACCESS.FIELD_ACCESS
16.	VARIABLE_ACCESS	`-->`	`this`
		\|	identifier
		\|	identifier [EXPRESSION]
17.	FIELD_ACCESS	`-->`	identifier
		\|	identifier [EXPRESSION]
18.	ARG_LIST	`-->`	epsilon
		\|	EXPRESSION < , EXPRESSION >^*
19.	CONDITIONAL_STATEMENT	`-->`	`if (` EXPRESSION ) STATEMENT
		\|	`if (` EXPRESSION ) STATEMENT `else` STATEMENT
20.	OPTIONAL_EXPRESSION	`-->`	epsilon
		\|	EXPRESSION
21.	EXPRESSION	`-->`	NAME
		\|	number
		\|	`null`
		\|	NAME ( ARG_LIST )
		\|	`read` ()
		\|	NEW_EXPRESSION
		\|	UNARY_OP EXPRESSION
		\|	EXPRESSION RELATION_OP EXPRESSION
		\|	EXPRESSION SUM_OP EXPRESSION
		\|	EXPRESSION PRODUCT_OP EXPRESSION
		\|	( EXPRESSION )
22.	NEW_EXPRESSION	`-->`	`new` identifier ()
		\|	`new int [` EXPRESSION `]`
		\|	`new` identifier `[` EXPRESSION `]`
23.	UNARY_OP	`-->`	`+`
		\|	`-`
		\|	`!`
24.	RELATION_OP	`-->`	`==`
		\|	`!=`
		\|	`<=`
		\|	`>=`
		\|	`<`
		\|	`>`
25.	SUM_OP	`-->`	`+`
		\|	`-`
		\|	`\|\|`
26.	PRODUCT_OP	`-->`	`*`
		\|	`/`
		\|	`%`
		\|	`&&`

Operator Precedence

Operator precedence is as follows (from highest to lowest):

UNARY_OP
PRODUCT_OP
SUM_OP
RELATION_OP

Within each group of operators each operator has the same precedence; for example, +, -, and || have the same precedence.

Operator Associativity

All operators associate left-to-right; for example, a + b + c is interpreted as (a + b) + c.

Lexical Conventions

The Decaf language observes the following lexical conventions:

Comments begin with the characters //. All text after the // until the end of the line is ignored by the compiler.

Identifers are unlimited sequences of letters, numbers, and the underscore _ character. They must begin with a letter. (Note: Unlike Decaf, Java permits an idenitifier to begin with an underscore.) Formally:

letter

-->

[a-z] | [A-Z]

digit

-->

[0-9]

alpha

-->

letter | digit | _

identifier

-->

letter alpha^*

Numbers are unsigned quantities of one or more digits. A negative number can be constructed using the unary - operator.

number

-->

digit^*

The following set of words are reserved keywords in the Decaf language. It is a compile time error to declare an identifier to have the name of a keyword.

                        
		        class     else      if        int
                        new       null      print     read      
                        return    this      void      while

The following symbols are used in Decaf as operators, grouping symbols, comment specifiers, and delimiters:

		        [         ]         {         }
                        !=        ==        <         >
                        <=        >=        &&        ||
                        !         +         -         *
                        /         %         ;         ,
                        (         )         =         //
                        .

White space consists of blanks, tabs, and end-of-line characters. White space may be used to mark the end of a symbol, keyword, identifier, or other token. However, with the exception delimiting the beginning and ending of a keyword, white space is not required. For example, a&&b is a legitimate lexical sequence that should be broken into the tokens identifier, conditional and, and identifier. White space is not passed to the parser.

Types

Primitive Types

Decaf has two primitive types:

int: a 32-bit signed integer, and
void: a degenerate type used to indicate that a method returns no value.

Structured Types

Decaf has two structured types:

Classes. A class consists of a set of instance variables and a set of methods that operate on these instance variables. A program creates instances of classes called objects. A program accomplishes its designated task by calling methods associated with variouus objects.
Arrays. Decaf arrays are one-dimensional homogeneous arrays that may consist of integers or instances of a programmer-defined class. Mulidimensional arrays are not supported in Decaf.

References

All array and class variables (objects) in Decaf are reference variables. When a variable is declared, space is allocated for a reference, but no space is allocated for an object. The programmer must explicitly create an object with the new operator and then assign the reference that new returns to the appropriate variable. Java works exactly the same way. When an object is no longer referenced by any variable, it becomes garbage; Java would properly collect the garbage, but Decaf simply ignores the object. This failure to reclaim storage should pose no problem for most of the Decaf programs you will write.

The following code performs the actions indicated by the comments:

                class Foo {  //  Declare a class named Foo
                    int key;
                }
                
                Foo a;   //  Declare a couple of variables that may
                Foo b;   //  reference objects of type Foo

                a = new Foo();  //  Create a Foo object and assign the
                                //  reference to variable a
                b = a;          //  b references the same Foo object as a

                a.key = 10;
                print(b.key);   //  10 gets printed

                a = new Foo();  //  a and b will now reference different
                b = new Foo();  //  objects

Parentheses appear after the variable name when a class object is created with new. Java requires this since the use of new actually calls a constructor, and arguments can be supplied if needed. Decaf does not implement constructors, but the parentheses remain so Decaf code looks more like Java code.

Classes

A class declaration creates a new class type and specifies its implementation. The identifier immediately after the class keyword defines the name of the class. The class body then defines the implementation by possibly declaring a set of instance variables and a set of methods.

2.

CLASS_DECLARATION

-->

class identifier CLASS_BODY

3.

CLASS_BODY

-->

{ VAR_DECLARATION^* METHOD_DECLARATION^* }

The body of a class is delimited by curly braces. There is no semicolon after the class body. Java allows variable and method declarations to be intermixed, but for simplicity Decaf requires that all instance variable declarations precede all method declarations.

Variable Declarations. Java supports multiple variable declarations (variables separated by commas) in a single declaration and variable initialization. Decaf allows only one variable to be declared at a time and omits variable initializers. Decaf also omits class variables (refered to as static in Java).

Integer instance variables should be automatically initialized to zero (0), and reference variables should be initialized to null.

Types. A valid type for a variable is either the primitive type int, a previously defined class, or an array type. The current class is considered to be a previously defined class, so the type of an instance variable may be the class being defined. For example, the following definition is valid:

            class TreeNode {
                int key;
                TreeNode left_child;
                TreeNode right_child;
            }

Unlike C/C++, Java allows the square brackets of an array declaration to be placed after the type, instead of after the variable (Java also allows the C/C++ style of brackets after the variable). Decaf requires the brackets to be placed after the type. Also unlike C/C++, Java does not specify a size at the point of declaration; the size is provided when the new operator is used. Consequently, an array variable may reference different length arrays over its lifetime. Example array declarations include:

            int[] a;            //  Declare a
            Foo[] f;            //  Declare f
            a = new int[10];    //  Allocate array object a
            f = new Foo[1000];  //  Allocate array object f

Identifier names. It is not an error to assign a variable the same name as a previously or subsequently defined class name (although it is poor programming style).

Method Declarations. A method is similar to a function in conventional imperative languages. A method is a block of code that can be invoked with zero or more values as arguments. Unlike a traditional function, a method belongs to a particular class, and its definition must appear within the body of a class definition.

Java supports special methods called constructors that automatically initialize an object when new is invoked. For simplicity, Decaf does not have constructors.

Java supports method overloading; that is, multiple methods with the same name but different parameter lists. For simplicity, Decaf does not support method overloading, so it is a compile time error if two methods in a class have the same name.

In Decaf, unlike Java, methods and instance variables must have unique names; that is, in Decaf it is illegal to have an instance variable and a method with the same name within the same class.

Return Types. If a method does not return a value, its return type should be declared void. If the return type is an array type, Decaf requires the empty brackets be placed after the type name. For example:

                int[] foo(int a, int b) { . . . }  //  Correct

                int foo[](int a, int b) { . . . }  //  Incorrect

                int foo(int a, int b)[] { . . . }  //  Incorrect

Parameters. Parameters are considered local variables of the method (that is, they are in the method's namespace). They may not be overridden by other local variables, and it is a compile time error to attempt to do so.

Parameters are passed by value. Thus, integer arguments are copied to their corresponding parameters and that reference arguments are copied to their corresponding parameters.

this. The keyword this may be used within a method to reference the current object. this is useful for referencing shadowed variables or methods (that is, variables or methods in the current object that are masked by identically named local variables or parameters in the method) or for passing the current object as an argument to another object's method. For example:

                class Example {
                    int key;
                    int foo;
                    void setKey(int key, AnotherClass obj) {
                        this.key = key;   //  key is a local variable
                                          //  (actually a parameter) that
                                          //  masks an instance variable
                                          //  so this must be used
                        foo = key;        //  foo == this.foo but key
                                          //  is the parameter key
                        obj.notify(this); //  Pass the current object to
                                          //  the method of another
                                          //  object
                    }
                }

As illustrated above, the expression "this." is appended implicitly to any non-shadowed instance variable or method of the current object. If a local variable has the same name as an instance variable or method, then the programmer must explicitly preface the instance variable or method name with "this." in order to access the shadowed instance variable or method.

Body of Methods. The body of a method consists of a sequence of local variable declarations followed by statements:

11.

METHOD_BODY

-->

{ LOCAL_VAR_DECLARATION^* SIMPLE_STATEMENT⁺ }

Unlike Java, Decaf requires that all local variable declarations precede any statement. Java allows declarations and statements to be intermixed. The syntax for local declarations is identical to that for instance variables: one variable per declararion and no initialization.

Unlike Java, Decaf does automatically assign zero (0) to local integers does automatically assign null to local reference variables. This permits a run time check for uninitialized references. (Java checks for uninitialized variables at compile time.)

Namespaces

There are only two namespaces in Decaf---the method namespace and the class namespace. When a variable is declared in a method body it is placed in the method's namespace.

A local variable's declaration is in force throughout the body of the method in which it is declared. If an identifier previously declared as an instance variable or method in the enclosing class, then the other declaration is hidden for the remainder of the method body. this must be used to access these hidden instance variables and methods.

Type scope. Unlike Java, an identifier with the same name as a previously defined class does not hide the class name. Hence, the previously defined class name can still be used as a type name. In other words, a type name always remains in force.

Variable Name Resolution Summary. The compiler should resolve a variable reference by looking first in the method namespace. If the variable cannot be found in the method namespace, the compiler should search the enclosing class's namespace. If the variable is found in neither the method namespace nor the enclosing class's namespace, then the variable is considered undeclared and a compile time error should be generated.

Statements

Decaf statements do not have values; they are evaluated only for their effect. In Decaf (unlike Java) this restriction extends to assignment. For example, in Java one can write expressions such as a = b = c or (a = b)[3]. Such statements are illegal in Decaf since an assignment statement has no value.

Statements may be one of the following:

Empty Statement. The semicolon by itself represents an empty statement. It does nothing.

Assignment. An assignment statement evaluates an expression and assigns the result to a variable:

NAME = EXPRESSION ;

A NAME is either the reserved keyword this, an identifer, a field access via the dot notation, or an array access:

15.

NAME

-->

VARIABLE_ACCESS

|

VARIABLE_ACCESS.FIELD_ACCESS

16.

VARIABLE_ACCESS

-->

this

|

identifier

|

identifier [EXPRESSION]

17.

FIELD_ACCESS

-->

identifier

|

identifier [EXPRESSION]

Note that array accesses may occur in either half of a dotted name. For example, both a[3].b and c.d[2] are legal names.

Method Calls. A method call consists of the name of the method name and a list of zero or more comma-separated arguments. The name may be a dotted name; for example, obj.f(3, x) A Decaf program may ignore the return value of a method which is not declared void. For example, if foo returns an int, then both of the following statements are valid:

        
	       stack[3].foo(5, 10);
               x = stack[3].foo(5, 10);

Output. The print operator writes zero or more integers to the screen. Each displayed value is followed by a blank space. The arguments to print must all be integer-valued expressions. A print operator with no arguments prints a new line; thus, a print operator with no arguments may be used to obtain a blank line in the output.

For example, the code fragment

        
       print(34, 4, 2, 15);
       print();
       print(34, 4, 2, 15);
       print(15);
       print(16);
       print();
       print(-4);
       print();
       print();
       print(-4);

would display

        
       34 4 2 15
       34 4 2 15 15 16
       -4
       
       -4

There is no way to display a string of characters.

Decaf's technique of handling input and output is different from Java's. Decaf builds the input and output commands into the language; Java uses library commands for I/O. Java's libraries are more flexible since developers can write new I/O routines to match the needs of their applications or handle new I/O devices. Since Decaf was devised to teach the concepts of compiler construction and is not meant to be a production language, the limitations of Decaf's I/O is acceptable.

Conditional Statements. For simplicity, Decaf does not support the switch construct of Java. It does support the if/else statement:

19.	CONDITIONAL_STATEMENT	`-->`	`if (` EXPRESSION ) STATEMENT
		\|	`if (` EXPRESSION ) STATEMENT `else` STATEMENT

In Decaf, the conditonal expression must evaluate to an int. As in C/C++, any integer except 0 is considered to be true, and 0 is considered false. (Java requires that conditional expression be type boolean, but Decaf does not support a boolean type.)

The dangling else problem is resolved by matching an else with the closest previous unmatched if in the same method body. (Java resolves the dangling else the same way.)

Iteration. Java supports multiple loop forms, including while, do, and for. For simplicity, Decaf supports only the while construct:

while ( EXPRESSION ) STATEMENT

The expression in the while statement must evaluate to an integer. The statement is executed until the expression evaluates to 0 (that is, false).

Java supports break and continue that alter the normal program flow in a loop. These constructs are convenient but are not absolutely necessary, and thus they have been omitted from Decaf to make it more managable.

Return Statement. Values are returned from methods via the return statement.

return OPTIONAL_EXPRESSION ;

The type of the return expression must be identical to the return type of the method in which it is used. It is a compile time error if the types do not agree or for the return expression to have an argument in a void method.

Compound Statements. A compound statement can be used to group a sequence of statements to be executed as the body of a while of if statement. A compound statement is a sequence of statements enclosed in curly braces. Unlike Java, Decaf does not support "inner blocks." In Java, a pair of curly braces within a method body defines an inner block in which variables can be declared which are local to this inner block. Since Decaf supports only two namespaces, method and class, inner blocks for finer control of variable scoping are not supported.

Expressions

Decaf expressions are simpler and therefore less powerful than Java expressions. For example, Java allows object references to be determined by complex expressions, but Decaf requires that object references be determined by variable names. The following expressions are legal in Java but illegal in Decaf:

        c = (a = b)[3];      //  c is assigned the contents of b[3] and 
                             //  a is assigned a reference to the array 
                             //  pointed to by b
        c = foo.goo(a,b).x;  //  the object containing field x is determined
                             //  at runtime by executing the method goo 
                             //  in the object referenced by object foo

One would use multiple Decaf statements to achieve the effects of each of these single Java statements.

Decaf expressions involve the following:

Input

read reads and returns and 32-bit signed integer from the keyboard. Read is an operator like print and new; it is neither a function nor a method. It looks and behaves as if it were a function, however. The trailing parentheses emphasize the fact that read is not a simple variable.
Arithmetic and Relational Operators

The arithmetic and relational operators are straightforward. Integer division (/) truncates. For example, 8/3 = 2. Your compiler must insert code that detects division by zero. If your code detects an attempt to divide by zero, the program should terminate and report the execptional condition.

Relational operators should return integer 1 if the expression is true and the integer 0 if the expression is false.
Conditional and, Conditional or, and not

The conditonal and operator (&&) should evaluate only its first operand if the first operand is false (i.e., the integer 0). Similarly, the conditional or (||) operator should evaluate only its first operand if the first operand evaluates to true (i.e., any integer except 0). Both conditional and and conditional or operators should return the integer 1 if the expression is true and the integer 0 if the expression is false.

The not operator (!) should return the integer 0 if the expression is true and the integer 1 if the expression is false.
The new Operator

The new operator creates a new object, initializes all its instance variables to zero or null (as appropriate), and returns a reference to the newly created object.

Array Allocation. An allocation request for an array object must specify the size of the array. The new operator allocates enough space to accommodate the requested number of objects. For example, the allocation request

a = new int[10];

would be handled by allocating space for 10 integer components (40 bytes).

Note that an array variable is not allocated storage when it it declared. The dimension is not specified when the array is declared. An array declaration only allocates space for a reference to the array (4 bytes). Consequently, the only way to allocate space for an array object is to use the new operator.
Order of Evaluation

All operands in an expression are evaluated left-to-right order, regardless of the precedence of operators. For example, suppose a[3] is originally 5 and that the method obj.foo() returns 20 and sets a[3] to 10. Then the result of the expression

obj.foo() + (a[3] * a[3])

is 20 + (10 * 10) or 120 (notice that even though the parenthesized expression has greater precedence, the operands are evaluated in left-to-right order). Similarly, all parameters are evaluated in strictly left-to-right order. Specific rules such as these are required because because side effects can cause different results depending on the order in which the operands are evaluated. For example, if the operands in the expression obj.foo() + (a[3] * a[3]) were evaluated in right-to-left order, the result would be 20 + (5 * 5) or 45. Hence, to ensure that Decaf programs generated by different compilers produce the same results, an order of evaluation must be imposed.

The Java language specification imposes a strict order of evaluation, unlike C and C++ in which the order of evaluation is unspecified. To see why this is important, this sequence of C++ code:
```
      Stack s = new Stack();
      s.push(3);
      s.push(5);
      cout << s.pop() << " " << s.pop() << endl;
```
will print "3 5" on some compiler/platform combinations and "5 3" on others.
null References

It is a runtime error to attempt to access the field of a null reference. For example, if the statement

obj.x = 2;

were executed when obj is null, the program should be terminated and an error message should be displayed.

Arrays

Decaf arrays are one-dimensional arrays. As in Java, arrays are first-class objects with a single publicly accessible field called length. If the array's length is n, then its components range from 0 to n - 1 inclusive. The array subscripts must be integers (i.e., int types). A runtime error must be generated and the program terminated if the subscript is out of range. Java behaves similarly by throwing an exception.

An array's length field may be examined to determine the number of elements in the array. This length field is read only; once new is used to allocate the array, its length never changes. The array itself can always be reassigned, of course: it can be assigned to another, already allocated, array, and it can be assigned to the result of invoking the new operator again.

Arrays of zero length can be allocated. In this case, the length field of the array object would be zero, and the array itself would be empty. The array object has a secret instance variable that points to the array components. The only way the programmer can access these components is via square brackets. For example, the fourth element of array foo is accessed via the expression foo[3]. The length field is accessed via the standard dot notation: foo.length.

Miscellaneous

A Decaf program must have exactly one class with a method named main(). main() should take no arguments, and its return type is void.

letter	`-->`	[a-z] \| [A-Z]
digit	`-->`	[0-9]
alpha	`-->`	letter \| digit \| _
identifier	`-->`	letter alpha^*

CPTR 415 Compiler Construction The Decaf Programming Language