Semantic Analysis

Assignment 4
CPTR 415 Compiler Construction


  1. Overview
  2. In this assignment you will implement a semantic analyzer for your Decaf compiler. Your semantic analyzer will construct an abstract syntax tree and then perform various semantic checks including type checking.

  3. Issues
    1. Strategy. Your parse tree from the last assignment was fairly simple. Each node in your AST will contain additional information that facilitates semantic analysis. You should enhance your parse tree code to support a more elaborate AST. One useful approach is to devise a C++ class for each language construct. For example, you may have an abstract ASTNode superclass from which all other classes (like IdentifierNode, ExpressionNode, and ThisNode) are derived. Each class would have a method responsible for determining its own type correctness. For example, the AssignmentNode class would check the type of its lefthand object (identifier, array access, or object dereference) and compare that to the type of its righthand expression.

      Your AST may be a simplified form of your parse tree. The AST can be built to more closely match the original grammar of the Decaf language specification. Remember, this original grammar was slightly modified to make it LALR(1) and therefore acceptable to Yacc. You may choose to simplify your tree so that Kleene closure is represented by multiple subtrees instead a long, deep branch.

    2. Semantic Checks. The primary task at this stage is type checking, but other semantic checks should be performed. These checks should include:
      • The types used within a statement must be compatible for that statement. For example, it it illegal to assign an integer to an object reference, and it is illegal to attempt to add two objects.
      • The number and types of arguments passed to a method must agree with its definition.
      • It is illegal to declare more than one instance variable with the same name within the same class and more than one local variable with the same name name within the same method.
      • Exactly one class in a program can have a main() method.
      • The this reference is a reference type to a class. The class depends on which method is referencing this. If a method in class Foo references this, then within the scope of this method, this is a reference to type Foo.
      • The this reference is restricted; that is, it may appear at the beginning of a dotted name, it may be passed as a parameter, and it may be assigned from ( x = this; // OK), but not to (this = x; // Illegal). In other words, this is read-only.
      • The null reference is a reference type to a class or array. Its type matches any reference or array type.
      • The null reference is restricted; that is, it may not appear at the beginning of a dotted name, it may be passed as a parameter, and it may be assigned from ( x = null;), but not to (null = x;).
      • Arithmetic operators return an integer.
      • print returns void.
      • The only field accessible in an array is length.

      This list is not meant to be exhaustive; you may find other semantic checks that are not on this list.

    3. Error Handling. You should continue to report lexical and syntactic errors you detect. For simplicity, you do not have to perform semantic checking if there are any lexical or syntactic errors in the program. However, if you find it easier to do semantic checking while parsing, you may choose to interleave syntactic and semantic error messages.

      Semantic errors should be reported. Semantic error messages should include the number of the line responsible for the error.

    4. Symbol Table. You should integrate your symbol table code from the first assignment with your semantic analyzer. Because of the limited number of scopes, it makes sense to support three symbol tables at any given time--one for the type names, one for the current class namespace, and one for the current method namespace. Your method symbol table may have a link to the class symbol table. Using this organization, an identifier can be looked up in the method table; if it is not found there, the class table can then be consulted via the link from the class table. The type table would be accessible at any time. You may elect to use a different organization of symbol tables.

      Your symbol table code will need to be augmented. Your symbol table entries still consist of strings (identifiers), but you will include information about type and possible structure (indication of an array, parameter list for a method, etc.).

      You may find it convenient to have each node in your AST point to the symbol table to be used to analyze the section of code that is represented by that node.

      It is very important to get the symbol table code integrated correctly. If you must compromise between increasing the number of semantic checks to perform or getting the symbol table code to work flawlessly, opt for getting the symbol table code to work. For one thing, your checks cannot really work very well anyway with faulty symbol table code. Perhaps even more importantly, correct symbol table management is essential to the final part of the project, code generation. If you perform no semantic checks but set up the symbols tables correctly, your code generator (if implemented correctly) will be able to emit correct assembly code for correct Decaf programs. (Illegal Decaf programs do not progress to the code generation phase.)

  4. Format of Output
  5. Your protocompiler should produce a set of error messages:

    1. Lexer errors, if any
    2. Parser errors, if any
    3. Semantic errors, if any

    Each error message should consist of:

    1. a line number where the error was detected
    2. an informative message describing the nature of the error

    If the program has no errors, nothing should be printed.

  6. Assignment Submission and Grading
  7. You should do all your developement within a single directory. When you are ready to submit the assignment, remove all executable and object files from this directory and use the submit415 command while in this directory to submit your assignment. Provide a make file, so I need only unpack your files, run make, and then execute your program. The executable should be named decafc. It should be possible to execute this program on a file using the command

    decafc < filename
    

    Hand in a brief write-up describing

    Your semantic analyzer will be tested against a variety of test files.

    You source code should use comment blocks to explain any non-obvious operations (for example, how you handle errors).

    The possible number of points for this assignment is 10. Points are weighted as follows:

    Correctness of your program including the completeness of
    your semantic checks
    8
    Error messages are informative 1
    Source code is readable and well commented;
    write-up is well structured and informative
    1

    I will deduct 1 point if your program does not build flawlessly.

    Your protocompiler should not produce segmentation faults. I consider segmentation faults to be a strong indication of poor programming quality; please do your best to have your program behave gracefully in all situations.