Auxilary Compiler Modules

We will describe the modules you’ll need to use following a skeleton code for your compiler

The main program of your compiler will look like the following code. You are supposed to read the comments and understand each step.

  1int main(int argc, const char* argv[]) {
  2
  3  bool doTypeCheck=true, doCodeGen=true, doLLVM=false;
  4  std::string filename;
  5  for (int i=1; i<argc; ++i) {
  6    if (std::string(argv[i]) == "--noTypecheck") doTypeCheck=false;
  7    else if (std::string(argv[i]) == "--noCodegen") doCodeGen=false;
  8    else if (std::string(argv[i]) == "--genLLVM") doLLVM=true;
  9    else if (filename=="") {
 10      // it is not a valid option, must be the file name, make sure it is the first one
 11      filename = std::string(argv[i]);
 12    }
 13    else { // something unexpected came: Not a valid option, and a second filename
 14      std::cout << "Usage: ./asl [--noTypecheck|--noCodegen|--genLLVM] [<file.asl>]" << std::endl;
 15      return EXIT_FAILURE;
 16    }
 17  }
 18  
 19  // open input file (or std::cin) and create a character stream
 20  antlr4::ANTLRInputStream input;
 21  if (filename != "") {
 22    std::ifstream stream;
 23    stream.open(filename);
 24    if (stream.fail()) {
 25      std::cout << "Could not open file: " << filename << std::endl;
 26      return EXIT_FAILURE;
 27    }
 28    input = antlr4::ANTLRInputStream(stream);
 29  }
 30  else {            // read fron std::cin
 31    input = antlr4::ANTLRInputStream(std::cin);
 32  }
 33  
 34  // create a lexer that consumes the character stream and produces a token stream
 35  AslLexer lexer(&input);
 36  antlr4::CommonTokenStream tokens(&lexer);
 37  
 38  // create a parser that consumes the token stream, and parses it.
 39  AslParser parser(&tokens);
 40  
 41  // call the parser and get the parse tree
 42  antlr4::tree::ParseTree *tree = parser.program();
 43  
 44  // check for lexical or syntactical errors
 45  if (lexer.getNumberOfSyntaxErrors() > 0 or
 46      parser.getNumberOfSyntaxErrors() > 0) {
 47    std::cout << "Lexical and/or syntactical errors have been found." << std::endl;
 48    return EXIT_FAILURE;
 49  }
 50  
 51  // print the parse tree (for debugging purposes)
 52  // std::cout << tree->toStringTree(&parser) << std::endl;
 53  
 54  if (not doTypeCheck) {
 55    std::cout << "-- Early stop: no typecheck has been made." << std::endl;
 56    return EXIT_SUCCESS;
 57  }
 58  
 59  // auxililary classes we are going to need to store information while
 60  // traversing the tree. They are described below in this document
 61  TypesMgr       types;
 62  SymTable       symbols(types);
 63  TreeDecoration decorations;
 64  SemErrors      errors;
 65
 66  // create a visitor that looks for variables and function declarations
 67  // in the tree and stores required information
 68  SymbolsVisitor symboldecl(types, symbols, decorations, errors);
 69  symboldecl.visit(tree);
 70
 71  // create another visitor that will perform type checkings wherever
 72  // it is needed (on expressions, assignments, parameter passing, etc)
 73  TypeCheckVisitor typecheck(types, symbols, decorations, errors);
 74  typecheck.visit(tree);
 75
 76  if (errors.getNumberOfSemanticErrors() > 0) {
 77    std::cout << "There are semantic errors: no code generated." << std::endl;
 78    return EXIT_FAILURE;
 79  }
 80
 81  if (not doCodeGen) {
 82    std::cout << "-- Early stop: no code generated." << std::endl;
 83    return EXIT_SUCCESS;
 84  }
 85  
 86  // create a third visitor that will return the generated code
 87  // for each part of the tree, and will store it in 'mycode'
 88  CodeGenVisitor codegenerator(types, symbols, decorations);
 89  code mycode = std::any_cast<code>(codegenerator.visit(tree));
 90
 91  // print generated code as output
 92  std::cout << mycode.dump() << std::endl;
 93  
 94  if (doLLVM) {
 95    std::string llvmStr = mycode.dumpLLVM(types, symbols);
 96    std::string llvmFileName;
 97    if (filename == "") 
 98      llvmFileName = "output.ll";
 99    else {
100      std::size_t slashPos = filename.rfind("/");
101      std::size_t dotPos   = filename.rfind(".");
102      llvmFileName = filename.substr(slashPos+1, dotPos-slashPos-1) + ".ll";
103    }
104    std::ofstream myLLVMFile(llvmFileName, std::ofstream::out);
105    myLLVMFile << llvmStr << std::endl;
106    myLLVMFile.close();
107  }
108  
109  return EXIT_SUCCESS;
110}

Code generated by ANTLR4

ANTLR4 is a parser generator that will create many of the code needed to build the compiler.

Given a grammar, (file Asl.g4) ANTLR4 will generate classes AslLexer and a AslParser that can be directly called from our main to obtain the parse tree of the target program.

Once we have the parse tree, we need to traverse it to perform type checking and code generation. ANTLR4 will also generate one abstract class called AslVisitor that has one visit method for each rule (or rule label) in the grammar. The derived class AslBaseVisitor implements these methods that walk the entire tree.

The SymbolVisitor, TypeCheckVisitor, and CodeGenVisitor are classes derived from ASLBaseVisitor. In each of them, the visit method for nodes that we want to process have been written, thus overwritting the empty methods in the base class. Each visitor will deal with some nodes and ignore some others, since each will do different things.

For instance:

  • SymbolVisitor will only declare visit methods for nodes related to variable, parameter, or function declarations, and ignore all the rest.

  • TypeCheckVisitor will declare visit methods for nodes related to expressions, assignments, and parameter passing, but ignore nodes about variable or function declarations.

  • CodeGenVisitor will declare visit methods for nodes related to instructions and expressions, but ignore others.

Thus, to build a compiler with ANTLR4, we only need to write the grammar, a short main program like the previous example, and the needed visit methods of one (or more) derived Visitor classes that deal with the tree nodes we want to process.

Auxiliary Modules

The classes TypesMgr, SymTable, TreeDecoration, SemErrors, and the Code Manager module, containing class code and other related subclasses are used by our ASL compiler to store data about the program being compiled, and to propagate this information from one traversal to the following (e.g. Symbol declaration traversal will store information about which variables are declared and which type each of them has. Type checking traversal will use the type information to verify that operations are correctly performed, and code generation traversal will use information about the names and sizes of the variables to produce the right low-level code). Class code will be useful to contain the partially generated code, extend it with new instructions, and print it when it is complete. Class counters provides counters to keep track of used labels and temporals.

Type Manager

The Type Manager stores which data types have been seen in the target program (e.g. bool, array of 10 char, function receiving one int and returning bool) and offers some methods to manipulate them.

More information about Type Manager.

Symbol Table

The Symbol Table stores which identifiers have been seen in the target program (variable names, parameter names, function names), in which scope have they appeared (i.e. in which function or code block), and associates them with a type (stored in Type Manager)

More information about Symbol Table.

Tree Decoration

The Tree Decoration module allows to store some information associated to specific nodes in the parse tree.

More information about Tree Decoration.

Code Manager

The Code Manager module contains several classes that ease the handling and combination of code fragments.

More information about Code Manager.

Semantic Errors

Semantic Errors module simplifies the handling of errors, associating errors to nodes in the tree.

More information about Semantic Errors.