Auxilary Compiler Modules¶
We will describe the modules you’ll need to use following a skeleton code for your compiler
The main program of your compiler will look like the following code. You are supposed to read the comments and understand each step.
1int main(int argc, const char* argv[]) {
2
3 bool doTypeCheck=true, doCodeGen=true, doLLVM=false;
4 std::string filename;
5 for (int i=1; i<argc; ++i) {
6 if (std::string(argv[i]) == "--noTypecheck") doTypeCheck=false;
7 else if (std::string(argv[i]) == "--noCodegen") doCodeGen=false;
8 else if (std::string(argv[i]) == "--genLLVM") doLLVM=true;
9 else if (filename=="") {
10 // it is not a valid option, must be the file name, make sure it is the first one
11 filename = std::string(argv[i]);
12 }
13 else { // something unexpected came: Not a valid option, and a second filename
14 std::cout << "Usage: ./asl [--noTypecheck|--noCodegen|--genLLVM] [<file.asl>]" << std::endl;
15 return EXIT_FAILURE;
16 }
17 }
18
19 // open input file (or std::cin) and create a character stream
20 antlr4::ANTLRInputStream input;
21 if (filename != "") {
22 std::ifstream stream;
23 stream.open(filename);
24 if (stream.fail()) {
25 std::cout << "Could not open file: " << filename << std::endl;
26 return EXIT_FAILURE;
27 }
28 input = antlr4::ANTLRInputStream(stream);
29 }
30 else { // read fron std::cin
31 input = antlr4::ANTLRInputStream(std::cin);
32 }
33
34 // create a lexer that consumes the character stream and produces a token stream
35 AslLexer lexer(&input);
36 antlr4::CommonTokenStream tokens(&lexer);
37
38 // create a parser that consumes the token stream, and parses it.
39 AslParser parser(&tokens);
40
41 // call the parser and get the parse tree
42 antlr4::tree::ParseTree *tree = parser.program();
43
44 // check for lexical or syntactical errors
45 if (lexer.getNumberOfSyntaxErrors() > 0 or
46 parser.getNumberOfSyntaxErrors() > 0) {
47 std::cout << "Lexical and/or syntactical errors have been found." << std::endl;
48 return EXIT_FAILURE;
49 }
50
51 // print the parse tree (for debugging purposes)
52 // std::cout << tree->toStringTree(&parser) << std::endl;
53
54 if (not doTypeCheck) {
55 std::cout << "-- Early stop: no typecheck has been made." << std::endl;
56 return EXIT_SUCCESS;
57 }
58
59 // auxililary classes we are going to need to store information while
60 // traversing the tree. They are described below in this document
61 TypesMgr types;
62 SymTable symbols(types);
63 TreeDecoration decorations;
64 SemErrors errors;
65
66 // create a visitor that looks for variables and function declarations
67 // in the tree and stores required information
68 SymbolsVisitor symboldecl(types, symbols, decorations, errors);
69 symboldecl.visit(tree);
70
71 // create another visitor that will perform type checkings wherever
72 // it is needed (on expressions, assignments, parameter passing, etc)
73 TypeCheckVisitor typecheck(types, symbols, decorations, errors);
74 typecheck.visit(tree);
75
76 if (errors.getNumberOfSemanticErrors() > 0) {
77 std::cout << "There are semantic errors: no code generated." << std::endl;
78 return EXIT_FAILURE;
79 }
80
81 if (not doCodeGen) {
82 std::cout << "-- Early stop: no code generated." << std::endl;
83 return EXIT_SUCCESS;
84 }
85
86 // create a third visitor that will return the generated code
87 // for each part of the tree, and will store it in 'mycode'
88 CodeGenVisitor codegenerator(types, symbols, decorations);
89 code mycode = std::any_cast<code>(codegenerator.visit(tree));
90
91 // print generated code as output
92 std::cout << mycode.dump() << std::endl;
93
94 if (doLLVM) {
95 std::string llvmStr = mycode.dumpLLVM(types, symbols);
96 std::string llvmFileName;
97 if (filename == "")
98 llvmFileName = "output.ll";
99 else {
100 std::size_t slashPos = filename.rfind("/");
101 std::size_t dotPos = filename.rfind(".");
102 llvmFileName = filename.substr(slashPos+1, dotPos-slashPos-1) + ".ll";
103 }
104 std::ofstream myLLVMFile(llvmFileName, std::ofstream::out);
105 myLLVMFile << llvmStr << std::endl;
106 myLLVMFile.close();
107 }
108
109 return EXIT_SUCCESS;
110}
Code generated by ANTLR4¶
ANTLR4 is a parser generator that will create many of the code needed to build the compiler.
Given a grammar, (file Asl.g4) ANTLR4 will generate classes AslLexer
and a AslParser
that
can be directly called from our main to obtain the parse tree of the target program.
Once we have the parse tree, we need to traverse it to perform type checking and code generation.
ANTLR4 will also generate one abstract class called AslVisitor
that has one visit
method for
each rule (or rule label) in the grammar. The derived class AslBaseVisitor
implements
these methods that walk the entire tree.
The SymbolVisitor
, TypeCheckVisitor
, and CodeGenVisitor
are classes derived
from ASLBaseVisitor
. In each of them, the visit
method for nodes that we want
to process have been written, thus overwritting the empty methods in the base class.
Each visitor will deal with some nodes and ignore some others, since each will do different things.
For instance:
SymbolVisitor
will only declarevisit
methods for nodes related to variable, parameter, or function declarations, and ignore all the rest.TypeCheckVisitor
will declarevisit
methods for nodes related to expressions, assignments, and parameter passing, but ignore nodes about variable or function declarations.CodeGenVisitor
will declarevisit
methods for nodes related to instructions and expressions, but ignore others.
Thus, to build a compiler with ANTLR4, we only need to write the grammar, a short main program
like the previous example, and the needed visit
methods of one (or more)
derived Visitor
classes that deal with the tree nodes we want to process.
Auxiliary Modules¶
The classes TypesMgr
, SymTable
, TreeDecoration
, SemErrors
, and the Code Manager
module, containing class code
and other related subclasses are
used by our ASL compiler to store data about the program being compiled, and to propagate
this information from one traversal to the following (e.g. Symbol declaration traversal
will store information about which variables are declared and which type each of them has.
Type checking traversal will use the type information to verify that operations are correctly
performed, and code generation traversal will use information about the names and sizes of
the variables to produce the right low-level code). Class code
will be useful to contain
the partially generated code, extend it with new instructions, and print it when it is complete.
Class counters
provides counters to keep track of used labels and temporals.
Type Manager¶
The Type Manager stores which data types have been seen in the target program (e.g. bool, array of 10 char, function receiving one int and returning bool) and offers some methods to manipulate them.
More information about Type Manager.
Symbol Table¶
The Symbol Table stores which identifiers have been seen in the target program (variable names, parameter names, function names), in which scope have they appeared (i.e. in which function or code block), and associates them with a type (stored in Type Manager)
More information about Symbol Table.
Tree Decoration¶
The Tree Decoration module allows to store some information associated to specific nodes in the parse tree.
More information about Tree Decoration.
Code Manager¶
The Code Manager module contains several classes that ease the handling and combination of code fragments.
More information about Code Manager.
Semantic Errors¶
Semantic Errors module simplifies the handling of errors, associating errors to nodes in the tree.
More information about Semantic Errors.