README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208

# Compiler


## Structure

The project is structured in three directories.

- [`c_compiler`](/c_compiler) contains the main compiler, together with the final parser and lexer.
- [`c_parser`](/c_parser) contains the first parser that was written and covered a very simple c-like syntax.
- [`c_lexer`](/c_lexer) contains the first lexer for the C syntax.


## Usage

The necessary dependencies are: `make bison flex g++-mips-linux-gnu qemu tidy`.

### Vagrant
All the necessary dependencies and correct environment can be installed using Vagrant and the
provided vagrant file. This can be done using the following commands.

``` shell
# Installing the virtual machine
vagrant up

# ssh into the machine
vagrant ssh

# cd into the correct directory
cd /vagrant
```

The tests and builds can then be run from there.

### Build

To use the compiler and test it out, together with the lexer and parser, the [`makefile`](/makefile) can
be used. The main compiler can be built using the following commands.

``` shell
# build all the targets
make

# Building the main compiler
make bin/c_compiler

# Building the parser
make bin/c_parser

# Building the lexer
make bin/c_lexer
```

### Testing

These can then be tested using the following shell scripts.

``` shell
# Running tests for the compiler
./run_test_deliverable.sh 1

# Running tests against the gcc reference implementation of the compiler
./run_test_deliverable.sh 2

# Running smaller testbench on compiler
./test_compiler.sh

# Running tests for the parser
./test_parser.sh

# Running tests for the lexer
./test_lexer.sh
```


## Functionality

Implemented compiler functionality following the C89 Spec.

- [x] Local variables
- [x] Integer arithmetic
- [x] While
- [x] IfElse
- [x] For
- [x] Function calls
- [x] Arrays
- [x] Multi Dimensional Arrays
- [x] Pointers
- [x] Strings
- [ ] Structures
- [ ] Floating-point

## AST

### Overview Diagram

Overview of the ast that is built by the compiler, this only contains some of the important classes
in the Compiler.

![my-ast.png](/res/my-ast.png)

### Description

#### Description of the structure

I used a pure abstract [`Node`](/c_compiler/include/node.hpp) class as an entry point to the AST.
I then had a [`TranslationUnit`](/c_compiler/include/translation_unit.hpp) class which contained 
the external declarations and function
definitions in a vector. All other classes used linked lists instead of vectors to store
multiple elements, for example, the [`Statement`](/c_compiler/include/statement.hpp) class has a 
pointer to a next statement instead
of having a separate [`StatementList`](/c_compiler/include/statement.hpp) class. This meant that 
I did not have to write an extra List
class and made it easier to inherit from the right classes. I used the grammar to separated the
classes properly, as an [`Expression`](/c_compiler/include/expression.hpp) is completely different 
to a [`Type`](/c_compiler/include/type.hpp), a [`Statement`](/c_compiler/include/statement.hpp) 
or a [`Function`](/c_compiler/include/function.hpp).
This meant that I could separate the member functions and not have to declare all of them in the
[`Node`](/c_compiler/include/node.hpp) class, as that would lead to a lot of empty definitions. These classes were mostly abstract too
but contained a few function definitions that would throw exceptions so that I did not have to
define these functions when I did not need them. I also had a few specific member functions
(eg. for [`UnaryExpression`](/c_compiler/include/expression.hpp)) for which I had to use dynamic 
and static pointer casting to access them.

#### Strengths

The class hierarchy and structure is very logical as it closely matches the grammar.
This also that it was easy to know what member variables a class needed and the
inheritance was very logical too. All the member functions are well separated
and only appear where they are actually used.

All the general base classes, that are mostly abstract as well, are in the bison union,
which means that I can use and assign all of those classes directly, and be more
specific in the member variables of those classes so that they only contain the types I need.

#### Limitations 

The [`Type`](/c_compiler/include/type.hpp) class is not very useful as it does not capture arrays correctly and store
all their information when using a multidimensional array for example. It also does not
enable me to extract this information correctly as it will only give me the primitive type of
the array.

As I did not want all the classes to contain functions that they do not need, classes like
UnaryOperator have member functions. To access these I have to use dynamic casts and with my
linked lists, I always have to check for nullptr before doing anything with it.

### Variable binding

#### General approach

I did not use many registers because they are a limited resource, and instead I decided
only to use registers $2 and $3 for performing operations and rarely use registers $t0 and $t1
when loading the address of a value, for example for pointers. I also used registers $4, $5, $6, $7
for passing values to functions. I used a frame pointer for the current function to access the
stack. This frame is split into multiple parts, first, enough space to store and pass values
when making function calls, then space for all declared variables in the function, and lastly enough
space to store temporary results from expressions. It also stores the previous frame pointer and the
return address at the end of the frame. Every time I perform any operations, I store
the result of that operation in that temporary space in the frame. I keep track of the stack
position and temporary expression stack position using the Bindings class. This class also includes
a map of variables bindings, which binds the identifier to its type and stack position
relative to the frame pointer. The Bindings class is passed by value to account for scopes and
variable shadowing. 

#### Strengths

The Bindings class stores the type of the identifier so that I can look
it up and perform the right operation in the [`Expression`](/c_compiler/include/expression.hpp) class. Storing the type, however, also
means that I do not have to store the type of an [`Expression`](/c_compiler/include/expression.hpp), but can just deduce it.

By only using two registers for operations, I do not have to worry about having no more
registers available, and which registers will not be overwritten after a function call.
The temporary expression result will always be in the current frame of the function even after
a function call.

#### Limitations

As I am only using two registers to perform operations, I have to include loads and
stores to access the temporary results of the operations. I also store results of an operation
when I do not need the result anymore. This means that the code will run much slower.

When counting the variables that are being assigned in the current function, I assume that
they are all integers. I also always store the return value $31 even when there is no function
call. This means that the frame is always much larger than it needs to be.

### Reflection

#### Strengths

The [`Type`](/c_compiler/include/type.hpp) class was not structured well for handling arrays, however, it does work well
with the other types and makes it easy to store and load variables using the
right operations. It also made it easy to identify pointers for pointer arithmatic and store
values in arrays without padding.

Function calls work well as they follow the MIPS ABI, and leave enough space for the called
function to store the parameters in the previous frame. The return address is also always stored,
which make recursive calls possible. Function declarations also work because they do not have
to emit assembly.

#### Scope for Improvement

I would like to work on multidimensional arrays as they do not fully work. To do
that I would have to add information such as the initial array in the type so that I can assign
arrays inside the array to a pointer and perform the right pointer arithmetic.

I would also like to reduce the number of dynamic and static casts by adding more
intermediate classes. This would also make the whole AST design more extensible and it would
be easier to possibly add structures in the future. This would also make classes more specialized
and maintainable.