To run, start tinycc from the build directory. You can pass it --verbose to enable debug prints, any other argument will be treated as a path to file to be compiled. If no file is specified, tinycc will execute all tests defined in the main.cpp file.
The code is generated with tiling. Each option that can generate code is weighted by the number of instructions it will produce. The tiler then finds the best path along the tree where the total code size is the smallest, and generates that path. This results in the best possible instruction selection possible.
For example: Source code expression 'a+3+b*6' is compiled into one LEA instruction.
The project includes non-trivial additional support for passing values not only by registers, but also by flags. Usually, comparisons have to be generated by branching with the appropriate jump instruction, and generating a register as all other operators. When the result of these is fed into a branch, the register has to be compared to zero again, and another jumps are generated.
Given that comparison operators are mostly used in ifs, this makes for a great inefficiency. This is where the additional support comes into place.
The codegen doesn’t generate any code for putting the result of the comparison into the register, when not asked. It prefers to only result in a meta-information such as: should there be branching based on this result, the conditional jump that should be used is X.
For example, if there the source code contains if(a>b), there isn’t any register generated for the a>b logical result. Instead, it recognizes the comparison is in an if, and jumps with the instruction GT.
Operators & and && are different, as are | and ||. The project drags this difference all the way to codegen, where appropriate measures are taken.
The logical operators (&& and II) generate branching. If the first operand of the AND instruction is true, the second operand is not evaluated. This is important, because the second operand could have side effects. Similarly with the logical OR. Binary AND/OR on the other hand only calculate the result with the appropriate target instruction.
The project incorporates top-down local register allocation. The algorithm first finds the most accessed variables. The variable access for those is then replaced with only register access, decreasing the memory overhead. At the end of each basic block, all variables are spilled.
The project includes non-trivial constant propagation. Each line of the program gets its own state, which is also propagated between basic blocks. The results of the analysis are then applied to the program; not only replacing expressions with compile-time evaluation, but also eliminating branching where possible. The analysis works on registers and local variables in unison.
The constant propagation fully supports cycles, and traverses up though the lattice until a fix point is reached.
When possible, the applied propagation also replaces branches with unconditional jumps, if it knows the path the code will take. Additional optimizations then take place to remove the dead path.
Apart from constants, the analysis also supports these extended states: Positive, Negative, SemiPositive, SemiNegative, NonZero. Including these values increases the precision and makes it possible to evaluate even more code at compile-time.
Let for example there be a variable that is initialized to zero, and then decremented in a loop. At first, the value is constant (0). After reaching the fix-point, the algorithm recognizes it as a semi-negative value (can be negative or zero).
Should there be a branch later in the code, that compares if the said variable is less or equal to zero, the optimizer will know the answer (it is!) and take only the respective path, replacing the branching with unconditional jump.
Similar optimizations can take place for non-zero results (like pointers), or for example results of logical comparisons that are always semi-positive.
The analysis doesn’t support taking values or dereferencing, making it effectively useless when given a program that utilizes pointers. The result is still sound, as it sets the value of the variable to unknown (top) the moment it is taken address of, but this also means it gives up any hope of optimizing said variable.
Also, there is similar approached taken to short circuiting, as the CP does not support it natively.
The constant propagation would have unsatisfactory results on its own, as it only adds/modifies instructions, but does not remove them. This is where dead code elimination comes into place. It filters the code to contain only the statements that contribute to the final result (or have side-effects).
When paired with constant propagation, it removes the leftover code that is not longer needed, given that the result was calculated at compile time. This makes it possible to reduce the whole program to only the return statement, should everything be possible to calculate at compile time.
The project supports inlining of basic blocks. If there are unconditional jumps to blocks that have no other predecessor, the algorithm “inlines” them into one. It also deflates the jump-chains that can sometimes be present in the IM.
This decreases the number of basic blocks, decreases the amount of code (by the jumps) and increases the context for the back-end, that can provide better results from its tiling process.
(Not only) after running previous optimizations, there often exist basic blocks that can never be reached. They are therefore removed to decrease the code size and compilation time, and also make it possible for more inlining.
The algorithm searches for the multiple definitions of the same basic blocks, and if found, replaces all references to one. This decreases the code size when such duplicates exits (often in switch for example), and makes possible for another optimizations.
The code is fed to a peepholer, that looks for conditional jumps immediately followed by unconditional jumps to the same location. When found, the conditional jump is removed.
In combination with deduplication, it is possible to output a code that doesn’t contain any branching, if both branches give the same result.
If jump target is another jump instruction, this optimization removes the overhead by jumping straight to the final destination.
Jumps that lead to instruction immediately following the jump are omitted. This generates much smaller code, and can be used with more advanced optimizations, such as:
The algorithm tries all possible code layout options (changing the basic blocks order), until it find the option where the average jump length is the smallest. Conditional jumps have half the weight of unconditional ones, because they are not always taken.
This optimization also takes a parameter (pipeline size), that can decide when it’s better to produce less instructions in total (omitting some jumps by placing BBs after each other), or when it is better to have the average jump size lower, even if the cost would be more instructions.
Similarly to IM, if there are basic blocks that can never be used, the algorithm removes them.