Why yet another 6502 C compiler?
As an LLVM backend, we benefit from the expansive high-level optimizations available. These include radical code transformations of switch statements, loops, and table lookups. Nothing beyond what a human could do of course; a human wrote all these optimizations, of course. But there's potential far beyond what a human would have patience for. As an example, switch statement cases may be shifted and bitwise operations applied to them to make the different case integers denser. This increases the number that can fit into a jump table, which decreases the amount of branching needed to execute the switch. A human could do that for a switch statement, but it's unlikely they'd go through the effort for any but the most performance critical. LLVM will tirelessly consider it for every single switch in the program.
At a lower level, good use of the zero page is essential to producing good 6502 code. To that end, we model the zero page as an "imaginary register" bank. The placement of these registers are completely customizable by the end user to fit a variety of target system memory models. Using registers for this purpose allows us full access to LLVM's register allocator, which can often allocate program temporary values in such a way that they never need to leave the zero page, A, X, and Y. This vastly reduces need for soft (emulated) stack, which is a sticking point for earlier 6502 compilers.
Even when a stack of some kind is required, the optimizer performs whole-program analysis to identify functions that cannot simultaneously have more than one invocation active. These functions can have their "stack frames" allocated in absolute memory, again avoiding use of the soft stack. We reserve the actual soft stack only for cases where it cannot be statically proven that a function doesn't intrinsically require it (due to function pointers or other complex control flow).
As for the code itself, we perform a remarkably effective loop optimization that detects 16-bit index operations that can be converted to a 16-bit index plus an 8-bit offset. The latter is a directly-supported addressing mode on the 6502, and 8-bit index manipulation can be done in a single instruction. This allows us to convert idiomatic 16-bit "int c" loops into something much more suitable for the 6502. Eventually, we hope that optimizations of this kind will transform standard, naive C code into tightly optimized 6502 code.
Because this is a subproject of LLVM, it inherits all the features of LLVM. Namely, this project provides full ELF support for 6502 objects, libraries, and executables. This opens up previously impossible functionality, such as viewing 6502 program properties in ELF tools that don't know anything about the 6502 specifically.
llvm-mos is not limited to C. It also provides a fully functional assembler and disassembler that reads and writes assembly source files in a GNU assembler compatible format. This in turns opens up a world of macro programming functionality, for those who prefer to work at the metal level.
A proof of concept exists, demonstrating that llvm-mos can support Rust as a source language. This suggests that LLVM-MOS can support other source languages as well, such as C++.
Lastly, the llvm-mos project is entirely open source, and developed entirely consistently with LLVM coding standards in mind. Want to experiment with a new codegen pass, or adding a new target? Jump right in, clone the codebase, and start playing.
See also Findings.