Code generation overview

See also this talk at 2022 EuroLLVM, which gives a high-level overview of llvm-mos's code generation strategy.

Like all LLVM backends, the bulk of the implementation exists in its own directory within the LLVM hierarchy, in llvm/lib/Target/MOS. llvm-mos uses LLVM's new GlobalISel architecture for instruction selection.

The instruction set of the 6502 isn't much more irregular than most modern CPUs, but the way that that regularity manifests itself is highly irregular. Modern CPUs generally have multiple operand slots for each opcode, so that addresses, indices, offsets, source registers, and destination registers can be directly specified. For the 6502, many of these values are indirectly encoded in the opcode itself. For example, the 6502 can load an absolute memory location to any register (A, X, or Y), but there are three opcodes to do so: LDA ADDR, LDX ADDR, and LDY ADDR. A more modern CPU would typically use one opcode: "LD R, ADDR".

To make the 6502 instruction set look more like what LLVM expects, we recast it "as if" it were a more modern instruction set. Thus, we report that the 6502 does have one "LD R, ADDR" instruction, where R is A, X, or Y. After code is generated in terms of these "logical instructions", we lower them down to the real 6502 instructions for final assembly output, machine code generation, and linking.

Because we model the 6502 instruction set in such a way as to be amenable to LLVM's algorithms, we benefit greatly from its machine independent optimization flows, from instruction selection, to register allocation, to basic block layout. There are some 6502-specific difficulties, but LLVM does provide relatively good means for targets (like ours) to sort them out ourselves, by providing pseudoinstructions and subroutine implementations that abstract away the complexity so LLVM doesn't need to know about it.

The original architects of the NMOS 6502 compensated for the 6502's small number of registers by providing 256 bytes of memory called Zero page, which could be accessed relatively quickly and cheaply by the processor. The llvm-mos C compiler utilizes a user-selectable range of zero-page memory, and performs nearly all of its operations there directly. We refer to this treatment of selectable ranges of zero page as imaginary registers, to distinguish them from LLVM's virtual registers. Code generation allocates and chooses imaginary registers for all operations that do not access 16-bit memory. Eventually, references to the imaginary registers are emitted as abstract symbols like "__rc12". The linker script will later map these to available locations in the zero page, depending on the target's specific memory map. Accordingly, the compiler's use of the zero page is highly customizable, and it can make use of highly discontiguous zero page fragments typical on real 6502 hardware.

Because we reserve a chunk of zero page memory for imaginary registers, and because LLVM has a great deal of specialized knowledge about pointer offsets and the like, llvm-mos can intelligently use a lot of the 6502's specialized addressing modes. For example, when a memory address that is a offset from a specific 16-bit pointer must be calculated, and that offset is 255 bytes or less, then llvm-mos can use LDA/STA (zp),y instructions to access that memory directly in a single instruction. This in turn means that llvm's GetElementPtr (GEP) instruction, can in many cases be reduced to a small handful of 6502 instructions. This is a big win when operating on pointed-to structs.

We can even try to produce 8-bit offsets where they wouldn't otherwise exist. For example, wherever possible, we rewrite 16-bit pointer loop indices to a sum of a 16-bit base and am 8-bit offset. Later on, the sum will be folded away into a LDA/STA (zp),y instruction. The advantage is that the loop increment is now just INY, not a full 16-bit increment. This optimization is possible because this pattern can be detected early on in the codegen pipelines, as LLVM allows us to do. See MOSIndexIV.cpp for more information on this particular optimization.