Current status

From llvm-mos

C compiler[edit | edit source]

  • Code generation for the Commodore 64, Atari 800, and an included 6502 simulator.
  • The high and low-level optimizations expected of a young-ish LLVM backend
    • Sophisticated register allocation over A, X, Y, and a field of 16 2-byte zero-page (imaginary) registers
    • The imaginary registers can be placed anywhere and need not be contiguous.
    • The calling convention passes through registers whenever possible.
    • Loop optimizations to select 6502 addressing modes
    • Whole program "static stack" optimization
      • Automatically identifies non-reentrant functions and allocates their frames as static globals
      • Programs without recursion or complex function pointers may not need a soft stack at all.
      • No manual annotations required
    • Link time inlining and optimization across the whole program
      • Includes SDK libraries. Library calls can be often optimized away completely!
  • Broad C99 and C++11 freestanding standards compatibility
    • Interrupt handling
    • C++ templates
    • C++ virtual functions
    • C++ new/delete
    • C++ Run-Time Type Information (dynamic_cast, typeid)
    • C++ static constructors/destructors (run before and after main)
    • C++ "magic" function local static constructors/destructors
  • Excellent compiler usability
    • Clang's world-class error messages
    • IDE integration through the included custom clangd's Language Server Protocol
    • Straigtforward invocations to compile for various targets: mos-c64-clang++ -Os -o game.prg game.cc
  • A small standard library sufficient to provide the above and a few extras
    • Simple printf
    • Simple malloc/free
    • exit, _Exit, and atexit

Notable Exceptions[edit | edit source]

  • C99 requires supporting 64KiB locals
  • Hosted C with all the standard library bells and whistles.
  • Float/double
  • C++ Exceptions

Assembler[edit | edit source]

The assembler, llvm-mc, understands and assembles all NMOS 6502 opcodes. The assembler correctly understands symbols, and it's possible to use them as branch targets, do pointer math on them, and the like. Fixups work as expected at link time.

The assembler correctly deals with 6502 relative branches. BEQ, BCC, etc., all correctly calculate PC relative offsets in the unusual 6502 convention, in the range of [-126,+129]. Since llvm-mc is GNU assembler compatible, you can use all GNU assembler features while writing 65xx code, including macros, ifdefs, and similar.

The assembler is capable of intelligently figuring out whether symbols should refer to zero page or 16-bit locations, at the time of compilation. If, at compile time, you place a symbol in a section named .zeropage, .directpage, or .zp, then that symbol will be assumed to be located in zero page; otherwise, it will be assumed to refer to a 16-bit address. Additionally, if a symbol is placed in a section with a z section flag enabled, then that symbol is assumed to be located in zero page, with addressing calculated accordingly.

The assembler and linker both understand that $ is a legal prefix for hexadecimal constants. Much existing 6502 assembly code depends on this older convention. See the DollarIsHexPrefix constant in MCAsmInfo.h. The lexer now queries whatever the current MCAsmInfo structure to see whether the target wants the dollar sign to be a hex prefix. So, everything that depends on the lexer (which is almost everything in LLVM) can now recognize 6502 format hexadecimal constants, if the corresponding MCAsmInfo asks for it. The modern 0x prefix works fine as well.

The assembler understands the names of the 6502 registers, including a, x, y, p, sp, and pc. It understands references to these names to be references to those registers. If your code depends on these names as variable or section names, you can force the assembler to use the prefix of llvm_mos_ on those registers, e.g. llvm_mos_a, llvm_mos_x, etc. To require this prefix on references to those registers, enable the mos-long-register-names compilation feature. For example, with llvm-mc, use the flag -mattr=+mos-long-register-names. Printed assembly output uses this naming scheme to avoid conflicts with existing code.

To target MOS family processors, you will need to use a triple of "mos" (try: -triple mos) as a parameter to any tool.

Linker[edit | edit source]

The ELF file format for objects and executables, has been extended to support 65xx compatible processors. Hello-world type programs have been proven to compile, and work as expected, on emulated Commodore 64, VIC-20, and Apple II machines.

The linker, lld, can be called with a "-flavor gnu" parameter in order to permit linking of ELF executables.

If you've written an appropriate linker file for your 6502 target, you can season the following overly verbose formula to compile assembly code for your particular target, where %s is the name of your assembly source, %S is the directory of your include files, and %t is the base name of your project:

llvm-mc -triple mos --filetype=obj -o=%t.obj %s 
llvm-objdump --all-headers --print-imm-hex -D %t.obj 
llvm-readelf --all %t.obj
lld -flavor gnu %t.obj -o %t.elf -L %S your-target.ld
llvm-readelf --all %t.elf 
llvm-objdump --all-headers --print-imm-hex -D %t.elf
llvm-objcopy --output-target binary --strip-unneeded %t.elf %t.bin

The llvm-objdump and llvm-readelf programs are not necessary; they're just there to help you debug your own pipeline.

As of this writing, some example linker files and BASIC stubs exist at llvm/test/MC/MOS/Inputs.

ELF[edit | edit source]

Both the assembler and the linker support the ELF format, for both object files and executables. The ELF format has been extended with a machine type of 6502 (naturally) to permit storing 65xx code in ELF files.

Because the 6502 assembler and linker both work with ELF files, you can use any of your favorite tools to inspect or understand ELF files generated by the LLVM tools. The llvm-readobj, llvm-objdump, llvm-objcopy, llvm-strip, and likely the other command line tools as well, work as expected. This also means that generic tools that work on ELF files, such as this online ELF viewer, can read and dump basic information about MOS executables.

Deployment[edit | edit source]

If any branch builds and passes the check-all suite of LLVM tests, it's immediately deployed so you can get fresh bits for your favorite platform. Always grab the main build first. We smoke test on Windows, MacOS, and Ubuntu, but llvm-mos should build on anything that LLVM can build on. You can download fresh binary bits from the Github releases page.

Implementation[edit | edit source]

TableGen has been taught all native 6502 instructions and formats. llvm-mc can assemble all 6502 instructions..

For some examples of what the backend can do as of this writing, see the llvm/test/MC/MOS directory for some functional assembler tests. Building the check-llvm-mc-mos target, confirms just these tests for MOS.